IBIS-AMI: Secure the AMI model and software with code signing

Preface:

During the weekly meetings of IBIS Advanced Technology Modeling (ATM) task group earlier this year, a participant proposed requirement of code signing the released IBIS-AMI models for identification and security consideration. While this direction was not pursued further at the end due to various considerations, several meeting attendees suggested it would be helpful if someone with experience in this process can share (e.g. at the IBIS summit etc). It just so happens that all SPISim’s digital downloads, including executable, patches and AMI models are digitally signed. Thus even though I am no computer security expert, I am writing this post to share some info. and our process for you, the AMI model maker, to consider. If you decide to go this route, this post may even help you saving hundreds or even thousands of dollars on certificate purchase (really!) So read on…

What is code signing:

According to the wikipedia [HERE], code signing is a process of digitally “signing” executables to confirm the software author and guarantee that the code has not been altered or corrupted since it was signed. That is, if say SPISim creates an application or an AMI model, digitally sign then release them, recipients will be able to verify that the binary files they have at hand, regardless how they obtained them, are the originals. (I will use “software” to represent both executable and AMI models from this point as they are all binaries with programs.) If the software’s content was tampered … such as becoming corrupted during downloading or injected with malicious instructions, the digital signatures of the software will no longer be valid. This will be detected by the operating system or UAC (user account control). The digital signature is embedded inside the software. It only guarantees the source of the files but not their behaviors. So software may still have bugs or crash… only that you know who is the author to blame 🙂

Code signing makes use of asymmetric cipher (public/private key) algorithm. The private key file, i.e. a certificate, is a platform neutral. However, the code signing process itself is platform dependent. So windows’ software needs to be signed on windows and so are on OSX. Linux is a little different on this aspect, you will find more info. about this at the wikipedia link above.

In the subsequent sections, I will be using windows platform as an example.

What does signed software look like:

If your software is unsigned… such as the application shown below, users will see a “Publisher Unknown” info with yellow background. This may raise a red flag depending on your company’s IT or computer’s UAC policy settings.

 

On the other hand, a signed application like ours will look like this (blue):

Besides showing “Verified”, more info such as company name, address, certificate issuer, valid periods etc are also available upon request as it’s embedded in the software.

Actually, as soon as you download a software application, you may right click to verify their “Digital Signatures”. Only signed binary will have such tab available:

For signed web based application, such as our SPISim_AMI and SPISim_LINK free web apps, browser like chrome will let user run them directly. An unsigned or self-signed (more on this later) web-apps will mostly be blocked.

Regarding .dll (e.g. IBIS-AMI models),  an unsigned one will not have any source info:

A signed one will look like this:

 

Why you should have your software signed:

For a desktop or server based application, platform mechanism such as windows’ UAC will detect the signature during the initial installation process. For a web-based application, browser serves as a gatekeeper to detect or even block unsigned ones being run as they present more/frequent dangers. For a .dll file such as IBIS-AMI model, users need to right-click the file and check its properties by themselves unless the application which loads these dlls will check first before executing APIs within. At this time point, I am not aware of any EDA tool or simulator which does so. Nevertheless, code signing your software is like putting a seal of approval. It makes your product look more professional and also shows that as a software or model publisher, you care about customers or clients’ cyber security when your product is being used.

How to have your software signed:

Now that you are interested in getting software signed, here are the process:

  • Purchase a certificate:

You will need to have a certificate (i.e. private key) to sign the software. Your identity will be encrypted within this private key. Only a reputable entity can serve as a certificate authority (CA) and issue such certificate. One used to be able to create a self-signed certificate but that may be only useful within same organization (intranet). For public network, this entity will verify who you say you are before encrypting the identity you have provided using their own key (i.e. they vouch for your info) as a certificate. This is like a chain of trust. This way you can’t pretend that you are Microsoft or Google etc. This is a very involved process with human effort included so it may not be cheap!! By the way, a certificate is not forever… it expires after certain period of time.

If you search  “purchase certificate for code signing” online, those top ones are probably quite expensive:

I am going to save you money here… Visit tucows.com, open an account and purchase certificate there. In stead of thousands, you pay less than two hundred for a three-year certificate!

  • Validate your identification:

During the purchasing process, you will be asked to provide identity such as your personal or company name, company email address, phone number, phsical address etc. All the info. provided must be verifiable by this entity you are purchasing from. For example, you will need to have a public (government or state) record which shows same information. You may be asked to provide IDs and verify you are the domain owner. You will then be asked to wait for a period of time or even possible phone call from the CA. They will go through human based verification process to make sure the data you provided is valid and you are who you say you are. Only after that, they will email you the link to download/install the certificate they have issued for you. A supported web browser like FireFox is required to install the issued certificate.

For example, here is the info. requested during the purchasing process:

  • Download and backup certificate

The issued certificate will delivered via a clickable link and can only be installed in the same browser used during application/purchasing process. It is very important that you install, export this certificate right away then backup to a different location. With backup certificate, you can use other computer to sign software or the same computer after OS re-install. You will also need this certificate as a file as part of the building and release process.

More info. about how to do this is available [HERE]

  • Make code signing part of your build/release process

Once the certificate file is available, you may incorporate that into part of software releasing flow by signing and verifying the signature at the end of the building process. This is platform and OS-bit (32-bit or 64-bit) dependent. The white blocks below are password I used for using our certificate..

Signing and verify IBIS-AMI dll file:

Signing web-app jar file:

Signing and verify installer:

Time-stamp using a NNTP server is required:

Signtool64 above is a windows’s tool from Microsoft. Java has its own version:

 

Unsign signed software:

Because a certificate is valid only though certain period of time(e.g. three years), it will “expire” after that. While not immediately after the expiration date, you will soon not be able to sign the software with expired certificate. The time-stamping via a time-server during the signing process is a mechanism to prevent you back-date the software to be released…. That means you will need to keep certificate updated by re-purchase a new one. Platform such as windows’ UAC may also prevent one from installing software signed by outdated certificate. The last possibility is that a certificate may be revoked due to the “chain of trust” being destroyed somewhere upstream.

More often than not, once an AMI model is released, it may stay out there beyond original publisher’s control for quite some time and then become expired. A signed software can be un-signed to strip this timestamped signature if the IT policies prohibit installing outdated binaries: More info. may be obtained [HERE]

Finally:

I hope this blog post will be helpful for model maker publishing their AMI models or software alike. Another good blog article on this topic may be found [HERE]

AI/ML: Use ML techniques for CTLE AMI modeling

Note:

  • Dataset and ipython notebook of this post is available at SPISim’s github page: [HERE]
  • This notebook may be better rendered by nbviewer and can be viewed  [HERE].

Use ML techniques for SERDES CTLE modeling for IBIS-AMI simulation

Table of contents:

1.Motivation ...

2.Problem Statements ...

3.Generate Data ...

4.Prepare Data ...

5.Choose a Model ...

6.Training ...

7.Prediction ...

8.Reverse Direction ...

9.Deployment ...

10.Conclusion ...

Motivation:

One of SPISim's main service is IBIS-AMI modeling and consulting. In most cases, this happens when IC companies is ready to release simulation model for their customers to do system design. We will then require data either from circuit simulation, lab measurements or data sheet spec. in order to create associated IBIS-AMI model. Occasionally, we also receive request to provide AMI model for architecture planning purpose. In this situation, there is no data or spec. In stead, the client asks to input performance parameters dynamically (not as preset) so that they can evaluate performance at the architecture level before committing to a certain spec. and design accordingly. In such case, we may need to generate data dynamically based on user's input before it will be fed into existing IBIS-AMI models of same kind. Continues linear equalizer, CTLE, is often used in modern SERDES or even DDR5 design. It is basically a filter in the frequency domain (FD) with various peaking and gain properties. As IBIS-AMI is simulated in the time domain (TD), the core implementations in the model is iFFT to convert into impulse response in TD. CtleFDTD

In order to create such CTLE AMI model from user provided spec. on the fly, we would like to avoid time-consuming parameter sweep (in order to match the performance) during runtime of initial set-up call. Thus machine learning techniques may be applied to help use build a prediction model to map input attributes to associated CTLE design parameters so that its FD and TD response can be generated directly. After that, we can feed the data into existing CTLE C/C++ code blocks for channel simulation.

Problem Statement:

We would like to build a prediction model such that when user provide a desired CTLE performance parameters, such as DC Gain, peaking frequency and peaking value, this model will map to corresponding CTLE design parameters such as pole and zero locations. Once this is mapped, CTLE frequency response will be generated followed by time-domain impulse response. The resulting IBIS-AMI CTLE model can be used for channel simulation for evaluating such CTLE block's impact in a system... before actual silicon design has been done.

Generate Data:

Overview

The model to be build is for nominal (i.e. numerical) prediction for about three to six attributes, depending on the CTLE structure, from four input parameters, namely dc gain, peak frequency, peak value and bandwidth. We will sample the input space randomly (as full combinatorial is impractical) then perform measurement programmingly in order generate enough dataset for modeling purpose.

Defing CTLE equations

We will target the following two most common CTLE structures:

IEEE 802.3bs 200/400Gbe Chip-to-module (C2M) CTLE:

C2M_CTLE

IEEE 802.3bj Channel Operating Margin (COM) CTLE:

COM_CTLE

Define sampling points and attributes

All these pole and zeros values are continuous (numerical), so a sub-sampling from full solution space will be performed. Once frequency response corresponding to each set of configuration is generated, we will measure is dc gain, peak frequency and value, bandwidth (3dB loss from the peak), and frequencies when 10% and 50% gain between dc and peak values happened. The last two attributes will help us increasing the attributes available when creating the prediction model.

Attributes to be extracted:

CTLEAttr

Synthesize and measure data

A python script (in order to be used with subsequent Q-learning w/ OpenAI later) has been written to synthesize these frequency response and perform measurement at the same time.

Quantize and Sampling:

Sampling

Synthesize:

Synthesize

Measurement:

Measurement

The end results after this data generation phase is a 100,000 points dataset for each of the two CTLE structures. We can now proceed for the prediction modeling.

Prepare Data:

In [42]:
## Using COM CTLE as an example below:

# Environment Setup:
import os
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import numpy as np

prjHome = 'C:/Temp/WinProj/CTLEMdl'
workDir = prjHome + '/wsp/'
srcFile = prjHome + '/dat/COM_CTLEData.csv'

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(workDir, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)
In [43]:
# Read Data
srcData = pd.read_csv(srcFile)
srcData.head()

# Info about the data
srcData.head()
srcData.info()
srcData.describe()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 11 columns):
ID         100000 non-null int64
Gdc        100000 non-null float64
P1         100000 non-null float64
P2         100000 non-null float64
Z1         100000 non-null float64
Gain       100000 non-null float64
PeakF      100000 non-null float64
PeakVal    100000 non-null float64
BandW      100000 non-null float64
Freq10     100000 non-null float64
Freq50     100000 non-null float64
dtypes: float64(10), int64(1)
memory usage: 8.4 MB
Out[43]:
ID Gdc P1 P2 Z1 Gain PeakF PeakVal BandW Freq10 Freq50
count 100000.000000 100000.000000 1.000000e+05 1.000000e+05 1.000000e+05 100000.000000 1.000000e+05 100000.000000 1.000000e+05 1.000000e+05 1.000000e+05
mean 49999.500000 0.574911 1.724742e+10 5.235940e+09 5.241625e+09 0.574911 3.428616e+09 2.374620 1.496343e+10 4.145855e+08 1.136585e+09
std 28867.657797 0.230302 4.323718e+09 2.593138e+09 2.589769e+09 0.230302 4.468393e+09 3.346094 1.048535e+10 5.330043e+08 1.446972e+09
min 0.000000 0.200000 1.000000e+10 1.000000e+09 1.000000e+09 0.200000 0.000000e+00 0.200000 9.965473e+08 0.000000e+00 -1.000000e+00
25% 24999.750000 0.350000 1.350000e+10 3.000000e+09 3.000000e+09 0.350000 0.000000e+00 0.500000 4.770445e+09 0.000000e+00 -1.000000e+00
50% 49999.500000 0.600000 1.700000e+10 5.000000e+09 5.000000e+09 0.600000 0.000000e+00 0.800000 1.410597e+10 0.000000e+00 -1.000000e+00
75% 74999.250000 0.750000 2.100000e+10 7.500000e+09 7.500000e+09 0.750000 7.557558e+09 2.710536 2.386211e+10 8.974728e+08 2.510339e+09
max 99999.000000 0.950000 2.450000e+10 9.500000e+09 9.500000e+09 0.950000 1.516517e+10 16.731528 3.965752e+10 1.768803e+09 4.678211e+09

Seems full justified! Let's plot some distribution...

In [44]:
# Drop the ID column
mdlData = srcData.drop(columns=['ID'])

# plot distribution
mdlData.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()
Saving figure attribute_histogram_plots

There are abnomal high peaks for Freq10, Freq50 and PeakF. We need to plot the FD data to see what's going on...

Error checking:

NoPeaking

Apparently, this is caused by CTLE without peaking. We can safely remove these data points as they will not be used in actual design.

In [45]:
# Drop those freq peak at the beginning (i.e. no peak)
mdlTemp = mdlData[(mdlData['PeakF'] > 100)]
mdlTemp.info()


# plot distribution again
mdlTemp.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots2")
plt.show()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 41588 entries, 1 to 99997
Data columns (total 10 columns):
Gdc        41588 non-null float64
P1         41588 non-null float64
P2         41588 non-null float64
Z1         41588 non-null float64
Gain       41588 non-null float64
PeakF      41588 non-null float64
PeakVal    41588 non-null float64
BandW      41588 non-null float64
Freq10     41588 non-null float64
Freq50     41588 non-null float64
dtypes: float64(10)
memory usage: 3.5 MB
Saving figure attribute_histogram_plots2

Now the distribution seems good. We can proceed to separate variables (i.e. attributes) and targets

In [46]:
# take this as modeling data from this point
mdlData = mdlTemp

varList = ['Gdc', 'P1', 'P2', 'Z1']
tarList = ['Gain', 'PeakF', 'PeakVal']

varData = mdlData[varList]
tarData = mdlData[tarList]

Choose a Model:

We will use Keras for the modeling framework. While it will call Tensorflow on our machine in this case, the GPU is only used for training purpose. We will use (shallow) neural network for modeling as we want to implement the resulting models in our IBIS-AMI model's C++ codes.

In [47]:
from keras.models import Sequential
from keras.layers import Dense, Dropout

numVars = len(varList)  # independent variables
numTars = len(tarList)  # output targets
nnetMdl = Sequential()
# input layer
nnetMdl.add(Dense(units=64, activation='relu', input_dim=numVars))

# hidden layers
nnetMdl.add(Dropout(0.3, noise_shape=None, seed=None))
nnetMdl.add(Dense(64, activation = "relu"))
nnetMdl.add(Dropout(0.2, noise_shape=None, seed=None))
          
# output layer
nnetMdl.add(Dense(units=numTars, activation='sigmoid'))
nnetMdl.compile(loss='mean_squared_error', optimizer='adam')

# Provide some info
#from keras.utils import plot_model
#plot_model(nnetMdl, to_file= workDir + 'model.png')
nnetMdl.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_10 (Dense)             (None, 64)                320       
_________________________________________________________________
dropout_7 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_11 (Dense)             (None, 64)                4160      
_________________________________________________________________
dropout_8 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_12 (Dense)             (None, 3)                 195       
=================================================================
Total params: 4,675
Trainable params: 4,675
Non-trainable params: 0
_________________________________________________________________

Training:

We will do the 20% training/testing split for the modeling. Note that we need to scale the input attributes to be between 0~1 so that neuron's activation function can be used to differentiate and calculate weights. These scaler will be applied "inversely" when we predict the actual performance later on.

In [48]:
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Prepare Training (tran) and Validation (test) dataset
varTran, varTest, tarTran, tarTest = train_test_split(varData, tarData, test_size=0.2)

# scale the data
from sklearn import preprocessing
varScal = preprocessing.MinMaxScaler()
varTran = varScal.fit_transform(varTran)
varTest = varScal.transform(varTest)

tarScal = preprocessing.MinMaxScaler()
tarTran = tarScal.fit_transform(tarTran)

Now we can do the model fit:

In [49]:
# model fit
hist = nnetMdl.fit(varTran, tarTran, epochs=100, batch_size=1000, validation_split=0.1)
tarTemp = nnetMdl.predict(varTest, batch_size=1000)
#predict = tarScal.inverse_transform(tarTemp)
#resRMSE = np.sqrt(mean_squared_error(tarTest, predict))
resRMSE = np.sqrt(mean_squared_error(tarScal.transform(tarTest), tarTemp))
resRMSE
Train on 29943 samples, validate on 3327 samples
Epoch 1/100
29943/29943 [==============================] - 0s 12us/step - loss: 0.0632 - val_loss: 0.0462
Epoch 2/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0394 - val_loss: 0.0218
Epoch 3/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0218 - val_loss: 0.0090
Epoch 4/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0134 - val_loss: 0.0046
Epoch 5/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0102 - val_loss: 0.0039
Epoch 6/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0090 - val_loss: 0.0036
Epoch 7/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0082 - val_loss: 0.0032
Epoch 8/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0075 - val_loss: 0.0030
Epoch 9/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0071 - val_loss: 0.0027
Epoch 10/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0067 - val_loss: 0.0025
Epoch 11/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0063 - val_loss: 0.0022
Epoch 12/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0059 - val_loss: 0.0020
Epoch 13/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0056 - val_loss: 0.0018
Epoch 14/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0053 - val_loss: 0.0017
Epoch 15/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0050 - val_loss: 0.0015
Epoch 16/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0048 - val_loss: 0.0014
Epoch 17/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0046 - val_loss: 0.0013
Epoch 18/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0045 - val_loss: 0.0012
Epoch 19/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0043 - val_loss: 0.0011
Epoch 20/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0042 - val_loss: 0.0011
Epoch 21/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0041 - val_loss: 9.9891e-04
Epoch 22/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0040 - val_loss: 9.5673e-04
Epoch 23/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0039 - val_loss: 9.1935e-04
Epoch 24/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0038 - val_loss: 8.7424e-04
Epoch 25/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0037 - val_loss: 8.3335e-04
Epoch 26/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0036 - val_loss: 8.0617e-04
Epoch 27/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0035 - val_loss: 7.7511e-04
Epoch 28/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0035 - val_loss: 7.6336e-04
Epoch 29/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0034 - val_loss: 7.4145e-04
Epoch 30/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0034 - val_loss: 7.1555e-04
Epoch 31/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0033 - val_loss: 6.8232e-04
Epoch 32/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0033 - val_loss: 6.8118e-04
Epoch 33/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0033 - val_loss: 6.5987e-04
Epoch 34/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0032 - val_loss: 6.5535e-04
Epoch 35/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0032 - val_loss: 6.4880e-04
Epoch 36/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0032 - val_loss: 6.2126e-04
Epoch 37/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0031 - val_loss: 6.1235e-04
Epoch 38/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0031 - val_loss: 6.0875e-04
Epoch 39/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0030 - val_loss: 5.8204e-04
Epoch 40/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0030 - val_loss: 5.8521e-04
Epoch 41/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0030 - val_loss: 5.8456e-04
Epoch 42/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0030 - val_loss: 5.5742e-04
Epoch 43/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0029 - val_loss: 5.5412e-04
Epoch 44/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0029 - val_loss: 5.5415e-04
Epoch 45/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0029 - val_loss: 5.3159e-04
Epoch 46/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0029 - val_loss: 5.2046e-04
Epoch 47/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0028 - val_loss: 5.1748e-04
Epoch 48/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0028 - val_loss: 5.1205e-04
Epoch 49/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0027 - val_loss: 5.0424e-04
Epoch 50/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0028 - val_loss: 4.9067e-04
Epoch 51/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0027 - val_loss: 4.7902e-04
Epoch 52/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0027 - val_loss: 4.7667e-04
Epoch 53/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0026 - val_loss: 4.6521e-04
Epoch 54/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0026 - val_loss: 4.6684e-04
Epoch 55/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0026 - val_loss: 4.7006e-04
Epoch 56/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0026 - val_loss: 4.5770e-04
Epoch 57/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0025 - val_loss: 4.3075e-04
Epoch 58/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0025 - val_loss: 4.3796e-04
Epoch 59/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0025 - val_loss: 4.3114e-04
Epoch 60/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0025 - val_loss: 4.1051e-04
Epoch 61/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0025 - val_loss: 4.0642e-04
Epoch 62/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0025 - val_loss: 4.1214e-04
Epoch 63/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0025 - val_loss: 3.9472e-04
Epoch 64/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0024 - val_loss: 3.9697e-04
Epoch 65/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0024 - val_loss: 3.8548e-04
Epoch 66/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0024 - val_loss: 3.9030e-04
Epoch 67/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.7588e-04
Epoch 68/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0024 - val_loss: 3.6643e-04
Epoch 69/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.6973e-04
Epoch 70/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.6345e-04
Epoch 71/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.5743e-04
Epoch 72/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0023 - val_loss: 3.5294e-04
Epoch 73/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.6533e-04
Epoch 74/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.5859e-04
Epoch 75/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0022 - val_loss: 3.3832e-04
Epoch 76/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0022 - val_loss: 3.5197e-04
Epoch 77/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.4445e-04
Epoch 78/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.3888e-04
Epoch 79/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.3597e-04
Epoch 80/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.2317e-04
Epoch 81/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0021 - val_loss: 3.2205e-04
Epoch 82/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.4191e-04
Epoch 83/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.2288e-04
Epoch 84/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.1419e-04
Epoch 85/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.1307e-04
Epoch 86/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.1795e-04
Epoch 87/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.1200e-04
Epoch 88/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.0641e-04
Epoch 89/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.2401e-04
Epoch 90/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.0903e-04
Epoch 91/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.1448e-04
Epoch 92/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.0788e-04
Epoch 93/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.0349e-04
Epoch 94/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.0098e-04
Epoch 95/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0020 - val_loss: 3.1119e-04
Epoch 96/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.0249e-04
Epoch 97/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 2.8934e-04
Epoch 98/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 2.9429e-04
Epoch 99/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 2.8466e-04
Epoch 100/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0019 - val_loss: 3.0773e-04
Out[49]:
0.01786895428237113

Let's see how this neural network learns over different Epoch

In [50]:
# plot history
plt.plot(hist.history['loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.show()

Looks quite reasonable. We can save the Keras model now together with scaler for later evaluation.

In [51]:
# save model and architecture to single file
nnetMdl.save(workDir + "COM_nnetMdl.h5")

# also save scaler
from sklearn.externals import joblib
joblib.dump(varScal, workDir + 'VarScaler.save') 
joblib.dump(tarScal, workDir + 'TarScaler.save') 
print("Saved model to disk")
Saved model to disk

Prediction:

Now let's use this model to make some prediction

In [52]:
# generate prediction
predict = tarScal.inverse_transform(tarTemp)
allData = np.concatenate([varTest, tarTest, predict], axis = 1)
allData.shape
headLst = [varList, tarList, tarList]
headStr = ''.join(str(e) + ',' for e in headLst)
np.savetxt(workDir + 'COMCtleIOP.csv', allData, delimiter=',', header=headStr)

Let's take some 50 points and see how the prediction work

In [53]:
# plot some data
begIndx = 100
endIndx = 150
indxAry = np.arange(0, len(varTest), 1)

plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,0][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,0][begIndx:endIndx])
Out[53]:
<matplotlib.collections.PathCollection at 0x242d059d390>
In [54]:
# Plot Peak Freq.
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,1][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,1][begIndx:endIndx])
Out[54]:
<matplotlib.collections.PathCollection at 0x242d72df2e8>
In [55]:
# Plot Peak Value
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,2][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,2][begIndx:endIndx])
Out[55]:
<matplotlib.collections.PathCollection at 0x242d5ea39e8>

Reverse Direction:

The goal of this modeling is to map performance to CTLE poles and zeros locations. What we just did is the other way around (to make sure such neural network's structure meets our need). Now we needs to reverse the direction for actual modeling. To provide more attributes for better predictions, we will also use frequencies where 10% and 50% gain happened as part of the input attributes.

In [56]:
tarList = ['Gdc', 'P1', 'P2', 'Z1']
varList = ['Gain', 'PeakF', 'PeakVal', 'Freq10', 'Freq50']

varData = mdlData[varList]
tarData = mdlData[tarList]
In [57]:
from keras.models import Sequential
from keras.layers import Dense, Dropout

numVars = len(varList)  # independent variables
numTars = len(tarList)  # output targets
nnetMdl = Sequential()
# input layer
nnetMdl.add(Dense(units=64, activation='relu', input_dim=numVars))

# hidden layers
nnetMdl.add(Dropout(0.3, noise_shape=None, seed=None))
nnetMdl.add(Dense(64, activation = "relu"))
nnetMdl.add(Dropout(0.2, noise_shape=None, seed=None))
          
# output layer
nnetMdl.add(Dense(units=numTars, activation='sigmoid'))
nnetMdl.compile(loss='mean_squared_error', optimizer='adam')

# Provide some info
#from keras.utils import plot_model
#plot_model(nnetMdl, to_file= workDir + 'model.png')
nnetMdl.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_13 (Dense)             (None, 64)                384       
_________________________________________________________________
dropout_9 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_14 (Dense)             (None, 64)                4160      
_________________________________________________________________
dropout_10 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_15 (Dense)             (None, 4)                 260       
=================================================================
Total params: 4,804
Trainable params: 4,804
Non-trainable params: 0
_________________________________________________________________
In [58]:
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Prepare Training (tran) and Validation (test) dataset
varTran, varTest, tarTran, tarTest = train_test_split(varData, tarData, test_size=0.2)

# scale the data
from sklearn import preprocessing
varScal = preprocessing.MinMaxScaler()
varTran = varScal.fit_transform(varTran)
varTest = varScal.transform(varTest)

tarScal = preprocessing.MinMaxScaler()
tarTran = tarScal.fit_transform(tarTran)
In [59]:
# model fit
hist = nnetMdl.fit(varTran, tarTran, epochs=100, batch_size=1000, validation_split=0.1)
tarTemp = nnetMdl.predict(varTest, batch_size=1000)
#predict = tarScal.inverse_transform(tarTemp)
#resRMSE = np.sqrt(mean_squared_error(tarTest, predict))
resRMSE = np.sqrt(mean_squared_error(tarScal.transform(tarTest), tarTemp))
resRMSE
Train on 29943 samples, validate on 3327 samples
Epoch 1/100
29943/29943 [==============================] - 0s 15us/step - loss: 0.0800 - val_loss: 0.0638
Epoch 2/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0578 - val_loss: 0.0457
Epoch 3/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0458 - val_loss: 0.0380
Epoch 4/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0408 - val_loss: 0.0344
Epoch 5/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0378 - val_loss: 0.0317
Epoch 6/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0354 - val_loss: 0.0299
Epoch 7/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0340 - val_loss: 0.0287
Epoch 8/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0327 - val_loss: 0.0276
Epoch 9/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0315 - val_loss: 0.0265
Epoch 10/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0303 - val_loss: 0.0254
Epoch 11/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0293 - val_loss: 0.0244
Epoch 12/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0284 - val_loss: 0.0235
Epoch 13/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0274 - val_loss: 0.0225
Epoch 14/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0266 - val_loss: 0.0215
Epoch 15/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0257 - val_loss: 0.0202
Epoch 16/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0245 - val_loss: 0.0189
Epoch 17/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0235 - val_loss: 0.0175
Epoch 18/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0223 - val_loss: 0.0160
Epoch 19/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0213 - val_loss: 0.0146
Epoch 20/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0202 - val_loss: 0.0135
Epoch 21/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0193 - val_loss: 0.0125
Epoch 22/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0186 - val_loss: 0.0117
Epoch 23/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0179 - val_loss: 0.0109
Epoch 24/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0171 - val_loss: 0.0104
Epoch 25/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0167 - val_loss: 0.0099
Epoch 26/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0163 - val_loss: 0.0094
Epoch 27/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0156 - val_loss: 0.0090
Epoch 28/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0154 - val_loss: 0.0087
Epoch 29/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0151 - val_loss: 0.0084
Epoch 30/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0146 - val_loss: 0.0081
Epoch 31/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0143 - val_loss: 0.0077
Epoch 32/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0139 - val_loss: 0.0074
Epoch 33/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0138 - val_loss: 0.0072
Epoch 34/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0135 - val_loss: 0.0071
Epoch 35/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0132 - val_loss: 0.0069
Epoch 36/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0131 - val_loss: 0.0068
Epoch 37/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0130 - val_loss: 0.0066
Epoch 38/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0129 - val_loss: 0.0065
Epoch 39/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0127 - val_loss: 0.0063
Epoch 40/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0124 - val_loss: 0.0062
Epoch 41/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0122 - val_loss: 0.0061
Epoch 42/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0122 - val_loss: 0.0059
Epoch 43/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0120 - val_loss: 0.0059
Epoch 44/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0119 - val_loss: 0.0057
Epoch 45/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0118 - val_loss: 0.0057
Epoch 46/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0117 - val_loss: 0.0056
Epoch 47/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0115 - val_loss: 0.0055
Epoch 48/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0114 - val_loss: 0.0055
Epoch 49/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0114 - val_loss: 0.0055
Epoch 50/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0112 - val_loss: 0.0053
Epoch 51/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0111 - val_loss: 0.0052
Epoch 52/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0112 - val_loss: 0.0052
Epoch 53/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0111 - val_loss: 0.0052
Epoch 54/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0109 - val_loss: 0.0051
Epoch 55/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0108 - val_loss: 0.0050
Epoch 56/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0107 - val_loss: 0.0049
Epoch 57/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0107 - val_loss: 0.0050
Epoch 58/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0107 - val_loss: 0.0049
Epoch 59/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0106 - val_loss: 0.0048
Epoch 60/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0104 - val_loss: 0.0047
Epoch 61/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0103 - val_loss: 0.0046
Epoch 62/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0102 - val_loss: 0.0046
Epoch 63/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0103 - val_loss: 0.0046
Epoch 64/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0102 - val_loss: 0.0045
Epoch 65/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0101 - val_loss: 0.0044
Epoch 66/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0101 - val_loss: 0.0044
Epoch 67/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0100 - val_loss: 0.0045
Epoch 68/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0098 - val_loss: 0.0043
Epoch 69/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0098 - val_loss: 0.0043
Epoch 70/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0098 - val_loss: 0.0043
Epoch 71/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0097 - val_loss: 0.0042
Epoch 72/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0096 - val_loss: 0.0043
Epoch 73/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0096 - val_loss: 0.0041
Epoch 74/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0094 - val_loss: 0.0042
Epoch 75/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0094 - val_loss: 0.0041
Epoch 76/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0094 - val_loss: 0.0041
Epoch 77/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0093 - val_loss: 0.0040
Epoch 78/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0094 - val_loss: 0.0040
Epoch 79/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0093 - val_loss: 0.0040
Epoch 80/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0092 - val_loss: 0.0039
Epoch 81/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0091 - val_loss: 0.0039
Epoch 82/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0091 - val_loss: 0.0039
Epoch 83/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0091 - val_loss: 0.0039
Epoch 84/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0091 - val_loss: 0.0038
Epoch 85/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0090 - val_loss: 0.0039
Epoch 86/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0089 - val_loss: 0.0038
Epoch 87/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0089 - val_loss: 0.0038
Epoch 88/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0088 - val_loss: 0.0037
Epoch 89/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0088 - val_loss: 0.0037
Epoch 90/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0087 - val_loss: 0.0037
Epoch 91/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0087 - val_loss: 0.0037
Epoch 92/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0086 - val_loss: 0.0036
Epoch 93/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0087 - val_loss: 0.0037
Epoch 94/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0086 - val_loss: 0.0036
Epoch 95/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0084 - val_loss: 0.0036
Epoch 96/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0086 - val_loss: 0.0036
Epoch 97/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0085 - val_loss: 0.0036
Epoch 98/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0085 - val_loss: 0.0035
Epoch 99/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0085 - val_loss: 0.0035
Epoch 100/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0084 - val_loss: 0.0034
Out[59]:
0.0589564154176633
In [60]:
# plot history
plt.plot(hist.history['loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.show()
In [61]:
# Separated Keras' architecture and synopse weight for later Cpp conversion
from keras.models import model_from_json
# serialize model to JSON
nnetMdl_json = nnetMdl.to_json()
with open("COM_nnetMdl_Rev.json", "w") as json_file:
    json_file.write(nnetMdl_json)
# serialize weights to HDF5
nnetMdl.save_weights("COM_nnetMdl_W_Rev.h5")

# save model and architecture to single file
nnetMdl.save(workDir + "COM_nnetMdl_Rev.h5")
print("Saved model to disk")

# also save scaler
from sklearn.externals import joblib
joblib.dump(varScal, workDir + 'Rev_VarScaler.save') 
joblib.dump(tarScal, workDir + 'Rev_TarScaler.save') 
Saved model to disk
Out[61]:
['C:/Temp/WinProj/CTLEMdl/wsp/Rev_TarScaler.save']
In [62]:
# generate prediction
predict = tarScal.inverse_transform(tarTemp)
allData = np.concatenate([varTest, tarTest, predict], axis = 1)
allData.shape
headLst = [varList, tarList, tarList]
headStr = ''.join(str(e) + ',' for e in headLst)
np.savetxt(workDir + 'COMCtleIOP_Rev.csv', allData, delimiter=',', header=headStr)
In [63]:
# plot Gdc
begIndx = 100
endIndx = 150
indxAry = np.arange(0, len(varTest), 1)
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,0][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,0][begIndx:endIndx])
Out[63]:
<matplotlib.collections.PathCollection at 0x242ccefe1d0>
In [64]:
# Plot P1
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,1][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,1][begIndx:endIndx])
Out[64]:
<matplotlib.collections.PathCollection at 0x242ccc7c470>
In [65]:
# Plot P2
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,2][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,2][begIndx:endIndx])
Out[65]:
<matplotlib.collections.PathCollection at 0x242d6f15dd8>
In [66]:
# Plot Z1
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,3][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,3][begIndx:endIndx])
Out[66]:
<matplotlib.collections.PathCollection at 0x242ccbaa978>

It seems this "reversed" neural network also work reasonably well. We will further fine-tune later on.

Deployment:

Now that we have trained model in Keras' .h5 format, we can translate this model into corresponding cpp codes using Keras2Cpp:

Keras2Cpp

Its github repository is here: Keras2Cpp

Resulting file can be compiled together with keras_model.cc, keras_model.h in our AMI library.

Conclusion:

In this post/notebook, we explore the flow to create a neural network based model for CTLE's parameter prediction. Data science techniques have been used. The resulting Keras' model is then converted into C++ code for implementation in our IBIS-AMI library. With this performance based CTLE model, our user can run channel simulation before committing actual silicon design.

IBIS-AMI: Study of DDR Asymmetric Rt/Ft in Existing IBIS-AMI Flow

[This blog post is written in preparation for the presentation of the same title to be given at the 2019 DesignCon IBIS Summit. Presentation slides and audio recording are linked at the bottom of this post.]

This paper is written by both Wei-hsing Huang (principle consultant at SPISim USA) and Wei-kai Shih, who is Tokyo based.

Motivation:

Here in US, one of IBIS committee’s working groups, IBIS-ATM (advanced technology modeling) has regular meeting on Tue. I try to call-in whenever possible to gain insights on upcoming modeling trends. During mid 2018, DDR5 related topics were brought up: Existing AMI reference flow described in the spec. focuses on differential or SERDES. For example, the stimulus waveform is from -0.5 to 0.5 and/or a single impulse response is used for analysis, thus assuming symmetric rise time (Rt) and fall time (Ft) mostly. Whether this reference flow can be applied to DDR, which may have asymmetric Rt/Ft and single-ended like DQ, is the center of discussion. Different EDA companies in this work group have different opinions. Some think the flow can be used directly with minimal change while others think the flow has fundamental shortcomings for DDR. Thing about IBIS spec. change is that whoever think the current version has deficiencies needs to write a “buffer issue resolution document (BIRD)”. Doing so will inevitably disclose some of the trade secrets or expose shortcoming of the the tool. As a result, while there are companies which think change may be needed, no flow change have been proposed at this point. As a model maker, I wonder then how existing flow can be applied to DDR without major change? Thus this study is to demonstrate “one” possible implementation. Existing EDA companies may have more sophisticated algorithms/implementations to support this asymmetric condition, but the existence of “one” such possible flow may convince model makers that it’s time to think about how DDR AMI may be implemented rather than waiting for the unlikely spec. change.

AMI_Init:

There are both “statistical” and “bit-by-bit” flows in channel analysis. In either case, the first step an EDA tool will do before calling AMI model is “channel calibration”. According to the spec. the impulse response of the channel, which includes analog buffer, is obtained here. For a SERDES design which has no asymmetric Rt/Ft issue, this impulse is then sent to TX AMI followed by RX AMI, resulting impulse response is then calculated using probability density function (PDF), integrated to be cumulative density function (CDF), then obtain bathtub plots etc.

The textbook definition of an impulse response is from a “delta response” input which happens at the infinite small time step. In real situation, there is no such thing as an “infinite small time step”. The minimal step used by a simulator is a “time step” which is usually 1ps or more. Buffer will not toggle from low to high back to low in a single time step. So in reality, simulator often uses step response then take derivative to get impulse response. Now the problem comes: for an analog channel with asymmetric Rt/Ft, these two step response (ignoring the sign) are different. That means we will have two different impulse response, then which one should be send to AMI models? A note here up front is that it’s EDA tool which sets up the calibration, so it has any nodal information, such as pad of Tx and Rx analog buffer, if needed.

Asymmetric Rt/Ft:

One may think that there is no such limitation that an AMI model can only be called once. So theoretically, a simulator can run analysis flow twice… impulse calculated from rising step response is used for the first time and the one from falling step response is used for the second time. However, not only is this not efficient, a model may not be implemented properly such that calling AMI_Init again right after AMI_Close may cause crash if it’s in the same process and model pointer was not released completely. Thus doing so may hamper a simulator’s robustness.

As depicted in the picture above… if a simulator uses a long UI pulse to calibrate the channel, then both rising and falling step response are included in one simulation. Now let the data captured at Tx analog pad as X1 and X2 for rising and falling portion respectively, the data captured at Rx analog pad Y1 and Y2 will be X1 and X2 convolved with interconnect’s transfer function, which is LTI. If we derive a Xform(t) which is transfer function between X1 and X2, then that Xform(t) should also be able to transform between Y1 and Y2.  That means if a simulator can calculate Xform(t) it self, then regardless the impulse response it sent to AMI models is calculated from rising or falling step response, it can always “reconstruct” the result from the other type of impulse response using this Xform(t) function.

To prove this concept, we have written a simple matlab script taking step inputs of different slew rate, say inp1 and inp2. It calculates the Xform(t) function from both inputs and then reconstruct the response out2′ from out1. When overlaying nominal output out2 and reconstructed out2′ together, we can see that they match very well, thus prove the concept.

Once we have response from both different slew rates, we can construct their respective eyes then use each one’s different portion to construct a synthesized eye. Such eye will not be symmetric like that calculated from SERDES.

When calculating PDF for asymmetric case, one may also need to consider the precedent bit’s value and use a tree like structure to keep track of possible bit sequence. For example, for a typical SERDES bit sequence, if encoding is not considered, each bit will have 50% one and 50% zero. PDF is constructed based on that assumption. But in an asymmetric case, if the data used at the cursor is from rising response, then the cursor bit must be 1 while (cursor – 1) must be zero. If (cursor – 2) is 1 again, then the tail of falling response at (cursor – 1) will be superimposed to the cursor data. That is, we can’t treat each bit to have same 50% probability when constructing PDF. It’s not a binomial distribution as each occurrence is not independent. A simulator may need to determine the maximum bit length to keep track of first, then based on that depth to form tree-like sequence which leads to the rising or falling steps at the cursor location. Finally use superimpose to construct the overall response.

AMI_GetWave:

According to the reference flow for the bit-by-bit case: equalized Tx output from digital bit sequence is converted with channel’s impulse response. The resulting waveform is then sent to Rx EQ before getting final results. Either Tx EQ or Rx EQ or both may not be LTI so usage of aforementioned Xform(t) is not applicable.

As a fruit of thought… the spec. only mentions that in a bit-by-bit mode, the output of Tx AMI model is equalized digital sequence, while input to the Rx EQ must be the channel response from that sequence, then are there other ways to get such response to Rx yet with different Rt/Ft considered?

One example is like shown in top half of the picture above. If a simulator takes that equalized digital input and “simulate” to get final response, then this “simulation process” should have taken different Rt/Ft into account and has valid results. However, this process will be slow and I don’t think any simulator is doing it this way. Furthermore, the spec. specifically say it needs to “convolve” with impulse response. First of all, this impulse can be from rising or falling. Secondly, even we decide to decovolve with input first (thus has sequences of different delta response) then convolve with pulse response (i.e. one simulated UI), will there be any issue?

From the plot above…  we can see that when a pulse has different rising and falling slew rate, using superimpose to construct 011… will find “glitches” at the trailing high state portion. The severity of this “glitches” depends on how much difference the Rt/Ft is. So using a pulse response here will still not work.

A simple matlab script has also been written to demonstrate occurrence of such “glitches”. This proves that not only using an impulse response to convolve with Tx EQ’s output is problematic, even using a full simulated pulse (which has asymmetric Rt/Ft’s effect) to convolve delta sequences (this delta sequence is original TX EQ’s output deconvolve with one digital bit) will still be problematic. Glitches will happen for consecutive ones or zeros due to the mismatches of Rt and Ft. Thus one must use rise step and fall step response instead when doing such kind of convolution.

Summary:

In this presentation, we discussed how existing AMI flow may be applied to asymmetric Rt/Ft such as those often seen in DDR case. A “smarter” EDA tool should be able to handle this situation without changing on spec.’s reference flow. When a channel analysis is performed in a “statistical” flow, an EDA tool can obtain waveform data at both Tx and Rx analog buffer’s pads during calibration process. Such data can be used to construct a transform function, XForm(t). With this function, impulse response through EQ can be reconstructed and thus built an asymmetric eye. Tree structure may be needed to keep track of possible bit combinations. In a “bit-by-bit” flow, the current spec. may be too specific as it forces to use convolution of TX EQ’s output with channel’s impulse response before sending to RX EQ. Such direct convolution may be problematic. A “smarter” simulator may calculate it using different method without changing data output from TX EQ and input to the RX EQ. Step response should be used as different Rt/Ft will cause “glitches” when consecutive ones/zeros are present if convolution method is used.

Links:

Presentation: [HERE] (http://www.spisim.com/support/paperetc/20180202_DesignConSummit_SPISim.pdf)

Audio recording (English): [HERE]

IBIS-AMI: An end-to-end AMI modeling flow

In previous post, I mentioned about the “IBIS cook-book” as a good reference for the analog portion of the buffer modeling. Unfortunately, when it comes to the equalization part, i.e. AMI, there is no similar counterpart AFAIK. For the AMI modeling, the EQ algorithms need to be realized with algorithms/procedures implemented as spec. compliant APIs and written in C language. These functions then need to be compiled as a dynamic library in either dynamic link libraries (.dll on windows) or “shared objects (.so on linux-like). Different compiler and build tool has different ways to create such files. So it’s fair to say that many of these aspects are actually in the computer science/programming domains which are outside the electrical or modeling scopes. It is unlikely to have a document to detail all these processes step-by-step.

In this post, instead of writing those “programming” details, I would like to give a high-level overview about what different steps of the AMI modeling process are… from end to end.  Briefly, they can be arranged in the following steps based on execution order:

  1. Analog modeling
  2. Prepare collateral
  3. Define architecture
  4. Create models
  5. Model validation
  6. Channel correlation
  7. Documentation

The following sections will describe each part in details.

Analog modeling:

Believe it or not, the first step of AMI modeling is to create proper IBIS models… i.e. its analog portion. This is particular true if circuit being modeled belongs to TX. A TX AMI model is equalizing signals which includes its own analog buffer’s effect measured at the TX pad. So if there is no channel (pass-through) and it’s under nominal loading condition, the analog response of the TX will be the signals to be equalized. That is to say, without knowing what will be equalized (i.e. what the model’s analog behavior is), one can’t calculate the TX AMI model’s EQ parameters.

Take the plot above as an example. This is a FFE EQ circuit. The flat lines indicated by two yellow arrows are different de-emphasis settings, thus controlled by AMI. However, the rising/falling slew rate, wave shape and dc levels etc as circled in red are all analog behaviors. Thus an accurate IBIS model must be created first to establish the base lines for equalization. Recently, BIRD 194 has been proposed to use touch-stone file in lieu of an IBIS model… still the analog model must be there.

For a RX circuit, it may be easier as an input buffer is usually just a ESD clamp or terminator. Thus it doesn’t take much effort to create the IBIS model. Interested people may see my previous posts regarding various IBIS modeling topics.

Prepare collateral:

AMI’s data can be obtained from different sources: circuit simulation, lab/silicon measurement or data sheet. For simulation case, simulation must be done and the resulting waveform’s performance needs to be extracted. These values will serve as a “design targets” based on whitch AMI model’s parameters are being tuned.

For example, this is a typical TX waveform and measured data:

Various curves have been “lined-up” for easy post-processing. Using our VPro, we batch measured the value at the 5.3ns for different curves and created a table:Similarly, data collected from measurement needs to be quantified. This may be done manually and maybe labor intensive as the noise is usually there:

Some of the circuits may have response is in frequency domain. In this case, various points (DC, fundamental freq. 2X fundamental etc) needs to be measured like above.

If it’s from data sheet, then the values are already there yet there may be different ways to realize such performance. For example, equations of different zeros and poles locations may all have same DC gain or gain at particular frequencies, so which one to pick may depending on other factors.

Define architecture:

Based on the collateral and the data sheet, the modeler needs to determine how the AMI models will be built. Usually it should reflect the IC’s design functions so there are not much ambiguity here. For example, if the Rx circuit has DFE/CDR functions, then the AMI models must also contain such modules. On the other hand, some data my be represented in different ways and proper judgement needs to be made. Take this waveform as an example:

It’s already very obvious that it has a FFE with one post-tap. However, since the analog behavior needs to be represented by an IBIS model, then one needs to decide how these different behaviors, boxed in different colors, should be modeled. They can be constructed with several different IBIS models or a single IBIS model yet with some “scaling” block included so that IBIS of similar wave shapes can be squeezed or stretched. For a repeater, oftentimes people only care about what goes into and what comes out of this AMI model. The abilities to “probe” signals between a repeater’s RX and TX may be limited by the capabilities of simulator used. As a result, a modeler may have freedom determining which functions go into Rx and which go to Tx. In some cases, same model yet with different architecture needs to be created to meet different usage scenarios. An example has been discussed in our previous post [HERE]

Create models:

Once architecture is defined, next step is the actual C/C++ implementation. This is where programming part starts. Ideally, building blocks from previous projects are there already or will be created as a module so that they can be reused in the future. Multiple instance of the same models may be loaded together in some cases so the usage of “static” variables or function need to be very careful. Good programming practice comes into play here. I have seen models only work with certain bit-rate and 32 samples per UI. That indicates the model is “hard-coded”… it does not have codes to up-sample or down-sample the data based on the sampling-interval passed in from the API function. Accompanied with writing model’s C codes are unit testing, source revision control, compilations and dependencies check etc. The last one is particular important on linux as if your model relies on some external libraries and it is not linked statically, the same model running fine on developer’s machine will not even pass golden checker at user’s end…. because the library is not available there. Typically one will need to prepare several machines, virtual or not, which are “fresh” from OS installation and are the oldest “distros” one is willing to support. All these are typical software development process being applied toward this AMI modeling scope.

After the binary .dll/.so files are generated, then next step is to assemble a proper .ami files. Depending on parameter types (integer, values, corners etc), different flavors of syntax are available to create such file. In addition, different EDA simulators has different ways to present the parameter selections to its end user. So one may need to choose best syntax so that choices of parameter values will always be selected properly in targeted simulators. For example, if one already select TYP/MIN/MAX corner for the IBIS model, he/she should not have to do so again for the AMI part. It doesn’t make sense at all if a MIN AMI model will be used with MAX corner IBIS model… the corner should be “synchronized”.

Once the model is ready, next step is to tune the parameters so that each of the performance target will be matched. Some interface, such as PCIe, has pre-defined FFE tap weights so there are no ambiguities. In most cases, one need to find the parameter’s values to match measured or simulated performance. Such tasks is very tedious and error prone if doing manually and process like our “AutoTune” will come very handy:

Basically, our tool let user specify matching target and tool will use bisection algorithm to find the tap values. Hundred of cases can be “tuned” in a matter of minutes. In some other cases, grid search may be needed.

Model validation:

Just like traditional IBIS, the first step of model validation is to run it through golden checker. However, one needs to do so on different platforms:

The golden checker didn’t start checking the included AMI binary models until quite recently. Basically it loads the .ibs file, identifies models with AMI functions, then check the .ami file syntax. Finally, the checker will load the associated .dll/.so files. Due to the fact that different OS platform loads binary files differently, that means certain models (e.g. .dll) can only be checked on associated platform (e.g. Windows). That’s why one needs to perform the same check on different platforms to make sure they are all successful. Library dependencies or platform issues can be identified quickly here. However, the golden checker will not drive the binary file. So the functional checks described in next paragraph will be next step.

Typically, an AMI model have several parameters. To validate a model thoroughly, all combinations of these parameters values need to be exercised. We can “parameterize” settings in a .ami file like below:

Here, pattern like %VARIABLE_NAME% is used to create a .ami template. Then our SPIMPro can be used to generate all combinations of possible parameter values and create as a table. There can usually be hundreds or even thousands cases. Similar to the process described in “Systematic approach mentioned in my previous post”, we can then generate corresponding .ami files for all these cases. So there will be hundreds or thousands of them! Next step is to be able to “drive” them and obtain single model’s performance. Depending on the EDA tools, most of them either do not have automation capability to do this in batch mode or may require further programming. In our case, our SPIMPro and SPIVPro have built-in functions to support this sweeping flow in batch mode all in the same environment. SPISimAMI model driver is used extensively here! Once each case’s simulation is done, again one needs to extract the performance then compare with those obtained from raw data and make delta comparison.

A scattering plot like below will quickly indicate which AMI parameter combinations may not work properly in newly created AMI models. In this case, one needs to go back to the modeling stage to check the codes then do this sweep validation all over again.

Channel correlation:

The model validation mentioned in previous section is only for a single model, not the full channel. So one still needs to pick several full channels set-up to fully qualify the models. A caveat of the channel analysis is that it only shows time domain data regardless the flow is “statistical” or “bit-by-bit”, that means it is often not easy to qualify frequency domain component such as CTLE. In this case, a corresponding s-parameter whose Sdd12 (differential input to differential output) is represented by this CTLE AMI settings can be used for an apple-to-apple comparison, like schematic shown below:

Another required step here is to test with different EDA vendor’s tool. This presents another challenge because channel simulator is usually pricey and it’s rarely the case that one company will have all of them (e.g. ADS, HyperLynx, SystemSI, QCD and HSpice etc). Different EDA tools does invoke AMI models differently… for example, some simulator passes absolute path for DLL_Path reserved parameter while others only sent relative path. So without going through this step, it’s difficult to predict what a model will behave on different tools.

Documentation:

Once all these are done, the final step is of course to create an AMI model usage guide together with some sample set-ups. Usually it will starts with IBIS model’s pin model associations and some performance chart, followed by descriptions of different AMI parameters’ meaning and mapping to the data sheet. One may also add extra info. such as alternatives if the user’s EDA tool does not support newer keyword such as Dll_Path, Dll_ID or Supporting_Files etc. Waveform comparison between original data (silicon measurement vs AMI results) should also be included. Finally it will be beneficial to provide instructions on how an example channel using this model can be set-up in popular EDA tools such as ADS, HyperLynx or HSpice.

Summary:

There you have it.. the end-to-end AMI modeling process without touching programming details! Both AMI API and programming languages are moving targets as they both evolve with time. Thus one must continue honing skills and techniques involved to be able to deliver good quality models efficiently and quickly. This is a task which requires disciplines and experience of different domains. After sharing these with you readers, do you still want to do it yourself? 🙂 Happy modeling!

IBIS-AMI: Something about CTLE

Overview:

Continuous time linear equalizer, or CTLE for short, is a commonly used in modern communication channel. In a system where lossy channels are present, a CTLE can often recover signal quality for receiver or down stream continuous signaling. There have been many articles online discussing how a CTLE works theoretically. More thorough technical details are certainly also available in college/graduate level communication/IC design text book. In this blog post, I would like to focus more on its IBIS-AMI modeling aspect from a practical point of view. While not all secret sauce will be revealed here:-), hopefully the points mentioned here will give reader a good staring point in implementing or determining their CTLE/AMI modeling methodologies.

[Credit:] Some of the pictures used in this post are from Professor Sam Palermo’s course webpage linked below. He was also my colleague at Intel previously. (not knowing each other though..)

ECEN 720: High-Speed Links Circuits and Systems

 

What and why CTLE:

The picture above shows two common SERDES channel setups. While the one at the top has a direct connection between Tx and Rx, the bottom one has a “repeater” to cascade up stream and down stream channels together. This “cascading” can be repeated more than once so there maybe more than two channels involved. CTLE may sit inside the Rx of both set-ups or the middle “ReDriver” in the bottom one. In either case, the S-parameter block represents a generalized channel. It may contain passive elements such as package, transmission lines, vias or connectors etc. A characteristic of such channel is that it presents different losses across spectrum, i.e. dispersion.

For example, if we plot these channel’s differential input to differential output, we may see their frequency domain (FD) loss as shown above.

Digital signals being transmitted are more or less like sequence of bit/square pulse. We know that very rich frequency components are generated during its sharp rising/falling transition edges. Ideally, a communication channel to propagate these signals should behave like an (unit-gain) all pass filter. That is, various frequency components of the signal should not be treated equally, otherwise distortion will occur. Such ideal response can be indicated as the green box below:

In reality, such all pass filter does come often. In order to compensate our lossy channels (as indicated by the red box) to be more like the ideal case (green box) as an end result, we need to apply equalization (indicated by blue box). This is why an equalizer is often used… basically it provides a transfer function/frequency response to compensate the lossy channel in order to recover the signal quality. A point worth taken here is that the blue box and red box are “tie” together. So using same equalizer for channels of different losses may cause under or over compensated. That is, an equalizer is related to the channel being compensated. Another point is that CTLE is just a subset of such linear equalizer.

CTLE is a subset of linear equalizer:

A linear equalizer can be implemented in many different ways. For example, a feed-forward equalizer is often used in the Tx side and within DFE:

FFE’s behavior is more or less easier to predict and its AMI implementation is also quite straight forward. For example, a single pre-tap or post-tap’s FFE response can be easily visualized and predicted:

Now, a CTLE is a more “generalized” linear equalizer, so its behavior is usually represented in terms of frequency responses. Thus, to accommodate/compensate channels of different losses, we will have different FD responses for CTLE:

Now that IBIS-AMI modeling for CTLE is of concern, how do we obtain such modeling data for CTLE and how they should be modeled?

Different types of CTLE modeling data:

While CTLE’s behavior can be easily understood in frequency domain, for IBIS-AMI or channel analysis, it eventually needs to come back to time domain (FD) to convolve with inputs. This is because both statistical or bit-by-bit mode of link analysis are in time domain. Thus we have several choice: provide model FD data and have it converted to TD inside the implemented AMI model, or simply provide TD response directly to the model. The benefit of the first approach is that model can perform iFFT based on analysis’ bit rate and sampling rate’s settings. The advantage of the latter one is that the provided TD model can be checked to have good quality and model does not need to do similar iFFT every time simulation starts. Of course, the best implementation, i.e. like us SPISim’s approach, is to support both modes for best flexibility and expandability 🙂

  • Frequency domain data:

Depending on the availability of original EQ design, there are several possibilities for FD data: Synthesized with poles and zeros, extract from S-parameters or AC simulation to extract response.

  1. Poles and Zeros: Given different number of poles, zeros and their locations along with dc boost level, one can synthesize FD response curves:So say if we are given a data sheet which has EQ level of some key frequencies like below: Then one can sweep different number and locations of poles and zeros to obtain matching curves to meet the spec.:Such synthesized curves are well behaved in terms of passivity and causality etc,  and can be extended to covered desired frequency bandwidth.
  2. Extract from S-parameters: Another way to obtain frequency response is from EQ circuit’s existing S-parameter. This will provide best correlation scenarios for generated AMI model because the raw data can serve as a design target. However, there are many intermediate steps one have to perform first. For example, the given s-parameter may be single ended and only with limited frequency range (due to limitation of VNA being used), so if tool like our SPISim’s SPro is used, then one needs to: reording port (from Even-Odd ordering, i.e. 1-3, 2-4 etc to Sequential ordering, i.e. 1, 2 -> 3, 4), then convert to differential/mixed mode, after that extrapolate toward dc and high frequencies (many algorithms can be used and such extrapolation must also abide by physics) and finally extract the only related differential input -> differential output portion data.
  3. AC simulation: This assumes original design is available for AC simulation. Such raw data still needs to be sanity checked in terms of loss and phase change. For example, if gain are not flat toward DC and high-frequency range, then extra fixing may be needed otherwise iFFT results will be spurious.
  • Time domain data: time domain response can be obtained from aforementioned FD data by doing iFFT directly as shown below. It may also be obtained by simulating original EQ circuit in time domain. However, there are still several considerations:
  1. How to do iFFT: padding with zeros or conjugate are usually needed for real data iFFT. If the original FD data is not “clean” in terms of causality, passivity or asymptotic behavior, then they need to be fixed first.
  2. TD simulation: Is simulating impulse response possible? If not, maybe a step response should be performed instead. Then what is the time step or ramp speed to excite input stimuli? Note that during IBIS-AMI’s link analysis, the time step being used there may be different from the one being used here, so how will you scale the data accordingly. Once a step response is available, successive differentiation will produce impulse response with proper scaling.

How to implement CTLE AMI model:

Now that we have data to model, how will they be implemented in C/C++ codes to support AMI API for link analysis is another level of consideration.

  • Decision mechanism: As mentioned previously, a CTLE FD response targets at a channel of certain loss, thus the decision to use appropriate CTLE settings based on that particular channel at hand must either be decided by user or the model itself. While the former (user decision) does not need further explanation, the latter case (model decision, i.e. being adaptive) is a whole different topic and often vendor specific.

Typically, such adaptive mechanism has a pre-sorted CTLE in terms of strength or EQ level, then a figure-of-merit (FOM) needs to be extracted from equalized signal. That is, apply a tentative CTLE to the received data, then calculate such FOM. Then increase or decrease the EQ level by using adjacent CTLE curves and see whether FOM improves. Continue doing so until either selected CTLE “ID” settles or reach the range bounds. This process may be performed across many different cycles until it “stabilized” or being “locked”. Thus, the model may need to go through training period first to determine best CTLE being used during subsequent link analysis.

  • EQ configurations:

So now you have a bunch of settings or data like below, how should you architecture the model properly such that it can be extended in the future with revised CTLE response or allow user to perform corner selections (which essentially adds another dimension):

This is now more in software architecture domain and needs some trade-off considerations. For example, you may want to provide fine grid full spectrum FD/TD response but the data will may become to big. So internal re-sampling may be needed. For FD data, the model may needs to sample to have 2^N points for efficient iFFT. Different corner/parameter selection should not be hard coded in the models because future revised model’s parameter may be different. For external source data, encryption is usually needed to protect the modeling IP. With proper planning, one may reuse same CLTE module in many different design without customization on a case-by-case basis.

  • Correlations:

Finally it’s time to correlate the create CTLE AMI model against original EQ design or its behavioral model. Done properly, you should see signals being “recovered” from left to right below:

However, getting results like this in the first try may be a wishful thinking. In particular, the IBIS-AMI model does not work alone… it needs to work together with associated IBIS model (analog front-end) in most link simulator. So that IBIS model’s parasitics and loading etc will all affect the result. Besides, the high-impedance assumption of the AMI model also means proper termination matching is needed before one can drop them in for direct replacement of existing EQ circuit or behavioral models for correlation.

Summary:

At this point, you may realize that while a CTLE can be easily understood from its theoretic behavior perspective, its implementation to meet IBIS-AMI demands is a different story. We have seen CTLE models made by other vendor not expandable at all such that the client need to scratch the existing ones for minor revised CTLE behavior/settings (also because this particular model maker charges too much, of course). It’s our hope that the learning experience mentioned in this post will provide some guidance or considerations regardless when you decide to deep dive developing your own CTLE IBIS-AMI model, or maybe just delegate such tasks to professional model makers like us 🙂

IBIS/AMI: Equalization in coming DDR standard

Preface:

At the DesignCon IBIS summit this year, the second half of the meeting focused on the trend and possible approaches for equalization modeling for DDR interface. For the past several years, IBIS-AMI modeling and stat-eye like link analysis have been applied widely and successfully on the SERDES interfaces. DDR, on the other hand, didn’t consider EQ much until recent DDR4 3200 or faster or upcoming DDR5/DDR6 standard. Whether AMI like model and SERDES like flow can be applicable to DDR are still topics of discussion, as the AMI spec itself still doesn’t support DDR properly yet. Nevertheless, the trend indicates that EQ model being part of the DDR simulation will be inevitable.

At the summit, representatives from EDA vendors focusing on AMI and VHDL based approaches and a DDR manufacturer have shared their studies. Interested reader may find related presentation at the IBIS website. Following up these discussion, we at SPISim also performed several experiments of AMI on DDR. In this blog post, we would like to share some of the considerations on this topic as well.

Some major differences between SERDES vs DDR:

There are several major differences between SERDES and DDR interfaces which will affect how EQ models being used as part of the simulation:

  • Point-to-point vs Multi-Drop:

A SERDES channel is point-to-point, as shown above. Signals started from a TX propagate through the channel are received by one RX. This Tx-Rx connection may be cascaded in several stages with repeater being used in the middle. A repeater itself contains a Rx and Tx.  Repeater may be required as SERDES interfaces may extend very long…from controller to the edge of the board and beyond to connect to external devices (USB, SATA etc). Nevertheless, a SERDES channel looks like a single long “chain”. Thus the nature of the SEDES is “long” and “lossy”.

DDR, on the other hand, is multi-dropped by nature. There is usually one controller on board but several “DIMM”s connections on the other ends. For example, a typical laptop has two SO-DIMMs at least which has combinations of being soldered on board or plug-able through memory sockets. The desktop or server board will have more DIMMs to allow more installed memory. Depending on it’s dual channel, 3-channel or quad-channel etc, they may come in pairs of 2, 3 or 4 respectively. These memory modules usually do not reside too far away from the controller in order to avoid latency, thus no repeater mechanism is needed. The DDR’s topology presents a “short” yet “reflective” nature due to the impedance change at branch points and different termination within each DIMM modules.

  • Differential vs Single-ended, Embedded clock vs source synchronous:

SERDES interface are differential, that means they are more immune to noises such as voltage droop or ground bounces, as both P and N signals are susceptible to the same effect so the overall noise is cancelled out. That’s why power-aware models starting from IBIS V5.1 are rarely needed for SERDES. DDR, on the other hand, has many single-ended signals. All the DQ byte lanes are singled-ended so power noise is of a major concern.

Another architecture difference is clocking mechanism. SERDES uses embedded clock so clock signals need to be recovered at Rx from encoded bit-stream (e.g. 8b/10b), which is also part of the transmitted data. A CDR is needed to recover such clock signals and it itself is level sensitive/dependent. DDR uses source synchronous so clock is transmitted separately.

  • Operation modes:

For SERDES, there is one direction of signal propagation. Schematic-wise, Tx located at the far left while Rx sit at the far right. Some of the DDR (e.g. DQ) has both read and write modes. Both the controller and memory module can serve as Tx and Rx roles in different modes so the signal is bi-directional. In addition, there are different on-die-termination (ODT) in DDR so the impedance of different DDR module will be different depending on which one is receiving/driving. This “combinatorial” characteristics increases the complexities of EQ optimization as more dimensions need to be swept or analyzed.

Various EQ methods for DDR:

Until recent years, the analysis methods for  DDR and non-DDR interfaces are very similar. Topology (either pre-layout or post-layout) are composed or extracted for spice-like analysis in time domain. Worst case pattern may be decided in advanced or just perform long enough simulation to cover sufficient bit sequences. Time based or related performance parameters are than processed and compared against spec. to determine the channel performance.

With the higher bit-rate and low BER requirement, this approach is no longer valid for SERDES. StatEye like convolution based simulation has replaced spice simulation and EQ modeling are also changed to accommodate this analysis requirement. That is why the AMI are getting popular and important these days. We start seeing EQ in DDR4 3200 and will sure to see that being part of upcoming DDR5 and DDR6 etc. So what are the EQ modules we often used in SERDES can be applied to DDR?

  • FFE: Feed-forward equalizer. It uses various numbers of taps and weight to eliminate or de-emphasize the signals at different UI. As DDR is quite “reflective”, this EQ method should improve the link performance as it can be used to cancel ISI. The challenging part is that FFE tap weight is pre-defined and may not be adaptive during communication.

The screen-cap below show FFE effect on either single-ended (SE) or differential (DP) signals of different tap location.

  • CTLE: Continuous time linear equalizer. This is usually used to amplify signals of a particular frequency and/or provide DC boost. For example, USB3 operates mainly around 5GHz, thus a CTLE of boost at this frequency can help improving lossy channel for better signal quality. CTLE usually resides at the RX side, its another capabilities is to provide DC boost so that voltage swing received can be amplified to meet the eye requirement. Giving a data sheet:

  A set of CTLE curves are often available to boost these performance parameters in frequency domain:

As DDR channel is short but not that lossy, it’s been shown that CTLE is not that useful comparing to its role in SERDES.

  • DFE: Decision feedback equalizer. In SERDES, DFE comes with CDR as a DFE needs clock signals to preform “slicing” for tap adaptation:

While this is another form of ISI cancellation, it can be applied dynamically based on the link condition. So there is a period before the DFE will “lock-in” with stabled tap weights. For this reason, it has similar effect as FFE for a reflective channel yet may be more versatile. However, the DFE itself is non-linear so it can only perform in bit-by-bit mode. In contract, FFE is a FIR and can be used in both statistical and bit-by-bit mode simulation.

As both FFE and DFE show similar effects of ISI cancellation, there may be redundancies if both are used at the same time. Our study validates this assumption in one of the case whose results are shown below:

Both FFE and DFE alone will open the closed eye significantly yet when used both together, the results is not much different from only using one of them. If this is the case for most DDR cases, then the important topics is to perform “sweep” efficiently in order to find out number of taps and weights required in either FFE and DFE module used.

Insufficiencies of current AMI spec for DDR:

As of today, IBIS-AMI being applicable to DDR is still questionable. This is because IBIS-AMI so far is SERDES focused and it’s spec. need to be revised before DDR can be covered. Here are some of these shortcomings that we are aware of:

  • Step response/RF response:

In the spec, the “statistical” simulation flow describes that a channel’s impulse response is sent into Tx in the AMI_Init call. Practically, such impulse response only exists in theory and is not easily obtainable with circuit simulation. Instead, most link analysis use step response then post-process by taking derivative to obtain the impulse response. The assumption of this approach is that channel’s rising and falling transition are symmetric, which is usually not the case for single ended signals such as DDR. Thus in order to perform StatEye like convolution based link analysis more accurately, one may need to forgo the single impulse based statistical analysis flow but resolving to full sequence based (e.g. PRBS sequence) bit-by-bit flow.

  • Clocking:

IBIS spec. assumes the signaling is clock embedded as the only place where clock is mentioned is the function signature of AMI_GetWave:

The usage of this clock_time is “output” from the AMI model. That is, the (RX) model can optionally recover the clock from the “*wave” array then return the clock data back to circuit simulator. As mentioned previously, DDR is source synchronous so a clock signal is already available outside the “*wave” data. In my opinion, this is an easy change as the spec. can simply indicate that the “clock” data can be bi-directional, meaning that simulator may receive clocks elsewhere then pass its data into the AMI model using this clock_time signature while calling the AMI API. Then the RX’s DFE can make use of the pre-determined clocks to perform slicing and tap adaptation. Nevertheless, this clocking difference has not yet been addressed in the spec. as of today.

  • Signaling:

In the IBIS-AMI spec, user will find differential signal being assumed as the description below indicates stimulus is from -0.5 ~ 0.5:

By definition, a LTI EQ model’s transfer function is independent of the inputs being scaled and shifted, thus it basically behaves the same regardless of single-ended (SE) or differential (DP). A NLTV model, like RX DFE, does depend on the proper threshold to determine signal data bits. Thus whether it’s SE or DP does make difference just like whether the encoding scheme is NRZ or PAM4. Besides those descriptions change like shown above, a simulator/link analyzer can theoretically perform signal shift before calling AMI model then restore afterward automatically so that most of the developed AMI model mechanism can still be used. Alternatively, an AMI model can achieve similar effect using a level shifter if the spec. indicates that such adjustment will not be performed by the link simulator.

  The waveform shown above is a 3rd party vendor’s link simulator being applied to DDR analysis using IBIS-AMI. As one can see in the left, the channel characterization shows that the voltage swing of the single ended model is 0.4 volt ranging from 0.95 ~ 1.35. However, the waveform sent to IBIS-AMI models has been re-centered around 0.0 so the swing range is -0.2~0.2. Apparently, what the simulator has done is taking the ground based channel characterization result to convolve with differential stimulus required by the spec. This vendor’s simulator then does smartly restore the output from AMI model back to single ended so the final eye presented are shown at correct voltage level.

Other EQ modeling methods:

In this DesignCon IBIS summit, a VHDL/Verilog-A based modeling approach was also presented to show that with such EQ model/mechanism available, the traditional time-based simulation flow can still be used and for DDR, the million bit simulation (thus requires AMI like models) are not necessarily needed. That paper show comparable results such as eye margin etc obtained by this VHDL based model. Personally, I don’t think this result is viable too much beyond that particular study. The reason is that VHDL/Verilog-A as a modeling language has great limitations when being compared to C/C++ based language which AMI uses. This is particularly true in the following aspects:

  • Libraries: performing numerical computation in C/C++ is quite routine and many libraries have been developed so very rarely one needs to start from scratch. For example, GNU and LAPACK are both widely used as foundations of C/C++ based numerical analysis. Where are these counter parts in VHDL/Verilog-A? Without these, model development beyond simple sequential programming will be almost impossible or too tedius.
  • IP Protection: Compiled C/C++ codes are basically machine codes and can’t be easily de-compiled. This is why AMI is considered as IP protected while IBIS is only abstract to the behavioral level. VHDL/Verilog-A are mostly plain-text based and even when encryption/obfuscation is possible, the model soon becomes vendor simulator specific because only they can decrypt/interpret the scrambled codes. This defeats the purpose of the shareable model.
  • Speed/Flexibility: Interpret language will not be as efficient as compiled ones. While whether there is really needs for a compiled language such as C/C++ can be discussed, VHDL/Verilog-A will still be the less likely choices regardless. In my opinion, a possible direction may be language such as Python because it not only is open sourced, but also supports C API and has rich numerical libraries support (NumPy, SciPy)

The discussion above summarizes my understanding and observation of EQ or AMI’s usage for DDR. While the implementation may not necessarily be the same, I believe we can rest assured that EQ will be part of DDR spec to come. Even if it’s not AMI, it will just be built on top of existing modeling methodologies so far just like AMI stands on the shoulder of transitional IBIS.

IBIS-AMI: Using IBIS-AMI in COM Analysis

[This blog post is written in preparation for the presentation of the same title to be given at the 2018 DesignCon IBIS Summit. Presentation slides and audio recording are linked at the bottom of this post.]

Motivation:

An AMI model is in the binary form of .dll (dynamic link library) or .so (shared object). It itself is not an executable and can’t be used directly. To load or run the AMI models, one needs to have a “driver”. Commercial tools like HSpice has a license required utility called “AMICheck” to test drive the given AMI models with rise/fall/single bit response. We SPISim also provide a free utility called SPISimAMI.exe which does pretty much the same. These small drivers are good when you want to quickly check whether the AMI models at hand are “run-able”. However, to validate or test model’s full function, such a simple tester is often insufficient. In an ideal situation, a link analysis simulator, which will load Tx and Rx AMI models involved and perform calculation/optimization, is preferred as a driver. If a model developer can use IDE to attach to this simulator process and have access to the simulator codes as well, then he/she can set a break point within both simulator and the loaded AMI model to step through and debug during the whole analysis process.

Even if one doesn’t have access to simulator’s codes or debug build, theoretically, an IDE can also “attach” to a process before it loads the AMI dlls in which we have break points set (as a model builder, we have access to the model codes). However, thing is not so straight forward in real world. Most of the EDA tools I have seen allow user to interact different link analysis settings via GUI, then when a “simulate” button is clicked, a separated process is launched/forked and that process will do the work such as characterizing channel, loading AMI models and simulation etc before giving results back to the front-end GUI for further display. It is not easy (if even possible) to automatically attach dll files being debugged to these “spawn/forked” process. No to mention that if both Tx and Rx models are involved in a optimization process (such as back-channel), then simply stopping at a breaking point within one of the AMI models is not enough… one can’t observe and see the interactions for full picture. With these limitations, develop and testing AMI models within a full link analysis flow become challenging.

For a model developer who does not have access to these full link simulator’s sources, open source platform is a direction. There are several ones out there already… PyBert and COM are two such examples. From what I have seen, most of them already have some generic Tx/Rx algorithm blocks in place. So these EQ operating portions may be replaced to support AMI models to meet our needs. Being able to do so will shorten the model design cycle and enable the possibility to develop blocks with more advanced capabilities (such as back-channel communication). As PyBert already has some sort of AMI modeling support, this paper intends to explore possibilities to add similar capabilities in IEEE 802.3 spec. supported channel operating margin (COM) flow.

Background:

Channel Operating Margin (COM) is a ratified IEEE802.3 spec. Interested reader can find an overview slides given by the COM main author (also my former colleague at Intel) linked here: [Channel Operating Margin Tutorial] More detailed technical details are available in the IEEE 802.3BJ spec document and Richard’s 2013 DesignCon paper of same title. Further more, its matlab source codes are also available at the 802.3 website.

Given such technical depth like COM’s, to describe it in several paragraphs in this post will not be meaningful. So I will try to just give an overview from AMI builder’s perspective and help reader to see how AMI models can be plugged-in to the flow.

COM’s reference model is shown above. The upper half of the right side represent the through inter-symbol-interference (ISI) channel and the lower half is for the crosstalk (XTK), which can be near end, far end or both. Simply put, COM is an evaluation of signal to noise ratio for the full system. Most of the noise terms, such as mentioned ISI, XTK, jitters etc have all been taken into account. The signal part is the peak of the single bit response (SBR, i.e. pulse response). COM itself has published algorithms for many different blocks above and also interface specific default parameters for different 803.2 interfaces. EQ portion such as FFE in Tx, CTLE in Rx and even DFE are also implemented.

For a SERDES designer or AMI model builder, channel S-param (with or without package portion) is assumed given and COM flow will select best selection of FFE tap weights, CTLE pole/zero location and DFE tap weights as well. The searching flow for these parameters are exhaustive… full combination of FFE taps and CTLE dc gains are used to apply for toward the channel. A figure of merit (FOM) is then calculated for each combination. Best case is then decided based on the FOM value. Once EQ settings have been decided, then a SBR is formed and a full blown BER like analysis is applied with DFE involved to calculate final COM value.

For a link analysis flow, the first step is to “characterize” the channel, i.e. obtain impulse response. There are many devil’s details behind this step… single-ended s-param may be need to converted to mixed mode, package models of different sections needs to be cascaded, and finally the cascaded s-param needs to be “conditioned” before doing IFFT (not using an IBIS model or analog front-end in COM). All of these are important yet may be out of an AMI model builder’s direct concern… then just want this channel to “work”. Fortunately, these steps have all been included in COM flow already and can be used as they are.

Regarding Tx and Rx EQ, original COM implementation (circa. 2014) only supports one FFE pre-tap and one post-tap for TX. Recently, it have been extended to support two pre-tap and three post-taps. For CTLE, two poles and one zero equation is used and user can only sweep DC gain. The analysis flow is very similar to what’s described in IBIS spec section 10.2 but only with LTI assumption. That is, impulse response obtained from conditioned S-param is sent to Tx EQ, then pass through Rx CTLE before further processing. DFE taps are not optimized within each iteration of FOM calculation, it’s calculated only after optimized FFE + CTLE settings have been found.

As mentioned previously, the searching algorithm of these EQ is exhaustive. So if one open the published COM matlab codes, he/she will find the multi-level loops for different Tx EQ taps and Rx CTLE Gdc settings as shown above. To replace these generic EQ functions with our AMI models, codes need to be changed here.

Using AMI in COM flow:

To use AMI model in a COM flow, one need to replace collect the replace these FFE and CTLE calls in the COM codes with the corresponding AMI model invocation. Here shows two possible modifications routes:

  • LTI (Linear, time invariant) design: As COM flow use impulse response by default, it’s easier to plug-in LTI AMI model (i.e. models which don’t use AMI_GetWave to process data) directly.

The first step is to “combine” or “collapse” those multi-level loops into single loop. This single loop can be iteration which go through an array which contains all the AMI parameters combinations to be tried (may not be exhaustive) or has a “stopping-criteria” which will “break” the loop such as optimization within this single loop has reached solution. Tx and Rx may not be FFE/CTLE respectively or can have different format (for example, CTLE can iteration list of frequency response curves rather than pole/zero data). For the later case (optimization), Tx and RX can be calculated together if needed. original COM’s package length and DFE can still be used to calculate FOM of different condition if needed.

  • NLTV (Non-linear, Time variant) design: In this case, a PRBS like bit-pattern is needed first in order to convolve with the channel’s impulse response. Bit-stream response is then formed to feed into model’s AMI_GetWave function within each loop. Just like what’s described in IBIS’s spec, Tx and Rx’s GetWave functions are called sequentially and model’s DFE and FOM function (not COM’s) may be used at the end to decide when to finish the iteration.

  Regarding implementation details, as COM was originally written in matlab, so matlab’s corresponding mechanism to load and call external DLL functions need to be used to replace original FFE/CTLE function call. Basically (as shown in the right part of the picture above), mex -setup needs to be called to determine which IDE environment is installed in the working computer. A header file which include the definitions of the AMI API function is also needed. Then the following functions are called in sequence:

  • Load AMI model using: load(‘XXXXXX.dll’, ‘ami.h’)
  • check libisloaded(‘XXXXXX’) and list functions in the library using libfunctions(XXXXXX’)
  • Call AMI library function using calllib(‘XXXXXX.dll’, ‘ami_init’, htInput, rowSize…)
  • Finally unloadlibrary(‘XXXXXX’)

Also worth mentioned is that if we are doing this for AMI models being developed, not a generalized AMI-capable link simulator, then parser for .ami to form parameter tree is not necessarily needed to form argument passing into ami_init functions etc. We can form a string of parsed “key-value” pairs in advance manually and pass into AMI function. Other open platform like PyBert does have AMI parser built-in for its AMI capabilities.

Results:

In our experiment, we want to avoid the multi-level loops for all possible FFE tap weight combinations by using our AMI FFE model capable of self-optimization. The concept is simple: if we already have an channel’s impulse response, then the optimal weight to obtain same output as input (recover signal) in the minimum mean-squared error sense can be solved by using pseudo-inverse and linear algebra technique. We want to validate this approach work and can find similar (if not same) solution comparing to full exhaustive search.

Result is shown above. Red dot represents original COM’s sweeping results (FOM value). There are 13 Gdc values each with 24 one pre-tap and one-post tap combination possible… so total 312 run is needed. Blue dots are our AMI results… since we still use COM’s CTLE, so 13 run is performed. However, for each Gdc run, AMI model computes only once based on the self-optimization algorithm mentioned and finally report best results together with best CTLE Gdc. As seen that blue dots are almost at the top of all 13 original ” COM chunks”, we validate that this algorithm/our Optimization-capable FFE does work.

Summary:

To summarize this study, first we want to emphasize that for a model developer, who can be an individual model provider or a SERDES designer being asked to develop AMI models, a full flow capable of being used to debug AMI model being developed is needed. This can’t be covered by the simple utility driver particular when optimization such as back-channel come into play.

To meet our needs, open-source link-analysis platform is worth considering. In particular, COM flow of IEEE802.3 is attractive because it’s been ratified, well documented, widely used and support BER-like flow with source codes. While its Tx and Rx block functions may be generic, it’s not difficult to replace those function calls with our own AMI models’ API functions in either LTI or NLTV scenarios. This process not only help shortening model development cycles, but also is very beneficial in further understanding how link analysis is actually performed.

Links:

Presentation: [HERE] (http://www.spisim.com/support/paperetc/20180202_DesignConSummit_SPISim.pdf)

Audio recording (English): [HERE]

IBIS-AMI: “Paring-off” of the models in three scenarios

Here at SPISim, we provide AMI modeling service in addition to the developed EDA tools. Companies interested in AMI service are mostly IC/IP vendors. They need to provide AMI model in particular so that their user, system companies which use their ICs, can perform the design and analysis. Oftentimes, our client is only interested in AMI modeling for either Tx or Rx circuit because that’s where their IC design resides. However, they also often found later on that in many scenarios, it actually takes “two” models to be useful for their users. In this post, I will talk about some of such “paring-off” cases so that interested reader may be well informed when such modeling needs arise.

Tx and Rx:

The topology shown above is most commonly used for link analysis. At the Tx and Rx ends, user often is asked to specify corresponding IBIS models. Most definitely, they need to specify AMI model and associated AMI parameters. The interconnect in the middle is generalized here… it can be Tx package, main route, connector and Rx package etc represented in either separate models or cascaded S-parameter. The whole purpose of interconnect here is to provide an impulse response so that when link simulator is operating in either “Statistical” or “Bit-By-Bit” modes, AMI models at the Tx and Rx ends will be able to process accordingly. The analysis flow an AMI-compatible simulator should follow is described in details in section 10.2 of the IBIS spec. User should find that in those descriptions, and almost all the tutorial documents on-line such as the one below from KeySight, both Tx and Rx AMI models are both required to be part of the analysis.

Thus, the first “paring-off” in AMI modelings is each Tx needs an Rx AMI models to work… at least mostly. From this perspective, this is very different from traditional spice-based simulation, where you may have Tx/Rx part in either IBIS, transistor circuit or behavior/Verilog-A modeling format and can simulate without any issue.

So what if you have only either Tx or Rx AMI model? It will be tricky (depends on the simulator) if not possible in many cases. So we here at SPISim often have to provide a “dummy” AMI model in additional to the one ordered. As a “dummy” model, it is a simple pass-through without doing anything. When combined with pass-through S-param or interconnect, it will be very straight forward and clear to test and validate delivered AMI model’s performance. When all the spec. are met (e.g. EQ levels), then the user can add real interconnect and Tx/Rx AMI models from other vendors to proceed the design process.

(A pass-through “Dummy” AMI model implements AMI API but without any implementation body or just don’t change those variables passed by reference)

Repeater:

IBIS version 6.0 includes technical advances such as mid-channel repeaters documented in BIRD 156.3. Repeater is often used in longer SERDES channel such as PCIe or HDMI etc. There are two types of repeaters: redriver and retimer.

  • Redriver: A redriver performs signal conditioning through equalization, providing compensation for input channel loss from deterministic jitter such as inter-symbol interference.
  • Retimer: A retimer is a mixed-signal device that includes equalization functions plus a clock data recovery (CDR) function to compensate both deterministic and random jitter, and in turn transmit a clean signal downstream.

It’s easy to see the “paring” part within a repeater:

Within a repeater, Rx sits at the front to receive signals from upstream, then pass to Tx part of the repeater to transmit again toward the downstream. So one really can’t work without the other. Plus also “paired” Tx AMI at upstream and final Rx AMI at the end of the downstream, it takes at least four AMI models in most cases to start the analysis. More than one repeaters are also allowed so the number of “paired” AMIs can grow. The proper analysis sequence when repeater(s) is involved is also documented in 10.7 of the spec.

So far as repeater being an “electrical” component is concerned, there may be no such needs to probe signals between Rx and Tx within a repeater. However, when we model an optical link as a repeater (shown below from Agilent slides), then thing becomes much more interesting:

We handled such a case just recently… an optical transceiver used in an (electrical) system may be modeled as a repeater. Having that said, please note that in “optical” world, Tx is now again at the front of the repeater as laser is used to light the optical fiber to transmit signal toward Rx (now the 2nd part of the repeater). So a potential confusion may happen here when optical and electrical worlds meet. In addition, there might be two other considerations:

  1. There may be needs to probe signal between optical’s Tx and Rx ends, which can be single-ended-like (optical pulse in mili-watt)
  2. Users of such optical module based repeater may only focus on design at the upstream side, the downstream side or both. In the former two situations, they may not have AMI models at the other side.

The first one above is tricky to solve… mainly due to the non-differential nature of signals in a link simulator. Until now, if one reads the AMI portion of the IBIS spec, he/she should notice that the signal mentioned are always differential. So while there is no problem to create an AMI model handling single ended signals, how the probing/analysis goes really depends on simulator being used. Having that said, with DDR adopting AMI methodology, the single-ended signal support should happen very soon.

To accommodate needs of different possible users of our clients, we make use of the “pass-through” dummy model again and proposed the following different AMI models combinations:

In usage scenario 1: The modeled Tx and Rx optical ends (each have their own circuitry) are assembled accordingly. Thus if a simulator can handle “single-ended-like” optical signal in between, then this model can be used directly.

In usage scenario 2: The signal between optical ends within a redriver is not of concern and the support of single-ended-like signal is uncertain. In this case, we “lumped” the TxRx together and put it at the front of the redriver. The second half of the redriver is a simple pass-through. As input and output from the first part are always differential, it will meet the expectation of the simulation flow documented in the spec.

In usage scenario 3: we assume the user is only interested in the downstream end and have only Rx AMI model. In such a case, a repeater will not work because it’s lack of the up-stream. So we combined the created TxRx here and make it a transmitter AMI.

In usage scenario 4: we assume the user is only interested in the upstream portion and only have Tx model. Similar to scenario 3, we combine the TxRx of the optical models and make it a receiver AMI.

In the “paring-off” for a repeater, there are two points worth mentioning:

  • Unless DFE/CDR are being modeled and thus retimer, distinction between Tx and Rx EQ are often blur particular when they both are LTI. For example, either Tx or Rx can have a FFE EQ and CTLE of a Rx is also just a passive filter. In such case, a model can be used as either a Tx or Rx (like scenario 3 and 4). The only thing needs to pay attention to is the jitters assignment as AMI parameters for Tx and Rx’s jitters are different.
  • We want to cover all usage scenarios and make our client happy. However, we also don’t want to over burden SPISim engineers too much in creating too many different AMI models. Thus as mentioned in previous post, the architecture of AMI modeling becomes important. When done properly.. (such as in our cases 🙂 ), we can simply cascade stages easily at the .ami file level without even rewriting C/C++ codes or any re-compilation.

IBIS-AMI and IBIS:

When we focus on more technically challenging AMI models, we often forgot still important role a traditional IBIS model will play in link analysis. Simply put, the accuracy of an IBIS model still impacts the final results (e.g. BER eye) greatly.

The normal process of the link analysis starts with “link characterization”. Mostly this involves a time domain simulation of driver driving through interconnect with receiver analog front-end (no EQ) connected to obtain the pad to pad step response. Then this step response is differentiated to obtain the impulse response. Finally AMI models are brought into play by convolving with impulse response (LTI or statistics mode) or processing with bit response sequence (NLTV or bit-by-bit mode). In this characterization stage, no EQ (i.e. no AMI) is involved. It’s is their “analog front-end”, i,.e. IBIS models or corresponding spice/behavioral models being used. Thus, the accurate modeling of this portion does impact the processed link analysis results.

In the picture above, eye plots of identical AMI model yet each with different IBIS models are shown. It’ clearly seen how IBIS’s VT waveform and IV curves affect the eye openings width and height. Thus the third scenarios we want to emphasis of IBIS/AMI “paring-off” is the AMI model and their corresponding IBIS model.

SPISim is a relative small EDA/consulting company comparing to others such as Agilent, Cadence or Mathsoft (matlab). These companies each have their own AMI modeling solutions and cost much much higher when comparing to our offerings. However, when we look into their AMI product details, it’s not easy for us to find (or maybe even not there?) how the IBIS portion will be handled. As an AMI model is mostly used for SERDES application, differential or current-mode-logic (CML) design is often involved. If one reads chapter 4 of IBIS cookbook carefully (available at the IBIS website), he/she should find that the process of differential IBIS modeling is more complicated than the simple single-ended IBIS buffer. Thus when considering the importance of “paring” between AMI and IBIS, one should really take this into account. An AMI model without proper analog front end will definitely come back to haunt you during validation stage. Interested reader regarding differential IBIS modeling may also refer to our previous post or our 2016 Asian IBIS summit paper.

IBIS-AMI: AMI Modeling Using Scripts and Spice Models

Preface:

This blog post is written in preparation for the upcoming IBIS summits at EPEPS (San Jose, Oct/18/2017), Shanghai (Nov/13/2017) and Taipei (Nov/15/2017), where I will present paper of the same topic. Slides, example models and audio recording will be made available below:

Motivation:

Many years ago when I entered the signal integrity field, we analyzed the channel by performing spice-like simulation for several hundred nano-seconds at most. Post-process were then done to get FOM metrics. At that time, the bus speed was barely around 1Gbps. These days, high speed-IO SERDEs are common among various computing devices and their speed reaches multi-Gbs or higher easily. Not to mention the several new 802.3 network protocols which have even higher speed (50G~ >100GB). With such high data rate, one needs to “simulate” number of bits at 1E12 level to reach certain bit-error-rate (BER, say 1E-12) with certain confidence level (CI, say 99%) . As such, traditional SI analysis method is no longer valid because it is simply inpossible to simulate so many bits in reasonable time using spice-based simulator. A new channel analysis methodology, like link analysis, is thus needed and invented around year 2003 to address this problem (e.g. StatEye). For link analysis, traditional buffer models such as IBIS are not much useful as they mostly time-domain based. Algorithmic modeling interface (AMI, a subset of IBIS) models are used mostly instead.

AMI modeling is very technically challenging, it requires cross domain expertise such as simulation, modeling and C/C++ programming across different OSes and platforms. Thus it usually takes much longer for an engineer to ramp up to be able to develop and deliver a model when comparing to traditional IBIS. Two of the big hurdles which cause this slow ramp-up for AMI modeling are the requirements to express the circuit’s behaviors in C/C++ language and then be compiled according to Spec’s API requirements. To lower such barriers, we are asked often: 1. Can we create an AMI model using scripting languages? 2. Can I simulate existing spice models using link simulator before committing to develop a full blown C/C++ version?

We propose approaches to meet these two common requests in this presentation.

Background:

Channel analysis: Nowadays the high speed link analysis most definitely includes stages such as Tx/Rx EQ, which are beyond traditional IBIS. Equalization is needed to compensate channel noise such as inter-symbol interference (self-channel interference) and crosstalk (co-channel interference). These EQ stages can open “eye” from a closed one of a noisy channel, represented by S-parameter interconnect.

There are two analysis methodologies for modern link analysis:

  • Statistical: If the circuit is linear time-invariant (LTI), one can obtain many information about channel’s limit by using a single pulse or impulse response. In this flow, a channel is assumed LTI (s-parameter may need to be enforced/fixed in terms of causality, passivity etc first). Its impulse response is then “fed” into Tx/Rx circuitry to obtain the response. By using superimpose (or superposition), response of the channel + circuitry of  different UIs are added together and probability of various BER level can be computed from there.
  • Bit-by-bit: If a circuit is non-linear time variant (NLTV), then such superposition is not allowed. In that case, a bit-stream may be fed into Tx/Rx circuitry by link tool/simulator to obtain their continuous time-domain response. These outputs are then “convoluted” with LTI channel to obtain overall channel response. In order to do this for many millions of bits, some assumption needs to be made (high-z, to be discussed later) in order to gain speed performance when comparing to same time-domain spice-like nodal analysis. Also, the link tool may break bits to several chunks and feed to Tx/Rx separately before combining them together, with “aliasing” of adjacent chunks of bits being taken care of properly at the link tool level.

AMI models support both of these two channel analysis methodologies.

AMI Model: an IBIS-AMI model contains several parts:

  •  .ibs file: In the .ibs file, there is a section called “Algorithmic Model” which points to the paths where the .ami and .dll/.so files reside. This keyword block also provides info such as bit, OS platform and the compiler used to generate the .dll/.so files. Other than these info, the .ibs file and AMI model it points to are basically independent. Further more, in the link analysis, traditional IBIS part are often considered “analog front end” and is “absorbed” into either the AMI portion or the channel portion of the data.
  • .dll/.so file: This is compiled binary format. The language MUST be plain old C and it must be compatible defined AMI API spec. in order to be able to loaded by the link simulator.
  • .ami file: This is the plain text file which contains “config. settings” for the binary .dll/.so files. As shown in the picture below, it has two main sections:

  • Reserved Parameters: This block is usually at the top of the .ami file. These settings are for link simulator only. Depending on part of the settings in this block, such as “Resolve_Exists” or “GetWave_Exists” set to true or false, the associated API functions are invoked by the link tool.
  • Model Specific: This block is usually at the bottom of the .ami file. It contains AMI model developed defined parameters which link tool and API spec does not interfere with. While there are many “text” in this block, when the .dll/.so portion of the AMI model received their info. passed by the link tool, they have been “filtered” and converted to simple “name-value” pair as shown above. So while the depth of the “tree structure” for this model specific section can have many levels, the parameters received inside the model can be just two levels at its simplest form with model name (RX_model above) at the root and the other name-value pairs as “leaves”.

AMI-API functions: As of IBIS Spec. V6.1, there are up to five API functions can be used in a compiled AMI model:

Among which, AMI_Resolve and AMI_Resolve_Close are for string parameter pre-processing which can be ingored in a lot of cases. AMI_Close is like garbage collection/clean-up to release allocated memory, so is trivial is most cases as well. Modern OS may reclaim memory space back even one does not do any “Free” or “Delete” there in AMI_Close. Two most important ones, AMI_Init and AMI_GetWave, are marked in red.  In particular, they participate in the aforementioned “Statistical (for LTI)”  and “Time-domain bit-by-bit (for NLTV)” models. That is, for a LTI model, its AMI model must/should implement the AMI_Init function call. A LTI model can also implement AMI_GetWave function call but this is optional. On the other hand, a NLTV model must implement AMI_GetWave function while implementing AMI_Init as “initialization” rather than “computing” portion of the codes. The bit-by-bit convolution part of a NLTV model should be implemented in the AMI_GetWave part of the codes.

When looking at the function declaration part of the spec, as boxed in red in right part of the image above, one should also realized that the first arguments (an array represented by double pointer) serves as both input and output purpose. These are “pass by reference” arguments as they are pointer. So at the beginning of the AMI_Init/AMI_GetWave calls, the model can obtain either impulse response of the channel or digital bit sequence from the simulator via this pointer. Then the model perform necessary computing using info from the rest of the arguments (some of them also serve as output purpose, such as char **msg, but is not that important in this context). At the end of the computation of this function call, the modified response must be filled in back to the address where the first arguments points to, so that the link simulator will retrieve the values and carry on the rest of the analysis.

AMI Modeling Flow:

A typical AMI modeling flow involves the following steps:

  1. Identify the behaviors of circuits being modeled. As we are going to use a computer language to describe the model’s behavior, we must know how it works first. These behaviors can be obtained via either mathematical derivation, simulation results or measurements. In the last two cases, a look-up table may be used inside the model.
  2. Code the behavior and IBIS-AMI API: There are two parts of this section. The API part MUST be implemented in C (not even C++). The other part of the codes can be in any language, shape or form as long as the developed C codes know how to communicate and exchange the data. That is, the actual computing part, can be in language other than C/C++ if you like or even be completed via circuit simulation.
  3. Compile and link as .dll/.so: This is the compilation of the strict C portion of the API. In windows, one usually needs to compile for both 32 and 64-bit. On Linux, on top of different bits, one should also test on various distros (debian or red-hat based) as they may use different version of GNU C (and thus support different version of C spec. e.g. C99 (1999) or C11 (2011)

Now let’s talk more about item 2 above. There are many considerations on how you should code this part. A specifically C/C++ coded model for one circuit will mostly run very fast. However, it may requires frequent re-compilation/re-testing when new design comes. If we can make this part as simple as possible and non design specific, such as calling external scripts, then this work may only need to be done once as all the variation are now external to the compiled .dll/.so.

Modeling with Scripts:

Knowing the requirements of the AMI API, we can now propose a flow to create an AMI model using user’s favorite scripting language:

Flow:

To support script based AMI modeling, we need to have a thin API implementation (in C, as required by AMI Spec) whose sole task is to translate all the received arguments into a text format and write as a text file. It will then pass the path of the text file to user defined scripts or batch file via system call. Location and type of the script is defined in advance in plain-text .ami file. The script needs to retrieve the argument information first by parsing the plain text file, perform necessary computing, then write into another or same text file which this AMI model with thin layer knows where to find. So when the script completes, the AMI model will parse the text file generated by the script, fill the information back to the aforementioned “passed by reference” pointer array, then complete this step. As discussed, whether the scripts is needed for either AMI_Init or AMI_GetWvae or both are pre-defined based on circuit behavior. And since this is developer chosen favorite language, parsing and writing to text file should not be an issue when comparing to say C language. Lastly, should there be any information need to be passed between API calls (such as model member’s values between AMI_Init and AMI_GetWave), they can also be file based. To summarize, the AMI model with thin layer completes the API calls with upper simulator like other regular AMI models. However, its “transactions” with underlying user’s scripts are all file based.

Example:

A matlab example of the AMI_Init is shown above. In the matlab codes, it first calles parseInput function, which is a text file prepared by the thin AMI model and contains input waveform. It then performs computation such as convolve with FFE, then the result matrix is written back to the text file via storeOutput function call which thin AMI model know where to find. Since matlab’s “conv” function is used directly, the model developer does not need to deal with c-based implementation details such as memory allocation FFT/iFFT in some cases or other math library linking/compilation.

Considerations:

While script based AMI development is simple and handy, there are several considerations before deciding to release such models:

  • Performance and distributional: Since all communications between thin AMI model and user’s script are file based, it inevitably will suffer some performance issue. If this is AMI_Init, it’s only called once by simulator during analysis and such performance penalty is less of a concern. Next, one must also consider the how the mode can be distributed? If the script is in matlab .m file, then model clients need to have matlab environment installed as well. If it’s in compiled matlab, then client needs to install matlab compiled runtime (MCR). If the scripting language is in perl, then perl interpreter, which is usually installed by default on linux but not Windows, is needed. To distribute such interpreter, one must also check the license terms and then also think about the elegance of such model release.
  • Consider Python!: Python is a very worthy candidate here because it has rich math or matlab like libraries such as SciPy or NumPy. More importantly, there is a mechanism called “embedded python” in which the whole python interpreters together with math libraries used can be bundled and distributed in a single zip file. That is, the end user does not need to install python environment first as the thin AMI model already linked with C-Python and can find all required functions in either user’s scripts or bundled zip file.

Modeling with Spice circuits:

Now that we know a thin layer can call external script either directly or via its interpreter, we may also come to the conclusion that it can also call external program such as a circuit simulator. This is of course true!.

An assumption we mentioned at the beginning of the post is the “High-Z” condition. In a typical spice-like nodal analysis, there are many “Newton-Raphson” iteration going on within same time step. At the beginning of each Newton iteration, tentative voltages are given at each node. Each components then compute the drawing or output current into these nodes based on these voltages and its own behaviors. At the end of the iteration, circuit simulator solves the system matrix to see whether KCL/KVL reaches balance and then determine either another Newton iteration is needed or it can march into next time step.

In channel simulation, such iteration is not needed as there is a “High-Z” assumption… and that’s why it can run much faster than nodal spice simulation. In High-Z assumption, each blocks is assume to have high input impedance and output impedance so it will not draw any current at the input and the output is set once determined. Since the thin layer AMI model can obtain the inputs from simulator via API call, if it can perform another task… such as convert this inputs input PWL source with time step equal to UI/number_sample_per_UI (both are know and passed by the simulator), then it can theoretically call a simulator to drive user provided spice subckt like above. Note that the input waveform is just voltage which represents potential difference of two nodes. So there is no reference to GND at all. It’s subject to user’s spice circuit to determine what the reference is and provide GND reference if needed.

Flow:

The picture above shows the flow to simulate/model AMI with spice subcircuit. It is very similar to the flow for AMi with scripting language. First the thin layer AMI model need to generate a PWL source dynamically based on the provided inputs. Then form (either write out as a file or internally in  memory) a netlist as a driving circuit and probe at the output. This driving circuit will use user provided spice subckt with possible value overrides defined at the .ami file. Then thin layer AMI will call external spice simulator (or internal API) to perform nodal based simulation. The output (like .tr0 for HSpice) is then processed and its value is again filled back to the initial API pointer array to return back to upper circuit simulator.

Example:

An example is shown above. The template is pre-defined with the PWL source and path to user’s spice circuit being left to be filled-in. The thin AMI generates the netlist with all values filled properly upon being called by simulator. It then call external simulator to do nodal simulation. Resulting waveform are post-processed and filled-in to the API argument memory address and complete this API call.

Considerations:

Similar to the AMI modeling with script language case, there are several considerations when adopting the spice-based AMI modeling approach. First is the performance… as each AMI API call involves nodal simulator initialization (allocate matrix, solve for DC, Newton iteration between each time sample etc), it will be significantly slower than pure C implementation. However, this is less of a concern if only AMI_Init is needed as it’s called only once. More over, one does not need derive any equation or do any coding at all and can get accurate link performance using this Tx/Rx circuitry directly… so development time is saved significantly there. If one decides to implement such block in C/C++, then simulation results obtained during this process can serve as a very good reference or correlation data for C/C++ based model to be developed.

Another consideration is again the distribuability: If this spice model has particular MOSFET model and requires say HSpice, then the AMI model recipient also needs to have HSpice in their environment in order to run. While commercial simulator like HSpice may not provide API or serve as shared library, many open source ones do. Examples are NgSpice and/or QUCS. In these case, the compiled thin AMI model is basically a simulator in .dll/.so form and can perform simulation all by itself. The binary size is around 8MB larger then without as it also needs to link with all simulator supported device models as well.

Summary:

Using either scripts or existing spice circuits for AMI modeling is actually doable. The presentation I give here is not just talk on paper. The implemented “thin-layer AMI” and examples are also provided together with the slides. These flows can be considered as part of the AMI development process as they can shorten the modeling cycle significantly while providing data for correlation should one decide to go full C/C++ implementation at the later stage. The consideration points includes performance, elegance of released models and distributability of either the script’s interpreter or simulator. Also a thin layer AMI models is needed. This thin layer API is called Proxy model in computing science terms. As a matter of fact, SPISim has implemented such proxy models and made them available for public to use free of charge. [Link Here] Having that said, this API model only needs to be done once as all the model variation are located externally in user scripts/spice models. and thus require no re-compilation when design changes. Nevertheless, these two possible modeling approaches provide an AMI model developer alternative ways to decide on how a model can be developed more efficiently and effectively.

IBIS-AMI: A case study

IBIS-AMI modeling is a task usually executed at the end of IP development process. That is, hardward IPs are created first, then associated AMI models are developed and released to be accompany with this hardware IP since the IPs vendor usually does not want to expose the design details to the their customers. On the other hand, it is also likely the case that a user may have received behavioral or encrypted spice models first, or even obtain measurement data from the lab, then would like to create corresponding AMI models for channel analysis. Such needs arise usually because either original IP vendor does not have AMI model or may charge too much. It is also possible that end user would like to explore different design parameters before determining which IP/part to use.

In this post, we would like to discuss how such an usage demands can be accomplished with SPISim’s AMI flow without any C/C++ coding or compilation for AMI. Here are the topics for this case study:

  • Collateral
  • Design spec. and modeling goals
  • Modeling process
  • Model release
  • Other thoughts

Collateral:

For this design, we received a testbench for a typical SERDES channel using this IP. The schematic is shown below:

Both Tx and Rx are encrypted hspice models with adjustable parameters. When picking one set of parameters to test run the channel test bench, we get valid simulation results.

As this is a hspice netlist, we have the freedom to probe different points and change the simulation options as needed (for example, from transient to ac or transfer-function).

Design Spec. and modeling goals:

The original IP vendor only provides simple descriptions for various blocks and design parameters. The Tx and Rx packages are spice sub-circuits and channel is a s-parameter file. Both Tx and Rx model has typ/min/max corners. On the Tx side, there are settings for voltage supply which is independent with the corner settings, a 7-bit resolutions for a multiplier to work with the swing amplitude, a 6-bit resolution for a de-emphasis and a flag for turning-on/off the boost.

With these IP design spec, the modeling goal is to create associate AMI models which will allow end users to fine tune performance with similar parameters like original IP. Further more, SPIPro is used for its spec. AMI model so that model developer should not need to write any C/C++ codes or perform any .dll/.so compilation for this AMI modeling.

Modeling process:

If one needs to create AMI models from simulation or measurement results, that usually means he/she does not have original design details or even design spec. available. So the so called “top-down” approach will not work simply because there is no design at all to translate to corresponding C/C++ code. Instead, we need to create an architecture via trial-and-error to “reverse engineer” the design so that the performance will match what’s given. Here are the steps we used in this case:

  • Create Tx/Rx test bench:

While the full channel test bench demonstrate how the Tx/Rx work, it also includes information not needed in the AMI modeling. For example, package and channel are usually separated from the AMI model because they are usage or client specific. Besides, as mentioned in previous posts, one assumption of AMI model is high-Z input and output. So the first step of the modeling is to separate channel from driver/receiver and create Tx and Rx only test bench together with inputs and outputs.

For Tx, it’s a simple PRBS input with output connecting to 50 ohms loads.

And its input/output should look like this:

When we modify the test bench to connect Tx directly to Rx, i.e. bypass all the package and channel models, we get such input/output eyes:

From these eye plot, we do not see DFE like abrupt force-zero output, thus the Rx model seems to be a CTLE like boost filter. So we created Rx test bench like below and obtain its frequency response:

Note that in order to perform what-if analysis of next step, we also use VPro to capture inputs to these Tx and Rx models so that they are evenly spaced.. as required by AMI’s Init and GetWave function. These evenly spaced inputs will be used later on by SPISimAMI.exe to drive the model. For HSpice, the following option may be needed so that the .tr0 file will have evenly spaced data points even simulation is various time-step sized:

  • Define architecture:

Now that we have input data and desired output for this particular set of parameters, next step is to explore whether SPISim’s existing spec. model is sufficient to meet the performance.

Tx: for tx module, we observed the following two characteristics:

  1. Has de-emphasis, happened after the peak
  2. Swing range and offset are different between input (0 ~ 1) and outputs (-0.5 ~ 0.5). Also there are loading effect so rising and falling have RC charging/discharging like behaviors.

For the de-emphasis part. We can use Spec. AMI’s what-if analysis to see how different pre/de-emphasis settings will cause difference responses:

We change these pre and post cursors one at a time only, using -0.1 for value .

We then test drive using built-in PRBS with same UI time, 200ps:

From the summary plot above, and correlate with the Tx output, it’s apparently that this Tx has one post-cursor FFE beause only this set-up gives de-emphasis after one UI after the peak.

For the output swing and slew rate, it’s the sign of AFE (analog front-end). So we can simply add an AFE stage right after FFE, with high/low rail voltage “clamped” to the simulated peaks:

With several trial-and-errors, all within SPISim’s Spec-AMI GUI, we obtained the following Tx correlation between spice test bench and AMI results for this particular set of parameters:

Similarly, as Rx behaves like a continuous filter, we use a CTLE stage to mimic its behavior:

With several tuning, we obtain such Rx correlations:

From these correlations, we believe that the stages built-in to the SPISim’s AMI are sufficient to meet the performance needs. So next step is to find the parameters to support all these design specs.

As similar process between Tx and Rx, we will use Tx as an example to demonstrate the flow.

  • Simulation/Sweep plan:

Before generating a simulation plan, we first also make sure these design parameters are independent with each other. We performed simple 1D sweep (several points) in particular for the swing and de-emphasis level.

Notice that the slew rate and peaks are not affected by de-emphasis level much. In addition, the spacing seems linearly spaced according to the bit settings. With this, we can sweep them independently.

Now for Tx, a full factorial sweep will require 2^6 (de-emphasis level) * 2^7 (swing level) * 2 (boost flag) * 3 (corner) * 3 (voltage levels) ~= 150000 runs. Due to the linearity, we do no need to sweep full 6 and 7-bits of de-emphasis and swing levels. Instead, we increase step interval for simulation because SPISim’s model’s built-in interpolation can be applied automatically, more about this at the model release section.

For these five variables, we use MPro to generate 2403 test case (1/60 of full factorial). We then create a template from the Tx spice test bench so that parameters can be updated according to the table and its column header above:

Noticed that variables are enclosed with “%” (e.g. %XVPTX%). Their values are replaced by corresponding values in each row of the table to generate a new spice file. With all these 2403 spice file generated, each for different combinations of parameters, we kicked-off the batch mode simulation and obtained all two thousand cases within one hour.  This is done all within MPro and can also be done using user’s script.

  • Parameter tuning:

With all results being available, next step is to find corresponding AMI parameters for each of the case. SPISim’s model driver, SPISimAMI.exe, is a must have here. It will take the collected, evenly spaced inputs to drive the AMI parameters together with the SPISim’s spec. model, output results can then be correlated with the .tr0 file simulated with HSPice for that particular case. In case when the correlation is not good, AMI parameters should be updated again for another iteration of performance matching. This process is summarized in the flow below:

Apparently, such as task is too tedious and error prune for a human to perform. Since the input pattern is the same, relative timing locations of different properties (such as peak and de-emphasis levels) are also the same even between different test cases, we can process these data automatically. Regarding “adjusting” AMI parameters, a simple bi-section search will quickly converge to the desired de-emphasis tap values for the given test case.

In our process, we first parameterized the .ami file as shown below. This ami file is based on the architecture identified in the second step… only that now we need to find different values for different parameters. In particular, the swing level and FFE post tap are unknowns and need to be swept.

Because these two variables are independent, we can first identify the peak value as “Swing” from the .tr0 simulation results, then use bi-section to adjust the FFE_PARM_POS1 such that the output from SPISimAMI for this .ami file settings and SPISim’s spec .dll/.so will match the tr0 data.

With this flow and process, we use our SPIProcPro to bath process the simulation results and get all 2403 fine-tuned AMI parameters within one hour:

Here is the partial config. settings for SPIProcPro of this modeling process:

For each of the test cases, the post-processed AMI result is saved to an individual .csv file. All these csv files are also combined and then concatenated side-by-side with original input condition. The end result is a table with input condition at the left and associated AMI parameters at the right. This table will serve as pre-set data table for our AMI models to be released.

 Model release:

The combined table looks like below:

The columns boxed in blue above are those required for SPISim’s AMI spec models. These columns has header named exactly the same as those in the ami file green boxed below. The row entries for settings of each test case in the csv table is filled in to the boxed in red in ami file below.

Now this is not the format to be released to the AMI user. In stead, the design parameter we want to expose to the end user are listed in the red box in the csv table. They have the column headers matching original IP’s design parameters. Also note that this csv table only contains partial results as we didn’t perform full factorial sweep for all ~150000 cases.

The design parameters, boxed in red in csv file, need to be able to be mapped to those boxed in blue, the AMI stage’s parameters.

One unique feature of SPISim’s AMI model is it supports of table look up and up to two dimensional linear interpolation for numerical values not found. The set-up is easily done from model config. GUI. The example below is for PCIe with 11 preset tables.

In this case, we do not use preset index. Instead, we set the preset index value to “-1” to signify that model’s table look-up will be used:

In the final .ami file above, the exposed parameters (boxed in red) are exactly those defined in the original given IP. Their name and values will be used to look-up from the table to find matching row. If up to two rows have no exact match in terms of numerical value, linear interpolation will be performed by the model during initialization. A final matching row or values will be used to assign to the parameters required by spec model.

The final results of the create model is one .ami file, one .csv file (plain or encrypted), and one set of .dll/.so file. There is no need to write any C/C++ codes or even compilation for the AMI modeling purpose and the SPISim’s spec. models can be used directly. For the post-processing portion, a python/perl/matalb scripts can be used instead if SPIProPro is not sufficient.

Other thoughts:

In this case study, a look-up table based approach is used to create AMI model via reverse engineering. Alternatively, we can create a spice simulation session using our SSolver/ngspice spice engine to simulate these given Tx/Rx spice model directly. The pre-requisite for this approach is that the given hspice model is not encrypted, can be converted to the SSolver/NgSpice based syntax and has no process file involved. Regarding search mechanism, a bi-section searching algorithm is used in our modeling process here as there is only one value (EQ variable) needs to be determined. The amplitude/swing is deterministic and can be found once tr0 simulation results is post-processed. If there are more than one variable involved for optimization , such multi-tap EQs, then a gradient descent based algorithm may be needed if full surface sweep is not practical.