IBIS’s buffer overclocking

In our SPIBPro’s Q3’19 update, we added some new enhancements regarding buffer overclocking handling. While the technical details of these capabilities are reserved only for our customers, the general problem statements and solutions are, nevertheless, worth sharing. This short post will cover this buffer overclocking topic and also serve as a future reference for our customers.

Some of the pictures used here are from previous IBIS summit’s presentations. User may find references to these slides toward the end of this post.

What is buffer overclocking:

We all know that the frequency is equivalent to the 1.0 / period. When an IBIS model is used, the minimal/shortest time period is equivalent to the summation of complete rising and falling transition given by the VT data table:

This “shortest” possible time span. thus dictates the highest possible operating frequency. When an input stimulus exceeds this frequency, then this buffer is being overclocked. An overclocked buffer will not have sufficient time to complete either rising, falling or both transitions. As a results, before it’s pad can reach steady state (indicated by the green curve below), it has to make transition toward the other direction again. The outcome of this situation is that there may be discontinuities, glitches or non-convergence during the simulation.

IBIS spec. doesn’t spell out how a simulator should handle in this case. As a result, an overclocked buffer may behave differently across different EDA tools. That’s why in the IBIS cookbook, it’s suggested that a model maker should generate short enough VT duration table so that the buffer can operate fast enough to avoid being overclocked by end user during normal usage. For example, if a USB3 buffer has total VT duration longer than 200ps, then it will definitely be overclocked when user use a normal 5G bps PRBS signal to drive this buffer.

 

Why buffer overclocking is more problematic in a power-aware model:

To make a buffer’s VT table duration shorter, a common approach is to remove the steady state (flat portion) of the leading and trailing waveform. However, when a power-aware model is being considered, this will present some problems:

The picture above shows both the VT data (rising only here) and IT together. IT is the composite current for DDR DQ-like power-aware model. Since the extracted data of the current portion has pre-driver into account, it’s often the case that there will be “activities” (i.e., non-zero current) long before the voltage at pad starts transition. As a result, the amount of leading steady state can be removed in a power-aware model are often bounded by the IT table. And this limitation is usually very “severe” such that the resulting trimmed model still can’t meet the spec.-required operating frequency.

This picture above (from a summit paper) high-light this issue more clearly. It’s also worth mentioning that the “T” of VT and IT table are time synchronized. So one can’t treat these two table separately or shift x-axis time point at will.

Solution to the overclocking issue:

There are several approaches to address this overclocking issue:

  1. An model user may specify the amount of delay to be trimmed by a simulator, or
  2. A simulator may apply “windowing” automatically to find the amount of trimming values and apply before simulation, or
  3. A model maker can create a proper model using approaches to be mentioned in sections below.

The first approach is not only time consuming, but will also easily lead to usage error or non-convergence. The second one may or may not produce desired results. It’s certainly simulator dependent and a model maker/user may not have any control. Nevertheless, the third approach is the preferred one as only the model maker knows how the buffer behaves and exact amount of delay to be removed.

Caveats in model data windowing:

One common mistake of trimming data to avoid overclocking is that different VT tables or IT tables are processed individually. For example, if VT table simulated from one of the fixtures has longer delays than the others, a model maker may tends to remove different amount of delays from different VT tables. Even more severe problem happens when VT and IT are removed independently. All these handling are erroneous due to the following reasons:

  • A simulator usually needs to compute “switching coefficients” to apply to IT table data. Their values needs to be calculated from VT tables of different fixture set-ups. Thus trimming different amount from VT tables from different fixtures may cause singularity when solving for switching coefficients.
    • That means all [Rising waveform] tables should be trimmed with same value;
    • Similarly, all [Falling waveform] tables should be trimmed with same value.
  • A buffer’s rising and falling response may be used together in a differential buffer. So if their timing reference are inconsistent, then the P and N terminals will not make transitions at the same time. This may cause “ledges” in the middle of the voltage swings
    • That means [Rising waveform] and [Falling waveform] needs to be trimmed with same value
  • As mentioned earlier, IT and VT needs to be time synchronized so that gate modulation effect can be properly accounted for.
    • That means VT and IT needs to be trimmed with same value.

To summarize, a model maker needs to trim ALL time-dependent table data together with same amount. While trimming trailing time point are relatively easy (just remove those points will do). Trimming from the waveform’s beginning also requires shifting x-axis points to start from t=0. So this is usually better be done with a modeling tool or flow.

Handling of the overclocking:

  • [Initial Delay] keyword:

Due to the limitation of the pre-driver current showing activities much earlier than the pad’s voltage transitions, a simulator can’t do much if both are bundled together in a device model in a simulator. However, if the IT table can be pulled-out from the IBIS device simulation model and be handled as a time-dependent current source, then things will be much easier. That’s why in the IBIS V6.1, two new keywords are introduced for this purpose. They are V-T and I-T for [Initial Delay]:

Before using these keyword, one should carefully read the related BIRD document (referenced at the end) and the associated section in the IBIS spec.:

For example, it is the simulator’s responsibilities to remove these specified delay from the “raw” model data before simulation. So a model maker should NOT remove the delay themselves…. simply specify the delay values will do.

That is to say, the modeling process may become two “passes”. At first, once the un-trimmed model data are produced, a maker can use an inspection tool (such as our SPIBPro’s built-in specter) to plot all VTs tables together and measure the common minimal delay to remove. Do the same to all IT table as well.

In the second “pass”, one or both of these two different values are specified under the [Initial Delay] keyword to complete the modeling process.

  • Data trimming and auto-tuning:

For a power-aware model which has IT data, the solution using [Initial Keyword] is the best one. For a traditional, or only voltage related, buffer, a model maker has more choices.

For example, a model maker can start by trimming the trailing steady data. Increment the trimming value or also trim the leading portion until the frequency meets the demands.

This method will alleviate the restrictions of simulator’s support for the [Initial Delay] support… thus an older version of simulator will also work. At the meantime, a caveat worth noting is that the distortion of the model data depends on the amount of data being removed:

So if excessive amount of data is removed, the resulting model may not pass ibis checker (will give DC mismatch violation). In this case, further tuning is needed. Our SPIBPro does perform “AutoTune” automatically during trimming process so a trimmed buffer will almost always error/warning free.

Reference:

The overclocking issue has long been there since early 2000. I remember one of the projects at my earlier career was to address the non-convergence issue in Mentor’s simulator brought by user trimmed model. Interested user may find the following reference useful and worth reading:

IBIS’s package model

Recently I gave a training to some of our customers regarding package model handling in an IBIS file/model. Most of the treatments in this topic fall within IBIS spec., extraction and simulator tools used, rather than modeling tool like our modeling suite or SPIBPro. Thus I think it will be beneficial to organize the materials and share through this blog post, while serving as a future reference for our customers.

Package model:

An IBIS file may contain one or more IBIS models. Each of the IBIS models has its own IT or VT tables to describe the behavior of the Tx or Rx silicon design attached to that signal pin. The effect of the package model, on the other hand, is not included in those table data. Information about the package model is “linked” with the silicon portion of the buffer through either IBIS keywords within the same IBIS file, or as a separated file in different formats. It’s the simulator’s job to take both the separated Tx/Rx silicon behavior, as well as package model into account during circuit simulation.

In general, there are two ways to link the IBIS’s silicon behavior with the package model. Each also have two possible routes to achieve the same purpose:

Add package model info:

    1. Inside an IBIS file:
      • As a lumped model
      • As a distributed model
    2. Outside an IBIS file:
      • ICM (IBIS Interconnect Modeling)
      • General spice circuit elements

More details will be discussed below… Please note that these package models should be generated from extraction tool such as HFSS, Q3d etc. The IBIS model or file mentioned here simply serves as a place holder. Circuit simulator will use all these data together during simulation.

 

Package as a lumped model inside an IBIS model:

From the IBIS spec, we can see two related keywords for this lumped model representations: [Package] and  [Pin]

To be more specific, please see this example model:

These values represent the parasitic structures as shown below:

This is the simplest, yet least accurate, method to describe the package information. While the [Package] keyword and data is required, parasitic portion of the [Pin] keyword is optional. When both are present, those in the [Pin] section will superseded (i.e. override) values defined in the [Package] section. Thus, a model maker can use [Pin] keyword to introduce pin specific lumped parasitics while using [Package] keyword for general or default values.

 

Package as a distributed model inside an IBIS file:

A more comprehensive (distributed) package model may be introduced into an IBIS file and model using the [Package Model] and [Defined Package Model] keywords:

Using this method, the component declares the name of a package model it is using (thus the value of the [Package Model] is simply a model name string), and the detailed contents of this package model is defined as a separated section using top-level keyword, [Define Package Model] inside the same IBIS file (usually toward the end of the file, after all [Component] descriptions):

What’s in side the defined package model? Since it’s a “distributed” model and belongs to a component, terminals mapped to a component’s pins are there and so are frequency dependent R/L/G/C matrices etc. The dimension of the matrices is equivalent to the number of pins mapped. With that said, a S-parameter format is not supported here. To use a S-parameter as a package model, one must use either of the remaining two methods to be described below.

As mentioned earlier, an IBIS file or an IBIS model which has package model info. is just serving as a place holder. Package model’s effects or behaviors are not included in an IBIS model’s IV/VT/IT table data. Simulator has the responsibility to combine these two together… either when computing switching coefficients for lumped model version or “stamping” extra elements into nodal matrices for a distributed one. This also means that different EDA tool may have different methods, syntax or GUIs to introduce package model’s data:

 

Use ICM as a separated file for package model outside an IBIS file:

The ICM (IBIS interconnect modeling) spec. is a separated “top-level” spec. document similar to an IBIS spec. So to learn more about it, one has to read the through ICM spec. document

While it’s more or less similar to the [Defined Package Model] mentioned earlier in terms of R/L/G/C frequency dependent matrices etc, it has one unique strength in particular that it supports S-parameter:

So if your extraction tool can only produce S-paramter format, then ICM is the way to go. The table below lists the comparison of different interconnect model format: (PKG is the defined package model)

Use general circuit description for package model outside an IBIS file:

Last but not least… one may also introduce package model contents using a non IBIS/ICM specific format. For example, a model maker may provide a spice .subckt representing the package model (converted from frequency data using tool like broadband Spice or IDEM work etc, can have further hierarchy) or a simple S-parameter. The connectivity should be documented in the simulation guide or usage manual of the IBIS model. It’s the end user’s responsibility to put this as one of the stages in simulation topology. For example, Intel’s platform design guide use such method for an ICH package model:

There are pros and cons for this approach: a model user may have more work to have package effects introduced. On the other hand, this method has best compatibility… it does not require supports of a specific IBIS/ICM version in the simulator used, and it also allows probing or sweeping internals inside the package model for debugging or optimization purpose.

AI/ML: Use ML techniques for CTLE AMI modeling

Note:

  • Dataset and ipython notebook of this post is available at SPISim’s github page: [HERE]
  • This notebook may be better rendered by nbviewer and can be viewed  [HERE].

Use ML techniques for SERDES CTLE modeling for IBIS-AMI simulation

Table of contents:

1.Motivation ...

2.Problem Statements ...

3.Generate Data ...

4.Prepare Data ...

5.Choose a Model ...

6.Training ...

7.Prediction ...

8.Reverse Direction ...

9.Deployment ...

10.Conclusion ...

Motivation:

One of SPISim's main service is IBIS-AMI modeling and consulting. In most cases, this happens when IC companies is ready to release simulation model for their customers to do system design. We will then require data either from circuit simulation, lab measurements or data sheet spec. in order to create associated IBIS-AMI model. Occasionally, we also receive request to provide AMI model for architecture planning purpose. In this situation, there is no data or spec. In stead, the client asks to input performance parameters dynamically (not as preset) so that they can evaluate performance at the architecture level before committing to a certain spec. and design accordingly. In such case, we may need to generate data dynamically based on user's input before it will be fed into existing IBIS-AMI models of same kind. Continues linear equalizer, CTLE, is often used in modern SERDES or even DDR5 design. It is basically a filter in the frequency domain (FD) with various peaking and gain properties. As IBIS-AMI is simulated in the time domain (TD), the core implementations in the model is iFFT to convert into impulse response in TD. CtleFDTD

In order to create such CTLE AMI model from user provided spec. on the fly, we would like to avoid time-consuming parameter sweep (in order to match the performance) during runtime of initial set-up call. Thus machine learning techniques may be applied to help use build a prediction model to map input attributes to associated CTLE design parameters so that its FD and TD response can be generated directly. After that, we can feed the data into existing CTLE C/C++ code blocks for channel simulation.

Problem Statement:

We would like to build a prediction model such that when user provide a desired CTLE performance parameters, such as DC Gain, peaking frequency and peaking value, this model will map to corresponding CTLE design parameters such as pole and zero locations. Once this is mapped, CTLE frequency response will be generated followed by time-domain impulse response. The resulting IBIS-AMI CTLE model can be used for channel simulation for evaluating such CTLE block's impact in a system... before actual silicon design has been done.

Generate Data:

Overview

The model to be build is for nominal (i.e. numerical) prediction for about three to six attributes, depending on the CTLE structure, from four input parameters, namely dc gain, peak frequency, peak value and bandwidth. We will sample the input space randomly (as full combinatorial is impractical) then perform measurement programmingly in order generate enough dataset for modeling purpose.

Defing CTLE equations

We will target the following two most common CTLE structures:

IEEE 802.3bs 200/400Gbe Chip-to-module (C2M) CTLE:

C2M_CTLE

IEEE 802.3bj Channel Operating Margin (COM) CTLE:

COM_CTLE

Define sampling points and attributes

All these pole and zeros values are continuous (numerical), so a sub-sampling from full solution space will be performed. Once frequency response corresponding to each set of configuration is generated, we will measure is dc gain, peak frequency and value, bandwidth (3dB loss from the peak), and frequencies when 10% and 50% gain between dc and peak values happened. The last two attributes will help us increasing the attributes available when creating the prediction model.

Attributes to be extracted:

CTLEAttr

Synthesize and measure data

A python script (in order to be used with subsequent Q-learning w/ OpenAI later) has been written to synthesize these frequency response and perform measurement at the same time.

Quantize and Sampling:

Sampling

Synthesize:

Synthesize

Measurement:

Measurement

The end results after this data generation phase is a 100,000 points dataset for each of the two CTLE structures. We can now proceed for the prediction modeling.

Prepare Data:

In [42]:
## Using COM CTLE as an example below:

# Environment Setup:
import os
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import numpy as np

prjHome = 'C:/Temp/WinProj/CTLEMdl'
workDir = prjHome + '/wsp/'
srcFile = prjHome + '/dat/COM_CTLEData.csv'

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(workDir, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)
In [43]:
# Read Data
srcData = pd.read_csv(srcFile)
srcData.head()

# Info about the data
srcData.head()
srcData.info()
srcData.describe()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 11 columns):
ID         100000 non-null int64
Gdc        100000 non-null float64
P1         100000 non-null float64
P2         100000 non-null float64
Z1         100000 non-null float64
Gain       100000 non-null float64
PeakF      100000 non-null float64
PeakVal    100000 non-null float64
BandW      100000 non-null float64
Freq10     100000 non-null float64
Freq50     100000 non-null float64
dtypes: float64(10), int64(1)
memory usage: 8.4 MB
Out[43]:
ID Gdc P1 P2 Z1 Gain PeakF PeakVal BandW Freq10 Freq50
count 100000.000000 100000.000000 1.000000e+05 1.000000e+05 1.000000e+05 100000.000000 1.000000e+05 100000.000000 1.000000e+05 1.000000e+05 1.000000e+05
mean 49999.500000 0.574911 1.724742e+10 5.235940e+09 5.241625e+09 0.574911 3.428616e+09 2.374620 1.496343e+10 4.145855e+08 1.136585e+09
std 28867.657797 0.230302 4.323718e+09 2.593138e+09 2.589769e+09 0.230302 4.468393e+09 3.346094 1.048535e+10 5.330043e+08 1.446972e+09
min 0.000000 0.200000 1.000000e+10 1.000000e+09 1.000000e+09 0.200000 0.000000e+00 0.200000 9.965473e+08 0.000000e+00 -1.000000e+00
25% 24999.750000 0.350000 1.350000e+10 3.000000e+09 3.000000e+09 0.350000 0.000000e+00 0.500000 4.770445e+09 0.000000e+00 -1.000000e+00
50% 49999.500000 0.600000 1.700000e+10 5.000000e+09 5.000000e+09 0.600000 0.000000e+00 0.800000 1.410597e+10 0.000000e+00 -1.000000e+00
75% 74999.250000 0.750000 2.100000e+10 7.500000e+09 7.500000e+09 0.750000 7.557558e+09 2.710536 2.386211e+10 8.974728e+08 2.510339e+09
max 99999.000000 0.950000 2.450000e+10 9.500000e+09 9.500000e+09 0.950000 1.516517e+10 16.731528 3.965752e+10 1.768803e+09 4.678211e+09

Seems full justified! Let's plot some distribution...

In [44]:
# Drop the ID column
mdlData = srcData.drop(columns=['ID'])

# plot distribution
mdlData.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()
Saving figure attribute_histogram_plots

There are abnomal high peaks for Freq10, Freq50 and PeakF. We need to plot the FD data to see what's going on...

Error checking:

NoPeaking

Apparently, this is caused by CTLE without peaking. We can safely remove these data points as they will not be used in actual design.

In [45]:
# Drop those freq peak at the beginning (i.e. no peak)
mdlTemp = mdlData[(mdlData['PeakF'] > 100)]
mdlTemp.info()


# plot distribution again
mdlTemp.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots2")
plt.show()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 41588 entries, 1 to 99997
Data columns (total 10 columns):
Gdc        41588 non-null float64
P1         41588 non-null float64
P2         41588 non-null float64
Z1         41588 non-null float64
Gain       41588 non-null float64
PeakF      41588 non-null float64
PeakVal    41588 non-null float64
BandW      41588 non-null float64
Freq10     41588 non-null float64
Freq50     41588 non-null float64
dtypes: float64(10)
memory usage: 3.5 MB
Saving figure attribute_histogram_plots2

Now the distribution seems good. We can proceed to separate variables (i.e. attributes) and targets

In [46]:
# take this as modeling data from this point
mdlData = mdlTemp

varList = ['Gdc', 'P1', 'P2', 'Z1']
tarList = ['Gain', 'PeakF', 'PeakVal']

varData = mdlData[varList]
tarData = mdlData[tarList]

Choose a Model:

We will use Keras for the modeling framework. While it will call Tensorflow on our machine in this case, the GPU is only used for training purpose. We will use (shallow) neural network for modeling as we want to implement the resulting models in our IBIS-AMI model's C++ codes.

In [47]:
from keras.models import Sequential
from keras.layers import Dense, Dropout

numVars = len(varList)  # independent variables
numTars = len(tarList)  # output targets
nnetMdl = Sequential()
# input layer
nnetMdl.add(Dense(units=64, activation='relu', input_dim=numVars))

# hidden layers
nnetMdl.add(Dropout(0.3, noise_shape=None, seed=None))
nnetMdl.add(Dense(64, activation = "relu"))
nnetMdl.add(Dropout(0.2, noise_shape=None, seed=None))
          
# output layer
nnetMdl.add(Dense(units=numTars, activation='sigmoid'))
nnetMdl.compile(loss='mean_squared_error', optimizer='adam')

# Provide some info
#from keras.utils import plot_model
#plot_model(nnetMdl, to_file= workDir + 'model.png')
nnetMdl.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_10 (Dense)             (None, 64)                320       
_________________________________________________________________
dropout_7 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_11 (Dense)             (None, 64)                4160      
_________________________________________________________________
dropout_8 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_12 (Dense)             (None, 3)                 195       
=================================================================
Total params: 4,675
Trainable params: 4,675
Non-trainable params: 0
_________________________________________________________________

Training:

We will do the 20% training/testing split for the modeling. Note that we need to scale the input attributes to be between 0~1 so that neuron's activation function can be used to differentiate and calculate weights. These scaler will be applied "inversely" when we predict the actual performance later on.

In [48]:
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Prepare Training (tran) and Validation (test) dataset
varTran, varTest, tarTran, tarTest = train_test_split(varData, tarData, test_size=0.2)

# scale the data
from sklearn import preprocessing
varScal = preprocessing.MinMaxScaler()
varTran = varScal.fit_transform(varTran)
varTest = varScal.transform(varTest)

tarScal = preprocessing.MinMaxScaler()
tarTran = tarScal.fit_transform(tarTran)

Now we can do the model fit:

In [49]:
# model fit
hist = nnetMdl.fit(varTran, tarTran, epochs=100, batch_size=1000, validation_split=0.1)
tarTemp = nnetMdl.predict(varTest, batch_size=1000)
#predict = tarScal.inverse_transform(tarTemp)
#resRMSE = np.sqrt(mean_squared_error(tarTest, predict))
resRMSE = np.sqrt(mean_squared_error(tarScal.transform(tarTest), tarTemp))
resRMSE
Train on 29943 samples, validate on 3327 samples
Epoch 1/100
29943/29943 [==============================] - 0s 12us/step - loss: 0.0632 - val_loss: 0.0462
Epoch 2/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0394 - val_loss: 0.0218
Epoch 3/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0218 - val_loss: 0.0090
Epoch 4/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0134 - val_loss: 0.0046
Epoch 5/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0102 - val_loss: 0.0039
Epoch 6/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0090 - val_loss: 0.0036
Epoch 7/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0082 - val_loss: 0.0032
Epoch 8/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0075 - val_loss: 0.0030
Epoch 9/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0071 - val_loss: 0.0027
Epoch 10/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0067 - val_loss: 0.0025
Epoch 11/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0063 - val_loss: 0.0022
Epoch 12/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0059 - val_loss: 0.0020
Epoch 13/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0056 - val_loss: 0.0018
Epoch 14/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0053 - val_loss: 0.0017
Epoch 15/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0050 - val_loss: 0.0015
Epoch 16/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0048 - val_loss: 0.0014
Epoch 17/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0046 - val_loss: 0.0013
Epoch 18/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0045 - val_loss: 0.0012
Epoch 19/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0043 - val_loss: 0.0011
Epoch 20/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0042 - val_loss: 0.0011
Epoch 21/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0041 - val_loss: 9.9891e-04
Epoch 22/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0040 - val_loss: 9.5673e-04
Epoch 23/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0039 - val_loss: 9.1935e-04
Epoch 24/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0038 - val_loss: 8.7424e-04
Epoch 25/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0037 - val_loss: 8.3335e-04
Epoch 26/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0036 - val_loss: 8.0617e-04
Epoch 27/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0035 - val_loss: 7.7511e-04
Epoch 28/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0035 - val_loss: 7.6336e-04
Epoch 29/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0034 - val_loss: 7.4145e-04
Epoch 30/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0034 - val_loss: 7.1555e-04
Epoch 31/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0033 - val_loss: 6.8232e-04
Epoch 32/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0033 - val_loss: 6.8118e-04
Epoch 33/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0033 - val_loss: 6.5987e-04
Epoch 34/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0032 - val_loss: 6.5535e-04
Epoch 35/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0032 - val_loss: 6.4880e-04
Epoch 36/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0032 - val_loss: 6.2126e-04
Epoch 37/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0031 - val_loss: 6.1235e-04
Epoch 38/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0031 - val_loss: 6.0875e-04
Epoch 39/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0030 - val_loss: 5.8204e-04
Epoch 40/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0030 - val_loss: 5.8521e-04
Epoch 41/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0030 - val_loss: 5.8456e-04
Epoch 42/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0030 - val_loss: 5.5742e-04
Epoch 43/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0029 - val_loss: 5.5412e-04
Epoch 44/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0029 - val_loss: 5.5415e-04
Epoch 45/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0029 - val_loss: 5.3159e-04
Epoch 46/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0029 - val_loss: 5.2046e-04
Epoch 47/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0028 - val_loss: 5.1748e-04
Epoch 48/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0028 - val_loss: 5.1205e-04
Epoch 49/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0027 - val_loss: 5.0424e-04
Epoch 50/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0028 - val_loss: 4.9067e-04
Epoch 51/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0027 - val_loss: 4.7902e-04
Epoch 52/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0027 - val_loss: 4.7667e-04
Epoch 53/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0026 - val_loss: 4.6521e-04
Epoch 54/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0026 - val_loss: 4.6684e-04
Epoch 55/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0026 - val_loss: 4.7006e-04
Epoch 56/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0026 - val_loss: 4.5770e-04
Epoch 57/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0025 - val_loss: 4.3075e-04
Epoch 58/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0025 - val_loss: 4.3796e-04
Epoch 59/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0025 - val_loss: 4.3114e-04
Epoch 60/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0025 - val_loss: 4.1051e-04
Epoch 61/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0025 - val_loss: 4.0642e-04
Epoch 62/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0025 - val_loss: 4.1214e-04
Epoch 63/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0025 - val_loss: 3.9472e-04
Epoch 64/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0024 - val_loss: 3.9697e-04
Epoch 65/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0024 - val_loss: 3.8548e-04
Epoch 66/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0024 - val_loss: 3.9030e-04
Epoch 67/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.7588e-04
Epoch 68/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0024 - val_loss: 3.6643e-04
Epoch 69/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.6973e-04
Epoch 70/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.6345e-04
Epoch 71/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.5743e-04
Epoch 72/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0023 - val_loss: 3.5294e-04
Epoch 73/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.6533e-04
Epoch 74/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.5859e-04
Epoch 75/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0022 - val_loss: 3.3832e-04
Epoch 76/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0022 - val_loss: 3.5197e-04
Epoch 77/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.4445e-04
Epoch 78/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.3888e-04
Epoch 79/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.3597e-04
Epoch 80/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.2317e-04
Epoch 81/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0021 - val_loss: 3.2205e-04
Epoch 82/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.4191e-04
Epoch 83/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.2288e-04
Epoch 84/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.1419e-04
Epoch 85/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.1307e-04
Epoch 86/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.1795e-04
Epoch 87/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.1200e-04
Epoch 88/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.0641e-04
Epoch 89/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.2401e-04
Epoch 90/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.0903e-04
Epoch 91/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.1448e-04
Epoch 92/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.0788e-04
Epoch 93/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.0349e-04
Epoch 94/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.0098e-04
Epoch 95/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0020 - val_loss: 3.1119e-04
Epoch 96/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.0249e-04
Epoch 97/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 2.8934e-04
Epoch 98/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 2.9429e-04
Epoch 99/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 2.8466e-04
Epoch 100/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0019 - val_loss: 3.0773e-04
Out[49]:
0.01786895428237113

Let's see how this neural network learns over different Epoch

In [50]:
# plot history
plt.plot(hist.history['loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.show()

Looks quite reasonable. We can save the Keras model now together with scaler for later evaluation.

In [51]:
# save model and architecture to single file
nnetMdl.save(workDir + "COM_nnetMdl.h5")

# also save scaler
from sklearn.externals import joblib
joblib.dump(varScal, workDir + 'VarScaler.save') 
joblib.dump(tarScal, workDir + 'TarScaler.save') 
print("Saved model to disk")
Saved model to disk

Prediction:

Now let's use this model to make some prediction

In [52]:
# generate prediction
predict = tarScal.inverse_transform(tarTemp)
allData = np.concatenate([varTest, tarTest, predict], axis = 1)
allData.shape
headLst = [varList, tarList, tarList]
headStr = ''.join(str(e) + ',' for e in headLst)
np.savetxt(workDir + 'COMCtleIOP.csv', allData, delimiter=',', header=headStr)

Let's take some 50 points and see how the prediction work

In [53]:
# plot some data
begIndx = 100
endIndx = 150
indxAry = np.arange(0, len(varTest), 1)

plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,0][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,0][begIndx:endIndx])
Out[53]:
<matplotlib.collections.PathCollection at 0x242d059d390>
In [54]:
# Plot Peak Freq.
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,1][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,1][begIndx:endIndx])
Out[54]:
<matplotlib.collections.PathCollection at 0x242d72df2e8>
In [55]:
# Plot Peak Value
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,2][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,2][begIndx:endIndx])
Out[55]:
<matplotlib.collections.PathCollection at 0x242d5ea39e8>

Reverse Direction:

The goal of this modeling is to map performance to CTLE poles and zeros locations. What we just did is the other way around (to make sure such neural network's structure meets our need). Now we needs to reverse the direction for actual modeling. To provide more attributes for better predictions, we will also use frequencies where 10% and 50% gain happened as part of the input attributes.

In [56]:
tarList = ['Gdc', 'P1', 'P2', 'Z1']
varList = ['Gain', 'PeakF', 'PeakVal', 'Freq10', 'Freq50']

varData = mdlData[varList]
tarData = mdlData[tarList]
In [57]:
from keras.models import Sequential
from keras.layers import Dense, Dropout

numVars = len(varList)  # independent variables
numTars = len(tarList)  # output targets
nnetMdl = Sequential()
# input layer
nnetMdl.add(Dense(units=64, activation='relu', input_dim=numVars))

# hidden layers
nnetMdl.add(Dropout(0.3, noise_shape=None, seed=None))
nnetMdl.add(Dense(64, activation = "relu"))
nnetMdl.add(Dropout(0.2, noise_shape=None, seed=None))
          
# output layer
nnetMdl.add(Dense(units=numTars, activation='sigmoid'))
nnetMdl.compile(loss='mean_squared_error', optimizer='adam')

# Provide some info
#from keras.utils import plot_model
#plot_model(nnetMdl, to_file= workDir + 'model.png')
nnetMdl.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_13 (Dense)             (None, 64)                384       
_________________________________________________________________
dropout_9 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_14 (Dense)             (None, 64)                4160      
_________________________________________________________________
dropout_10 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_15 (Dense)             (None, 4)                 260       
=================================================================
Total params: 4,804
Trainable params: 4,804
Non-trainable params: 0
_________________________________________________________________
In [58]:
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Prepare Training (tran) and Validation (test) dataset
varTran, varTest, tarTran, tarTest = train_test_split(varData, tarData, test_size=0.2)

# scale the data
from sklearn import preprocessing
varScal = preprocessing.MinMaxScaler()
varTran = varScal.fit_transform(varTran)
varTest = varScal.transform(varTest)

tarScal = preprocessing.MinMaxScaler()
tarTran = tarScal.fit_transform(tarTran)
In [59]:
# model fit
hist = nnetMdl.fit(varTran, tarTran, epochs=100, batch_size=1000, validation_split=0.1)
tarTemp = nnetMdl.predict(varTest, batch_size=1000)
#predict = tarScal.inverse_transform(tarTemp)
#resRMSE = np.sqrt(mean_squared_error(tarTest, predict))
resRMSE = np.sqrt(mean_squared_error(tarScal.transform(tarTest), tarTemp))
resRMSE
Train on 29943 samples, validate on 3327 samples
Epoch 1/100
29943/29943 [==============================] - 0s 15us/step - loss: 0.0800 - val_loss: 0.0638
Epoch 2/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0578 - val_loss: 0.0457
Epoch 3/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0458 - val_loss: 0.0380
Epoch 4/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0408 - val_loss: 0.0344
Epoch 5/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0378 - val_loss: 0.0317
Epoch 6/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0354 - val_loss: 0.0299
Epoch 7/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0340 - val_loss: 0.0287
Epoch 8/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0327 - val_loss: 0.0276
Epoch 9/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0315 - val_loss: 0.0265
Epoch 10/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0303 - val_loss: 0.0254
Epoch 11/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0293 - val_loss: 0.0244
Epoch 12/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0284 - val_loss: 0.0235
Epoch 13/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0274 - val_loss: 0.0225
Epoch 14/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0266 - val_loss: 0.0215
Epoch 15/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0257 - val_loss: 0.0202
Epoch 16/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0245 - val_loss: 0.0189
Epoch 17/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0235 - val_loss: 0.0175
Epoch 18/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0223 - val_loss: 0.0160
Epoch 19/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0213 - val_loss: 0.0146
Epoch 20/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0202 - val_loss: 0.0135
Epoch 21/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0193 - val_loss: 0.0125
Epoch 22/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0186 - val_loss: 0.0117
Epoch 23/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0179 - val_loss: 0.0109
Epoch 24/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0171 - val_loss: 0.0104
Epoch 25/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0167 - val_loss: 0.0099
Epoch 26/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0163 - val_loss: 0.0094
Epoch 27/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0156 - val_loss: 0.0090
Epoch 28/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0154 - val_loss: 0.0087
Epoch 29/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0151 - val_loss: 0.0084
Epoch 30/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0146 - val_loss: 0.0081
Epoch 31/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0143 - val_loss: 0.0077
Epoch 32/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0139 - val_loss: 0.0074
Epoch 33/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0138 - val_loss: 0.0072
Epoch 34/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0135 - val_loss: 0.0071
Epoch 35/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0132 - val_loss: 0.0069
Epoch 36/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0131 - val_loss: 0.0068
Epoch 37/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0130 - val_loss: 0.0066
Epoch 38/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0129 - val_loss: 0.0065
Epoch 39/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0127 - val_loss: 0.0063
Epoch 40/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0124 - val_loss: 0.0062
Epoch 41/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0122 - val_loss: 0.0061
Epoch 42/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0122 - val_loss: 0.0059
Epoch 43/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0120 - val_loss: 0.0059
Epoch 44/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0119 - val_loss: 0.0057
Epoch 45/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0118 - val_loss: 0.0057
Epoch 46/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0117 - val_loss: 0.0056
Epoch 47/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0115 - val_loss: 0.0055
Epoch 48/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0114 - val_loss: 0.0055
Epoch 49/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0114 - val_loss: 0.0055
Epoch 50/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0112 - val_loss: 0.0053
Epoch 51/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0111 - val_loss: 0.0052
Epoch 52/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0112 - val_loss: 0.0052
Epoch 53/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0111 - val_loss: 0.0052
Epoch 54/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0109 - val_loss: 0.0051
Epoch 55/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0108 - val_loss: 0.0050
Epoch 56/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0107 - val_loss: 0.0049
Epoch 57/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0107 - val_loss: 0.0050
Epoch 58/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0107 - val_loss: 0.0049
Epoch 59/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0106 - val_loss: 0.0048
Epoch 60/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0104 - val_loss: 0.0047
Epoch 61/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0103 - val_loss: 0.0046
Epoch 62/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0102 - val_loss: 0.0046
Epoch 63/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0103 - val_loss: 0.0046
Epoch 64/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0102 - val_loss: 0.0045
Epoch 65/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0101 - val_loss: 0.0044
Epoch 66/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0101 - val_loss: 0.0044
Epoch 67/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0100 - val_loss: 0.0045
Epoch 68/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0098 - val_loss: 0.0043
Epoch 69/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0098 - val_loss: 0.0043
Epoch 70/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0098 - val_loss: 0.0043
Epoch 71/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0097 - val_loss: 0.0042
Epoch 72/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0096 - val_loss: 0.0043
Epoch 73/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0096 - val_loss: 0.0041
Epoch 74/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0094 - val_loss: 0.0042
Epoch 75/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0094 - val_loss: 0.0041
Epoch 76/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0094 - val_loss: 0.0041
Epoch 77/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0093 - val_loss: 0.0040
Epoch 78/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0094 - val_loss: 0.0040
Epoch 79/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0093 - val_loss: 0.0040
Epoch 80/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0092 - val_loss: 0.0039
Epoch 81/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0091 - val_loss: 0.0039
Epoch 82/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0091 - val_loss: 0.0039
Epoch 83/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0091 - val_loss: 0.0039
Epoch 84/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0091 - val_loss: 0.0038
Epoch 85/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0090 - val_loss: 0.0039
Epoch 86/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0089 - val_loss: 0.0038
Epoch 87/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0089 - val_loss: 0.0038
Epoch 88/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0088 - val_loss: 0.0037
Epoch 89/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0088 - val_loss: 0.0037
Epoch 90/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0087 - val_loss: 0.0037
Epoch 91/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0087 - val_loss: 0.0037
Epoch 92/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0086 - val_loss: 0.0036
Epoch 93/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0087 - val_loss: 0.0037
Epoch 94/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0086 - val_loss: 0.0036
Epoch 95/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0084 - val_loss: 0.0036
Epoch 96/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0086 - val_loss: 0.0036
Epoch 97/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0085 - val_loss: 0.0036
Epoch 98/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0085 - val_loss: 0.0035
Epoch 99/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0085 - val_loss: 0.0035
Epoch 100/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0084 - val_loss: 0.0034
Out[59]:
0.0589564154176633
In [60]:
# plot history
plt.plot(hist.history['loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.show()
In [61]:
# Separated Keras' architecture and synopse weight for later Cpp conversion
from keras.models import model_from_json
# serialize model to JSON
nnetMdl_json = nnetMdl.to_json()
with open("COM_nnetMdl_Rev.json", "w") as json_file:
    json_file.write(nnetMdl_json)
# serialize weights to HDF5
nnetMdl.save_weights("COM_nnetMdl_W_Rev.h5")

# save model and architecture to single file
nnetMdl.save(workDir + "COM_nnetMdl_Rev.h5")
print("Saved model to disk")

# also save scaler
from sklearn.externals import joblib
joblib.dump(varScal, workDir + 'Rev_VarScaler.save') 
joblib.dump(tarScal, workDir + 'Rev_TarScaler.save') 
Saved model to disk
Out[61]:
['C:/Temp/WinProj/CTLEMdl/wsp/Rev_TarScaler.save']
In [62]:
# generate prediction
predict = tarScal.inverse_transform(tarTemp)
allData = np.concatenate([varTest, tarTest, predict], axis = 1)
allData.shape
headLst = [varList, tarList, tarList]
headStr = ''.join(str(e) + ',' for e in headLst)
np.savetxt(workDir + 'COMCtleIOP_Rev.csv', allData, delimiter=',', header=headStr)
In [63]:
# plot Gdc
begIndx = 100
endIndx = 150
indxAry = np.arange(0, len(varTest), 1)
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,0][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,0][begIndx:endIndx])
Out[63]:
<matplotlib.collections.PathCollection at 0x242ccefe1d0>
In [64]:
# Plot P1
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,1][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,1][begIndx:endIndx])
Out[64]:
<matplotlib.collections.PathCollection at 0x242ccc7c470>
In [65]:
# Plot P2
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,2][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,2][begIndx:endIndx])
Out[65]:
<matplotlib.collections.PathCollection at 0x242d6f15dd8>
In [66]:
# Plot Z1
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,3][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,3][begIndx:endIndx])
Out[66]:
<matplotlib.collections.PathCollection at 0x242ccbaa978>

It seems this "reversed" neural network also work reasonably well. We will further fine-tune later on.

Deployment:

Now that we have trained model in Keras' .h5 format, we can translate this model into corresponding cpp codes using Keras2Cpp:

Keras2Cpp

Its github repository is here: Keras2Cpp

Resulting file can be compiled together with keras_model.cc, keras_model.h in our AMI library.

Conclusion:

In this post/notebook, we explore the flow to create a neural network based model for CTLE's parameter prediction. Data science techniques have been used. The resulting Keras' model is then converted into C++ code for implementation in our IBIS-AMI library. With this performance based CTLE model, our user can run channel simulation before committing actual silicon design.

AI/ML: Use ML techniques for layer stackup modeling

Note:

  • Dataset and ipython notebook of this post is available at SPISim’s github page: [HERE]
  • This notebook may be better rendered by nbviewer and can be viewed  [HERE].

Motivation:

When planning a PCB layer stackup, often time we would like to know the trade-off between various layout options vs their signal integrity performance. For example, a wider trace may provides smaller impedance yet occupate more routing area. Narrow down the spacing between differential pairs may save some spaces but will also increase crosstalk. Most EDA tool involving system level signal integrity analysis provides "transmission line calculator" like shown below for designer to quickly make estimation and determine the trade-off:
TLineCalc

However, all such "calculators" I have seen, even in a commercial one, only consider the single trace or one differential pair itself. They do not take take crosstalks into account. More over, stackup parameters such as conductivity and permetivity must be entered individually instead of a range. As a results, user can't not easily visualize relationships between performance parameters vs the stackup properties. Thus, an enhanced version of such "T-Line calculator", which can address the aformentioned gaps will be very useful. Such tool requires a prediction model to link between various stackup parameters to their performance targets. Data science/machine learning techniques can thus be used to build such model.

Problem Statements:

We would like to build a prediction model such that given a set of stackup parameters such as trace width and spacing etc, its performance such as impedance, attenuations, near-end and far-end crosstalk can be quickly estimated. This model can then be deployed into a stand-alone tool for range based sweep such that a visual plot can be generated to provide relations between various parameters to decide design trade-off.

Generate Data:

Overview:

The model to be built here is for nominal (i.e. numerical) prediction with around 10 attributes, i.e. input variables. Various stackup configurations will be generated via sampling and their corresponding stackup model, in the form of frequency dependent R/L/G/C matrices will be simulated via field solver. Such process are deterministics. Post process steps will read these solved model and calculate performance. Here we define performance to be predicted as impedance, attenuation, near-end/far-end crosstalks and propagation speed.

Define stakup structure:

There are many possible stakup structures as shown below. For more accurate prediction, we are going to generate one prediction model per structure. Presets

Use three single-ended traces (victim in the middle) in strip-line setup as an example, various attributes may be defined as shown below: Setup

These parameters, such as S(Spacing), W(Width), Sigma(Conductivity), Er(Permitivity), H(Height) etc are represented as varaibles to be sampled.

Define sampling points:

Next step is to define ranges of variable values and sampling points. Since there are about 10 parameters, full combinatorial data will be impractical. Thus we may need to apply sampling algorithms such as design-of-experiments or spacing filling etc to establish best coverage of the solution space. For this setup, we have generate 10,000 cases to be simulated. Sample

Generate inputs setup and simulate:

Once we have sample points, layer stackup configurations to the solver to be used will be generated. Each field solver has different syntax thus a flow will be needed... TProFlow

In this case, we use HSpice from Synopsys for field solver, thus each of 10K parameter combinations will be used to generate their spice input files for simulation: Simulate

The next step is to perforam circuit simulation for all these cases. This may be a time-consuming process so a distributed environment or simulation farm may be used.

Performance measurement:

The outcome of each simulation is a frequency dependent tabular model, corresponding to its layer stackup settings. HSpice's tabular format looks like this: Tabular Next step is to load these models and do performance measurement: Measure Matrix manipulation such as eigen-value decomponsition will be applied in order to obtain the characteristic impedance and propagation speed etc. Measurement output of each model should be a set of parameters which will be combined with original inputs to form the dataset for our prediction modeling.

Prepare Data:

From this point, we can start the modeling process using python and various packages.

In [101]:
%matplotlib inline

## Initial set-up for data
import os
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import numpy as np

prjHome = 'C:/Temp/WinProj/LStkMdl'
workDir = prjHome + '/wsp/'
srcFile = prjHome + '/dat/SLSE3.csv'

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(workDir, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)
In [102]:
# Let's read the data and do some statistic
srcData = pd.read_csv(srcFile)

# Take a peek:
srcData.head()
Out[102]:
H1 H3 ER1 ER3 TD1 TD3 H2 W S EDW SIGMA H FNAME Z0SE(1_SE) S0SE(1_SE) KBSENB(1_1) KFSENB(1_1) A(1_1)
0 4.9311 5.5859 3.6964 3.3277 0.013771 0.030565 0.91236 7.5573 30.6610 0.177530 50000000.0 15 SPIMDL00001.TAB 37.715088 1.601068e+08 0.000035 -7.297762e-15 0.497289
1 4.8918 5.3156 4.0410 3.8404 0.022008 0.008600 0.53126 7.0378 6.9217 0.733370 50000000.0 15 SPIMDL00002.TAB 39.831954 1.510345e+08 0.019608 -1.093612e-12 0.249591
2 1.7907 11.5230 3.6984 3.3534 0.013859 0.045137 2.00990 3.6964 27.3230 0.707750 50000000.0 15 SPIMDL00003.TAB 35.663928 1.587798e+08 0.000305 -2.797417e-13 0.705953
3 2.8595 4.7259 3.9605 3.3981 0.010481 0.018028 0.36547 2.3872 7.2803 0.735210 50000000.0 15 SPIMDL00004.TAB 59.456438 1.558444e+08 0.010125 -5.478920e-12 0.327952
4 5.5946 8.2553 3.4176 3.3249 0.024434 0.003663 2.08810 6.3776 20.1680 0.053585 50000000.0 15 SPIMDL00005.TAB 46.082722 1.633106e+08 0.004136 -4.787903e-13 0.169760
In [103]:
srcData.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 18 columns):
H1             10000 non-null float64
H3             10000 non-null float64
ER1            10000 non-null float64
ER3            10000 non-null float64
TD1            10000 non-null float64
TD3            10000 non-null float64
H2             10000 non-null float64
W              10000 non-null float64
S              10000 non-null float64
EDW            10000 non-null float64
SIGMA          10000 non-null float64
H              10000 non-null int64
FNAME          10000 non-null object
Z0SE(1_SE)     10000 non-null float64
S0SE(1_SE)     10000 non-null float64
KBSENB(1_1)    9998 non-null float64
KFSENB(1_1)    9998 non-null float64
A(1_1)         10000 non-null float64
dtypes: float64(16), int64(1), object(1)
memory usage: 1.4+ MB
In [104]:
srcData.describe()
Out[104]:
H1 H3 ER1 ER3 TD1 TD3 H2 W S EDW SIGMA H Z0SE(1_SE) S0SE(1_SE) KBSENB(1_1) KFSENB(1_1) A(1_1)
count 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.0 10000.0 10000.000000 1.000000e+04 9.998000e+03 9.998000e+03 10000.000000
mean 4.199999 7.500007 3.850001 3.850001 0.025000 0.025000 1.400000 5.500001 18.500007 0.375000 50000000.0 15.0 42.985345 1.533140e+08 1.475118e-02 -2.155868e-12 0.509534
std 1.616661 3.175591 0.490773 0.490772 0.014434 0.014434 0.635117 2.020829 9.526750 0.216515 0.0 0.0 172.163835 7.085288e+06 2.847164e-02 3.787910e-11 0.288974
min 1.400000 2.001000 3.000000 3.000100 0.000003 0.000002 0.300140 2.000100 2.002600 0.000002 50000000.0 15.0 11.630424 1.384625e+08 1.009146e-10 -2.796522e-09 0.011486
25% 2.799925 4.750325 3.425075 3.425050 0.012500 0.012501 0.850038 3.750350 10.250000 0.187510 50000000.0 15.0 32.173933 1.479284e+08 2.216767e-04 -1.112024e-12 0.328938
50% 4.200000 7.500500 3.850000 3.849950 0.025000 0.025000 1.399950 5.500150 18.500000 0.374985 50000000.0 15.0 40.031614 1.527750e+08 1.969018e-03 -1.928585e-15 0.507237
75% 5.600075 10.249500 4.274950 4.274950 0.037499 0.037498 1.949875 7.249650 26.748000 0.562507 50000000.0 15.0 48.688036 1.583456e+08 1.376321e-02 7.447876e-13 0.680500
max 6.999900 13.000000 4.699800 4.699900 0.050000 0.049995 2.500000 8.999600 34.999000 0.749930 50000000.0 15.0 17097.973570 1.725066e+08 3.184933e-01 9.613918e-11 14.524624

Note that:

  • Sigma(Conductivity) and H(default layer height) are constants in this setup;
  • FNAME (File name) is not needed for modeling
  • Z0 (impedance) has outliers
  • Forward/Backward crosstalk (Kb/kf) have missing terms
In [105]:
# drop constant and file name columns
stkData = srcData.drop(columns=['H', 'SIGMA', 'FNAME'])
In [106]:
# plot distributions before dropping measurement outliers
stkData.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()
Saving figure attribute_histogram_plots
In [107]:
# drop outliers and invalid Kb/Kf cells
# These may be caused by unphysical stakup model or calculation during post-processing
maxZVal = 200
minZVal = 10
stkTemp = stkData[(stkData['Z0SE(1_SE)'] < maxZVal) & \
                  (stkData['Z0SE(1_SE)'] > minZVal) & \
                  (np.abs(stkData['KBSENB(1_1)']) > 0.0) & \
                  (np.abs(stkData['KFSENB(1_1)']) > 0.0)]

# Check again to make sure data are now justified
stkTemp.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 9994 entries, 0 to 9999
Data columns (total 15 columns):
H1             9994 non-null float64
H3             9994 non-null float64
ER1            9994 non-null float64
ER3            9994 non-null float64
TD1            9994 non-null float64
TD3            9994 non-null float64
H2             9994 non-null float64
W              9994 non-null float64
S              9994 non-null float64
EDW            9994 non-null float64
Z0SE(1_SE)     9994 non-null float64
S0SE(1_SE)     9994 non-null float64
KBSENB(1_1)    9994 non-null float64
KFSENB(1_1)    9994 non-null float64
A(1_1)         9994 non-null float64
dtypes: float64(15)
memory usage: 1.2 MB
In [108]:
# now plot distributions again, should see proper distribuition now
stkData = stkTemp
stkData.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()
Saving figure attribute_histogram_plots
In [109]:
# find principal components for Z
corr_matrix = stkData.drop(columns=['KBSENB(1_1)', 'KFSENB(1_1)', 'S0SE(1_SE)', 'A(1_1)']).corr()
corr_matrix['Z0SE(1_SE)'].abs().sort_values(ascending=False)
Out[109]:
Z0SE(1_SE)    1.000000
W             0.664396
H1            0.535432
H3            0.344356
H2            0.186618
ER1           0.119390
ER3           0.110507
EDW           0.067261
TD3           0.061906
TD1           0.017937
S             0.000107
Name: Z0SE(1_SE), dtype: float64

From this correlation matrix above, it can be shown that trace width and height are dominate factors for the trace's impedance.

Choose a Model:

Since we are building a nominal estimator here, I will try simple linear regressor as estimator first:

In [110]:
# Separate input and output attributes
allTars = ['Z0SE(1_SE)', 'KBSENB(1_1)', 'KFSENB(1_1)', 'S0SE(1_SE)', 'A(1_1)']
varList = [e for e in list(stkData) if e not in allTars]
varData = stkData[varList]
In [111]:
# We have 10,000 cases here, try in-memory normal equation directly first:

# LinearRegression Fit Impedance
from sklearn.linear_model import LinearRegression

tarData = stkData['Z0SE(1_SE)']
lin_reg = LinearRegression()
lin_reg.fit(varData, tarData)

# Fit and check predictions using MSE etc
from sklearn.metrics import mean_squared_error, mean_absolute_error
predict = lin_reg.predict(varData)
resRMSE = np.sqrt(mean_squared_error(tarData, predict))
resRMSE
Out[111]:
3.478480854776983
In [112]:
# Use 10-Split for cross validations:
def display_scores(attribs, scores):
    print("Attribute:", attribs)
    print("Scores:", scores)
    print("Mean:", scores.mean())
    print("Standard deviation:", scores.std())
    
from sklearn.model_selection import cross_val_score
lin_scores = cross_val_score(lin_reg, varData, tarData,
                             scoring="neg_mean_squared_error", cv=10)
lin_rmse_scores = np.sqrt(-lin_scores)
display_scores(varList, lin_rmse_scores)
Attribute: ['H1', 'H3', 'ER1', 'ER3', 'TD1', 'TD3', 'H2', 'W', 'S', 'EDW']
Scores: [3.25291562 4.16881619 3.22429058 3.32251943 3.70096691 3.46812853
 3.34169206 3.5511994  3.28259776 3.39876687]
Mean: 3.4711893352193854
Standard deviation: 0.270957271481177
In [113]:
# try Regularization it self
from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=1, solver="cholesky")
ridge_reg.fit(varData, tarData)
predict = ridge_reg.predict(varData)
resRMSE = np.sqrt(mean_squared_error(tarData, predict))
resRMSE
Out[113]:
3.4921877809274595

Thus a 3 ohms or so difference may be obtained from this estimator. What if higher order regression is used:

In [114]:
from sklearn.preprocessing import PolynomialFeatures
poly_features = PolynomialFeatures(degree=2, include_bias=False)
varPoly = poly_features.fit_transform(varData)
lin_reg = LinearRegression()
lin_reg.fit(varPoly, tarData)
predict = lin_reg.predict(varPoly)
resRMSE = np.sqrt(mean_squared_error(tarData, predict))
resRMSE
Out[114]:
1.2780298094097002

A more accurate model thus may be obtained this way.

Training and Evaluation:

In [115]:
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

def plot_learning_curves(model, X, y):
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=10)
    train_errors, val_errors = [], []
    for m in range(1, len(X_train)):
        model.fit(X_train[:m], y_train[:m])
        y_train_predict = model.predict(X_train[:m])
        y_val_predict = model.predict(X_val)
        train_errors.append(mean_squared_error(y_train_predict, y_train[:m]))
        val_errors.append(mean_squared_error(y_val_predict, y_val))

    plt.plot(np.sqrt(train_errors), "r-+", linewidth=2, label="Training set")
    plt.plot(np.sqrt(val_errors), "b-", linewidth=3, label="Validation set")
    plt.legend(loc="upper right", fontsize=14)
    plt.xlabel("Training set size", fontsize=14)
    plt.ylabel("RMSE", fontsize=14)

lin_reg = LinearRegression()
plot_learning_curves(lin_reg, varData, tarData)
plt.axis([0, 8000, 0, 20])
save_fig("underfitting_learning_curves_plot")
plt.show()
Saving figure underfitting_learning_curves_plot

Neural Network:

As the difference between prediction to actual measurement is about two ohms, it has met our modeling goals. As an alternative approacy, let's try neural net modeling below

In [116]:
from keras.models import Sequential
from keras.layers import Dense, Dropout

numInps = len(varList)
nnetMdl = Sequential()
# input layer
nnetMdl.add(Dense(units=64, activation='relu', input_dim=numInps))

# hidden layers
nnetMdl.add(Dropout(0.3, noise_shape=None, seed=None))
nnetMdl.add(Dense(64, activation = "relu"))
nnetMdl.add(Dropout(0.2, noise_shape=None, seed=None))
          
# output layer
nnetMdl.add(Dense(units=1, activation='sigmoid'))
nnetMdl.compile(loss='mean_squared_error', optimizer='adam')

# Provide some info
#from keras.utils import plot_model
#plot_model(nnetMdl, to_file= workDir + 'model.png')
nnetMdl.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_10 (Dense)             (None, 64)                704       
_________________________________________________________________
dropout_7 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_11 (Dense)             (None, 64)                4160      
_________________________________________________________________
dropout_8 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_12 (Dense)             (None, 1)                 65        
=================================================================
Total params: 4,929
Trainable params: 4,929
Non-trainable params: 0
_________________________________________________________________
In [117]:
# Prepare Training (tran) and Validation (test) dataset
varTran, varTest, tarTran, tarTest = train_test_split(varData, tarData, test_size=0.2)

# scale the data
from sklearn import preprocessing
varScal = preprocessing.MinMaxScaler()
varTran = varScal.fit_transform(varTran)
varTest = varScal.transform(varTest)

tarScal = preprocessing.MinMaxScaler()
tarTran = tarScal.fit_transform(tarTran.values.reshape(-1, 1))
In [118]:
hist = nnetMdl.fit(varTran, tarTran, epochs=50, batch_size=1000, validation_split=0.1)
tarTemp = nnetMdl.predict(varTest, batch_size=1000)
predict = tarScal.inverse_transform(tarTemp)
resRMSE = np.sqrt(mean_squared_error(tarTest, predict))
resRMSE
Train on 7195 samples, validate on 800 samples
Epoch 1/50
7195/7195 [==============================] - 0s 43us/step - loss: 0.0457 - val_loss: 0.0205
Epoch 2/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0202 - val_loss: 0.0150
Epoch 3/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0173 - val_loss: 0.0157
Epoch 4/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0169 - val_loss: 0.0134
Epoch 5/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0148 - val_loss: 0.0111
Epoch 6/50
7195/7195 [==============================] - 0s 6us/step - loss: 0.0129 - val_loss: 0.0094
Epoch 7/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0110 - val_loss: 0.0078
Epoch 8/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0093 - val_loss: 0.0061
Epoch 9/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0078 - val_loss: 0.0045
Epoch 10/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0063 - val_loss: 0.0034
Epoch 11/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0055 - val_loss: 0.0026
Epoch 12/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0047 - val_loss: 0.0021
Epoch 13/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0043 - val_loss: 0.0018
Epoch 14/50
7195/7195 [==============================] - 0s 6us/step - loss: 0.0039 - val_loss: 0.0017
Epoch 15/50
7195/7195 [==============================] - 0s 6us/step - loss: 0.0037 - val_loss: 0.0016
Epoch 16/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0034 - val_loss: 0.0015
Epoch 17/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0032 - val_loss: 0.0015
Epoch 18/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0030 - val_loss: 0.0015
Epoch 19/50
7195/7195 [==============================] - ETA: 0s - loss: 0.002 - 0s 8us/step - loss: 0.0029 - val_loss: 0.0015
Epoch 20/50
7195/7195 [==============================] - 0s 6us/step - loss: 0.0028 - val_loss: 0.0014
Epoch 21/50
7195/7195 [==============================] - 0s 6us/step - loss: 0.0028 - val_loss: 0.0014
Epoch 22/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0026 - val_loss: 0.0014
Epoch 23/50
7195/7195 [==============================] - 0s 6us/step - loss: 0.0025 - val_loss: 0.0013
Epoch 24/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0024 - val_loss: 0.0013
Epoch 25/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0024 - val_loss: 0.0013
Epoch 26/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0023 - val_loss: 0.0012
Epoch 27/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0023 - val_loss: 0.0012
Epoch 28/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0022 - val_loss: 0.0012
Epoch 29/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0022 - val_loss: 0.0012
Epoch 30/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0020 - val_loss: 0.0011
Epoch 31/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0021 - val_loss: 0.0011
Epoch 32/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0020 - val_loss: 0.0011
Epoch 33/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0020 - val_loss: 0.0011
Epoch 34/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0019 - val_loss: 0.0010
Epoch 35/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0018 - val_loss: 0.0010
Epoch 36/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0018 - val_loss: 9.9226e-04
Epoch 37/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0018 - val_loss: 9.8783e-04
Epoch 38/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0017 - val_loss: 9.5433e-04
Epoch 39/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0017 - val_loss: 9.5077e-04
Epoch 40/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0017 - val_loss: 9.2018e-04
Epoch 41/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0016 - val_loss: 9.1261e-04
Epoch 42/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0016 - val_loss: 9.0183e-04
Epoch 43/50
7195/7195 [==============================] - 0s 4us/step - loss: 0.0016 - val_loss: 8.8474e-04
Epoch 44/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0016 - val_loss: 8.6719e-04
Epoch 45/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0015 - val_loss: 8.4854e-04
Epoch 46/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0016 - val_loss: 8.2477e-04
Epoch 47/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0015 - val_loss: 8.0656e-04
Epoch 48/50
7195/7195 [==============================] - 0s 6us/step - loss: 0.0014 - val_loss: 8.1118e-04
Epoch 49/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0014 - val_loss: 7.8844e-04
Epoch 50/50
7195/7195 [==============================] - 0s 5us/step - loss: 0.0014 - val_loss: 7.9169e-04
Out[118]:
1.982441126153167
In [119]:
plt.plot(hist.history['loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.show()

With epoc increased to 100, we can even obtain 1.5 ohms accuracy. It seems this neural network model is comparable to the polynominal regressor and meet our needs.

In [120]:
# save model and architecture to single file
nnetMdl.save(workDir + "LStkMdl.h5")

# finally
print("Saved model to disk")
Saved model to disk

Deploy:

Using the SESL3 data set as an example, we follow the similar process and built 10+ prediction models for different stakup structure setup. The polynominal model or neural network can be implemented in Java/C++ to avoid dependencies on python's package for distribution purpose. The implemented front-end, shown below, provide a quick and easy method for system designer for stackup/routing planning: Deploy

Conclusion:

In this post/markdown document, we decribe the stackup modeling process using data science/machine learning techniques. The outcome is a deployed front-end with modeled neural network for user's instant performance evaluation. The data set and this markdown document is published on this project's git-hub page.

S-Param: Various indicators… ILFit, ILD, ICN, ICR, IMR, INEXT, PQM, RQM…

S-Parameter:

In modern channel design, scattering parameters (S-param) is a commonly used to represent passive interconnects. A S-param model is basically a power view (i.e. incidental and reflective) of each “port” under certain reference impedance at various frequencies. So its format is like a frequency dependent matrices each with dimensions equivalent to the square number of ports. In a college text-book, a two port network is usually used to explain these incidental/reflective relationships. Using linear algebra, different form (Z, Y, T, ABCD etc) may also be calculated to be used in different scenarios. One can also cascade two or more S-param. together to form a consolidated S-param model.

For example, a typical SERDES channel is usually in point-to-point topology. So stages of various interconnects may be cascaded into a single S-parameter.

Usually an 3D field solver tool like HFSS/Q3D etc will extract physical design into a S-parameter.  For a homogeneous interconnect such as transmission line, 2D/2.5D field solver can be used to extract their RLGC/tabular model data. These RLGC data can easily be converted to corresponding S-parameters. So essentially all these stages’ device model can be in S-param. format and cascaded together. However, when there are more than two ports on each side… such as the schematic shown above, the basic two-port network formulas need to be “generalized” so that plain S/Z/Y/T/ABCD conversion formula can still be applied. That is, the ports needs to be arranged in such a way so that a “generalized 2-N ports” can be formed:

As a S-parameter is passive in nature and represent “power” data, its value or “decay” can be viewed in dB scale and considered as a type of “loss” (compare to 0Hz origin, or 0dB). Depending on the incidental (through-type) or reflective relationship between each ports, different part of S-param may be viewed as “insertion loss (IL)” or “return loss (RL)” from near end (NE) or far-end (FE):

The notation above is for a “single-ended” s-param. To describe a differential channel, a “differential-mode” or “mixed-mode” conversion need to be applied. The forming  of “differential” relationship is essentially to subtract certain element in the S-param’s matrix which represents the “return” path or return channel. The resulting S-parameter can thus be used to describe relationship between “modes” such as differential mode or common mode:

As one can see, a s-parameter is so “versatile” in describing interconnect’s properties yet its data is non-intuitive complex numerical matrices, an usual or traditional way of checking a s-param’s quality is to visualize its data and “eye-balling” to correlate to model data from previous design revision or results from different extractors:

While this method is still useful, the judgement of model’s quality very much depends on one’s experience or tolerance and thus is not very objective. As the data rate becomes higher and interconnect becomes more “coupled” or lossy, new parameters are defined to check a S-parameter’s figure of merits.

Here we list some of important and non interface specific S-parameter checking parameters for reader’s reference. Many of these quality metrics have been incorporated into our SPISPro module for an one stop shop of all the S-param. processing needs.

S-param. check introduced in IEEE 802.3 bj: This spec. defines a “figure of merit” for each interconnect used in a high-speed SERDES (>= 100G) channel. Generic package and motherboard portion are included using behavior models and cascaded with user provided through-type/NEXT/FEXT S-parameter to form a complete and consolidated channel. Then parameters of FFE at the Tx and CTLE+DFE at the Rx are swept to see how good a channel can perform in the best scenarios . A calculated “COM” (channel operating margin” value is then calculated to represent this channel’s potential. COM’s value is significantly affected by cascaded S-param’s parameters listed below:

  • ILFit: Insertion loss fit

The inductive and capacitive properties of a channel will introduce frequency dependent loss. To avoid signal distortion, it is usually desired to have a linear relationship between frequency and the channel losses such as well-behaved example below:

However, due to impedance mismatch and multiple reflections, such linear dependency may not be possible. Thus “ripples” may exist which in terms introduce the time domain noise and causes degraded final eye height/width. A “fitted-curve” (ILFit, up to half of bandwidth, i.e. Nyquist rate) may be introduced to approximate such insertion loss. Such smoothed curve can be viewed as channel without mutli-reflection and is thus a “best-case” scenario.

ILfit is usually calculated in a minimized-squared-error (MSE) sense. Pre-defined curve’s order is used and its weighted parameters are then fitted.

  • ILD: Insertion loss deviation

Once an ILfit is calculated, one can calculate the “difference” between Ithis fitted curve and original data to form the ILD:

The calculated ILD values can then be further integrated to form a channel’s Figure of merits”:

  • ICN/ICF: Integrated NE/FE cross-talk noise

These are calculated RMS values integrated (summed) over a frequency range. The integrated value represents impact due to both near-end and far-end crosstalk:

  • ICR: Insertion loss to cross-talk ratio

This value compares channel performance’s impact due to insertion loss vs cross-talk.

  • COM: Channel operating margin

The amplitude resulting from single-bit-response (SBR) can be separated based on the signal and losses due to crosstalk, insertion loss etc. With equalization at both Tx and Rx side both being taken into account, the best case value will eventually be summarized as an indicator, “COM” for the channel performance’s comparison.

S-param check and parameters defined in COM methodologies has profound impact. It not only sets an example for other interfaces (e.g. USB-C, discussed below) to check their interconnect model, but also provides a meaningful correlation between different S-param’s deviation toward its final eye impact. Details about these calculation can be found in 802.3bj spec.. SPISim’s SPro integrated COM as part of the S-param reporting flow as shown in various plots above and screen shots below:

 

S-param. check introduced in USB-C Cable/Connector:

USB Type C (USB 3.1) spec. was published in 2014. Not only does it increase the signaling speed from previous gen. to be 10Gbps, it also provides power charging capability, has a smaller connector form factor and has non-directional connectivity. As this spec. is back-ward compatible, both legacy and low-speed USB connectivity are also supported. As such, there are many different scenarios, coupled with different VNA capabilities to be considered:

However, the main check (ILD/IMR etc) in USB-C is more or less similar to those used in COM methodology. Nevertheless, there are still minor difference in terms of implementation such as how insertion loss should be fitted and weighted…. as discussed below. In addition, since there are different pairs of traces (e.g. SS, D+/D- etc) in the cable/connector, dedicated checks between signals are also included. Essentially they are calculation applied toward specific S-parameter matrix elements corresponding to the incidental/reflective behaviors between those traces/ports.

  • ILfitNq: Insertion loss fit at Nyquist rate

COM methodology in 802.3bj spec. spells out what the fitting formula should be used. However, this is not the case for USB-C. In a reference materials for the USB-C dev. forum’s tool, Interpar, it suggests a different formula as shown below should be used:

In addition, a “weighting” function is suggested to downplay the deviation at the high-frequency portion:

The checking of the IL_Fit/ILD relies more on frequency dependent spec. lines as well up to the Nyquist rate. A “Pass”/”Fail” indicator is then calculated against these spec. line.

  • IMR: Integrated multi-reflection

The deviation part (ILD) has been separated to form its own figure of merit using formula shown below:

Resulting IMR values are super-speed pair (SS) specific:

  • IRL: Integrated return loss

This is a metric dedicated for return loss check:

  • INEXT/IFEXT: Integrated NEXT/FEXT noise

Similarly, there are metric dedicated for cross talk check and spec-lines:

  • Others: other signal specific check can be performed by simply configuring and plotting specific matrix elements within certain frequencies ranges then compare with defined spec. lines. Example of differential-to-common mode conversion is shown below.

  Both COM and USB-C S-param methodologies are parts of our SPro’s reporting flow:

S-param check introduced in IEEE P370:

IEEE P370 committee’s objective is to define fixture design as well as data quality metric standard for high speed interconnects. A good overview article is available at the signal integrity journal linked [HERE] The P370 WG3, which us SPISim took part in, focuses on S-parameter’s various quality metric (QM):

  • PQM: S/Y-Passivity quality metric
  • RQM: Reciprocal/symmetry quality metric
  • CQM: Causality quality metric
  • EQM: Even-Odd quality metric

Previously, we have written a post [LINKED HERE] detailing how these metrics are calculated. Interested readers may also wait for upcoming P370 spec. for the details. One thing to note is that regardless it’s COM, USB-C or channel simulation like those involved IBIS-AMI models, high-quality interconnect/S-param model is usually required as it will significantly impact the resulting impulse response or pulse/single-bit response. When convolving these time-domain responses many times to obtain final PDF/CDR for BER, any initial imperfection in the S-param model will be “amplified” and cause end results’ error. As such, in both COM and USB-C’s flow, the cascaded or user measured S-param have always been checked and adjusted particularly in the aspects of passivity and causality. They are done particularly at the low (toward DC) and high frequency (trailing tails) region. Separated procedures have been applied to extrapolate for a DC point by using first several data points then fit with “theoretical” good behavior model equation to make sure DC response is correct. These details are not even mentioned in the 802.3/USB-C spec…. one must bite the bullet and study to their reference implementation (e.g. matlab scripts) to find such procedure’s existence. Without this auto fixing, calculated COM/Spec. parameters will not match those produced by reference script or InterPar forum tool.

IBIS-AMI: Study of DDR Asymmetric Rt/Ft in Existing IBIS-AMI Flow

[This blog post is written in preparation for the presentation of the same title to be given at the 2019 DesignCon IBIS Summit. Presentation slides and audio recording are linked at the bottom of this post.]

This paper is written by both Wei-hsing Huang (principle consultant at SPISim USA) and Wei-kai Shih, who is Tokyo based.

Motivation:

Here in US, one of IBIS committee’s working groups, IBIS-ATM (advanced technology modeling) has regular meeting on Tue. I try to call-in whenever possible to gain insights on upcoming modeling trends. During mid 2018, DDR5 related topics were brought up: Existing AMI reference flow described in the spec. focuses on differential or SERDES. For example, the stimulus waveform is from -0.5 to 0.5 and/or a single impulse response is used for analysis, thus assuming symmetric rise time (Rt) and fall time (Ft) mostly. Whether this reference flow can be applied to DDR, which may have asymmetric Rt/Ft and single-ended like DQ, is the center of discussion. Different EDA companies in this work group have different opinions. Some think the flow can be used directly with minimal change while others think the flow has fundamental shortcomings for DDR. Thing about IBIS spec. change is that whoever think the current version has deficiencies needs to write a “buffer issue resolution document (BIRD)”. Doing so will inevitably disclose some of the trade secrets or expose shortcoming of the the tool. As a result, while there are companies which think change may be needed, no flow change have been proposed at this point. As a model maker, I wonder then how existing flow can be applied to DDR without major change? Thus this study is to demonstrate “one” possible implementation. Existing EDA companies may have more sophisticated algorithms/implementations to support this asymmetric condition, but the existence of “one” such possible flow may convince model makers that it’s time to think about how DDR AMI may be implemented rather than waiting for the unlikely spec. change.

AMI_Init:

There are both “statistical” and “bit-by-bit” flows in channel analysis. In either case, the first step an EDA tool will do before calling AMI model is “channel calibration”. According to the spec. the impulse response of the channel, which includes analog buffer, is obtained here. For a SERDES design which has no asymmetric Rt/Ft issue, this impulse is then sent to TX AMI followed by RX AMI, resulting impulse response is then calculated using probability density function (PDF), integrated to be cumulative density function (CDF), then obtain bathtub plots etc.

The textbook definition of an impulse response is from a “delta response” input which happens at the infinite small time step. In real situation, there is no such thing as an “infinite small time step”. The minimal step used by a simulator is a “time step” which is usually 1ps or more. Buffer will not toggle from low to high back to low in a single time step. So in reality, simulator often uses step response then take derivative to get impulse response. Now the problem comes: for an analog channel with asymmetric Rt/Ft, these two step response (ignoring the sign) are different. That means we will have two different impulse response, then which one should be send to AMI models? A note here up front is that it’s EDA tool which sets up the calibration, so it has any nodal information, such as pad of Tx and Rx analog buffer, if needed.

Asymmetric Rt/Ft:

One may think that there is no such limitation that an AMI model can only be called once. So theoretically, a simulator can run analysis flow twice… impulse calculated from rising step response is used for the first time and the one from falling step response is used for the second time. However, not only is this not efficient, a model may not be implemented properly such that calling AMI_Init again right after AMI_Close may cause crash if it’s in the same process and model pointer was not released completely. Thus doing so may hamper a simulator’s robustness.

As depicted in the picture above… if a simulator uses a long UI pulse to calibrate the channel, then both rising and falling step response are included in one simulation. Now let the data captured at Tx analog pad as X1 and X2 for rising and falling portion respectively, the data captured at Rx analog pad Y1 and Y2 will be X1 and X2 convolved with interconnect’s transfer function, which is LTI. If we derive a Xform(t) which is transfer function between X1 and X2, then that Xform(t) should also be able to transform between Y1 and Y2.  That means if a simulator can calculate Xform(t) it self, then regardless the impulse response it sent to AMI models is calculated from rising or falling step response, it can always “reconstruct” the result from the other type of impulse response using this Xform(t) function.

To prove this concept, we have written a simple matlab script taking step inputs of different slew rate, say inp1 and inp2. It calculates the Xform(t) function from both inputs and then reconstruct the response out2′ from out1. When overlaying nominal output out2 and reconstructed out2′ together, we can see that they match very well, thus prove the concept.

Once we have response from both different slew rates, we can construct their respective eyes then use each one’s different portion to construct a synthesized eye. Such eye will not be symmetric like that calculated from SERDES.

When calculating PDF for asymmetric case, one may also need to consider the precedent bit’s value and use a tree like structure to keep track of possible bit sequence. For example, for a typical SERDES bit sequence, if encoding is not considered, each bit will have 50% one and 50% zero. PDF is constructed based on that assumption. But in an asymmetric case, if the data used at the cursor is from rising response, then the cursor bit must be 1 while (cursor – 1) must be zero. If (cursor – 2) is 1 again, then the tail of falling response at (cursor – 1) will be superimposed to the cursor data. That is, we can’t treat each bit to have same 50% probability when constructing PDF. It’s not a binomial distribution as each occurrence is not independent. A simulator may need to determine the maximum bit length to keep track of first, then based on that depth to form tree-like sequence which leads to the rising or falling steps at the cursor location. Finally use superimpose to construct the overall response.

AMI_GetWave:

According to the reference flow for the bit-by-bit case: equalized Tx output from digital bit sequence is converted with channel’s impulse response. The resulting waveform is then sent to Rx EQ before getting final results. Either Tx EQ or Rx EQ or both may not be LTI so usage of aforementioned Xform(t) is not applicable.

As a fruit of thought… the spec. only mentions that in a bit-by-bit mode, the output of Tx AMI model is equalized digital sequence, while input to the Rx EQ must be the channel response from that sequence, then are there other ways to get such response to Rx yet with different Rt/Ft considered?

One example is like shown in top half of the picture above. If a simulator takes that equalized digital input and “simulate” to get final response, then this “simulation process” should have taken different Rt/Ft into account and has valid results. However, this process will be slow and I don’t think any simulator is doing it this way. Furthermore, the spec. specifically say it needs to “convolve” with impulse response. First of all, this impulse can be from rising or falling. Secondly, even we decide to decovolve with input first (thus has sequences of different delta response) then convolve with pulse response (i.e. one simulated UI), will there be any issue?

From the plot above…  we can see that when a pulse has different rising and falling slew rate, using superimpose to construct 011… will find “glitches” at the trailing high state portion. The severity of this “glitches” depends on how much difference the Rt/Ft is. So using a pulse response here will still not work.

A simple matlab script has also been written to demonstrate occurrence of such “glitches”. This proves that not only using an impulse response to convolve with Tx EQ’s output is problematic, even using a full simulated pulse (which has asymmetric Rt/Ft’s effect) to convolve delta sequences (this delta sequence is original TX EQ’s output deconvolve with one digital bit) will still be problematic. Glitches will happen for consecutive ones or zeros due to the mismatches of Rt and Ft. Thus one must use rise step and fall step response instead when doing such kind of convolution.

Summary:

In this presentation, we discussed how existing AMI flow may be applied to asymmetric Rt/Ft such as those often seen in DDR case. A “smarter” EDA tool should be able to handle this situation without changing on spec.’s reference flow. When a channel analysis is performed in a “statistical” flow, an EDA tool can obtain waveform data at both Tx and Rx analog buffer’s pads during calibration process. Such data can be used to construct a transform function, XForm(t). With this function, impulse response through EQ can be reconstructed and thus built an asymmetric eye. Tree structure may be needed to keep track of possible bit combinations. In a “bit-by-bit” flow, the current spec. may be too specific as it forces to use convolution of TX EQ’s output with channel’s impulse response before sending to RX EQ. Such direct convolution may be problematic. A “smarter” simulator may calculate it using different method without changing data output from TX EQ and input to the RX EQ. Step response should be used as different Rt/Ft will cause “glitches” when consecutive ones/zeros are present if convolution method is used.

Links:

Presentation: [HERE] (http://www.spisim.com/support/paperetc/20180202_DesignConSummit_SPISim.pdf)

Audio recording (English): [HERE]

IBIS-AMI: An end-to-end AMI modeling flow

In previous post, I mentioned about the “IBIS cook-book” as a good reference for the analog portion of the buffer modeling. Unfortunately, when it comes to the equalization part, i.e. AMI, there is no similar counterpart AFAIK. For the AMI modeling, the EQ algorithms need to be realized with algorithms/procedures implemented as spec. compliant APIs and written in C language. These functions then need to be compiled as a dynamic library in either dynamic link libraries (.dll on windows) or “shared objects (.so on linux-like). Different compiler and build tool has different ways to create such files. So it’s fair to say that many of these aspects are actually in the computer science/programming domains which are outside the electrical or modeling scopes. It is unlikely to have a document to detail all these processes step-by-step.

In this post, instead of writing those “programming” details, I would like to give a high-level overview about what different steps of the AMI modeling process are… from end to end.  Briefly, they can be arranged in the following steps based on execution order:

  1. Analog modeling
  2. Prepare collateral
  3. Define architecture
  4. Create models
  5. Model validation
  6. Channel correlation
  7. Documentation

The following sections will describe each part in details.

Analog modeling:

Believe it or not, the first step of AMI modeling is to create proper IBIS models… i.e. its analog portion. This is particular true if circuit being modeled belongs to TX. A TX AMI model is equalizing signals which includes its own analog buffer’s effect measured at the TX pad. So if there is no channel (pass-through) and it’s under nominal loading condition, the analog response of the TX will be the signals to be equalized. That is to say, without knowing what will be equalized (i.e. what the model’s analog behavior is), one can’t calculate the TX AMI model’s EQ parameters.

Take the plot above as an example. This is a FFE EQ circuit. The flat lines indicated by two yellow arrows are different de-emphasis settings, thus controlled by AMI. However, the rising/falling slew rate, wave shape and dc levels etc as circled in red are all analog behaviors. Thus an accurate IBIS model must be created first to establish the base lines for equalization. Recently, BIRD 194 has been proposed to use touch-stone file in lieu of an IBIS model… still the analog model must be there.

For a RX circuit, it may be easier as an input buffer is usually just a ESD clamp or terminator. Thus it doesn’t take much effort to create the IBIS model. Interested people may see my previous posts regarding various IBIS modeling topics.

Prepare collateral:

AMI’s data can be obtained from different sources: circuit simulation, lab/silicon measurement or data sheet. For simulation case, simulation must be done and the resulting waveform’s performance needs to be extracted. These values will serve as a “design targets” based on whitch AMI model’s parameters are being tuned.

For example, this is a typical TX waveform and measured data:

Various curves have been “lined-up” for easy post-processing. Using our VPro, we batch measured the value at the 5.3ns for different curves and created a table:Similarly, data collected from measurement needs to be quantified. This may be done manually and maybe labor intensive as the noise is usually there:

Some of the circuits may have response is in frequency domain. In this case, various points (DC, fundamental freq. 2X fundamental etc) needs to be measured like above.

If it’s from data sheet, then the values are already there yet there may be different ways to realize such performance. For example, equations of different zeros and poles locations may all have same DC gain or gain at particular frequencies, so which one to pick may depending on other factors.

Define architecture:

Based on the collateral and the data sheet, the modeler needs to determine how the AMI models will be built. Usually it should reflect the IC’s design functions so there are not much ambiguity here. For example, if the Rx circuit has DFE/CDR functions, then the AMI models must also contain such modules. On the other hand, some data my be represented in different ways and proper judgement needs to be made. Take this waveform as an example:

It’s already very obvious that it has a FFE with one post-tap. However, since the analog behavior needs to be represented by an IBIS model, then one needs to decide how these different behaviors, boxed in different colors, should be modeled. They can be constructed with several different IBIS models or a single IBIS model yet with some “scaling” block included so that IBIS of similar wave shapes can be squeezed or stretched. For a repeater, oftentimes people only care about what goes into and what comes out of this AMI model. The abilities to “probe” signals between a repeater’s RX and TX may be limited by the capabilities of simulator used. As a result, a modeler may have freedom determining which functions go into Rx and which go to Tx. In some cases, same model yet with different architecture needs to be created to meet different usage scenarios. An example has been discussed in our previous post [HERE]

Create models:

Once architecture is defined, next step is the actual C/C++ implementation. This is where programming part starts. Ideally, building blocks from previous projects are there already or will be created as a module so that they can be reused in the future. Multiple instance of the same models may be loaded together in some cases so the usage of “static” variables or function need to be very careful. Good programming practice comes into play here. I have seen models only work with certain bit-rate and 32 samples per UI. That indicates the model is “hard-coded”… it does not have codes to up-sample or down-sample the data based on the sampling-interval passed in from the API function. Accompanied with writing model’s C codes are unit testing, source revision control, compilations and dependencies check etc. The last one is particular important on linux as if your model relies on some external libraries and it is not linked statically, the same model running fine on developer’s machine will not even pass golden checker at user’s end…. because the library is not available there. Typically one will need to prepare several machines, virtual or not, which are “fresh” from OS installation and are the oldest “distros” one is willing to support. All these are typical software development process being applied toward this AMI modeling scope.

After the binary .dll/.so files are generated, then next step is to assemble a proper .ami files. Depending on parameter types (integer, values, corners etc), different flavors of syntax are available to create such file. In addition, different EDA simulators has different ways to present the parameter selections to its end user. So one may need to choose best syntax so that choices of parameter values will always be selected properly in targeted simulators. For example, if one already select TYP/MIN/MAX corner for the IBIS model, he/she should not have to do so again for the AMI part. It doesn’t make sense at all if a MIN AMI model will be used with MAX corner IBIS model… the corner should be “synchronized”.

Once the model is ready, next step is to tune the parameters so that each of the performance target will be matched. Some interface, such as PCIe, has pre-defined FFE tap weights so there are no ambiguities. In most cases, one need to find the parameter’s values to match measured or simulated performance. Such tasks is very tedious and error prone if doing manually and process like our “AutoTune” will come very handy:

Basically, our tool let user specify matching target and tool will use bisection algorithm to find the tap values. Hundred of cases can be “tuned” in a matter of minutes. In some other cases, grid search may be needed.

Model validation:

Just like traditional IBIS, the first step of model validation is to run it through golden checker. However, one needs to do so on different platforms:

The golden checker didn’t start checking the included AMI binary models until quite recently. Basically it loads the .ibs file, identifies models with AMI functions, then check the .ami file syntax. Finally, the checker will load the associated .dll/.so files. Due to the fact that different OS platform loads binary files differently, that means certain models (e.g. .dll) can only be checked on associated platform (e.g. Windows). That’s why one needs to perform the same check on different platforms to make sure they are all successful. Library dependencies or platform issues can be identified quickly here. However, the golden checker will not drive the binary file. So the functional checks described in next paragraph will be next step.

Typically, an AMI model have several parameters. To validate a model thoroughly, all combinations of these parameters values need to be exercised. We can “parameterize” settings in a .ami file like below:

Here, pattern like %VARIABLE_NAME% is used to create a .ami template. Then our SPIMPro can be used to generate all combinations of possible parameter values and create as a table. There can usually be hundreds or even thousands cases. Similar to the process described in “Systematic approach mentioned in my previous post”, we can then generate corresponding .ami files for all these cases. So there will be hundreds or thousands of them! Next step is to be able to “drive” them and obtain single model’s performance. Depending on the EDA tools, most of them either do not have automation capability to do this in batch mode or may require further programming. In our case, our SPIMPro and SPIVPro have built-in functions to support this sweeping flow in batch mode all in the same environment. SPISimAMI model driver is used extensively here! Once each case’s simulation is done, again one needs to extract the performance then compare with those obtained from raw data and make delta comparison.

A scattering plot like below will quickly indicate which AMI parameter combinations may not work properly in newly created AMI models. In this case, one needs to go back to the modeling stage to check the codes then do this sweep validation all over again.

Channel correlation:

The model validation mentioned in previous section is only for a single model, not the full channel. So one still needs to pick several full channels set-up to fully qualify the models. A caveat of the channel analysis is that it only shows time domain data regardless the flow is “statistical” or “bit-by-bit”, that means it is often not easy to qualify frequency domain component such as CTLE. In this case, a corresponding s-parameter whose Sdd12 (differential input to differential output) is represented by this CTLE AMI settings can be used for an apple-to-apple comparison, like schematic shown below:

Another required step here is to test with different EDA vendor’s tool. This presents another challenge because channel simulator is usually pricey and it’s rarely the case that one company will have all of them (e.g. ADS, HyperLynx, SystemSI, QCD and HSpice etc). Different EDA tools does invoke AMI models differently… for example, some simulator passes absolute path for DLL_Path reserved parameter while others only sent relative path. So without going through this step, it’s difficult to predict what a model will behave on different tools.

Documentation:

Once all these are done, the final step is of course to create an AMI model usage guide together with some sample set-ups. Usually it will starts with IBIS model’s pin model associations and some performance chart, followed by descriptions of different AMI parameters’ meaning and mapping to the data sheet. One may also add extra info. such as alternatives if the user’s EDA tool does not support newer keyword such as Dll_Path, Dll_ID or Supporting_Files etc. Waveform comparison between original data (silicon measurement vs AMI results) should also be included. Finally it will be beneficial to provide instructions on how an example channel using this model can be set-up in popular EDA tools such as ADS, HyperLynx or HSpice.

Summary:

There you have it.. the end-to-end AMI modeling process without touching programming details! Both AMI API and programming languages are moving targets as they both evolve with time. Thus one must continue honing skills and techniques involved to be able to deliver good quality models efficiently and quickly. This is a task which requires disciplines and experience of different domains. After sharing these with you readers, do you still want to do it yourself? 🙂 Happy modeling!

A quick and easy IBIS modeling flow

For engineers who are new to IBIS modeling, the “IBIS CookBook” [LINK HERE] is a very good reference document to get started. The latest version, V4.0, was created back in 2005. While most of the documented extraction procedures still hold true to this date, some of them may be tedious or even ambiguous in terms of executions. This is particular true for processes mentioned in Chapter 4, differential buffer modeling. Further more, most recent IBIS summit presentations focus on “new and hot” topics like IBIS-AMI modeling methodologies and not many are for the traditional IBIS. In this post, I would like to first review these “formal” process, dive into how each modeling table is extracted and used in simulation, then propose a “quick and easy” method particular for differential buffer. I will then summarize with and this approach’s pros and cons.

IBIS model components:

The most basic IBIS building block, as defined in Spec. Version 3.2, is shown above. Typically at least six tables will be included in an output type buffer. They are IV (Pull-up, Pull-down) and Vt( Rising and falling) under two different test load conditions. Additional clamp IV table (Power and Ground clamp) may be added for input type buffer. After Spec version V5.1, Six additional IT tables for ISSO_PU/PD/Composite currents have also been added to address PDN effects. To create an IBIS model, the data extraction processes start with exciting particular portion of the buffer so that measured data can be post-processed to formulate as a spec-compatible table format. Because a model also has TYP/MIN/MAX skews, so the number of simulations are basically the aforementioned number of tables times three. That is, for a most basic IBIS modeling, one may need to simulate at least eighteen cases (or simulation  “decks”).

To explain a little bit more regarding blocks untouched by proposed new method, I list them in the bullets below:

  • Package/Pin parasitics: IBIS cookbook and normal modeling flow do not mention about this part. Usually a buffer package’s model is extracted using tools such as HFSS or Q3d into a form of S-parameters or equivalent broad-band spice model. An IBIS model can use a lumped R+L+C structure to describe pin specific or package (apply to all pins) specific parasitics. Alternatively, an IBIS model can also use a more detailed tree structure package model shown below for non-lumped structure. Regardless, it’s HFSS or Q3D’s task to convert such extracted S-parameter or multi-terminal sub-circuit into these simple lumped RLC values or tree structures to be included in an IBIS file. It’s a separated process and not discussed here as a part of the buffer modeling.

  • C_Comp: At the very beginning, there is only a C_Comp value between pad and ground and it is used to describe frequency dependent behavior besides the parasitics. Later on, tool like HSpice introduces extra simulation syntax to split this single C_Comp value into branches between pad and various power terminals for better accuracy. Even later, this type of syntax was adopted as part of the IBIS spec. Still, user may only find how a single C_Comp value is computed in most materials. Briefly speaking, they can be calculated using time-domain method based on RC charging/discharging time constant or freq-domain method based on the imaginary current at a particular frequency. How to split this single value into several ones to match the frequency plot best remains an art (i.e. not standardize). In addition, the value C_Comp is not visible during modeling… their effects are only shown when there are reflections back from the other end due to impedance mismatch. What we have found is that usually an IC designer has a better idea about how this value should be and the aforementioned time/frequency domain calculation method may not produce an accurate estimate.

  • Clamp current: Power/Ground clamp currents and Pull-up/Pull-down currents are both IV based (i.e. dc steady state). So they are combined for load-line analysis during simulation. The difference between Pull-up/Pull-down and Clamp is that the latter one (i.e. Clamp) can’t be turned-off. So its effect is always there even when we are extracting IV for Pull-up/Pull-down structures. Thus to avoid “double-counting”, the post-processing stage needs to remove the clamp current from pull-up/pull-down currents first before putting them into separated table. To simplify the situation… particular for an output differential buffer, we may just use IV data even though this is an IO buffer.
  • IT current: These are different dc or transient based sweep in order to obtain buffer’s drawing current when power or ground are not “ideal”. This is important in DDR case when the DQ is single ended and it’s subjective to PDN’s noise. For differential application like SERDES, PDN’s effects are usually present at both the P and N terminals and will cancel with each other. Thus their extraction may be skipped for a differential buffer mostly. One may also note that the IT extraction of composite current is “synchronized” with VT extraction of rising/falling waveform so these current data are extracted with additional “probes” rather than separated simulation.

Full IBIS modeling flow:

The process suggested in IBIS’s cookbook can be summarized as the following steps. They are also implemented in our “Full IBIS modeling flow” within SPIBPro:

  • 0, Collect design data and collateral: A modeler needs to gather PVT (process, voltage, temperature) data, silicon design, buffer terminals’ definitions and bias conditions etc. A buffer may have several tuning “legs” and bit-set settings so a modeler needs to determine which will be used for TYP, MIN and MAX corners.
  • 1, Prepare working space: Create a working space on the disk.
  • 2, Generate simulation inputs: Generate simulation “decks” to excite different block of the buffer…one at a time. So one will have eighteen or more decks at the end of this stage waiting to be simulated.
  • 3, Perform simulations: Perform simulation either sequentially on a local machine or with a simulation “farm”. Double check the results and make sure they make sense, otherwise, go back to step 0 to see which settings may be incorrect or missing.
  • 4, Generate IBIS model: Post-process the simulation data and generate IBIS model. This is usually done by the tool like ours as manual process is tedious and error prone.
  • 5, Syntax check: First quality check of an IBIS model is that it must pass the golden checker. The check here is mostly syntax-wise though there are also basic behavior check such as monotonicity or DC mismatch etc.
  • 6, Validate IBIS model: A formal validation for an IBIS model is to hook-up test load and make sure they produce correlated results comparing to those from silicon at the end of step 3 above.
  • 7, Performance report: The modeler needs to extract the performance such as PU/PD impedance values and slew rate etc. for documentation purpose and check against the spec. or data sheet.

Full step-by-step modeling flow in SPIBPro

Data extraction for a single-ended buffer:

For a single-ended buffer, the first hurdle in the modeling process is to make sure each blocks are excited properly and simulation results make sense. As mentioned, there are at least eighteen simulation needs to be done:

There are also some complications regarding the DC simulation part: some of the buffer may have “clocking” and it’s not easy to separate them from the buffer iteself. Also,  there may be many RC parasitics between nodes for a buffer netlist extracted from post-layout. In other cases one can’t even separate the actual IO part from the pre-driving portions and the resulting circuits to be simulated become huge and time consuming. These situations will make IV data extraction slow and often problematic. As a result, a simple step 0~7 modeling process may not work properly and one need to iterate to tune the set-up such that simulation will always converge and resulting IV curve be monotonic. Nevertheless, the single buffer’s modeling is easier to manage.

Data extraction for a differential buffer:

Differential buffer’s IBIS modeling extends the challenge and effort to another dimension…literally! First of all, each pin in an IBIS file or component connect to an IBIS model and the possible structures and connections between different pins are very limited. So for a differential buffer, a series element needs to be created to describe the coupling relations between pins. All the pictures used in this paragraph are from IBIS cookbook and user may find further descriptions there.

In order to construct such series model, the IV sweep needs to be performed in two dimensions, both at similar resolutions. So if say a typical single-ended IV curve has one hundred points, then the second dimension should also have that much data. That means for one particular corner, there will be one hundred IV simulation in order to construct the 2D response surface shown below. First stage post-processing also needs to be preformed so that common-mode current can be eliminated. All these need to be done before formulating a 2D data view. Only after one can visualize the resulting data, he or she can determine what components are needed to create such series model. This presents the first challenges on top of the IV simulation issues mentioned for single-ended buffer.

The second challenge is regarding the VT simulation. The current flow through this newly constructed series element needs to be “eliminated” to avoid being double counted. For spice-like simulator, there is no such thing as “negative resistance”, “negative capacitance” etc. So one has to resort to approaches like control elements or even Verilog-A (as we presented in IBIS Summit 2016) to have proper VT data extracted. For control-source based approach, it is only limited describe pin couplings of a simple R/C but not non-linear resistance or surface such as series mosfet. For that, an intermediate step to map device or equation parameters to the calculated 2d surface is needed. Even using Verilog-A’s look-up table, the grid resolution is limited by the step size used in first two-dimensional IV step and may have non-convergence issue if it’s to coarse. That’s why in the cook book (the first two lines in the picture below), it doesn’t suggest any approach as it’s really not that easy!

Due to these two great challenges, we have found that differential modeling may not be easy for most modeler. We feel more this way when providing modeling service to clients who wants to perform simulations themselves then send us data. They may want to do so due to IP concern or they knowing more about the design. In those cases, the back-and-forth tuning and tweaking process become a burden on their side and also delay the whole schedule. Thus we are motivated to find an alternative “quick-and-easy” approach to substitute the “formal” modeling steps mentioned above. While being able to simulate accurately w/ great performance is still number one priority, we are ok that they can only be used under some context (such as channel simulation).

Quick and easy approach:

In previous post, we explained how IBIS model’s data are used in a circuit simulation. Simply speaking, the “VT” data is considered as “target” while “IV” tables are used to compute so called “switching coefficients” so that appropriate amount of current will be injected or withdrawn from the buffer pad to achieve. When this is true, the nodal voltage specified by that VT table at that particular time point will be satisfied due to KCL/KVL. Now there are switching coefficients for both pull-up and pull-down structures… thus it takes two equations to solve these two unknowns. That’s why two set of VT, each under different test loads, are required. Based on this algorithm, an IV data and calculated coefficients are actually “coupled” and affect each other. If current in IV table is larger, than the calculated coefficients will become smaller and vice versa. This way the overall injected/withdrawn current will still meet KCL/KVL required for VT. In this sense, the actual IV data is not that important as it will always be “adjusted” or “weighted” by the parameters.

On the other hand, the VT data also contains several DC points and they need to be correlate to the IV table, otherwise DC mismatch errors will be thrown by the golden checker. In addition, the IV data is limited to 100 points and they need to be monotonic to avoid convergence issue. So if we have several sets of VT data and one under normal test load (say 100 ohms for a differential buffer), then they will give us “hints” regarding how IV data will look like.

With this assumption, we propose the following quick-N-easy modeling steps:

  • Connect the silicon buffer to nominal loading conditions and obtain VT simulation data
    • For Single-ended, these are simple VT waveform under two different test loads;
    • For Differential, say use nominal 100 ohms first and see voltage range between V1 and V2
      • Let V3 = (V1 + V2) / 2, use VFixture = V3 and RFixture = say 40 & 60 respectively to obtain two waveforms;
      • Alternatively, use RFixture = 50 and VFixture = say (V1 + V3) / 2, (V2 + V3) / 2 respectively to obtain two waveforms;
      • The main goals is to have two set or set-up covering operating range when a nominal test load (say 100 ohms) is used.
  • Obtain C_Comp values from buffer IC designer
  • Obtain voltage range, temperature etc parameters.

And that’s all, through carefully implemented algorithm and computation, we can generate an IBIS model based on these data with minimal simulation requirements. An the generated model is guaranteed to be error/warning free.

While we will not disclose how these are actually done in details, we can show how they are incorporated in our SPIBPro… as shown below. As a matter of fact, this process has been used in the modeling projects of past year and shown great success.

Only two VT simulation data are required to create an IBIS model

Pros and cons:

We use this approach to create differential IBIS for channel analysis purpose (together with AMI) and have not yet found any problems. Having that said, I would offer several pros and cons for reader’s considerations:

Pros:

  • Minimal simulation required and easy to perform;
  • Will be mathematically correct: no DC mismatch or monotonic warnings, output will match provided VT waveform under nominal test load.

Cons:

  • May not be accurate if the model is used for DC sweep as the IV data in the model are artificially generated;
  • No “disable” or High-Z state as clamp currents (if there are any) has been incorporate into IV data without separation;
  • No Power-aware consideration as ISSO_PU/ISSO_PD generation are not taken into account.

Summary:

In this blog post, we reviewed the formal IBIS modeling process described in the cook book, challenges modelers will face and proposed an alternative “quick-and-easy” approach to address these issues. The proposed flow uses minimum simulation data while maintaining great accuracy. There might be limitations on models generated this way such as neither disable state nor power-aware data are accounted. However, in the context of channel analysis particular when a differential model is used together with its IBIS-AMI model, we have found great success with this flow. We have also incorporated this algorithm to our SPIBPro so our tool users can benefit from this efficient yet effective flow.

IBIS-AMI: Something about CTLE

Overview:

Continuous time linear equalizer, or CTLE for short, is a commonly used in modern communication channel. In a system where lossy channels are present, a CTLE can often recover signal quality for receiver or down stream continuous signaling. There have been many articles online discussing how a CTLE works theoretically. More thorough technical details are certainly also available in college/graduate level communication/IC design text book. In this blog post, I would like to focus more on its IBIS-AMI modeling aspect from a practical point of view. While not all secret sauce will be revealed here:-), hopefully the points mentioned here will give reader a good staring point in implementing or determining their CTLE/AMI modeling methodologies.

[Credit:] Some of the pictures used in this post are from Professor Sam Palermo’s course webpage linked below. He was also my colleague at Intel previously. (not knowing each other though..)

ECEN 720: High-Speed Links Circuits and Systems

 

What and why CTLE:

The picture above shows two common SERDES channel setups. While the one at the top has a direct connection between Tx and Rx, the bottom one has a “repeater” to cascade up stream and down stream channels together. This “cascading” can be repeated more than once so there maybe more than two channels involved. CTLE may sit inside the Rx of both set-ups or the middle “ReDriver” in the bottom one. In either case, the S-parameter block represents a generalized channel. It may contain passive elements such as package, transmission lines, vias or connectors etc. A characteristic of such channel is that it presents different losses across spectrum, i.e. dispersion.

For example, if we plot these channel’s differential input to differential output, we may see their frequency domain (FD) loss as shown above.

Digital signals being transmitted are more or less like sequence of bit/square pulse. We know that very rich frequency components are generated during its sharp rising/falling transition edges. Ideally, a communication channel to propagate these signals should behave like an (unit-gain) all pass filter. That is, various frequency components of the signal should not be treated equally, otherwise distortion will occur. Such ideal response can be indicated as the green box below:

In reality, such all pass filter does come often. In order to compensate our lossy channels (as indicated by the red box) to be more like the ideal case (green box) as an end result, we need to apply equalization (indicated by blue box). This is why an equalizer is often used… basically it provides a transfer function/frequency response to compensate the lossy channel in order to recover the signal quality. A point worth taken here is that the blue box and red box are “tie” together. So using same equalizer for channels of different losses may cause under or over compensated. That is, an equalizer is related to the channel being compensated. Another point is that CTLE is just a subset of such linear equalizer.

CTLE is a subset of linear equalizer:

A linear equalizer can be implemented in many different ways. For example, a feed-forward equalizer is often used in the Tx side and within DFE:

FFE’s behavior is more or less easier to predict and its AMI implementation is also quite straight forward. For example, a single pre-tap or post-tap’s FFE response can be easily visualized and predicted:

Now, a CTLE is a more “generalized” linear equalizer, so its behavior is usually represented in terms of frequency responses. Thus, to accommodate/compensate channels of different losses, we will have different FD responses for CTLE:

Now that IBIS-AMI modeling for CTLE is of concern, how do we obtain such modeling data for CTLE and how they should be modeled?

Different types of CTLE modeling data:

While CTLE’s behavior can be easily understood in frequency domain, for IBIS-AMI or channel analysis, it eventually needs to come back to time domain (FD) to convolve with inputs. This is because both statistical or bit-by-bit mode of link analysis are in time domain. Thus we have several choice: provide model FD data and have it converted to TD inside the implemented AMI model, or simply provide TD response directly to the model. The benefit of the first approach is that model can perform iFFT based on analysis’ bit rate and sampling rate’s settings. The advantage of the latter one is that the provided TD model can be checked to have good quality and model does not need to do similar iFFT every time simulation starts. Of course, the best implementation, i.e. like us SPISim’s approach, is to support both modes for best flexibility and expandability 🙂

  • Frequency domain data:

Depending on the availability of original EQ design, there are several possibilities for FD data: Synthesized with poles and zeros, extract from S-parameters or AC simulation to extract response.

  1. Poles and Zeros: Given different number of poles, zeros and their locations along with dc boost level, one can synthesize FD response curves:So say if we are given a data sheet which has EQ level of some key frequencies like below: Then one can sweep different number and locations of poles and zeros to obtain matching curves to meet the spec.:Such synthesized curves are well behaved in terms of passivity and causality etc,  and can be extended to covered desired frequency bandwidth.
  2. Extract from S-parameters: Another way to obtain frequency response is from EQ circuit’s existing S-parameter. This will provide best correlation scenarios for generated AMI model because the raw data can serve as a design target. However, there are many intermediate steps one have to perform first. For example, the given s-parameter may be single ended and only with limited frequency range (due to limitation of VNA being used), so if tool like our SPISim’s SPro is used, then one needs to: reording port (from Even-Odd ordering, i.e. 1-3, 2-4 etc to Sequential ordering, i.e. 1, 2 -> 3, 4), then convert to differential/mixed mode, after that extrapolate toward dc and high frequencies (many algorithms can be used and such extrapolation must also abide by physics) and finally extract the only related differential input -> differential output portion data.
  3. AC simulation: This assumes original design is available for AC simulation. Such raw data still needs to be sanity checked in terms of loss and phase change. For example, if gain are not flat toward DC and high-frequency range, then extra fixing may be needed otherwise iFFT results will be spurious.
  • Time domain data: time domain response can be obtained from aforementioned FD data by doing iFFT directly as shown below. It may also be obtained by simulating original EQ circuit in time domain. However, there are still several considerations:
  1. How to do iFFT: padding with zeros or conjugate are usually needed for real data iFFT. If the original FD data is not “clean” in terms of causality, passivity or asymptotic behavior, then they need to be fixed first.
  2. TD simulation: Is simulating impulse response possible? If not, maybe a step response should be performed instead. Then what is the time step or ramp speed to excite input stimuli? Note that during IBIS-AMI’s link analysis, the time step being used there may be different from the one being used here, so how will you scale the data accordingly. Once a step response is available, successive differentiation will produce impulse response with proper scaling.

How to implement CTLE AMI model:

Now that we have data to model, how will they be implemented in C/C++ codes to support AMI API for link analysis is another level of consideration.

  • Decision mechanism: As mentioned previously, a CTLE FD response targets at a channel of certain loss, thus the decision to use appropriate CTLE settings based on that particular channel at hand must either be decided by user or the model itself. While the former (user decision) does not need further explanation, the latter case (model decision, i.e. being adaptive) is a whole different topic and often vendor specific.

Typically, such adaptive mechanism has a pre-sorted CTLE in terms of strength or EQ level, then a figure-of-merit (FOM) needs to be extracted from equalized signal. That is, apply a tentative CTLE to the received data, then calculate such FOM. Then increase or decrease the EQ level by using adjacent CTLE curves and see whether FOM improves. Continue doing so until either selected CTLE “ID” settles or reach the range bounds. This process may be performed across many different cycles until it “stabilized” or being “locked”. Thus, the model may need to go through training period first to determine best CTLE being used during subsequent link analysis.

  • EQ configurations:

So now you have a bunch of settings or data like below, how should you architecture the model properly such that it can be extended in the future with revised CTLE response or allow user to perform corner selections (which essentially adds another dimension):

This is now more in software architecture domain and needs some trade-off considerations. For example, you may want to provide fine grid full spectrum FD/TD response but the data will may become to big. So internal re-sampling may be needed. For FD data, the model may needs to sample to have 2^N points for efficient iFFT. Different corner/parameter selection should not be hard coded in the models because future revised model’s parameter may be different. For external source data, encryption is usually needed to protect the modeling IP. With proper planning, one may reuse same CLTE module in many different design without customization on a case-by-case basis.

  • Correlations:

Finally it’s time to correlate the create CTLE AMI model against original EQ design or its behavioral model. Done properly, you should see signals being “recovered” from left to right below:

However, getting results like this in the first try may be a wishful thinking. In particular, the IBIS-AMI model does not work alone… it needs to work together with associated IBIS model (analog front-end) in most link simulator. So that IBIS model’s parasitics and loading etc will all affect the result. Besides, the high-impedance assumption of the AMI model also means proper termination matching is needed before one can drop them in for direct replacement of existing EQ circuit or behavioral models for correlation.

Summary:

At this point, you may realize that while a CTLE can be easily understood from its theoretic behavior perspective, its implementation to meet IBIS-AMI demands is a different story. We have seen CTLE models made by other vendor not expandable at all such that the client need to scratch the existing ones for minor revised CTLE behavior/settings (also because this particular model maker charges too much, of course). It’s our hope that the learning experience mentioned in this post will provide some guidance or considerations regardless when you decide to deep dive developing your own CTLE IBIS-AMI model, or maybe just delegate such tasks to professional model makers like us 🙂

A channel analysis trilogy

Preface:

We SPISim recently developed and released free web apps for various channel analysis tasks [Click HERE for overview]. While their product page each gives good descriptions and demo about what each tool can do individually, we think it will also be beneficial to put them all together in one flow so that user may have better picture about our ideas behind these developments. Thus this post is written for dual purposes: First we would like to explain how a channel analysis is usually performed. Secondly we want to show how one can perform such process using apps even directly from the web browser. Just for comparison, creating AMI models and license for this type of simulator usually costs tens of thousands of dollars up front. Now it can be done for free!

The big picture:

[Fig. 1, A SERDES system]

Let’s use a SERDES system, shown in Fig. 1 above, as an example. For other interface’s such as DDR, please see our previous blog post for considerations about similar application.

A SERDES system uses a point-to-point topology. In Fig. 1, the middle block enclosed in blue represents the channel. It mainly contains passive interconnect such as package, transmission line, vias, connectors or even cable. However, the channel may also include active components such as Tx and Rx devices. These active components are usually represented by IBIS or spice models. Alternatively, we can pull these active components out from the channel and “merge” them into the EQ as “analog front end” stages. In that case, the channel become pure passive. One of the reasons why one may want to do it this way may be either that the IBIS models are not be ready, or there are no simulators available to have them included as part of the simulation. This is usually the case when a free spice simulator is used as none of them that I am aware of supports IBIS out of the box. In general, unless analog front end AMI models and a pure passive channel, represented as a S-parameter, are used together for the analysis, the active devices do need to be part of the channel characterization in order to obtain accurate time domain response.

The next step is to obtain or generate AMI models for each Tx and Rx EQ circuits. Interested reader may refer to several posts we have written about possible software architecture and methodologies.

Regarding the channel analysis process, a simulator will first convert the characterized channel in s-param, or step response into impulse response first, then depending on the simulation mode being specified, this response or convoluted bit sequence will be fed into Tx and Rx respectively to obtain the overall response. For LTI system, a permutation process will be performed for all possible bit combinations to calculate the PDF of each sampling point statistically, then integrate to obtain the CDF and thus create a bath tub plot. For  a NLTV system, bit-by-bit waveform is accumulated and overlapped to plot the eye and BER may be extrapolated from there. While not mentioned here and also not yet implemented, various jitter and noise components are also important when creating the stimulus or calculating final results.

Following the analysis procedure detailed on Section 10.2 of IBIS spec, to summarize them briefly, a channel analysis includes three tasks: channel characterization, EQ model creation and putting them all together for channel simulation.

Task 1. Channel Characterization:

The first step of channel analysis is to characterize the channel, the blue block of Fig. 1, in order to obtain is time domain response. Even with active Tx/Rx front end being involved, the assumption here is that the characterized response will be linear time invariant (LTI). With a LTI input, a channel simulator can either perform statistical analysis if other EQ components are also LTI, or it can have the impulse convolved with PRBS bit sequence to generate full time domain waveform, finally procees to process with NLTV EQ components to get the final time domain waveform for further eye analysis.

If the passive channel is from post-layout, then user will need to use other 3D extraction tool to obtain the interconnect’s s-parameter. Devil’s details here will then include making sure the S-parameter is in good qualities such as being passive, causal, symmetric and asympotic etc. Also depending on the simulator, converting the single ended s-parameter to mixed-mode/differential one may also be needed. On the other hand, for pre-layout case, user will then first obtain or generate each component’s simulation model. Assuming what user have here are Tx and Rx IBIS model, transmission line and other R/L/C based behavioral models for package, via and connectors, then first one can use our [SPISim_IBIS web app] to convert those IBIS model into free simulator compatible spice subcircuits:

Then head over to LTSpice, offered by Analog device, to download and install the free simulator.

Either create schematic or a text netlist of the channel them perform a transient simulation to get its step response. The output will be a raw file in compressed format which can be viewed either in place in LTSpice or use SPISim’s free SPILite:


Task 2. Tx/Rx EQ Modeling:

If no EQ circuit is involved, then user can simply tinker the circuit/simulation settings completed in task 1 to perform conventional time domain based SI analysis. The reason stat-eye like channel analysis comes to play is because EQ circuits (represented in green block for Tx EQ and orange block for Rx EQ in Fig. 1) are involved to open the eyes and many more bits need to be “simulated” to obtain/extrapolate low BER, thus can’t be done easily using nodal based spice simulator as it will simply take too much time. EQ circuits can come in different models such as behavioral, spice or AMI, with AMI being the common denominator supported by almost all channel analysis tools from different vendors. Thus in the second task, we need to generate IBIS-AMI models for Tx and Rx EQ.

IBIS-AMI modeling usually involves C/C++ coding or compilation into .dll/.so if starting from scratch. However, user may use [SPISim_AMI web app] to generate AMI models instantly without going through those steps:

Test driving the model and view its response in place to verify the model’s parameters meet the performance needs:

Then click “Generate” button and the AMI model will be generated instantly. If cross platform models are desired, use [SPILite] instead and all Windows 64, Windows 32 and Linux 64 models will be generated in one shot.

At the beginning of this section, we mentioned that the EQ model can also be in the form of spice circuit, whether encrypted or not. In that case, its detailed behaviors will not be able to be described exactly by template based  or pre-defined models. However, a spice wrapper AMI model supported by SPILite can still be used, it will make this spice model “AMI compatible” and can be used in other channel simulators. User’s licensed/installed simulator will be called during the channel simulator.


Task 3. Channel analysis:

With both Tx/Rx EQ models plus channel response being ready, we can then perform the StatEye based channel analysis using [SPISim_Link web app]:

Detailed usage about this tool is demonstrated in a video on its product page. Basically, user can specify the generated AMI models in the “Tx EQ” and “Rx EQ” tab respectively, then do the same for the step response waveform to the “channel tab”. Both “statistical” mode and “Bit-by-bit” modes are supported here, yet if NLTV EQ such as DFE is used in part of the receiver, then “statistical” model can not be performed. With these set-up ready, a “Simulate” click will show the results in place within seconds:

The bath tub curve representing the CDF is also ready for inspection/eye measurement:

Alternatively, task 1 and 2 are also directly supported in the SPISim_LINK function alone so user may choose to experiment with different settings and see their response first before generating corresponding AMI model. For example, a simple change of post-tap value in Tx can be done in the UI:

Then the re-simulation will quickly show its effect in resulting eye:

For a system developer, he/she may obtain corresponding AMI models from their IC vendors and follow the same process to give these model a try. For an IC vendor, the AMI model generated here will also be compatible with other vendor’s tool and you may provide these models to your system clients before committing to perpetual version of the model.

Here you have it…. an economic yet efficient channel analysis which can be done directly through the web has been enabled for your design needs without any cost!