IBIS’s buffer overclocking

In our SPIBPro’s Q3’19 update, we added some new enhancements regarding buffer overclocking handling. While the technical details of these capabilities are reserved only for our customers, the general problem statements and solutions are, nevertheless, worth sharing. This short post will cover this buffer overclocking topic and also serve as a future reference for our customers.

Some of the pictures used here are from previous IBIS summit’s presentations. User may find references to these slides toward the end of this post.

What is buffer overclocking:

We all know that the frequency is equivalent to the 1.0 / period. When an IBIS model is used, the minimal/shortest time period is equivalent to the summation of complete rising and falling transition given by the VT data table:

This “shortest” possible time span. thus dictates the highest possible operating frequency. When an input stimulus exceeds this frequency, then this buffer is being overclocked. An overclocked buffer will not have sufficient time to complete either rising, falling or both transitions. As a results, before it’s pad can reach steady state (indicated by the green curve below), it has to make transition toward the other direction again. The outcome of this situation is that there may be discontinuities, glitches or non-convergence during the simulation.

IBIS spec. doesn’t spell out how a simulator should handle in this case. As a result, an overclocked buffer may behave differently across different EDA tools. That’s why in the IBIS cookbook, it’s suggested that a model maker should generate short enough VT duration table so that the buffer can operate fast enough to avoid being overclocked by end user during normal usage. For example, if a USB3 buffer has total VT duration longer than 200ps, then it will definitely be overclocked when user use a normal 5G bps PRBS signal to drive this buffer.

 

Why buffer overclocking is more problematic in a power-aware model:

To make a buffer’s VT table duration shorter, a common approach is to remove the steady state (flat portion) of the leading and trailing waveform. However, when a power-aware model is being considered, this will present some problems:

The picture above shows both the VT data (rising only here) and IT together. IT is the composite current for DDR DQ-like power-aware model. Since the extracted data of the current portion has pre-driver into account, it’s often the case that there will be “activities” (i.e., non-zero current) long before the voltage at pad starts transition. As a result, the amount of leading steady state can be removed in a power-aware model are often bounded by the IT table. And this limitation is usually very “severe” such that the resulting trimmed model still can’t meet the spec.-required operating frequency.

This picture above (from a summit paper) high-light this issue more clearly. It’s also worth mentioning that the “T” of VT and IT table are time synchronized. So one can’t treat these two table separately or shift x-axis time point at will.

Solution to the overclocking issue:

There are several approaches to address this overclocking issue:

  1. An model user may specify the amount of delay to be trimmed by a simulator, or
  2. A simulator may apply “windowing” automatically to find the amount of trimming values and apply before simulation, or
  3. A model maker can create a proper model using approaches to be mentioned in sections below.

The first approach is not only time consuming, but will also easily lead to usage error or non-convergence. The second one may or may not produce desired results. It’s certainly simulator dependent and a model maker/user may not have any control. Nevertheless, the third approach is the preferred one as only the model maker knows how the buffer behaves and exact amount of delay to be removed.

Caveats in model data windowing:

One common mistake of trimming data to avoid overclocking is that different VT tables or IT tables are processed individually. For example, if VT table simulated from one of the fixtures has longer delays than the others, a model maker may tends to remove different amount of delays from different VT tables. Even more severe problem happens when VT and IT are removed independently. All these handling are erroneous due to the following reasons:

  • A simulator usually needs to compute “switching coefficients” to apply to IT table data. Their values needs to be calculated from VT tables of different fixture set-ups. Thus trimming different amount from VT tables from different fixtures may cause singularity when solving for switching coefficients.
    • That means all [Rising waveform] tables should be trimmed with same value;
    • Similarly, all [Falling waveform] tables should be trimmed with same value.
  • A buffer’s rising and falling response may be used together in a differential buffer. So if their timing reference are inconsistent, then the P and N terminals will not make transitions at the same time. This may cause “ledges” in the middle of the voltage swings
    • That means [Rising waveform] and [Falling waveform] needs to be trimmed with same value
  • As mentioned earlier, IT and VT needs to be time synchronized so that gate modulation effect can be properly accounted for.
    • That means VT and IT needs to be trimmed with same value.

To summarize, a model maker needs to trim ALL time-dependent table data together with same amount. While trimming trailing time point are relatively easy (just remove those points will do). Trimming from the waveform’s beginning also requires shifting x-axis points to start from t=0. So this is usually better be done with a modeling tool or flow.

Handling of the overclocking:

  • [Initial Delay] keyword:

Due to the limitation of the pre-driver current showing activities much earlier than the pad’s voltage transitions, a simulator can’t do much if both are bundled together in a device model in a simulator. However, if the IT table can be pulled-out from the IBIS device simulation model and be handled as a time-dependent current source, then things will be much easier. That’s why in the IBIS V6.1, two new keywords are introduced for this purpose. They are V-T and I-T for [Initial Delay]:

Before using these keyword, one should carefully read the related BIRD document (referenced at the end) and the associated section in the IBIS spec.:

For example, it is the simulator’s responsibilities to remove these specified delay from the “raw” model data before simulation. So a model maker should NOT remove the delay themselves…. simply specify the delay values will do.

That is to say, the modeling process may become two “passes”. At first, once the un-trimmed model data are produced, a maker can use an inspection tool (such as our SPIBPro’s built-in specter) to plot all VTs tables together and measure the common minimal delay to remove. Do the same to all IT table as well.

In the second “pass”, one or both of these two different values are specified under the [Initial Delay] keyword to complete the modeling process.

  • Data trimming and auto-tuning:

For a power-aware model which has IT data, the solution using [Initial Keyword] is the best one. For a traditional, or only voltage related, buffer, a model maker has more choices.

For example, a model maker can start by trimming the trailing steady data. Increment the trimming value or also trim the leading portion until the frequency meets the demands.

This method will alleviate the restrictions of simulator’s support for the [Initial Delay] support… thus an older version of simulator will also work. At the meantime, a caveat worth noting is that the distortion of the model data depends on the amount of data being removed:

So if excessive amount of data is removed, the resulting model may not pass ibis checker (will give DC mismatch violation). In this case, further tuning is needed. Our SPIBPro does perform “AutoTune” automatically during trimming process so a trimmed buffer will almost always error/warning free.

Reference:

The overclocking issue has long been there since early 2000. I remember one of the projects at my earlier career was to address the non-convergence issue in Mentor’s simulator brought by user trimmed model. Interested user may find the following reference useful and worth reading:

IBIS’s package model

Recently I gave a training to some of our customers regarding package model handling in an IBIS file/model. Most of the treatments in this topic fall within IBIS spec., extraction and simulator tools used, rather than modeling tool like our modeling suite or SPIBPro. Thus I think it will be beneficial to organize the materials and share through this blog post, while serving as a future reference for our customers.

Package model:

An IBIS file may contain one or more IBIS models. Each of the IBIS models has its own IT or VT tables to describe the behavior of the Tx or Rx silicon design attached to that signal pin. The effect of the package model, on the other hand, is not included in those table data. Information about the package model is “linked” with the silicon portion of the buffer through either IBIS keywords within the same IBIS file, or as a separated file in different formats. It’s the simulator’s job to take both the separated Tx/Rx silicon behavior, as well as package model into account during circuit simulation.

In general, there are two ways to link the IBIS’s silicon behavior with the package model. Each also have two possible routes to achieve the same purpose:

Add package model info:

    1. Inside an IBIS file:
      • As a lumped model
      • As a distributed model
    2. Outside an IBIS file:
      • ICM (IBIS Interconnect Modeling)
      • General spice circuit elements

More details will be discussed below… Please note that these package models should be generated from extraction tool such as HFSS, Q3d etc. The IBIS model or file mentioned here simply serves as a place holder. Circuit simulator will use all these data together during simulation.

 

Package as a lumped model inside an IBIS model:

From the IBIS spec, we can see two related keywords for this lumped model representations: [Package] and  [Pin]

To be more specific, please see this example model:

These values represent the parasitic structures as shown below:

This is the simplest, yet least accurate, method to describe the package information. While the [Package] keyword and data is required, parasitic portion of the [Pin] keyword is optional. When both are present, those in the [Pin] section will superseded (i.e. override) values defined in the [Package] section. Thus, a model maker can use [Pin] keyword to introduce pin specific lumped parasitics while using [Package] keyword for general or default values.

 

Package as a distributed model inside an IBIS file:

A more comprehensive (distributed) package model may be introduced into an IBIS file and model using the [Package Model] and [Defined Package Model] keywords:

Using this method, the component declares the name of a package model it is using (thus the value of the [Package Model] is simply a model name string), and the detailed contents of this package model is defined as a separated section using top-level keyword, [Define Package Model] inside the same IBIS file (usually toward the end of the file, after all [Component] descriptions):

What’s in side the defined package model? Since it’s a “distributed” model and belongs to a component, terminals mapped to a component’s pins are there and so are frequency dependent R/L/G/C matrices etc. The dimension of the matrices is equivalent to the number of pins mapped. With that said, a S-parameter format is not supported here. To use a S-parameter as a package model, one must use either of the remaining two methods to be described below.

As mentioned earlier, an IBIS file or an IBIS model which has package model info. is just serving as a place holder. Package model’s effects or behaviors are not included in an IBIS model’s IV/VT/IT table data. Simulator has the responsibility to combine these two together… either when computing switching coefficients for lumped model version or “stamping” extra elements into nodal matrices for a distributed one. This also means that different EDA tool may have different methods, syntax or GUIs to introduce package model’s data:

 

Use ICM as a separated file for package model outside an IBIS file:

The ICM (IBIS interconnect modeling) spec. is a separated “top-level” spec. document similar to an IBIS spec. So to learn more about it, one has to read the through ICM spec. document

While it’s more or less similar to the [Defined Package Model] mentioned earlier in terms of R/L/G/C frequency dependent matrices etc, it has one unique strength in particular that it supports S-parameter:

So if your extraction tool can only produce S-paramter format, then ICM is the way to go. The table below lists the comparison of different interconnect model format: (PKG is the defined package model)

Use general circuit description for package model outside an IBIS file:

Last but not least… one may also introduce package model contents using a non IBIS/ICM specific format. For example, a model maker may provide a spice .subckt representing the package model (converted from frequency data using tool like broadband Spice or IDEM work etc, can have further hierarchy) or a simple S-parameter. The connectivity should be documented in the simulation guide or usage manual of the IBIS model. It’s the end user’s responsibility to put this as one of the stages in simulation topology. For example, Intel’s platform design guide use such method for an ICH package model:

There are pros and cons for this approach: a model user may have more work to have package effects introduced. On the other hand, this method has best compatibility… it does not require supports of a specific IBIS/ICM version in the simulator used, and it also allows probing or sweeping internals inside the package model for debugging or optimization purpose.

AI/ML: Use ML techniques for CTLE AMI modeling

Note:

  • Dataset and ipython notebook of this post is available at SPISim’s github page: [HERE]
  • This notebook may be better rendered by nbviewer and can be viewed  [HERE].

Use ML techniques for SERDES CTLE modeling for IBIS-AMI simulation

Table of contents:

1.Motivation ...

2.Problem Statements ...

3.Generate Data ...

4.Prepare Data ...

5.Choose a Model ...

6.Training ...

7.Prediction ...

8.Reverse Direction ...

9.Deployment ...

10.Conclusion ...

Motivation:

One of SPISim's main service is IBIS-AMI modeling and consulting. In most cases, this happens when IC companies is ready to release simulation model for their customers to do system design. We will then require data either from circuit simulation, lab measurements or data sheet spec. in order to create associated IBIS-AMI model. Occasionally, we also receive request to provide AMI model for architecture planning purpose. In this situation, there is no data or spec. In stead, the client asks to input performance parameters dynamically (not as preset) so that they can evaluate performance at the architecture level before committing to a certain spec. and design accordingly. In such case, we may need to generate data dynamically based on user's input before it will be fed into existing IBIS-AMI models of same kind. Continues linear equalizer, CTLE, is often used in modern SERDES or even DDR5 design. It is basically a filter in the frequency domain (FD) with various peaking and gain properties. As IBIS-AMI is simulated in the time domain (TD), the core implementations in the model is iFFT to convert into impulse response in TD. CtleFDTD

In order to create such CTLE AMI model from user provided spec. on the fly, we would like to avoid time-consuming parameter sweep (in order to match the performance) during runtime of initial set-up call. Thus machine learning techniques may be applied to help use build a prediction model to map input attributes to associated CTLE design parameters so that its FD and TD response can be generated directly. After that, we can feed the data into existing CTLE C/C++ code blocks for channel simulation.

Problem Statement:

We would like to build a prediction model such that when user provide a desired CTLE performance parameters, such as DC Gain, peaking frequency and peaking value, this model will map to corresponding CTLE design parameters such as pole and zero locations. Once this is mapped, CTLE frequency response will be generated followed by time-domain impulse response. The resulting IBIS-AMI CTLE model can be used for channel simulation for evaluating such CTLE block's impact in a system... before actual silicon design has been done.

Generate Data:

Overview

The model to be build is for nominal (i.e. numerical) prediction for about three to six attributes, depending on the CTLE structure, from four input parameters, namely dc gain, peak frequency, peak value and bandwidth. We will sample the input space randomly (as full combinatorial is impractical) then perform measurement programmingly in order generate enough dataset for modeling purpose.

Defing CTLE equations

We will target the following two most common CTLE structures:

IEEE 802.3bs 200/400Gbe Chip-to-module (C2M) CTLE:

C2M_CTLE

IEEE 802.3bj Channel Operating Margin (COM) CTLE:

COM_CTLE

Define sampling points and attributes

All these pole and zeros values are continuous (numerical), so a sub-sampling from full solution space will be performed. Once frequency response corresponding to each set of configuration is generated, we will measure is dc gain, peak frequency and value, bandwidth (3dB loss from the peak), and frequencies when 10% and 50% gain between dc and peak values happened. The last two attributes will help us increasing the attributes available when creating the prediction model.

Attributes to be extracted:

CTLEAttr

Synthesize and measure data

A python script (in order to be used with subsequent Q-learning w/ OpenAI later) has been written to synthesize these frequency response and perform measurement at the same time.

Quantize and Sampling:

Sampling

Synthesize:

Synthesize

Measurement:

Measurement

The end results after this data generation phase is a 100,000 points dataset for each of the two CTLE structures. We can now proceed for the prediction modeling.

Prepare Data:

In [42]:
## Using COM CTLE as an example below:

# Environment Setup:
import os
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import numpy as np

prjHome = 'C:/Temp/WinProj/CTLEMdl'
workDir = prjHome + '/wsp/'
srcFile = prjHome + '/dat/COM_CTLEData.csv'

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(workDir, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)
In [43]:
# Read Data
srcData = pd.read_csv(srcFile)
srcData.head()

# Info about the data
srcData.head()
srcData.info()
srcData.describe()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 11 columns):
ID         100000 non-null int64
Gdc        100000 non-null float64
P1         100000 non-null float64
P2         100000 non-null float64
Z1         100000 non-null float64
Gain       100000 non-null float64
PeakF      100000 non-null float64
PeakVal    100000 non-null float64
BandW      100000 non-null float64
Freq10     100000 non-null float64
Freq50     100000 non-null float64
dtypes: float64(10), int64(1)
memory usage: 8.4 MB
Out[43]:
ID Gdc P1 P2 Z1 Gain PeakF PeakVal BandW Freq10 Freq50
count 100000.000000 100000.000000 1.000000e+05 1.000000e+05 1.000000e+05 100000.000000 1.000000e+05 100000.000000 1.000000e+05 1.000000e+05 1.000000e+05
mean 49999.500000 0.574911 1.724742e+10 5.235940e+09 5.241625e+09 0.574911 3.428616e+09 2.374620 1.496343e+10 4.145855e+08 1.136585e+09
std 28867.657797 0.230302 4.323718e+09 2.593138e+09 2.589769e+09 0.230302 4.468393e+09 3.346094 1.048535e+10 5.330043e+08 1.446972e+09
min 0.000000 0.200000 1.000000e+10 1.000000e+09 1.000000e+09 0.200000 0.000000e+00 0.200000 9.965473e+08 0.000000e+00 -1.000000e+00
25% 24999.750000 0.350000 1.350000e+10 3.000000e+09 3.000000e+09 0.350000 0.000000e+00 0.500000 4.770445e+09 0.000000e+00 -1.000000e+00
50% 49999.500000 0.600000 1.700000e+10 5.000000e+09 5.000000e+09 0.600000 0.000000e+00 0.800000 1.410597e+10 0.000000e+00 -1.000000e+00
75% 74999.250000 0.750000 2.100000e+10 7.500000e+09 7.500000e+09 0.750000 7.557558e+09 2.710536 2.386211e+10 8.974728e+08 2.510339e+09
max 99999.000000 0.950000 2.450000e+10 9.500000e+09 9.500000e+09 0.950000 1.516517e+10 16.731528 3.965752e+10 1.768803e+09 4.678211e+09

Seems full justified! Let's plot some distribution...

In [44]:
# Drop the ID column
mdlData = srcData.drop(columns=['ID'])

# plot distribution
mdlData.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()
Saving figure attribute_histogram_plots

There are abnomal high peaks for Freq10, Freq50 and PeakF. We need to plot the FD data to see what's going on...

Error checking:

NoPeaking

Apparently, this is caused by CTLE without peaking. We can safely remove these data points as they will not be used in actual design.

In [45]:
# Drop those freq peak at the beginning (i.e. no peak)
mdlTemp = mdlData[(mdlData['PeakF'] > 100)]
mdlTemp.info()


# plot distribution again
mdlTemp.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots2")
plt.show()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 41588 entries, 1 to 99997
Data columns (total 10 columns):
Gdc        41588 non-null float64
P1         41588 non-null float64
P2         41588 non-null float64
Z1         41588 non-null float64
Gain       41588 non-null float64
PeakF      41588 non-null float64
PeakVal    41588 non-null float64
BandW      41588 non-null float64
Freq10     41588 non-null float64
Freq50     41588 non-null float64
dtypes: float64(10)
memory usage: 3.5 MB
Saving figure attribute_histogram_plots2

Now the distribution seems good. We can proceed to separate variables (i.e. attributes) and targets

In [46]:
# take this as modeling data from this point
mdlData = mdlTemp

varList = ['Gdc', 'P1', 'P2', 'Z1']
tarList = ['Gain', 'PeakF', 'PeakVal']

varData = mdlData[varList]
tarData = mdlData[tarList]

Choose a Model:

We will use Keras for the modeling framework. While it will call Tensorflow on our machine in this case, the GPU is only used for training purpose. We will use (shallow) neural network for modeling as we want to implement the resulting models in our IBIS-AMI model's C++ codes.

In [47]:
from keras.models import Sequential
from keras.layers import Dense, Dropout

numVars = len(varList)  # independent variables
numTars = len(tarList)  # output targets
nnetMdl = Sequential()
# input layer
nnetMdl.add(Dense(units=64, activation='relu', input_dim=numVars))

# hidden layers
nnetMdl.add(Dropout(0.3, noise_shape=None, seed=None))
nnetMdl.add(Dense(64, activation = "relu"))
nnetMdl.add(Dropout(0.2, noise_shape=None, seed=None))
          
# output layer
nnetMdl.add(Dense(units=numTars, activation='sigmoid'))
nnetMdl.compile(loss='mean_squared_error', optimizer='adam')

# Provide some info
#from keras.utils import plot_model
#plot_model(nnetMdl, to_file= workDir + 'model.png')
nnetMdl.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_10 (Dense)             (None, 64)                320       
_________________________________________________________________
dropout_7 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_11 (Dense)             (None, 64)                4160      
_________________________________________________________________
dropout_8 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_12 (Dense)             (None, 3)                 195       
=================================================================
Total params: 4,675
Trainable params: 4,675
Non-trainable params: 0
_________________________________________________________________

Training:

We will do the 20% training/testing split for the modeling. Note that we need to scale the input attributes to be between 0~1 so that neuron's activation function can be used to differentiate and calculate weights. These scaler will be applied "inversely" when we predict the actual performance later on.

In [48]:
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Prepare Training (tran) and Validation (test) dataset
varTran, varTest, tarTran, tarTest = train_test_split(varData, tarData, test_size=0.2)

# scale the data
from sklearn import preprocessing
varScal = preprocessing.MinMaxScaler()
varTran = varScal.fit_transform(varTran)
varTest = varScal.transform(varTest)

tarScal = preprocessing.MinMaxScaler()
tarTran = tarScal.fit_transform(tarTran)

Now we can do the model fit:

In [49]:
# model fit
hist = nnetMdl.fit(varTran, tarTran, epochs=100, batch_size=1000, validation_split=0.1)
tarTemp = nnetMdl.predict(varTest, batch_size=1000)
#predict = tarScal.inverse_transform(tarTemp)
#resRMSE = np.sqrt(mean_squared_error(tarTest, predict))
resRMSE = np.sqrt(mean_squared_error(tarScal.transform(tarTest), tarTemp))
resRMSE
Train on 29943 samples, validate on 3327 samples
Epoch 1/100
29943/29943 [==============================] - 0s 12us/step - loss: 0.0632 - val_loss: 0.0462
Epoch 2/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0394 - val_loss: 0.0218
Epoch 3/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0218 - val_loss: 0.0090
Epoch 4/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0134 - val_loss: 0.0046
Epoch 5/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0102 - val_loss: 0.0039
Epoch 6/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0090 - val_loss: 0.0036
Epoch 7/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0082 - val_loss: 0.0032
Epoch 8/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0075 - val_loss: 0.0030
Epoch 9/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0071 - val_loss: 0.0027
Epoch 10/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0067 - val_loss: 0.0025
Epoch 11/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0063 - val_loss: 0.0022
Epoch 12/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0059 - val_loss: 0.0020
Epoch 13/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0056 - val_loss: 0.0018
Epoch 14/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0053 - val_loss: 0.0017
Epoch 15/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0050 - val_loss: 0.0015
Epoch 16/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0048 - val_loss: 0.0014
Epoch 17/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0046 - val_loss: 0.0013
Epoch 18/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0045 - val_loss: 0.0012
Epoch 19/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0043 - val_loss: 0.0011
Epoch 20/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0042 - val_loss: 0.0011
Epoch 21/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0041 - val_loss: 9.9891e-04
Epoch 22/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0040 - val_loss: 9.5673e-04
Epoch 23/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0039 - val_loss: 9.1935e-04
Epoch 24/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0038 - val_loss: 8.7424e-04
Epoch 25/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0037 - val_loss: 8.3335e-04
Epoch 26/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0036 - val_loss: 8.0617e-04
Epoch 27/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0035 - val_loss: 7.7511e-04
Epoch 28/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0035 - val_loss: 7.6336e-04
Epoch 29/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0034 - val_loss: 7.4145e-04
Epoch 30/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0034 - val_loss: 7.1555e-04
Epoch 31/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0033 - val_loss: 6.8232e-04
Epoch 32/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0033 - val_loss: 6.8118e-04
Epoch 33/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0033 - val_loss: 6.5987e-04
Epoch 34/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0032 - val_loss: 6.5535e-04
Epoch 35/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0032 - val_loss: 6.4880e-04
Epoch 36/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0032 - val_loss: 6.2126e-04
Epoch 37/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0031 - val_loss: 6.1235e-04
Epoch 38/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0031 - val_loss: 6.0875e-04
Epoch 39/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0030 - val_loss: 5.8204e-04
Epoch 40/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0030 - val_loss: 5.8521e-04
Epoch 41/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0030 - val_loss: 5.8456e-04
Epoch 42/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0030 - val_loss: 5.5742e-04
Epoch 43/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0029 - val_loss: 5.5412e-04
Epoch 44/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0029 - val_loss: 5.5415e-04
Epoch 45/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0029 - val_loss: 5.3159e-04
Epoch 46/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0029 - val_loss: 5.2046e-04
Epoch 47/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0028 - val_loss: 5.1748e-04
Epoch 48/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0028 - val_loss: 5.1205e-04
Epoch 49/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0027 - val_loss: 5.0424e-04
Epoch 50/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0028 - val_loss: 4.9067e-04
Epoch 51/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0027 - val_loss: 4.7902e-04
Epoch 52/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0027 - val_loss: 4.7667e-04
Epoch 53/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0026 - val_loss: 4.6521e-04
Epoch 54/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0026 - val_loss: 4.6684e-04
Epoch 55/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0026 - val_loss: 4.7006e-04
Epoch 56/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0026 - val_loss: 4.5770e-04
Epoch 57/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0025 - val_loss: 4.3075e-04
Epoch 58/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0025 - val_loss: 4.3796e-04
Epoch 59/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0025 - val_loss: 4.3114e-04
Epoch 60/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0025 - val_loss: 4.1051e-04
Epoch 61/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0025 - val_loss: 4.0642e-04
Epoch 62/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0025 - val_loss: 4.1214e-04
Epoch 63/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0025 - val_loss: 3.9472e-04
Epoch 64/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0024 - val_loss: 3.9697e-04
Epoch 65/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0024 - val_loss: 3.8548e-04
Epoch 66/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0024 - val_loss: 3.9030e-04
Epoch 67/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.7588e-04
Epoch 68/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0024 - val_loss: 3.6643e-04
Epoch 69/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.6973e-04
Epoch 70/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.6345e-04
Epoch 71/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.5743e-04
Epoch 72/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0023 - val_loss: 3.5294e-04
Epoch 73/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0023 - val_loss: 3.6533e-04
Epoch 74/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.5859e-04
Epoch 75/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0022 - val_loss: 3.3832e-04
Epoch 76/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0022 - val_loss: 3.5197e-04
Epoch 77/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.4445e-04
Epoch 78/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.3888e-04
Epoch 79/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.3597e-04
Epoch 80/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0022 - val_loss: 3.2317e-04
Epoch 81/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0021 - val_loss: 3.2205e-04
Epoch 82/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.4191e-04
Epoch 83/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.2288e-04
Epoch 84/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.1419e-04
Epoch 85/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.1307e-04
Epoch 86/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.1795e-04
Epoch 87/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.1200e-04
Epoch 88/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.0641e-04
Epoch 89/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.2401e-04
Epoch 90/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0021 - val_loss: 3.0903e-04
Epoch 91/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.1448e-04
Epoch 92/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.0788e-04
Epoch 93/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.0349e-04
Epoch 94/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.0098e-04
Epoch 95/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0020 - val_loss: 3.1119e-04
Epoch 96/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 3.0249e-04
Epoch 97/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 2.8934e-04
Epoch 98/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 2.9429e-04
Epoch 99/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0020 - val_loss: 2.8466e-04
Epoch 100/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0019 - val_loss: 3.0773e-04
Out[49]:
0.01786895428237113

Let's see how this neural network learns over different Epoch

In [50]:
# plot history
plt.plot(hist.history['loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.show()

Looks quite reasonable. We can save the Keras model now together with scaler for later evaluation.

In [51]:
# save model and architecture to single file
nnetMdl.save(workDir + "COM_nnetMdl.h5")

# also save scaler
from sklearn.externals import joblib
joblib.dump(varScal, workDir + 'VarScaler.save') 
joblib.dump(tarScal, workDir + 'TarScaler.save') 
print("Saved model to disk")
Saved model to disk

Prediction:

Now let's use this model to make some prediction

In [52]:
# generate prediction
predict = tarScal.inverse_transform(tarTemp)
allData = np.concatenate([varTest, tarTest, predict], axis = 1)
allData.shape
headLst = [varList, tarList, tarList]
headStr = ''.join(str(e) + ',' for e in headLst)
np.savetxt(workDir + 'COMCtleIOP.csv', allData, delimiter=',', header=headStr)

Let's take some 50 points and see how the prediction work

In [53]:
# plot some data
begIndx = 100
endIndx = 150
indxAry = np.arange(0, len(varTest), 1)

plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,0][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,0][begIndx:endIndx])
Out[53]:
<matplotlib.collections.PathCollection at 0x242d059d390>
In [54]:
# Plot Peak Freq.
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,1][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,1][begIndx:endIndx])
Out[54]:
<matplotlib.collections.PathCollection at 0x242d72df2e8>
In [55]:
# Plot Peak Value
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,2][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,2][begIndx:endIndx])
Out[55]:
<matplotlib.collections.PathCollection at 0x242d5ea39e8>

Reverse Direction:

The goal of this modeling is to map performance to CTLE poles and zeros locations. What we just did is the other way around (to make sure such neural network's structure meets our need). Now we needs to reverse the direction for actual modeling. To provide more attributes for better predictions, we will also use frequencies where 10% and 50% gain happened as part of the input attributes.

In [56]:
tarList = ['Gdc', 'P1', 'P2', 'Z1']
varList = ['Gain', 'PeakF', 'PeakVal', 'Freq10', 'Freq50']

varData = mdlData[varList]
tarData = mdlData[tarList]
In [57]:
from keras.models import Sequential
from keras.layers import Dense, Dropout

numVars = len(varList)  # independent variables
numTars = len(tarList)  # output targets
nnetMdl = Sequential()
# input layer
nnetMdl.add(Dense(units=64, activation='relu', input_dim=numVars))

# hidden layers
nnetMdl.add(Dropout(0.3, noise_shape=None, seed=None))
nnetMdl.add(Dense(64, activation = "relu"))
nnetMdl.add(Dropout(0.2, noise_shape=None, seed=None))
          
# output layer
nnetMdl.add(Dense(units=numTars, activation='sigmoid'))
nnetMdl.compile(loss='mean_squared_error', optimizer='adam')

# Provide some info
#from keras.utils import plot_model
#plot_model(nnetMdl, to_file= workDir + 'model.png')
nnetMdl.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_13 (Dense)             (None, 64)                384       
_________________________________________________________________
dropout_9 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_14 (Dense)             (None, 64)                4160      
_________________________________________________________________
dropout_10 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_15 (Dense)             (None, 4)                 260       
=================================================================
Total params: 4,804
Trainable params: 4,804
Non-trainable params: 0
_________________________________________________________________
In [58]:
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Prepare Training (tran) and Validation (test) dataset
varTran, varTest, tarTran, tarTest = train_test_split(varData, tarData, test_size=0.2)

# scale the data
from sklearn import preprocessing
varScal = preprocessing.MinMaxScaler()
varTran = varScal.fit_transform(varTran)
varTest = varScal.transform(varTest)

tarScal = preprocessing.MinMaxScaler()
tarTran = tarScal.fit_transform(tarTran)
In [59]:
# model fit
hist = nnetMdl.fit(varTran, tarTran, epochs=100, batch_size=1000, validation_split=0.1)
tarTemp = nnetMdl.predict(varTest, batch_size=1000)
#predict = tarScal.inverse_transform(tarTemp)
#resRMSE = np.sqrt(mean_squared_error(tarTest, predict))
resRMSE = np.sqrt(mean_squared_error(tarScal.transform(tarTest), tarTemp))
resRMSE
Train on 29943 samples, validate on 3327 samples
Epoch 1/100
29943/29943 [==============================] - 0s 15us/step - loss: 0.0800 - val_loss: 0.0638
Epoch 2/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0578 - val_loss: 0.0457
Epoch 3/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0458 - val_loss: 0.0380
Epoch 4/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0408 - val_loss: 0.0344
Epoch 5/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0378 - val_loss: 0.0317
Epoch 6/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0354 - val_loss: 0.0299
Epoch 7/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0340 - val_loss: 0.0287
Epoch 8/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0327 - val_loss: 0.0276
Epoch 9/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0315 - val_loss: 0.0265
Epoch 10/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0303 - val_loss: 0.0254
Epoch 11/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0293 - val_loss: 0.0244
Epoch 12/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0284 - val_loss: 0.0235
Epoch 13/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0274 - val_loss: 0.0225
Epoch 14/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0266 - val_loss: 0.0215
Epoch 15/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0257 - val_loss: 0.0202
Epoch 16/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0245 - val_loss: 0.0189
Epoch 17/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0235 - val_loss: 0.0175
Epoch 18/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0223 - val_loss: 0.0160
Epoch 19/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0213 - val_loss: 0.0146
Epoch 20/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0202 - val_loss: 0.0135
Epoch 21/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0193 - val_loss: 0.0125
Epoch 22/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0186 - val_loss: 0.0117
Epoch 23/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0179 - val_loss: 0.0109
Epoch 24/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0171 - val_loss: 0.0104
Epoch 25/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0167 - val_loss: 0.0099
Epoch 26/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0163 - val_loss: 0.0094
Epoch 27/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0156 - val_loss: 0.0090
Epoch 28/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0154 - val_loss: 0.0087
Epoch 29/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0151 - val_loss: 0.0084
Epoch 30/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0146 - val_loss: 0.0081
Epoch 31/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0143 - val_loss: 0.0077
Epoch 32/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0139 - val_loss: 0.0074
Epoch 33/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0138 - val_loss: 0.0072
Epoch 34/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0135 - val_loss: 0.0071
Epoch 35/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0132 - val_loss: 0.0069
Epoch 36/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0131 - val_loss: 0.0068
Epoch 37/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0130 - val_loss: 0.0066
Epoch 38/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0129 - val_loss: 0.0065
Epoch 39/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0127 - val_loss: 0.0063
Epoch 40/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0124 - val_loss: 0.0062
Epoch 41/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0122 - val_loss: 0.0061
Epoch 42/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0122 - val_loss: 0.0059
Epoch 43/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0120 - val_loss: 0.0059
Epoch 44/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0119 - val_loss: 0.0057
Epoch 45/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0118 - val_loss: 0.0057
Epoch 46/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0117 - val_loss: 0.0056
Epoch 47/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0115 - val_loss: 0.0055
Epoch 48/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0114 - val_loss: 0.0055
Epoch 49/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0114 - val_loss: 0.0055
Epoch 50/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0112 - val_loss: 0.0053
Epoch 51/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0111 - val_loss: 0.0052
Epoch 52/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0112 - val_loss: 0.0052
Epoch 53/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0111 - val_loss: 0.0052
Epoch 54/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0109 - val_loss: 0.0051
Epoch 55/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0108 - val_loss: 0.0050
Epoch 56/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0107 - val_loss: 0.0049
Epoch 57/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0107 - val_loss: 0.0050
Epoch 58/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0107 - val_loss: 0.0049
Epoch 59/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0106 - val_loss: 0.0048
Epoch 60/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0104 - val_loss: 0.0047
Epoch 61/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0103 - val_loss: 0.0046
Epoch 62/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0102 - val_loss: 0.0046
Epoch 63/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0103 - val_loss: 0.0046
Epoch 64/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0102 - val_loss: 0.0045
Epoch 65/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0101 - val_loss: 0.0044
Epoch 66/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0101 - val_loss: 0.0044
Epoch 67/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0100 - val_loss: 0.0045
Epoch 68/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0098 - val_loss: 0.0043
Epoch 69/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0098 - val_loss: 0.0043
Epoch 70/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0098 - val_loss: 0.0043
Epoch 71/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0097 - val_loss: 0.0042
Epoch 72/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0096 - val_loss: 0.0043
Epoch 73/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0096 - val_loss: 0.0041
Epoch 74/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0094 - val_loss: 0.0042
Epoch 75/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0094 - val_loss: 0.0041
Epoch 76/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0094 - val_loss: 0.0041
Epoch 77/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0093 - val_loss: 0.0040
Epoch 78/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0094 - val_loss: 0.0040
Epoch 79/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0093 - val_loss: 0.0040
Epoch 80/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0092 - val_loss: 0.0039
Epoch 81/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0091 - val_loss: 0.0039
Epoch 82/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0091 - val_loss: 0.0039
Epoch 83/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0091 - val_loss: 0.0039
Epoch 84/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0091 - val_loss: 0.0038
Epoch 85/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0090 - val_loss: 0.0039
Epoch 86/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0089 - val_loss: 0.0038
Epoch 87/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0089 - val_loss: 0.0038
Epoch 88/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0088 - val_loss: 0.0037
Epoch 89/100
29943/29943 [==============================] - 0s 4us/step - loss: 0.0088 - val_loss: 0.0037
Epoch 90/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0087 - val_loss: 0.0037
Epoch 91/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0087 - val_loss: 0.0037
Epoch 92/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0086 - val_loss: 0.0036
Epoch 93/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0087 - val_loss: 0.0037
Epoch 94/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0086 - val_loss: 0.0036
Epoch 95/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0084 - val_loss: 0.0036
Epoch 96/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0086 - val_loss: 0.0036
Epoch 97/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0085 - val_loss: 0.0036
Epoch 98/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0085 - val_loss: 0.0035
Epoch 99/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0085 - val_loss: 0.0035
Epoch 100/100
29943/29943 [==============================] - 0s 3us/step - loss: 0.0084 - val_loss: 0.0034
Out[59]:
0.0589564154176633
In [60]:
# plot history
plt.plot(hist.history['loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.show()
In [61]:
# Separated Keras' architecture and synopse weight for later Cpp conversion
from keras.models import model_from_json
# serialize model to JSON
nnetMdl_json = nnetMdl.to_json()
with open("COM_nnetMdl_Rev.json", "w") as json_file:
    json_file.write(nnetMdl_json)
# serialize weights to HDF5
nnetMdl.save_weights("COM_nnetMdl_W_Rev.h5")

# save model and architecture to single file
nnetMdl.save(workDir + "COM_nnetMdl_Rev.h5")
print("Saved model to disk")

# also save scaler
from sklearn.externals import joblib
joblib.dump(varScal, workDir + 'Rev_VarScaler.save') 
joblib.dump(tarScal, workDir + 'Rev_TarScaler.save') 
Saved model to disk
Out[61]:
['C:/Temp/WinProj/CTLEMdl/wsp/Rev_TarScaler.save']
In [62]:
# generate prediction
predict = tarScal.inverse_transform(tarTemp)
allData = np.concatenate([varTest, tarTest, predict], axis = 1)
allData.shape
headLst = [varList, tarList, tarList]
headStr = ''.join(str(e) + ',' for e in headLst)
np.savetxt(workDir + 'COMCtleIOP_Rev.csv', allData, delimiter=',', header=headStr)
In [63]:
# plot Gdc
begIndx = 100
endIndx = 150
indxAry = np.arange(0, len(varTest), 1)
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,0][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,0][begIndx:endIndx])
Out[63]:
<matplotlib.collections.PathCollection at 0x242ccefe1d0>
In [64]:
# Plot P1
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,1][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,1][begIndx:endIndx])
Out[64]:
<matplotlib.collections.PathCollection at 0x242ccc7c470>
In [65]:
# Plot P2
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,2][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,2][begIndx:endIndx])
Out[65]:
<matplotlib.collections.PathCollection at 0x242d6f15dd8>
In [66]:
# Plot Z1
plt.scatter(indxAry[begIndx:endIndx], tarTest.iloc[:,3][begIndx:endIndx])
plt.scatter(indxAry[begIndx:endIndx], predict[:,3][begIndx:endIndx])
Out[66]:
<matplotlib.collections.PathCollection at 0x242ccbaa978>

It seems this "reversed" neural network also work reasonably well. We will further fine-tune later on.

Deployment:

Now that we have trained model in Keras' .h5 format, we can translate this model into corresponding cpp codes using Keras2Cpp:

Keras2Cpp

Its github repository is here: Keras2Cpp

Resulting file can be compiled together with keras_model.cc, keras_model.h in our AMI library.

Conclusion:

In this post/notebook, we explore the flow to create a neural network based model for CTLE's parameter prediction. Data science techniques have been used. The resulting Keras' model is then converted into C++ code for implementation in our IBIS-AMI library. With this performance based CTLE model, our user can run channel simulation before committing actual silicon design.

A quick and easy IBIS modeling flow

For engineers who are new to IBIS modeling, the “IBIS CookBook” [LINK HERE] is a very good reference document to get started. The latest version, V4.0, was created back in 2005. While most of the documented extraction procedures still hold true to this date, some of them may be tedious or even ambiguous in terms of executions. This is particular true for processes mentioned in Chapter 4, differential buffer modeling. Further more, most recent IBIS summit presentations focus on “new and hot” topics like IBIS-AMI modeling methodologies and not many are for the traditional IBIS. In this post, I would like to first review these “formal” process, dive into how each modeling table is extracted and used in simulation, then propose a “quick and easy” method particular for differential buffer. I will then summarize with and this approach’s pros and cons.

IBIS model components:

The most basic IBIS building block, as defined in Spec. Version 3.2, is shown above. Typically at least six tables will be included in an output type buffer. They are IV (Pull-up, Pull-down) and Vt( Rising and falling) under two different test load conditions. Additional clamp IV table (Power and Ground clamp) may be added for input type buffer. After Spec version V5.1, Six additional IT tables for ISSO_PU/PD/Composite currents have also been added to address PDN effects. To create an IBIS model, the data extraction processes start with exciting particular portion of the buffer so that measured data can be post-processed to formulate as a spec-compatible table format. Because a model also has TYP/MIN/MAX skews, so the number of simulations are basically the aforementioned number of tables times three. That is, for a most basic IBIS modeling, one may need to simulate at least eighteen cases (or simulation  “decks”).

To explain a little bit more regarding blocks untouched by proposed new method, I list them in the bullets below:

  • Package/Pin parasitics: IBIS cookbook and normal modeling flow do not mention about this part. Usually a buffer package’s model is extracted using tools such as HFSS or Q3d into a form of S-parameters or equivalent broad-band spice model. An IBIS model can use a lumped R+L+C structure to describe pin specific or package (apply to all pins) specific parasitics. Alternatively, an IBIS model can also use a more detailed tree structure package model shown below for non-lumped structure. Regardless, it’s HFSS or Q3D’s task to convert such extracted S-parameter or multi-terminal sub-circuit into these simple lumped RLC values or tree structures to be included in an IBIS file. It’s a separated process and not discussed here as a part of the buffer modeling.

  • C_Comp: At the very beginning, there is only a C_Comp value between pad and ground and it is used to describe frequency dependent behavior besides the parasitics. Later on, tool like HSpice introduces extra simulation syntax to split this single C_Comp value into branches between pad and various power terminals for better accuracy. Even later, this type of syntax was adopted as part of the IBIS spec. Still, user may only find how a single C_Comp value is computed in most materials. Briefly speaking, they can be calculated using time-domain method based on RC charging/discharging time constant or freq-domain method based on the imaginary current at a particular frequency. How to split this single value into several ones to match the frequency plot best remains an art (i.e. not standardize). In addition, the value C_Comp is not visible during modeling… their effects are only shown when there are reflections back from the other end due to impedance mismatch. What we have found is that usually an IC designer has a better idea about how this value should be and the aforementioned time/frequency domain calculation method may not produce an accurate estimate.

  • Clamp current: Power/Ground clamp currents and Pull-up/Pull-down currents are both IV based (i.e. dc steady state). So they are combined for load-line analysis during simulation. The difference between Pull-up/Pull-down and Clamp is that the latter one (i.e. Clamp) can’t be turned-off. So its effect is always there even when we are extracting IV for Pull-up/Pull-down structures. Thus to avoid “double-counting”, the post-processing stage needs to remove the clamp current from pull-up/pull-down currents first before putting them into separated table. To simplify the situation… particular for an output differential buffer, we may just use IV data even though this is an IO buffer.
  • IT current: These are different dc or transient based sweep in order to obtain buffer’s drawing current when power or ground are not “ideal”. This is important in DDR case when the DQ is single ended and it’s subjective to PDN’s noise. For differential application like SERDES, PDN’s effects are usually present at both the P and N terminals and will cancel with each other. Thus their extraction may be skipped for a differential buffer mostly. One may also note that the IT extraction of composite current is “synchronized” with VT extraction of rising/falling waveform so these current data are extracted with additional “probes” rather than separated simulation.

Full IBIS modeling flow:

The process suggested in IBIS’s cookbook can be summarized as the following steps. They are also implemented in our “Full IBIS modeling flow” within SPIBPro:

  • 0, Collect design data and collateral: A modeler needs to gather PVT (process, voltage, temperature) data, silicon design, buffer terminals’ definitions and bias conditions etc. A buffer may have several tuning “legs” and bit-set settings so a modeler needs to determine which will be used for TYP, MIN and MAX corners.
  • 1, Prepare working space: Create a working space on the disk.
  • 2, Generate simulation inputs: Generate simulation “decks” to excite different block of the buffer…one at a time. So one will have eighteen or more decks at the end of this stage waiting to be simulated.
  • 3, Perform simulations: Perform simulation either sequentially on a local machine or with a simulation “farm”. Double check the results and make sure they make sense, otherwise, go back to step 0 to see which settings may be incorrect or missing.
  • 4, Generate IBIS model: Post-process the simulation data and generate IBIS model. This is usually done by the tool like ours as manual process is tedious and error prone.
  • 5, Syntax check: First quality check of an IBIS model is that it must pass the golden checker. The check here is mostly syntax-wise though there are also basic behavior check such as monotonicity or DC mismatch etc.
  • 6, Validate IBIS model: A formal validation for an IBIS model is to hook-up test load and make sure they produce correlated results comparing to those from silicon at the end of step 3 above.
  • 7, Performance report: The modeler needs to extract the performance such as PU/PD impedance values and slew rate etc. for documentation purpose and check against the spec. or data sheet.

Full step-by-step modeling flow in SPIBPro

Data extraction for a single-ended buffer:

For a single-ended buffer, the first hurdle in the modeling process is to make sure each blocks are excited properly and simulation results make sense. As mentioned, there are at least eighteen simulation needs to be done:

There are also some complications regarding the DC simulation part: some of the buffer may have “clocking” and it’s not easy to separate them from the buffer iteself. Also,  there may be many RC parasitics between nodes for a buffer netlist extracted from post-layout. In other cases one can’t even separate the actual IO part from the pre-driving portions and the resulting circuits to be simulated become huge and time consuming. These situations will make IV data extraction slow and often problematic. As a result, a simple step 0~7 modeling process may not work properly and one need to iterate to tune the set-up such that simulation will always converge and resulting IV curve be monotonic. Nevertheless, the single buffer’s modeling is easier to manage.

Data extraction for a differential buffer:

Differential buffer’s IBIS modeling extends the challenge and effort to another dimension…literally! First of all, each pin in an IBIS file or component connect to an IBIS model and the possible structures and connections between different pins are very limited. So for a differential buffer, a series element needs to be created to describe the coupling relations between pins. All the pictures used in this paragraph are from IBIS cookbook and user may find further descriptions there.

In order to construct such series model, the IV sweep needs to be performed in two dimensions, both at similar resolutions. So if say a typical single-ended IV curve has one hundred points, then the second dimension should also have that much data. That means for one particular corner, there will be one hundred IV simulation in order to construct the 2D response surface shown below. First stage post-processing also needs to be preformed so that common-mode current can be eliminated. All these need to be done before formulating a 2D data view. Only after one can visualize the resulting data, he or she can determine what components are needed to create such series model. This presents the first challenges on top of the IV simulation issues mentioned for single-ended buffer.

The second challenge is regarding the VT simulation. The current flow through this newly constructed series element needs to be “eliminated” to avoid being double counted. For spice-like simulator, there is no such thing as “negative resistance”, “negative capacitance” etc. So one has to resort to approaches like control elements or even Verilog-A (as we presented in IBIS Summit 2016) to have proper VT data extracted. For control-source based approach, it is only limited describe pin couplings of a simple R/C but not non-linear resistance or surface such as series mosfet. For that, an intermediate step to map device or equation parameters to the calculated 2d surface is needed. Even using Verilog-A’s look-up table, the grid resolution is limited by the step size used in first two-dimensional IV step and may have non-convergence issue if it’s to coarse. That’s why in the cook book (the first two lines in the picture below), it doesn’t suggest any approach as it’s really not that easy!

Due to these two great challenges, we have found that differential modeling may not be easy for most modeler. We feel more this way when providing modeling service to clients who wants to perform simulations themselves then send us data. They may want to do so due to IP concern or they knowing more about the design. In those cases, the back-and-forth tuning and tweaking process become a burden on their side and also delay the whole schedule. Thus we are motivated to find an alternative “quick-and-easy” approach to substitute the “formal” modeling steps mentioned above. While being able to simulate accurately w/ great performance is still number one priority, we are ok that they can only be used under some context (such as channel simulation).

Quick and easy approach:

In previous post, we explained how IBIS model’s data are used in a circuit simulation. Simply speaking, the “VT” data is considered as “target” while “IV” tables are used to compute so called “switching coefficients” so that appropriate amount of current will be injected or withdrawn from the buffer pad to achieve. When this is true, the nodal voltage specified by that VT table at that particular time point will be satisfied due to KCL/KVL. Now there are switching coefficients for both pull-up and pull-down structures… thus it takes two equations to solve these two unknowns. That’s why two set of VT, each under different test loads, are required. Based on this algorithm, an IV data and calculated coefficients are actually “coupled” and affect each other. If current in IV table is larger, than the calculated coefficients will become smaller and vice versa. This way the overall injected/withdrawn current will still meet KCL/KVL required for VT. In this sense, the actual IV data is not that important as it will always be “adjusted” or “weighted” by the parameters.

On the other hand, the VT data also contains several DC points and they need to be correlate to the IV table, otherwise DC mismatch errors will be thrown by the golden checker. In addition, the IV data is limited to 100 points and they need to be monotonic to avoid convergence issue. So if we have several sets of VT data and one under normal test load (say 100 ohms for a differential buffer), then they will give us “hints” regarding how IV data will look like.

With this assumption, we propose the following quick-N-easy modeling steps:

  • Connect the silicon buffer to nominal loading conditions and obtain VT simulation data
    • For Single-ended, these are simple VT waveform under two different test loads;
    • For Differential, say use nominal 100 ohms first and see voltage range between V1 and V2
      • Let V3 = (V1 + V2) / 2, use VFixture = V3 and RFixture = say 40 & 60 respectively to obtain two waveforms;
      • Alternatively, use RFixture = 50 and VFixture = say (V1 + V3) / 2, (V2 + V3) / 2 respectively to obtain two waveforms;
      • The main goals is to have two set or set-up covering operating range when a nominal test load (say 100 ohms) is used.
  • Obtain C_Comp values from buffer IC designer
  • Obtain voltage range, temperature etc parameters.

And that’s all, through carefully implemented algorithm and computation, we can generate an IBIS model based on these data with minimal simulation requirements. An the generated model is guaranteed to be error/warning free.

While we will not disclose how these are actually done in details, we can show how they are incorporated in our SPIBPro… as shown below. As a matter of fact, this process has been used in the modeling projects of past year and shown great success.

Only two VT simulation data are required to create an IBIS model

Pros and cons:

We use this approach to create differential IBIS for channel analysis purpose (together with AMI) and have not yet found any problems. Having that said, I would offer several pros and cons for reader’s considerations:

Pros:

  • Minimal simulation required and easy to perform;
  • Will be mathematically correct: no DC mismatch or monotonic warnings, output will match provided VT waveform under nominal test load.

Cons:

  • May not be accurate if the model is used for DC sweep as the IV data in the model are artificially generated;
  • No “disable” or High-Z state as clamp currents (if there are any) has been incorporate into IV data without separation;
  • No Power-aware consideration as ISSO_PU/ISSO_PD generation are not taken into account.

Summary:

In this blog post, we reviewed the formal IBIS modeling process described in the cook book, challenges modelers will face and proposed an alternative “quick-and-easy” approach to address these issues. The proposed flow uses minimum simulation data while maintaining great accuracy. There might be limitations on models generated this way such as neither disable state nor power-aware data are accounted. However, in the context of channel analysis particular when a differential model is used together with its IBIS-AMI model, we have found great success with this flow. We have also incorporated this algorithm to our SPIBPro so our tool users can benefit from this efficient yet effective flow.

IBIS-AMI: “Paring-off” of the models in three scenarios

Here at SPISim, we provide AMI modeling service in addition to the developed EDA tools. Companies interested in AMI service are mostly IC/IP vendors. They need to provide AMI model in particular so that their user, system companies which use their ICs, can perform the design and analysis. Oftentimes, our client is only interested in AMI modeling for either Tx or Rx circuit because that’s where their IC design resides. However, they also often found later on that in many scenarios, it actually takes “two” models to be useful for their users. In this post, I will talk about some of such “paring-off” cases so that interested reader may be well informed when such modeling needs arise.

Tx and Rx:

The topology shown above is most commonly used for link analysis. At the Tx and Rx ends, user often is asked to specify corresponding IBIS models. Most definitely, they need to specify AMI model and associated AMI parameters. The interconnect in the middle is generalized here… it can be Tx package, main route, connector and Rx package etc represented in either separate models or cascaded S-parameter. The whole purpose of interconnect here is to provide an impulse response so that when link simulator is operating in either “Statistical” or “Bit-By-Bit” modes, AMI models at the Tx and Rx ends will be able to process accordingly. The analysis flow an AMI-compatible simulator should follow is described in details in section 10.2 of the IBIS spec. User should find that in those descriptions, and almost all the tutorial documents on-line such as the one below from KeySight, both Tx and Rx AMI models are both required to be part of the analysis.

Thus, the first “paring-off” in AMI modelings is each Tx needs an Rx AMI models to work… at least mostly. From this perspective, this is very different from traditional spice-based simulation, where you may have Tx/Rx part in either IBIS, transistor circuit or behavior/Verilog-A modeling format and can simulate without any issue.

So what if you have only either Tx or Rx AMI model? It will be tricky (depends on the simulator) if not possible in many cases. So we here at SPISim often have to provide a “dummy” AMI model in additional to the one ordered. As a “dummy” model, it is a simple pass-through without doing anything. When combined with pass-through S-param or interconnect, it will be very straight forward and clear to test and validate delivered AMI model’s performance. When all the spec. are met (e.g. EQ levels), then the user can add real interconnect and Tx/Rx AMI models from other vendors to proceed the design process.

(A pass-through “Dummy” AMI model implements AMI API but without any implementation body or just don’t change those variables passed by reference)

Repeater:

IBIS version 6.0 includes technical advances such as mid-channel repeaters documented in BIRD 156.3. Repeater is often used in longer SERDES channel such as PCIe or HDMI etc. There are two types of repeaters: redriver and retimer.

  • Redriver: A redriver performs signal conditioning through equalization, providing compensation for input channel loss from deterministic jitter such as inter-symbol interference.
  • Retimer: A retimer is a mixed-signal device that includes equalization functions plus a clock data recovery (CDR) function to compensate both deterministic and random jitter, and in turn transmit a clean signal downstream.

It’s easy to see the “paring” part within a repeater:

Within a repeater, Rx sits at the front to receive signals from upstream, then pass to Tx part of the repeater to transmit again toward the downstream. So one really can’t work without the other. Plus also “paired” Tx AMI at upstream and final Rx AMI at the end of the downstream, it takes at least four AMI models in most cases to start the analysis. More than one repeaters are also allowed so the number of “paired” AMIs can grow. The proper analysis sequence when repeater(s) is involved is also documented in 10.7 of the spec.

So far as repeater being an “electrical” component is concerned, there may be no such needs to probe signals between Rx and Tx within a repeater. However, when we model an optical link as a repeater (shown below from Agilent slides), then thing becomes much more interesting:

We handled such a case just recently… an optical transceiver used in an (electrical) system may be modeled as a repeater. Having that said, please note that in “optical” world, Tx is now again at the front of the repeater as laser is used to light the optical fiber to transmit signal toward Rx (now the 2nd part of the repeater). So a potential confusion may happen here when optical and electrical worlds meet. In addition, there might be two other considerations:

  1. There may be needs to probe signal between optical’s Tx and Rx ends, which can be single-ended-like (optical pulse in mili-watt)
  2. Users of such optical module based repeater may only focus on design at the upstream side, the downstream side or both. In the former two situations, they may not have AMI models at the other side.

The first one above is tricky to solve… mainly due to the non-differential nature of signals in a link simulator. Until now, if one reads the AMI portion of the IBIS spec, he/she should notice that the signal mentioned are always differential. So while there is no problem to create an AMI model handling single ended signals, how the probing/analysis goes really depends on simulator being used. Having that said, with DDR adopting AMI methodology, the single-ended signal support should happen very soon.

To accommodate needs of different possible users of our clients, we make use of the “pass-through” dummy model again and proposed the following different AMI models combinations:

In usage scenario 1: The modeled Tx and Rx optical ends (each have their own circuitry) are assembled accordingly. Thus if a simulator can handle “single-ended-like” optical signal in between, then this model can be used directly.

In usage scenario 2: The signal between optical ends within a redriver is not of concern and the support of single-ended-like signal is uncertain. In this case, we “lumped” the TxRx together and put it at the front of the redriver. The second half of the redriver is a simple pass-through. As input and output from the first part are always differential, it will meet the expectation of the simulation flow documented in the spec.

In usage scenario 3: we assume the user is only interested in the downstream end and have only Rx AMI model. In such a case, a repeater will not work because it’s lack of the up-stream. So we combined the created TxRx here and make it a transmitter AMI.

In usage scenario 4: we assume the user is only interested in the upstream portion and only have Tx model. Similar to scenario 3, we combine the TxRx of the optical models and make it a receiver AMI.

In the “paring-off” for a repeater, there are two points worth mentioning:

  • Unless DFE/CDR are being modeled and thus retimer, distinction between Tx and Rx EQ are often blur particular when they both are LTI. For example, either Tx or Rx can have a FFE EQ and CTLE of a Rx is also just a passive filter. In such case, a model can be used as either a Tx or Rx (like scenario 3 and 4). The only thing needs to pay attention to is the jitters assignment as AMI parameters for Tx and Rx’s jitters are different.
  • We want to cover all usage scenarios and make our client happy. However, we also don’t want to over burden SPISim engineers too much in creating too many different AMI models. Thus as mentioned in previous post, the architecture of AMI modeling becomes important. When done properly.. (such as in our cases 🙂 ), we can simply cascade stages easily at the .ami file level without even rewriting C/C++ codes or any re-compilation.

IBIS-AMI and IBIS:

When we focus on more technically challenging AMI models, we often forgot still important role a traditional IBIS model will play in link analysis. Simply put, the accuracy of an IBIS model still impacts the final results (e.g. BER eye) greatly.

The normal process of the link analysis starts with “link characterization”. Mostly this involves a time domain simulation of driver driving through interconnect with receiver analog front-end (no EQ) connected to obtain the pad to pad step response. Then this step response is differentiated to obtain the impulse response. Finally AMI models are brought into play by convolving with impulse response (LTI or statistics mode) or processing with bit response sequence (NLTV or bit-by-bit mode). In this characterization stage, no EQ (i.e. no AMI) is involved. It’s is their “analog front-end”, i,.e. IBIS models or corresponding spice/behavioral models being used. Thus, the accurate modeling of this portion does impact the processed link analysis results.

In the picture above, eye plots of identical AMI model yet each with different IBIS models are shown. It’ clearly seen how IBIS’s VT waveform and IV curves affect the eye openings width and height. Thus the third scenarios we want to emphasis of IBIS/AMI “paring-off” is the AMI model and their corresponding IBIS model.

SPISim is a relative small EDA/consulting company comparing to others such as Agilent, Cadence or Mathsoft (matlab). These companies each have their own AMI modeling solutions and cost much much higher when comparing to our offerings. However, when we look into their AMI product details, it’s not easy for us to find (or maybe even not there?) how the IBIS portion will be handled. As an AMI model is mostly used for SERDES application, differential or current-mode-logic (CML) design is often involved. If one reads chapter 4 of IBIS cookbook carefully (available at the IBIS website), he/she should find that the process of differential IBIS modeling is more complicated than the simple single-ended IBIS buffer. Thus when considering the importance of “paring” between AMI and IBIS, one should really take this into account. An AMI model without proper analog front end will definitely come back to haunt you during validation stage. Interested reader regarding differential IBIS modeling may also refer to our previous post or our 2016 Asian IBIS summit paper.

IBIS-AMI: A case study

IBIS-AMI modeling is a task usually executed at the end of IP development process. That is, hardward IPs are created first, then associated AMI models are developed and released to be accompany with this hardware IP since the IPs vendor usually does not want to expose the design details to the their customers. On the other hand, it is also likely the case that a user may have received behavioral or encrypted spice models first, or even obtain measurement data from the lab, then would like to create corresponding AMI models for channel analysis. Such needs arise usually because either original IP vendor does not have AMI model or may charge too much. It is also possible that end user would like to explore different design parameters before determining which IP/part to use.

In this post, we would like to discuss how such an usage demands can be accomplished with SPISim’s AMI flow without any C/C++ coding or compilation for AMI. Here are the topics for this case study:

  • Collateral
  • Design spec. and modeling goals
  • Modeling process
  • Model release
  • Other thoughts

Collateral:

For this design, we received a testbench for a typical SERDES channel using this IP. The schematic is shown below:

Both Tx and Rx are encrypted hspice models with adjustable parameters. When picking one set of parameters to test run the channel test bench, we get valid simulation results.

As this is a hspice netlist, we have the freedom to probe different points and change the simulation options as needed (for example, from transient to ac or transfer-function).

Design Spec. and modeling goals:

The original IP vendor only provides simple descriptions for various blocks and design parameters. The Tx and Rx packages are spice sub-circuits and channel is a s-parameter file. Both Tx and Rx model has typ/min/max corners. On the Tx side, there are settings for voltage supply which is independent with the corner settings, a 7-bit resolutions for a multiplier to work with the swing amplitude, a 6-bit resolution for a de-emphasis and a flag for turning-on/off the boost.

With these IP design spec, the modeling goal is to create associate AMI models which will allow end users to fine tune performance with similar parameters like original IP. Further more, SPIPro is used for its spec. AMI model so that model developer should not need to write any C/C++ codes or perform any .dll/.so compilation for this AMI modeling.

Modeling process:

If one needs to create AMI models from simulation or measurement results, that usually means he/she does not have original design details or even design spec. available. So the so called “top-down” approach will not work simply because there is no design at all to translate to corresponding C/C++ code. Instead, we need to create an architecture via trial-and-error to “reverse engineer” the design so that the performance will match what’s given. Here are the steps we used in this case:

  • Create Tx/Rx test bench:

While the full channel test bench demonstrate how the Tx/Rx work, it also includes information not needed in the AMI modeling. For example, package and channel are usually separated from the AMI model because they are usage or client specific. Besides, as mentioned in previous posts, one assumption of AMI model is high-Z input and output. So the first step of the modeling is to separate channel from driver/receiver and create Tx and Rx only test bench together with inputs and outputs.

For Tx, it’s a simple PRBS input with output connecting to 50 ohms loads.

And its input/output should look like this:

When we modify the test bench to connect Tx directly to Rx, i.e. bypass all the package and channel models, we get such input/output eyes:

From these eye plot, we do not see DFE like abrupt force-zero output, thus the Rx model seems to be a CTLE like boost filter. So we created Rx test bench like below and obtain its frequency response:

Note that in order to perform what-if analysis of next step, we also use VPro to capture inputs to these Tx and Rx models so that they are evenly spaced.. as required by AMI’s Init and GetWave function. These evenly spaced inputs will be used later on by SPISimAMI.exe to drive the model. For HSpice, the following option may be needed so that the .tr0 file will have evenly spaced data points even simulation is various time-step sized:

  • Define architecture:

Now that we have input data and desired output for this particular set of parameters, next step is to explore whether SPISim’s existing spec. model is sufficient to meet the performance.

Tx: for tx module, we observed the following two characteristics:

  1. Has de-emphasis, happened after the peak
  2. Swing range and offset are different between input (0 ~ 1) and outputs (-0.5 ~ 0.5). Also there are loading effect so rising and falling have RC charging/discharging like behaviors.

For the de-emphasis part. We can use Spec. AMI’s what-if analysis to see how different pre/de-emphasis settings will cause difference responses:

We change these pre and post cursors one at a time only, using -0.1 for value .

We then test drive using built-in PRBS with same UI time, 200ps:

From the summary plot above, and correlate with the Tx output, it’s apparently that this Tx has one post-cursor FFE beause only this set-up gives de-emphasis after one UI after the peak.

For the output swing and slew rate, it’s the sign of AFE (analog front-end). So we can simply add an AFE stage right after FFE, with high/low rail voltage “clamped” to the simulated peaks:

With several trial-and-errors, all within SPISim’s Spec-AMI GUI, we obtained the following Tx correlation between spice test bench and AMI results for this particular set of parameters:

Similarly, as Rx behaves like a continuous filter, we use a CTLE stage to mimic its behavior:

With several tuning, we obtain such Rx correlations:

From these correlations, we believe that the stages built-in to the SPISim’s AMI are sufficient to meet the performance needs. So next step is to find the parameters to support all these design specs.

As similar process between Tx and Rx, we will use Tx as an example to demonstrate the flow.

  • Simulation/Sweep plan:

Before generating a simulation plan, we first also make sure these design parameters are independent with each other. We performed simple 1D sweep (several points) in particular for the swing and de-emphasis level.

Notice that the slew rate and peaks are not affected by de-emphasis level much. In addition, the spacing seems linearly spaced according to the bit settings. With this, we can sweep them independently.

Now for Tx, a full factorial sweep will require 2^6 (de-emphasis level) * 2^7 (swing level) * 2 (boost flag) * 3 (corner) * 3 (voltage levels) ~= 150000 runs. Due to the linearity, we do no need to sweep full 6 and 7-bits of de-emphasis and swing levels. Instead, we increase step interval for simulation because SPISim’s model’s built-in interpolation can be applied automatically, more about this at the model release section.

For these five variables, we use MPro to generate 2403 test case (1/60 of full factorial). We then create a template from the Tx spice test bench so that parameters can be updated according to the table and its column header above:

Noticed that variables are enclosed with “%” (e.g. %XVPTX%). Their values are replaced by corresponding values in each row of the table to generate a new spice file. With all these 2403 spice file generated, each for different combinations of parameters, we kicked-off the batch mode simulation and obtained all two thousand cases within one hour.  This is done all within MPro and can also be done using user’s script.

  • Parameter tuning:

With all results being available, next step is to find corresponding AMI parameters for each of the case. SPISim’s model driver, SPISimAMI.exe, is a must have here. It will take the collected, evenly spaced inputs to drive the AMI parameters together with the SPISim’s spec. model, output results can then be correlated with the .tr0 file simulated with HSPice for that particular case. In case when the correlation is not good, AMI parameters should be updated again for another iteration of performance matching. This process is summarized in the flow below:

Apparently, such as task is too tedious and error prune for a human to perform. Since the input pattern is the same, relative timing locations of different properties (such as peak and de-emphasis levels) are also the same even between different test cases, we can process these data automatically. Regarding “adjusting” AMI parameters, a simple bi-section search will quickly converge to the desired de-emphasis tap values for the given test case.

In our process, we first parameterized the .ami file as shown below. This ami file is based on the architecture identified in the second step… only that now we need to find different values for different parameters. In particular, the swing level and FFE post tap are unknowns and need to be swept.

Because these two variables are independent, we can first identify the peak value as “Swing” from the .tr0 simulation results, then use bi-section to adjust the FFE_PARM_POS1 such that the output from SPISimAMI for this .ami file settings and SPISim’s spec .dll/.so will match the tr0 data.

With this flow and process, we use our SPIProcPro to bath process the simulation results and get all 2403 fine-tuned AMI parameters within one hour:

Here is the partial config. settings for SPIProcPro of this modeling process:

For each of the test cases, the post-processed AMI result is saved to an individual .csv file. All these csv files are also combined and then concatenated side-by-side with original input condition. The end result is a table with input condition at the left and associated AMI parameters at the right. This table will serve as pre-set data table for our AMI models to be released.

 Model release:

The combined table looks like below:

The columns boxed in blue above are those required for SPISim’s AMI spec models. These columns has header named exactly the same as those in the ami file green boxed below. The row entries for settings of each test case in the csv table is filled in to the boxed in red in ami file below.

Now this is not the format to be released to the AMI user. In stead, the design parameter we want to expose to the end user are listed in the red box in the csv table. They have the column headers matching original IP’s design parameters. Also note that this csv table only contains partial results as we didn’t perform full factorial sweep for all ~150000 cases.

The design parameters, boxed in red in csv file, need to be able to be mapped to those boxed in blue, the AMI stage’s parameters.

One unique feature of SPISim’s AMI model is it supports of table look up and up to two dimensional linear interpolation for numerical values not found. The set-up is easily done from model config. GUI. The example below is for PCIe with 11 preset tables.

In this case, we do not use preset index. Instead, we set the preset index value to “-1” to signify that model’s table look-up will be used:

In the final .ami file above, the exposed parameters (boxed in red) are exactly those defined in the original given IP. Their name and values will be used to look-up from the table to find matching row. If up to two rows have no exact match in terms of numerical value, linear interpolation will be performed by the model during initialization. A final matching row or values will be used to assign to the parameters required by spec model.

The final results of the create model is one .ami file, one .csv file (plain or encrypted), and one set of .dll/.so file. There is no need to write any C/C++ codes or even compilation for the AMI modeling purpose and the SPISim’s spec. models can be used directly. For the post-processing portion, a python/perl/matalb scripts can be used instead if SPIProPro is not sufficient.

Other thoughts:

In this case study, a look-up table based approach is used to create AMI model via reverse engineering. Alternatively, we can create a spice simulation session using our SSolver/ngspice spice engine to simulate these given Tx/Rx spice model directly. The pre-requisite for this approach is that the given hspice model is not encrypted, can be converted to the SSolver/NgSpice based syntax and has no process file involved. Regarding search mechanism, a bi-section searching algorithm is used in our modeling process here as there is only one value (EQ variable) needs to be determined. The amplitude/swing is deterministic and can be found once tr0 simulation results is post-processed. If there are more than one variable involved for optimization , such multi-tap EQs, then a gradient descent based algorithm may be needed if full surface sweep is not practical.

Differential modeling flow: Development

Flow considerations:

Two major considerations when developing a modeling flow are consistency and flexibility. This is particular true when it comes to differential buffer modeling. As discussed in our previous post, a half/true differential buffer goes through most of the same steps as single ended buffer yet certain process must be elaborated (e.g. modeling of the 2d surface sweep) and order needs to be preserved (i.e. extract C_Diff, I_Diff before performing VT simulation). In this post, we will briefly discuss how these considerations are incorporated into design concepts and realized in our SPIBPro modeling flow.

Design setup:

Conceptually, user does not need to know whether a buffer is true/half/pseudo differential before modeling. The IBIS cookbook V4 also states this and also use encrypted hspice as an example… as it’s black box and can’t be deciphered, a model developer needs to assume coupling exists between P and N (thus true differential). The computed differential current from both 2D sweeps of IV and C_Diff will reveal whether such (true differential) assumption is valid. The decision can then be made based on the value of current.. if it’s in uA or nA range, then coupling is insignificant and can be modeled as pseudo differential.

In reality, we may argue that one should start with pseudo-differential approach and switch to half/full differential only if the final validation suggest so. This is because modeling engineer usually can get insights about buffer to be modeled by talking to the circuit designer (usually in the same company, also no needs to reveal or know much of the design details during such inquiry). In addition, the DC sweep of extra dimension and modelings of the half/true differential flow are often overkill for many of the designs.

In terms of modeling setup, only input and output (for N pin) are needed in addition to common single-ended buffer. A model type selection of “differential” is sufficient to indicate the differential modeling flow rather than linear single-ended modeling.

DiffDrvCfg

Modeling flow overview:

The modeling flow is summarized in the picture below: the key change here is that flow is not linear anymore. The simulation and post-processing stages need to be gone through twice. When simulating for the first time, only IV/C_Die sweep are needed. The data is then post-process for the first time to extract the common-mode and differential-mode current. The latter will also be modeled as a separated component in this stage. This differential component is then inserted between P and N pins in the transient netlist and simulated in the second iteration of the simulation step. However, only VT data is needed this time. When that’s done, all IV, VT and C-Die data are now available and the flow can continue linearly just like that of single-ended buffer.DiffMdlFlow

Modeling of the IV data:

While PU/PD is steady state DC sweep in theory, their simulation is performed in “pseudo DC” most of the time due to likely existence of clock signal, thus not possible for true DC sweep. In “pseudo-DC” simulation, voltage are swept very slowly. In addition, sweeping along X coordinate of different Y values are mutually independent. So they can be simulated in parallel (multi threaded or can be distributed). Simulation under certain biasing condition may have convergence issue, so a flow must be tolerant about missing sweep data on some grid points due to non-convergence.

In our SPIBPro example below, sweep of different Y bias points (e.g. voltage at output N) are separated in different .sp files and they can be simulated in parallel.IVSweep

The task of post-processing step in the first iteration is to extract the simulation data just swept, calculate common-mode current, shift the surface data vertically, summarized as a table output and also perform initial modeling. Such a table is important for user to validate the the result and also perform some what-if modeling using tool like Excel if neded.

DiffMdlCsv

  • Surface modeling:

    Depending on the symmetric differential mode data of shifted surface table (data on the surface alone the blue line), one may decide whether it’s pure resistive, linear or non-linear. The symmetric differential mode is orthogonal to the zeroed common mode curve (red straight line) and it represents the most likely differential operation (i.e. symmetric output of P and N)  For example, the surface plot below suggest a resistor will suffice for the content of the series element:

DiffMdlSerR

With the csv table, one can quickly confirm whether the resistance is same across different voltage or not. If positive, then a linear resistor can be used. Otherwise, the non-linear resistance need to be modeled either with PWL spice element or as a series current (another IV table) in the series element. To be able to visualize the data with certain degrees of manipulation, a flow needs to have such 3D plotting capabilities. Otherwise, tool like matlab and familiarity of its syntax becomes necessary.

DiffMdlSurf

If the sweeping data shows surface like above, then either a surface modeling is needed or a “series MOSFET” needs to be created  as a series element. For surface modeling, one can use “response surface modeling” like method to calculate fitting coefficients in minimizing the mean squared error sense. Other general modeling approach like neural network is certainly also possible.

DiffMdlRSM

One should also check the residue of the prediction formula and maybe visualized as a scattered plot. SPIMPro’s general modeling and plotting functions are demonstrated below:

DiffMdlRSM2

DiffMdlRSM3

A good fit should have very small residue, shown as grey line across 0.0 in the picture above (other red dots are nominal values from sweeping grids). With valid results, one will need to translate this “prediction formula” into spice netlist using E/F/G/H control elements.

In matlab, similar process can be done using “lsfit” function.

  • Series element:DiffMdlSerThe same table content is sufficient to construct a series element. The steps needed here is to translate data into a IBIS compatible format. Such process is trivial when the model is simply R/L/C. For series current and series MOSFET (can have up to 100 tables of different bias condition), the attention needed to perform such work manually is not economical and a tool/flow should be used instead.

Modeling of the C-Diff:

CDiff

Similar process (tabulated and modeling) can be applied to calculated C_Diff. However, it’s much more limited in terms of series element’s syntax and E/F/G/H equation such that describing such surface (both frequency and voltage dependent) for C_Diff is not practical using either spice or series element. As a result, a trade-off needs to be made when picking values (or to average) from the surface and a single value is used when constructing model for C_Diff.

Verilog-A modeling:

In our submission to the IBIS summit later this year, we proposed another flow which is Verilog/VHDL based. One of the advantages of this implementation is that the raw, table-like data can be used directly using built-in $table_model function. Its usage also enables polarity differentiation and elaborated description of C_Diff, which is very crucial to the transient data accuracy. The details about this will be published here once our paper is accepted.

Combined model:

In previous post, we mentioned that in addition to the “series element” for differential model, pure [External Model] of the differential model type may also be used. Using the proposed Verilog-A/VHDL for series element only model, these two methods can then work together, as shown below:

CombSer

A series model is still declared in the “[Series PIn Mapping]” section to be connected between buffer’s N and P outputs. Its definition, can optionally include an external model such as the one implemented with behavioral language. This “[external model]”  works on top of existing series model definitions and can provide extra info. such as frequency/voltage dependent C_Diff if the simulator supports. Optionally, the “top” series model can be simply a  shell (e.g. with high impedance) and all the info. regarding the differential current is encapsulated inside the added model. When comparing to external model attached under output/IO buffer, this external model can be significantly less complicated and also provide more tuning capabilities.

Differential flow validation:

To validate a differential modeling flow,  one may construct a differential buffer by creating an artificial coupling (e.g. using some R/C elements) and then connect it between two known IBIS buffers. During the validation, the calculated common-mode data from DC sweep should reconstruct exactly the same PU/PD/PC/GC tables as those in the known IBIS buffer. Differential current will reveal the resitive element of the coupling portion. Calculated C_Diff should reveal the coupling capacitive element. Both of these steady state differential current and C_Diff can be validated using generated csv table or raw data. User should then find that with both captured accurately, the transient VT data calculated will also correlate to those of original IBIS buffer very well.

Paper and audio recording:

We presented this study at the 2016 Asian IBIS Summits. Readers may download the presentation from IBIS website or [HERE]. Audio recording at Tokyo is also available [HERE]

Differential modeling flow: Revisit

In the past, we have blogged several articles regarding buffer modeling. The flow and focus there are mainly for single-ended. The considerations are that:

  • Single ended (SE) buffer modeling is a good introduction to modeling flow in general without further complication;
  • (Pseudo) differential buffer can actually be modeled as two single-ended buffers;
  • Many (true/half) differential buffer can be modeled like pseudo differential buffer without significant loss of accuracy [LINK]
  • Differential buffer modeling involve several steps which are a little more complicated.

During our workshops in Taiwan last month, several attendees asked about CML (current-mode logic) modeling. With the maturing of the concepts in SE modeling and the prevalence of SERDES interfaces such as USB, SATA, PCIe etc, we think it’s a good time to revisit the differential modeling flow in more details. We will also show how this design concepts are realized in our SPIBPro in next post.

Note that the contents of this and next posts are also written in conjunction with the paper submission to the IBIS summit later this year.

Model descriptions:

[The info. here are summarized from the IBIS cookbook V4]

A differential buffer can be categorized into three different types:

  • Pseudo differential: shown in the rightmost of the picture below. The “P” and “N” driving portion are mutually independent. There is no coupling between these two pins in the circuit.
  • Half differential: shown in the middle. the pull-down portion of the “P” and “N” are coupled via a current source or shared load.
  • True differential: shown in the leftmost. Both the pull-up and pull-down circuits are coupled via “current-mirror” and shared current source.

DiffBuffer

For pseudo differential type buffer, the single ended buffer modeling flow can be applied directly. One only needs to generate these two “uncoupled” buffer separately (or use the same model with input flipped) and describe their differential behavior using the “diff pin” IBIS keyword:

DiffPin

Note that this “Diff Pin” keyword is also optional. So one may also simply instantiates two instance of the single ended buffer and use them together outside the IBIS file, such as in the spice netlist.

True/half differential buffer:

For a true/half differential buffers, the existence of “coupling” between P and N circuits must be described using existing IBIS syntax. There are two ways to do so: either using “series” or “external model” keywords:

  • Series: There are two places series needs to be declared: the first one is “[series pin mapping]” in the model headers and the other one is “series model” of the “coupling” circuit in the “[model]” descriptions:SerPinTake the picture shown above as an example. Under the “Series Pin Mapping” keyword, a model named “R_SERIES_100” is declared to connect pin 1 to and 2. Another instance of such model also sit between pin 3 and 4. Pin 1~4 each has its own model defined in other part of the ibis file already. For example, pin 1, 2 may have an output buffer connected while pin 3, 4 have open-drain buffers. This R_SERIES_100 model must have type “Series” defined as part of the IBIS file. Since “Series” is one of predefined IBIS model type, its contents (keywords) are not free form and must be one or more of the following series elements: R, L, C, Series current and Series MOSFET which contains up to 100 I/V tables under different biasing voltages.SerElem
  • External Model:  In the IBIS spec, four different differential specific model types are also support.:DiffMdlTypeThese differential model types must be implemented with language such as Spice, VHDL-AMS, Verlog-AMS, IBIS Interconnect Spice Sub-circuits (IBIS ISS) and declared with the “External” model section. These languages provides much more flexibility in terms of modeling capabilities, yet they also diminish the portability of the generated IBIS model.DiffMdlDescIn the example above, a separate file “ideal_driver.vhd” must be provided outside the IBIS file and an “entity” of name “driver_ideal” needs to be defined. The port connections is described using reserved keywords after the “Ports” statement. In the IBIS Spec, all possible ports are pre-defined and the declaring order here must match their definitions in the associated Verilog/VHDL etc file.

Modeling process:

Pure method 2 implementation, i.e. coding in language like Verilog/VHDL, is up to the model developer. On the other hand, the syntax and structure of series elements, used in method 1, is strictly limited. Thus the modeling process below focuses on method 1 above, i.e. the series model method.

As the coupled half/true buffer described by separated single drivers and connected via a series model for the coupling, the modeling flow must somehow be able to extract the data for these blocks independently so that they can be described with conformed IBIS keyword and reproduce the response when putting together.

For the “Driver” block, it’s a generic input/output/IO type buffer. So tables such as pull-up/pull-down/power-clamp/ground-clamp, when applicable, must be extracted. So are the transient waveform under different test loads. The extraction must be done such that the coupling with other half can be separately as an independent “series element” model.DiffDrvDataExt

  • PU/PD: A two dimensional table produced by sweeping at both the P and N outputs are performed, as shown in the upper right block above. One major assumption for PU/PD extraction is that when both output pins are at the same voltage level, there will be no current flowing through the series model (i.e. the coupling portion). So in that condition, we can assume the series model does not exist at all. The current flows into or out of the buffer at that point is considered as “common-mode” current. This common-mode info is than extracted from surface table and modeled as pull-up and pull-down IV. The whole surface is then shifted vertically by subtracting this common-mode current at every x/y grids. The resulting surface table is considered as the differential current (resistance portion of the series model) and will affect the output when voltage at N and P are different. Note that to satisfy the aforementioned assumption of zero current through series element at common mode voltages, the sweep done at the P/N pins must step very slowly such that the charging/discharging of the reactive components, such as C_Comp or C_Diff can be neglected.
  • PC/GC: Clamp table in IBIS are always optional. On the other hand, when a circuit have current which always exist and can’t be turned-off, these data must go to PC/GC to avoid being double counted. That is, the PC/GC table must account for current which always exist and this current will be subtracted from the PU/PD table during the modeling process to avoid being double counted. Situations like this include the ESD circuitry and/or pull-up portion of the half-differential buffer. For PC/GC extraction, common mode sweep like that for PU/PD needs to be performed while buffer is disabled (put into high-Z state). The common mode current will then be extracted as the PC/GC table. Differential mode current is disregarded as it has been accounted for in the PU/PD differential data.DiffDrvCDifExt
  • C_Comp/C_Diff: An important step here is to extract and compute the differential capacitance, C_Diff. This has to be done before VT simulation. It is because C_Diff will affect the transient state differential current, thus it must be accounted for before VT transient simulation. The extracted value will be part of the series element model by adding as C_series. In real design, the C_Comp and C_Diff are both frequency and voltage dependent. So an user can perform 2D sweep in these two dimensions and select a proper value for C_Diff. This formula to calculate c_diff/c_comp is shown above.
  • VT waveform: With both static (resistive) data being processed from PU/PD/PC/GC simulation, and the reactive data, i.e. C_Diff, computed from the C_Die simulation. these two info can then be modeled together to be eliminated during VT simulation. The considerations is that: once the effect of coupling part is removed, then the remaining data can be considered as single ended and model accordingly. Having that said, reader should realize that there are much more details involved in this step. For example, how do you model these data? Two approaches may be considered:
    • Use E/F/G/H control element: the surface data can be fit to minimize the mean squared error. Resulting coefficients are then realized using poly function or like of a control sources.
    • Create a series element: decide the proper components and then create a series element.

Interested user may find some study presented in IBIS summit, such as [this one]

As one can see, the modeling process for a differential buffer is not linear and a certain order must be followed. Inaccurate common-mode extraction will affect IV tables and cause dc-mismatch of the steady state. In addition, failure to extract C_Diff and model different current accurately will greatly impact the accuracy of the transient data. We will see how a modeling flow should take these into account and be validated easily. These are also design concepts behind our SPIBPro flow.

 

Simulate IBIS Data with Free Spice [Q & A]

I presented our developed “Ibis to Spice” flow last week at the IBIS summit of 2016 DesignCon. It was well received. In this post, I recapture discussions raised in the Q & A section for those who didn’t attend the summit.

  • Q: Does free spice have IBIS parser?
    • A: Nope. Neither NgSpice nor other free spice simulators that I am aware of come with IBIS parser. As a matter of fact, none of the free spice supports IBIS model at this moment. However, interested user may see what pyIBIS has offered and make use accordingly. Lastly, accordingly to the IBIS website, one can pay about $2500 and obtain source codes of golden parser. In our tool, we do have built-in IBIS parser implemented in cross-platform format to support this Ibis to spice conversion flow.
  • Q: Where is ASRC? I can’t find it in Berkeley Spice 3f5’s manual.
    • A: Please try NgSpice instead. Free spice simulator like NgSpice, EiSpice, LTSpice or even TISpice are all Berkeley spice derivatives. Their syntax may be deviates a little bit but the proposed algorithm should work on these simulators with very small syntax changes.
  • Q: Does this template only work for buffer with symmetric rising and falling?
    • A: Not true. The template we have developed will digitize the analog input to a digital one… one with very sharp rising/falling edges. The detection is based on a voltage threshold. So it does not matter whether the rising and falling time of inputs are symmetric or not.
  • Q: Can this spice template account for buffer overclocking?
    • A: IBIS spec does not specify how a circuit simulator should behave when a buffer is overclocked. So the results of a overclocked buffer maybe already circuit simulator dependent. Ideally, a mechanism will make sure current contributed by the pull-up/pull-down branches stays continuous during buffer clocking. This way voltage will also be continuous instead of sudden jump or drop. I suppose a more complicated mechanism can be added to the template to account for situations like these. However, it is not there at this moment.
  • Q: Will there be any issue if I simulate the buffer very long, say for 1us?
    • A: If the parameter used in ASRC element is too small, for example the original value of time “t” used by switching coefficients, then it will be rounded and cause incorrect results. This is because we usually use ps range as time step. That’s why we scale the value up in the template and also account for for this scaling in the pre-calculated Ku/Kd coefficients. If the scaling is 1E9 (as based line for for 1ns), then 1ps will becomes 1E-3 and 1us will be 1E3, both should not be rounded by the simulator. Thus I think this approach should work for long simulation time, such as 1us.
  • Q: Why is T-Line needed? Isn’t it the case that simulator has buffer history which can be accessed for the same reason?
    • A: Yes, most of circuit simulators have a buffer to record nodal voltage history. This way if one decides to use Trapezoidal rule to calculate first derivative, the previous two points can be used. Note that I am using purely spice syntax and element to mimic the dv/dt and delay effect. I have no access to the underlying C/C++ buffer structure, that’s why lossless T-Line is used. With this approach, if one want to get history data of more than one time step before, cascaded T-Line can be used. At each junction of the T-Elements stores the voltage history of previous n time steps (each T-element is one time step delay long).
  • Q: Why is “fine tuning” for handing off needed? Ku/Kd should have reached steady state already before transition.
    • A: When buffer is not overclocked, Ku/Kd will reach its steady states. For example, Kur will be come 1.0 at the end of rising transition so that pull-up branch is fully “ON”, while Kdr will becomes 0.0 so that pull-down branch is fully “off”. The reason we need the “fine tuning” to make sure “handing-off” between Kur and Kuf (Kdr, Kdf) is smooth is purely due to the spice syntax based implementation. Theoretically, it’s not needed. User may take the provided template and comment out the “fine-tuning” section and observe the spikes. If you find out other simpler method to filter out these spikes, please let me know.
  • Q: What is next step?
    • A: We plan to continue the path to support system simulation using free spice. Take NgSpice as an example, there are three approaches to add support for a new device, such as IBIS:
      1. Add as a native device like R, L, C: this requires knowledge of circuit simulator algorithm, the code structure and flow. Developer needs to codes in c and extend the arrays of existing device structure. It requires lots of effort and may take longer time to complete.
      2. Add a code-model library using XSpice extension: this requires knowledge of XSpice extension to NgSpice. Developer still needs to code in c still and compile code model as dynamic linked library, but the structure and flow is limited to XSpice device. There is no need to touch underlying spice infrastructure. It’s easier comparing to approach “1”. The compiled library is platform and os dependent, and is a little bit tedious in debugging (need to attach to a running process as the code model is loaded library). However, the developed code model library can run on any existing or released NgSpice simulator.
      3. Use ASRC element to model the behavior outside the simulator: This is what we have accomplished so far. It works well but performance is not that great.

In current version of NgSpice, IBIS, W-element and S-parameter device needs to be added to support system analysis. We plan to continue to approach “2” above and provide support of these devices in the form of  code-model library.

Simulate IBIS Data with Free Spice [Part 2]

Note:

Interested reader of this post may also want to follow up with the [Part 1] of same topic and try out the [Free Web App] which realizes this conversion flow.

For the motivation behind our Ibis to spice development and Ramp based Ibis2Spice conversion flow, please see [previous post]

Waveform based Ibis2Spice conversion:

Typical IBIS model simulation makes use of its waveform data. Thus a converted spice model from its IBIS model must be able to use these timing related data to have good correlations .

Both voltage and current related data can be timing related: In IBIS model of version 3.2 and up, one ore more “rising” and “falling” waveform may be included. Typically, two set of waveform under two different loading condition will be provided. In case only one waveform is available, simulator may convert ramp data to form additional set of waveform. In IBIS model version 5.1 and up, “composite current” is also included in order to account for simultaneous switching noise. This current information is point-to-point aligned with their voltage counterpart to withdraw current from power supply terminals of the buffer model. Note that what we call “timing” here is actually “elapsed time” after buffer’s input signal switches from low to high or high to low. Two sets of switching coefficients, Ku/Kd for rise and fall transition respectively, are used to mimic the gradually turning on/off the PU/PD branches such that under test load condition, the voltage and current waveform data can be reproduced accurately. By using these swtching coefficients, buffer’s strength will be scaled accordingly depending on the load. User may see [our past s post] for buffer’s operation in more details.

Once we have the related  info. then we can create building blocks and assembled exactly as  they are meant to be used in IBIS model, but now in spice format. So the schematic of our waveform based converted spice model will look exactly like a “stock” IBIS:

Selection_209

To achieve this, i.e. be able to use timing data in converted spice model and run like a stock IBIS, three main challenges need to be overcome:

  • Extract “elapsed time” info. during simulation;
  • Compute the switching coefficients Ku(t)/Kd(t) for both rise and fall transitions;
  • Assembled the block together and mimic actual IBIS operation.

Extract “elapsed time” info. during simulation:

In free spice (e.g. NgSpice)’s ASRC (Arbitrary SouRCe) element, function can be used.  One function parameter which may be used for this timing purpose is the “time” variable:

Selection_210

By using this “time” variable, we can calculate and extract the “elapsed time” info.:

  1. First, we need to be able to differentiate between rising and falling from input. This is because both rising and falling transition needs its own Ku(t)/Kd(t) table. After converting input analog to a digital signal, we can use first directive such that rising and falling can be differentiated. To compute dv/dt, a transmission line (TLine) can be used. Other choice may include using a Laplace equation in control element to mimic the “delay”. The decision of using TLine for the delay is also because in free spice, the control element does not support “delay” parameter like other commercial simulator does. Measurement of voltage differentials across TLine gives us dv and TLine delay is a single simulation step dT. TLine is terminated with its characteristic impedance so no reflection will incur at the far end.Selection_211
  2.  Once rising and falling transitions are separated, we can multiply the converted dv/dt with “time” variable and also use a “latch” to hold the signal at the end of the transition. Again, we use a matched TLine as a “latch”: by feeding near end voltage with the far end one, which is data of previous time step, we can have a latch holding signal indefinitely. The latched signal is show in red of the upper plot below. With another “time” variable subtracting this latched signal, we can then obtain “elapsed” time as show in the lower half of the plot. One caveat is that in typical time domain simulation for SI/PI, time step is pico-second range. Spice will ignore variables of very small value, such as 1E-12 scale, thus a proper scaling will be necessary.Selection_212

Compute the switching coefficients Ku(t)/Kd(t):

With the elapsed time, t, now available, we can compute Ku(t)/Kd(t). There are numerous paper and presentation detailing the algorithms. For example, one may compute this switching coefficients from model data based on the first reference below, or use measured based approach based on spice simulation detailed in second reference:

  1. Extraction of Transient Behavioral Model of Digital I/O Buffers from IBIS, Peivand Tehrani et. al, 1996 Electronic Components and Technology Conference.
  2. General K-table Extraction w/ Spice, Bob Ross, IBIS Summit, DesignCon 2015

In general, waveform data from two different test loads are used to solve two unknowns at each time points during rising/falling transition. This calculation is done offline in advance and the resulting data are again realized in the form of ASRC devices. The PU/PD/PC/GC branches will be exactly PWL based ASRC, only that the PU/PD takes one more input (NKUX, NKDX below) which is the switching coefficients, Ku(t) and Kd(t):Selection_205

Ku/Kd are ASRC with pre-computed coefficients and variable of “elapsed time” computed from the first step:Selection_206

PU/PD branches draw associated  current based on the Ku/Kd value presented as voltage value at node 5 below:Selection_207

Assembled the block together in spice formats:

While netlisting different PU/PD/PC/GC/Ku/Kd blocks together is trivial, a nuance becomes apparent during validation: the assembled spice model will have “spikes” right before and after each transitions which makes the model unusable.

Further investigation reveals how devil is hiding in such details… As explained in our previous post, there is a “hand-off” during transitions:

That is, when we “run out of” Ku/Kd for, say rising transition, we keep using the last point data until buffer is about to switch from high to low state. At that point, “hand-off” happens so that Ku/Kd for falling transition will be used. The “hand-off” process is triggered from input and even a time step delay in terms of table usage switching will cause huge spike and cause problem. With carefully looking into different signals in details and figure out their relation in timing, we have a solution which filters out all the spikes and produce a well correlated results, as shown below:

Selection_202

Both the voltage waveform and drawing current matches quite well. The small delay is insignificant because it’s a constant added to each cycle and will be eliminated in the eye plot.

As we introduce two single time step delay transmission line elements to calculate dv/dt and serve as sample-and-hold functions, they also limit the simulation time step a simulator can take. As a result, the converted model run much slower compare to its IBIS counterpart run in other commercial simulator under test load condition. In extracted channel simulation, the “slow down” becomes less apparent as there are multiple transmission line segments present in the topology. The time step limit introduced by the shortest transmission line is common to all simulators which may not be much longer than the TLines we have introduced.

Another note worth mentioning is that while this post focuses mainly about Ku/Kd and voltage waveform data, Composite current and their ISSO_PU/ISSO_PD are treated similarly: An ASRC constructed from composite current based on time aligned point with voltage counterpart is used to draw current. Another set of scaling table, based on ISSO data ares computed off line in advance in order to take the non-ideal voltage supply into account.

With both ramp and waveform based Ibis to spice flow being available, we are able to perform SI/PI simulation using IBIS on free spice simulators. Such flow has been integrated in our SPIBPro flow as shown below:

Ibis to Spice function is SPIBPro

Ibis to Spice function is SPIBPro

This post is written in preparation for the upcoming IBIS Summit at DesignCon 2016, where we will present this Ibis to spice flow. The presented slides and associated examples are shown below. Reader may also download them from IBIS's official page for this summit.

Presented slides: View externally [HERE]

Example files: [click to download]