IBIS’s buffer overclocking

In our SPIBPro’s Q3’19 update, we added some new enhancements regarding buffer overclocking handling. While the technical details of these capabilities are reserved only for our customers, the general problem statements and solutions are, nevertheless, worth sharing. This short post will cover this buffer overclocking topic and also serve as a future reference for our customers.

Some of the pictures used here are from previous IBIS summit’s presentations. User may find references to these slides toward the end of this post.

What is buffer overclocking:

We all know that the frequency is equivalent to the 1.0 / period. When an IBIS model is used, the minimal/shortest time period is equivalent to the summation of complete rising and falling transition given by the VT data table:

This “shortest” possible time span. thus dictates the highest possible operating frequency. When an input stimulus exceeds this frequency, then this buffer is being overclocked. An overclocked buffer will not have sufficient time to complete either rising, falling or both transitions. As a results, before it’s pad can reach steady state (indicated by the green curve below), it has to make transition toward the other direction again. The outcome of this situation is that there may be discontinuities, glitches or non-convergence during the simulation.

IBIS spec. doesn’t spell out how a simulator should handle in this case. As a result, an overclocked buffer may behave differently across different EDA tools. That’s why in the IBIS cookbook, it’s suggested that a model maker should generate short enough VT duration table so that the buffer can operate fast enough to avoid being overclocked by end user during normal usage. For example, if a USB3 buffer has total VT duration longer than 200ps, then it will definitely be overclocked when user use a normal 5G bps PRBS signal to drive this buffer.

 

Why buffer overclocking is more problematic in a power-aware model:

To make a buffer’s VT table duration shorter, a common approach is to remove the steady state (flat portion) of the leading and trailing waveform. However, when a power-aware model is being considered, this will present some problems:

The picture above shows both the VT data (rising only here) and IT together. IT is the composite current for DDR DQ-like power-aware model. Since the extracted data of the current portion has pre-driver into account, it’s often the case that there will be “activities” (i.e., non-zero current) long before the voltage at pad starts transition. As a result, the amount of leading steady state can be removed in a power-aware model are often bounded by the IT table. And this limitation is usually very “severe” such that the resulting trimmed model still can’t meet the spec.-required operating frequency.

This picture above (from a summit paper) high-light this issue more clearly. It’s also worth mentioning that the “T” of VT and IT table are time synchronized. So one can’t treat these two table separately or shift x-axis time point at will.

Solution to the overclocking issue:

There are several approaches to address this overclocking issue:

  1. An model user may specify the amount of delay to be trimmed by a simulator, or
  2. A simulator may apply “windowing” automatically to find the amount of trimming values and apply before simulation, or
  3. A model maker can create a proper model using approaches to be mentioned in sections below.

The first approach is not only time consuming, but will also easily lead to usage error or non-convergence. The second one may or may not produce desired results. It’s certainly simulator dependent and a model maker/user may not have any control. Nevertheless, the third approach is the preferred one as only the model maker knows how the buffer behaves and exact amount of delay to be removed.

Caveats in model data windowing:

One common mistake of trimming data to avoid overclocking is that different VT tables or IT tables are processed individually. For example, if VT table simulated from one of the fixtures has longer delays than the others, a model maker may tends to remove different amount of delays from different VT tables. Even more severe problem happens when VT and IT are removed independently. All these handling are erroneous due to the following reasons:

  • A simulator usually needs to compute “switching coefficients” to apply to IT table data. Their values needs to be calculated from VT tables of different fixture set-ups. Thus trimming different amount from VT tables from different fixtures may cause singularity when solving for switching coefficients.
    • That means all [Rising waveform] tables should be trimmed with same value;
    • Similarly, all [Falling waveform] tables should be trimmed with same value.
  • A buffer’s rising and falling response may be used together in a differential buffer. So if their timing reference are inconsistent, then the P and N terminals will not make transitions at the same time. This may cause “ledges” in the middle of the voltage swings
    • That means [Rising waveform] and [Falling waveform] needs to be trimmed with same value
  • As mentioned earlier, IT and VT needs to be time synchronized so that gate modulation effect can be properly accounted for.
    • That means VT and IT needs to be trimmed with same value.

To summarize, a model maker needs to trim ALL time-dependent table data together with same amount. While trimming trailing time point are relatively easy (just remove those points will do). Trimming from the waveform’s beginning also requires shifting x-axis points to start from t=0. So this is usually better be done with a modeling tool or flow.

Handling of the overclocking:

  • [Initial Delay] keyword:

Due to the limitation of the pre-driver current showing activities much earlier than the pad’s voltage transitions, a simulator can’t do much if both are bundled together in a device model in a simulator. However, if the IT table can be pulled-out from the IBIS device simulation model and be handled as a time-dependent current source, then things will be much easier. That’s why in the IBIS V6.1, two new keywords are introduced for this purpose. They are V-T and I-T for [Initial Delay]:

Before using these keyword, one should carefully read the related BIRD document (referenced at the end) and the associated section in the IBIS spec.:

For example, it is the simulator’s responsibilities to remove these specified delay from the “raw” model data before simulation. So a model maker should NOT remove the delay themselves…. simply specify the delay values will do.

That is to say, the modeling process may become two “passes”. At first, once the un-trimmed model data are produced, a maker can use an inspection tool (such as our SPIBPro’s built-in specter) to plot all VTs tables together and measure the common minimal delay to remove. Do the same to all IT table as well.

In the second “pass”, one or both of these two different values are specified under the [Initial Delay] keyword to complete the modeling process.

  • Data trimming and auto-tuning:

For a power-aware model which has IT data, the solution using [Initial Keyword] is the best one. For a traditional, or only voltage related, buffer, a model maker has more choices.

For example, a model maker can start by trimming the trailing steady data. Increment the trimming value or also trim the leading portion until the frequency meets the demands.

This method will alleviate the restrictions of simulator’s support for the [Initial Delay] support… thus an older version of simulator will also work. At the meantime, a caveat worth noting is that the distortion of the model data depends on the amount of data being removed:

So if excessive amount of data is removed, the resulting model may not pass ibis checker (will give DC mismatch violation). In this case, further tuning is needed. Our SPIBPro does perform “AutoTune” automatically during trimming process so a trimmed buffer will almost always error/warning free.

Reference:

The overclocking issue has long been there since early 2000. I remember one of the projects at my earlier career was to address the non-convergence issue in Mentor’s simulator brought by user trimmed model. Interested user may find the following reference useful and worth reading:

IBIS’s package model

Recently I gave a training to some of our customers regarding package model handling in an IBIS file/model. Most of the treatments in this topic fall within IBIS spec., extraction and simulator tools used, rather than modeling tool like our modeling suite or SPIBPro. Thus I think it will be beneficial to organize the materials and share through this blog post, while serving as a future reference for our customers.

Package model:

An IBIS file may contain one or more IBIS models. Each of the IBIS models has its own IT or VT tables to describe the behavior of the Tx or Rx silicon design attached to that signal pin. The effect of the package model, on the other hand, is not included in those table data. Information about the package model is “linked” with the silicon portion of the buffer through either IBIS keywords within the same IBIS file, or as a separated file in different formats. It’s the simulator’s job to take both the separated Tx/Rx silicon behavior, as well as package model into account during circuit simulation.

In general, there are two ways to link the IBIS’s silicon behavior with the package model. Each also have two possible routes to achieve the same purpose:

Add package model info:

    1. Inside an IBIS file:
      • As a lumped model
      • As a distributed model
    2. Outside an IBIS file:
      • ICM (IBIS Interconnect Modeling)
      • General spice circuit elements

More details will be discussed below… Please note that these package models should be generated from extraction tool such as HFSS, Q3d etc. The IBIS model or file mentioned here simply serves as a place holder. Circuit simulator will use all these data together during simulation.

 

Package as a lumped model inside an IBIS model:

From the IBIS spec, we can see two related keywords for this lumped model representations: [Package] and  [Pin]

To be more specific, please see this example model:

These values represent the parasitic structures as shown below:

This is the simplest, yet least accurate, method to describe the package information. While the [Package] keyword and data is required, parasitic portion of the [Pin] keyword is optional. When both are present, those in the [Pin] section will superseded (i.e. override) values defined in the [Package] section. Thus, a model maker can use [Pin] keyword to introduce pin specific lumped parasitics while using [Package] keyword for general or default values.

 

Package as a distributed model inside an IBIS file:

A more comprehensive (distributed) package model may be introduced into an IBIS file and model using the [Package Model] and [Defined Package Model] keywords:

Using this method, the component declares the name of a package model it is using (thus the value of the [Package Model] is simply a model name string), and the detailed contents of this package model is defined as a separated section using top-level keyword, [Define Package Model] inside the same IBIS file (usually toward the end of the file, after all [Component] descriptions):

What’s in side the defined package model? Since it’s a “distributed” model and belongs to a component, terminals mapped to a component’s pins are there and so are frequency dependent R/L/G/C matrices etc. The dimension of the matrices is equivalent to the number of pins mapped. With that said, a S-parameter format is not supported here. To use a S-parameter as a package model, one must use either of the remaining two methods to be described below.

As mentioned earlier, an IBIS file or an IBIS model which has package model info. is just serving as a place holder. Package model’s effects or behaviors are not included in an IBIS model’s IV/VT/IT table data. Simulator has the responsibility to combine these two together… either when computing switching coefficients for lumped model version or “stamping” extra elements into nodal matrices for a distributed one. This also means that different EDA tool may have different methods, syntax or GUIs to introduce package model’s data:

 

Use ICM as a separated file for package model outside an IBIS file:

The ICM (IBIS interconnect modeling) spec. is a separated “top-level” spec. document similar to an IBIS spec. So to learn more about it, one has to read the through ICM spec. document

While it’s more or less similar to the [Defined Package Model] mentioned earlier in terms of R/L/G/C frequency dependent matrices etc, it has one unique strength in particular that it supports S-parameter:

So if your extraction tool can only produce S-paramter format, then ICM is the way to go. The table below lists the comparison of different interconnect model format: (PKG is the defined package model)

Use general circuit description for package model outside an IBIS file:

Last but not least… one may also introduce package model contents using a non IBIS/ICM specific format. For example, a model maker may provide a spice .subckt representing the package model (converted from frequency data using tool like broadband Spice or IDEM work etc, can have further hierarchy) or a simple S-parameter. The connectivity should be documented in the simulation guide or usage manual of the IBIS model. It’s the end user’s responsibility to put this as one of the stages in simulation topology. For example, Intel’s platform design guide use such method for an ICH package model:

There are pros and cons for this approach: a model user may have more work to have package effects introduced. On the other hand, this method has best compatibility… it does not require supports of a specific IBIS/ICM version in the simulator used, and it also allows probing or sweeping internals inside the package model for debugging or optimization purpose.

IBIS-AMI: Study of DDR Asymmetric Rt/Ft in Existing IBIS-AMI Flow

[This blog post is written in preparation for the presentation of the same title to be given at the 2019 DesignCon IBIS Summit. Presentation slides and audio recording are linked at the bottom of this post.]

This paper is written by both Wei-hsing Huang (principle consultant at SPISim USA) and Wei-kai Shih, who is Tokyo based.

Motivation:

Here in US, one of IBIS committee’s working groups, IBIS-ATM (advanced technology modeling) has regular meeting on Tue. I try to call-in whenever possible to gain insights on upcoming modeling trends. During mid 2018, DDR5 related topics were brought up: Existing AMI reference flow described in the spec. focuses on differential or SERDES. For example, the stimulus waveform is from -0.5 to 0.5 and/or a single impulse response is used for analysis, thus assuming symmetric rise time (Rt) and fall time (Ft) mostly. Whether this reference flow can be applied to DDR, which may have asymmetric Rt/Ft and single-ended like DQ, is the center of discussion. Different EDA companies in this work group have different opinions. Some think the flow can be used directly with minimal change while others think the flow has fundamental shortcomings for DDR. Thing about IBIS spec. change is that whoever think the current version has deficiencies needs to write a “buffer issue resolution document (BIRD)”. Doing so will inevitably disclose some of the trade secrets or expose shortcoming of the the tool. As a result, while there are companies which think change may be needed, no flow change have been proposed at this point. As a model maker, I wonder then how existing flow can be applied to DDR without major change? Thus this study is to demonstrate “one” possible implementation. Existing EDA companies may have more sophisticated algorithms/implementations to support this asymmetric condition, but the existence of “one” such possible flow may convince model makers that it’s time to think about how DDR AMI may be implemented rather than waiting for the unlikely spec. change.

AMI_Init:

There are both “statistical” and “bit-by-bit” flows in channel analysis. In either case, the first step an EDA tool will do before calling AMI model is “channel calibration”. According to the spec. the impulse response of the channel, which includes analog buffer, is obtained here. For a SERDES design which has no asymmetric Rt/Ft issue, this impulse is then sent to TX AMI followed by RX AMI, resulting impulse response is then calculated using probability density function (PDF), integrated to be cumulative density function (CDF), then obtain bathtub plots etc.

The textbook definition of an impulse response is from a “delta response” input which happens at the infinite small time step. In real situation, there is no such thing as an “infinite small time step”. The minimal step used by a simulator is a “time step” which is usually 1ps or more. Buffer will not toggle from low to high back to low in a single time step. So in reality, simulator often uses step response then take derivative to get impulse response. Now the problem comes: for an analog channel with asymmetric Rt/Ft, these two step response (ignoring the sign) are different. That means we will have two different impulse response, then which one should be send to AMI models? A note here up front is that it’s EDA tool which sets up the calibration, so it has any nodal information, such as pad of Tx and Rx analog buffer, if needed.

Asymmetric Rt/Ft:

One may think that there is no such limitation that an AMI model can only be called once. So theoretically, a simulator can run analysis flow twice… impulse calculated from rising step response is used for the first time and the one from falling step response is used for the second time. However, not only is this not efficient, a model may not be implemented properly such that calling AMI_Init again right after AMI_Close may cause crash if it’s in the same process and model pointer was not released completely. Thus doing so may hamper a simulator’s robustness.

As depicted in the picture above… if a simulator uses a long UI pulse to calibrate the channel, then both rising and falling step response are included in one simulation. Now let the data captured at Tx analog pad as X1 and X2 for rising and falling portion respectively, the data captured at Rx analog pad Y1 and Y2 will be X1 and X2 convolved with interconnect’s transfer function, which is LTI. If we derive a Xform(t) which is transfer function between X1 and X2, then that Xform(t) should also be able to transform between Y1 and Y2.  That means if a simulator can calculate Xform(t) it self, then regardless the impulse response it sent to AMI models is calculated from rising or falling step response, it can always “reconstruct” the result from the other type of impulse response using this Xform(t) function.

To prove this concept, we have written a simple matlab script taking step inputs of different slew rate, say inp1 and inp2. It calculates the Xform(t) function from both inputs and then reconstruct the response out2′ from out1. When overlaying nominal output out2 and reconstructed out2′ together, we can see that they match very well, thus prove the concept.

Once we have response from both different slew rates, we can construct their respective eyes then use each one’s different portion to construct a synthesized eye. Such eye will not be symmetric like that calculated from SERDES.

When calculating PDF for asymmetric case, one may also need to consider the precedent bit’s value and use a tree like structure to keep track of possible bit sequence. For example, for a typical SERDES bit sequence, if encoding is not considered, each bit will have 50% one and 50% zero. PDF is constructed based on that assumption. But in an asymmetric case, if the data used at the cursor is from rising response, then the cursor bit must be 1 while (cursor – 1) must be zero. If (cursor – 2) is 1 again, then the tail of falling response at (cursor – 1) will be superimposed to the cursor data. That is, we can’t treat each bit to have same 50% probability when constructing PDF. It’s not a binomial distribution as each occurrence is not independent. A simulator may need to determine the maximum bit length to keep track of first, then based on that depth to form tree-like sequence which leads to the rising or falling steps at the cursor location. Finally use superimpose to construct the overall response.

AMI_GetWave:

According to the reference flow for the bit-by-bit case: equalized Tx output from digital bit sequence is converted with channel’s impulse response. The resulting waveform is then sent to Rx EQ before getting final results. Either Tx EQ or Rx EQ or both may not be LTI so usage of aforementioned Xform(t) is not applicable.

As a fruit of thought… the spec. only mentions that in a bit-by-bit mode, the output of Tx AMI model is equalized digital sequence, while input to the Rx EQ must be the channel response from that sequence, then are there other ways to get such response to Rx yet with different Rt/Ft considered?

One example is like shown in top half of the picture above. If a simulator takes that equalized digital input and “simulate” to get final response, then this “simulation process” should have taken different Rt/Ft into account and has valid results. However, this process will be slow and I don’t think any simulator is doing it this way. Furthermore, the spec. specifically say it needs to “convolve” with impulse response. First of all, this impulse can be from rising or falling. Secondly, even we decide to decovolve with input first (thus has sequences of different delta response) then convolve with pulse response (i.e. one simulated UI), will there be any issue?

From the plot above…  we can see that when a pulse has different rising and falling slew rate, using superimpose to construct 011… will find “glitches” at the trailing high state portion. The severity of this “glitches” depends on how much difference the Rt/Ft is. So using a pulse response here will still not work.

A simple matlab script has also been written to demonstrate occurrence of such “glitches”. This proves that not only using an impulse response to convolve with Tx EQ’s output is problematic, even using a full simulated pulse (which has asymmetric Rt/Ft’s effect) to convolve delta sequences (this delta sequence is original TX EQ’s output deconvolve with one digital bit) will still be problematic. Glitches will happen for consecutive ones or zeros due to the mismatches of Rt and Ft. Thus one must use rise step and fall step response instead when doing such kind of convolution.

Summary:

In this presentation, we discussed how existing AMI flow may be applied to asymmetric Rt/Ft such as those often seen in DDR case. A “smarter” EDA tool should be able to handle this situation without changing on spec.’s reference flow. When a channel analysis is performed in a “statistical” flow, an EDA tool can obtain waveform data at both Tx and Rx analog buffer’s pads during calibration process. Such data can be used to construct a transform function, XForm(t). With this function, impulse response through EQ can be reconstructed and thus built an asymmetric eye. Tree structure may be needed to keep track of possible bit combinations. In a “bit-by-bit” flow, the current spec. may be too specific as it forces to use convolution of TX EQ’s output with channel’s impulse response before sending to RX EQ. Such direct convolution may be problematic. A “smarter” simulator may calculate it using different method without changing data output from TX EQ and input to the RX EQ. Step response should be used as different Rt/Ft will cause “glitches” when consecutive ones/zeros are present if convolution method is used.

Links:

Presentation: [HERE] (http://www.spisim.com/support/paperetc/20180202_DesignConSummit_SPISim.pdf)

Audio recording (English): [HERE]

A channel analysis trilogy

Preface:

We SPISim recently developed and released free web apps for various channel analysis tasks [Click HERE for overview]. While their product page each gives good descriptions and demo about what each tool can do individually, we think it will also be beneficial to put them all together in one flow so that user may have better picture about our ideas behind these developments. Thus this post is written for dual purposes: First we would like to explain how a channel analysis is usually performed. Secondly we want to show how one can perform such process using apps even directly from the web browser. Just for comparison, creating AMI models and license for this type of simulator usually costs tens of thousands of dollars up front. Now it can be done for free!

The big picture:

[Fig. 1, A SERDES system]

Let’s use a SERDES system, shown in Fig. 1 above, as an example. For other interface’s such as DDR, please see our previous blog post for considerations about similar application.

A SERDES system uses a point-to-point topology. In Fig. 1, the middle block enclosed in blue represents the channel. It mainly contains passive interconnect such as package, transmission line, vias, connectors or even cable. However, the channel may also include active components such as Tx and Rx devices. These active components are usually represented by IBIS or spice models. Alternatively, we can pull these active components out from the channel and “merge” them into the EQ as “analog front end” stages. In that case, the channel become pure passive. One of the reasons why one may want to do it this way may be either that the IBIS models are not be ready, or there are no simulators available to have them included as part of the simulation. This is usually the case when a free spice simulator is used as none of them that I am aware of supports IBIS out of the box. In general, unless analog front end AMI models and a pure passive channel, represented as a S-parameter, are used together for the analysis, the active devices do need to be part of the channel characterization in order to obtain accurate time domain response.

The next step is to obtain or generate AMI models for each Tx and Rx EQ circuits. Interested reader may refer to several posts we have written about possible software architecture and methodologies.

Regarding the channel analysis process, a simulator will first convert the characterized channel in s-param, or step response into impulse response first, then depending on the simulation mode being specified, this response or convoluted bit sequence will be fed into Tx and Rx respectively to obtain the overall response. For LTI system, a permutation process will be performed for all possible bit combinations to calculate the PDF of each sampling point statistically, then integrate to obtain the CDF and thus create a bath tub plot. For  a NLTV system, bit-by-bit waveform is accumulated and overlapped to plot the eye and BER may be extrapolated from there. While not mentioned here and also not yet implemented, various jitter and noise components are also important when creating the stimulus or calculating final results.

Following the analysis procedure detailed on Section 10.2 of IBIS spec, to summarize them briefly, a channel analysis includes three tasks: channel characterization, EQ model creation and putting them all together for channel simulation.

Task 1. Channel Characterization:

The first step of channel analysis is to characterize the channel, the blue block of Fig. 1, in order to obtain is time domain response. Even with active Tx/Rx front end being involved, the assumption here is that the characterized response will be linear time invariant (LTI). With a LTI input, a channel simulator can either perform statistical analysis if other EQ components are also LTI, or it can have the impulse convolved with PRBS bit sequence to generate full time domain waveform, finally procees to process with NLTV EQ components to get the final time domain waveform for further eye analysis.

If the passive channel is from post-layout, then user will need to use other 3D extraction tool to obtain the interconnect’s s-parameter. Devil’s details here will then include making sure the S-parameter is in good qualities such as being passive, causal, symmetric and asympotic etc. Also depending on the simulator, converting the single ended s-parameter to mixed-mode/differential one may also be needed. On the other hand, for pre-layout case, user will then first obtain or generate each component’s simulation model. Assuming what user have here are Tx and Rx IBIS model, transmission line and other R/L/C based behavioral models for package, via and connectors, then first one can use our [SPISim_IBIS web app] to convert those IBIS model into free simulator compatible spice subcircuits:

Then head over to LTSpice, offered by Analog device, to download and install the free simulator.

Either create schematic or a text netlist of the channel them perform a transient simulation to get its step response. The output will be a raw file in compressed format which can be viewed either in place in LTSpice or use SPISim’s free SPILite:


Task 2. Tx/Rx EQ Modeling:

If no EQ circuit is involved, then user can simply tinker the circuit/simulation settings completed in task 1 to perform conventional time domain based SI analysis. The reason stat-eye like channel analysis comes to play is because EQ circuits (represented in green block for Tx EQ and orange block for Rx EQ in Fig. 1) are involved to open the eyes and many more bits need to be “simulated” to obtain/extrapolate low BER, thus can’t be done easily using nodal based spice simulator as it will simply take too much time. EQ circuits can come in different models such as behavioral, spice or AMI, with AMI being the common denominator supported by almost all channel analysis tools from different vendors. Thus in the second task, we need to generate IBIS-AMI models for Tx and Rx EQ.

IBIS-AMI modeling usually involves C/C++ coding or compilation into .dll/.so if starting from scratch. However, user may use [SPISim_AMI web app] to generate AMI models instantly without going through those steps:

Test driving the model and view its response in place to verify the model’s parameters meet the performance needs:

Then click “Generate” button and the AMI model will be generated instantly. If cross platform models are desired, use [SPILite] instead and all Windows 64, Windows 32 and Linux 64 models will be generated in one shot.

At the beginning of this section, we mentioned that the EQ model can also be in the form of spice circuit, whether encrypted or not. In that case, its detailed behaviors will not be able to be described exactly by template based  or pre-defined models. However, a spice wrapper AMI model supported by SPILite can still be used, it will make this spice model “AMI compatible” and can be used in other channel simulators. User’s licensed/installed simulator will be called during the channel simulator.


Task 3. Channel analysis:

With both Tx/Rx EQ models plus channel response being ready, we can then perform the StatEye based channel analysis using [SPISim_Link web app]:

Detailed usage about this tool is demonstrated in a video on its product page. Basically, user can specify the generated AMI models in the “Tx EQ” and “Rx EQ” tab respectively, then do the same for the step response waveform to the “channel tab”. Both “statistical” mode and “Bit-by-bit” modes are supported here, yet if NLTV EQ such as DFE is used in part of the receiver, then “statistical” model can not be performed. With these set-up ready, a “Simulate” click will show the results in place within seconds:

The bath tub curve representing the CDF is also ready for inspection/eye measurement:

Alternatively, task 1 and 2 are also directly supported in the SPISim_LINK function alone so user may choose to experiment with different settings and see their response first before generating corresponding AMI model. For example, a simple change of post-tap value in Tx can be done in the UI:

Then the re-simulation will quickly show its effect in resulting eye:

For a system developer, he/she may obtain corresponding AMI models from their IC vendors and follow the same process to give these model a try. For an IC vendor, the AMI model generated here will also be compatible with other vendor’s tool and you may provide these models to your system clients before committing to perpetual version of the model.

Here you have it…. an economic yet efficient channel analysis which can be done directly through the web has been enabled for your design needs without any cost!

IBIS-AMI: An economical yet efficient modeling flow

Preface:

It has been often believed that IBIS-AMI modeling imposes a comparably high cost and technical barriers to get started. An AMI modeling engineer certainly has the due diligence to be familiar and understand the basic of link analysis or AMI flow. However, the requirements of implementing the models in C/C++ codes, compiling them into .dll/.so libraries, and being able to run on third party (often expensive) EDA tools with consistent results are often too much to ask for… or at least will increase the design cycle. To meet these challenges, several big EDA companies have provided top-down based flow for AMI generation directly from architecture codes. In exchange with the “click-button” convenience, SERDES needs to be designed in its environment first and tool’s up-front expense also needs to be considered. Further more, the long term cost of the generated AMI models (in terms of model support, maintenance and extensibility) are often ignored. It’s also true that even with these top-down flows, compilation on different platforms (win32, win64, linux) etc are still inevitable.

We published several free tools recently and presented a paper at the recent IBIS summit regarding AMI modeling. Together with other open source/free link analysis tool to be mentioned at the end of this post, we think it’s now a good time to consider an alternative AMI modeling flow. This proposed methodology in this post will be economic (no front end tools cost) and efficient (give SERDES/AMI developer most control).

[Note that in this post, we use link tool/simulator alternatively to represent the application loading IBIS-AMI models and calling their functions.]

Concepts:

Engineers familiar with Spice-like simulator know that we usually only need one platform specific simulator (binary). Schematics netlist is in plain text format and can be used cross different platforms (assuming good practice such as relative modeling path, file line endings are followed). A general purpose spice simulator is not design specific so there is no need to have different simulators for different circuit or IC design. This compatibility is achieved by building on simple rules, i,e KCL and KVL and linear algebra. This way a simulator can decoupled from the implementation details of design specifics.

In the “AMI-modeling” world, are there such simple rules to be found so that we don’t need to compile a binary just because the design is different? Understood that there are always trade-offs: e.g. simulation speed of design specific binary model vs slower yet convenience during modeling cycle, then at least we should find a way so that modeling engineers can focus more on the basic of the algorithmic blocks and hopefully, still find a way in a later stage to generate a speedy model with minimum extra tasks (C/C++,dll etc)

AMI model, at its current scopes, is mainly for SERDES application. They are mostly point-to-point system or can be processed stage by stage. We may also think it this way by looking at the defined AMI-API function prototypes:

AMI_Init and AMI_GetWave are two main processing routines defined in the API spec. Various arguments are passed-in and the AMI models are responsible to perform designed algorithmic within the model then return these values in place. By “in place”, we mean the impulse_matrix’s and wave’s content will be modified in the same memory address before returning back to the calling application… which is mostly simulation platform or circuit simulator.

Based on these observations, we can liberate the couplings between 1) link tool and models, and 2) models and underline algorithmic processing. Via these decoupling, an economic yet efficient flow can be made possible.

Decoupling within the model:

First, we want to observe how data are being passed from the link tool to the model. In the schematic above, the input to the Tx stage is either channel’s response (LTI mode) or bit pattern (NLTV). If the Tx is a simple pass through, then Rx will receive similar information without being affected by Tx:

Take AMI_Init as an example, we can achieve this decoupling using SPISim’s free SPISimProxy model. SPISimProxy will write the argument received from the link tool to a plain text file. After the AMI model’s processing, it will again write out the processed data to another text file before finally giving back to the link tool. This way data being exchanged between link tool and the model are now exposed… even if they both are compiled binaries. The two blue boxes above represent such generated text data. The main function of the AMI_Int function, purple block in the middle and existed in the model’s .dll/.so file as a form of the C function, is to transform input to output response.

With this, we can now replace the AMI_Init function with our own… we can write this function in matlab, python, perl, java etc language instead of more demanding C/C++ form. It only needs to interface with two text files just exposed with the following operation sequence:

  1. SPISimProxy will expose the calling arguments into a text file, the top blue box
  2. User’s script will read the text file and perform necessary processing
  3. User’s script will write out the data in similar text format
  4. SPISimProxy will read the generated text data, form argument and pass back to simulator.

Because each of these steps can be customized by Proxy’s .ami settings, a configuration file or even environment variables, AMI developers are now liberated to use what ever languages they preferred without dealing any C/C++ if they like. There is also no need for .dll/.so compilation as SPISimProxy has been pre-compiled to support most of the platforms.

The example above uses AMI_Init as an example, other API calls, such as AMI_GetWave or AMI_Resolve can be done in similar fashion. Clean-up calls such as AMI_Close or AMI_Resolve_Close are also supported in SPISimProxy model so if needed, AMI modeler can also clean-up all these file traces at the end.

Part of the arguments passed from the simulator is the model pointer. This model pointer (void*) is supposed to be persistent during the AMI process. The script author may use file based persistence mechanism across AMI_Init/AMI_GetWave calls to store constructed data structure or settings etc. By avoiding using C/C++ specific pointers or data structures, the process becomes neutral and can support many different languages.

Decoupling with the EDA tool:

The aforementioned process requires an application to drive the proxy model, and thus the modeling scripts. This can be achieved with our free SPISimAMI.exe. It’s again pre-compiled to support most platforms. Its built-in pulse response enable user to model LTI based AMI process directly. For bit-by-bit based input data, user may use our SPILite or other free simulator such as our SSolver or NGSpice to generate bit sequence, then feed this input in .csv format to the application. SPISimAMI will then take this user’s input, form the proper arguments and send to underlying AMI models… it can be modeling script current under development, or an existing IBIS-AMI models for testing or validation purpose. SPISImAMI can also be used to drive an existing AMI models via SPISimProxy so that user may get insights about this process. In the demo video we posted on SPISimProxy page, a matlab script has been used to demonstrate the AMI_Init process.

Test drive with open-source link tool:

Once the given response work properly with the prototype implemented in modeling scripts, next step is to run them in a link tool for BER like analysis. For this purpose, pyBert may be used as it also load IBIS-AMI models, including our SPISimProxy.

Up to this point, there is no front-end cost involved in the AMI modeling process and a developer only needs to use his/her own favorite language to deal with plain-text input/output. In addition, the debugging and testing of the model prototype can be done with direct command call instead of multi-step GUI operation/invocation. This not only avoid license needed with 3rd party tool, but also ensures an efficient work flow with easily repeated consistent results.

Optimization and model release:

There are several possibilities to release the models from on this process:

  • The model publisher may encrypt the script if needed, then distribute as they are. This will produce most accurate results as they have been validated by the author during the modeling process. The disadvantages include that: 1) it’s less efficient as the data exchanged between the SPISimProxy and the modeling scripts are file based, 2) the model recipient may need to install other run time interpreters such as perl, python etc in order to run the encrypt/compiled script, and 3) the client also needs to download the SPISimProxy from our site as unlicensed redistribution is strickly prohibited.
  • We can work with the model publisher to provide specific API to the prototype model such that the IP and accuracy are still maintained, yet the performance will be improved dramatically. We can also remove the unlicensed terms and add the proxy class to have your company’s name so that you can distribute SPISimProxy together with your model.
  • We can also create the corresponding AMI model with pure C/C++ codes to that there is only one model to be released with best performance and convenience for the clients.

The modeling flow suggested above is not proprietary and can also be implemented within the corporation or the modeling team. We believe that with the liberation of the modeling engineers from these unrelated AMI modeling process, they will be able to focus more on the core business logic, i.e. the algorithmic part and deliver the best quality model for the industry’s progress. Those non directly related tasks can be left for other EDA professionals (* cough *) if needed.

Differential modeling flow: Development

Flow considerations:

Two major considerations when developing a modeling flow are consistency and flexibility. This is particular true when it comes to differential buffer modeling. As discussed in our previous post, a half/true differential buffer goes through most of the same steps as single ended buffer yet certain process must be elaborated (e.g. modeling of the 2d surface sweep) and order needs to be preserved (i.e. extract C_Diff, I_Diff before performing VT simulation). In this post, we will briefly discuss how these considerations are incorporated into design concepts and realized in our SPIBPro modeling flow.

Design setup:

Conceptually, user does not need to know whether a buffer is true/half/pseudo differential before modeling. The IBIS cookbook V4 also states this and also use encrypted hspice as an example… as it’s black box and can’t be deciphered, a model developer needs to assume coupling exists between P and N (thus true differential). The computed differential current from both 2D sweeps of IV and C_Diff will reveal whether such (true differential) assumption is valid. The decision can then be made based on the value of current.. if it’s in uA or nA range, then coupling is insignificant and can be modeled as pseudo differential.

In reality, we may argue that one should start with pseudo-differential approach and switch to half/full differential only if the final validation suggest so. This is because modeling engineer usually can get insights about buffer to be modeled by talking to the circuit designer (usually in the same company, also no needs to reveal or know much of the design details during such inquiry). In addition, the DC sweep of extra dimension and modelings of the half/true differential flow are often overkill for many of the designs.

In terms of modeling setup, only input and output (for N pin) are needed in addition to common single-ended buffer. A model type selection of “differential” is sufficient to indicate the differential modeling flow rather than linear single-ended modeling.

DiffDrvCfg

Modeling flow overview:

The modeling flow is summarized in the picture below: the key change here is that flow is not linear anymore. The simulation and post-processing stages need to be gone through twice. When simulating for the first time, only IV/C_Die sweep are needed. The data is then post-process for the first time to extract the common-mode and differential-mode current. The latter will also be modeled as a separated component in this stage. This differential component is then inserted between P and N pins in the transient netlist and simulated in the second iteration of the simulation step. However, only VT data is needed this time. When that’s done, all IV, VT and C-Die data are now available and the flow can continue linearly just like that of single-ended buffer.DiffMdlFlow

Modeling of the IV data:

While PU/PD is steady state DC sweep in theory, their simulation is performed in “pseudo DC” most of the time due to likely existence of clock signal, thus not possible for true DC sweep. In “pseudo-DC” simulation, voltage are swept very slowly. In addition, sweeping along X coordinate of different Y values are mutually independent. So they can be simulated in parallel (multi threaded or can be distributed). Simulation under certain biasing condition may have convergence issue, so a flow must be tolerant about missing sweep data on some grid points due to non-convergence.

In our SPIBPro example below, sweep of different Y bias points (e.g. voltage at output N) are separated in different .sp files and they can be simulated in parallel.IVSweep

The task of post-processing step in the first iteration is to extract the simulation data just swept, calculate common-mode current, shift the surface data vertically, summarized as a table output and also perform initial modeling. Such a table is important for user to validate the the result and also perform some what-if modeling using tool like Excel if neded.

DiffMdlCsv

  • Surface modeling:

    Depending on the symmetric differential mode data of shifted surface table (data on the surface alone the blue line), one may decide whether it’s pure resistive, linear or non-linear. The symmetric differential mode is orthogonal to the zeroed common mode curve (red straight line) and it represents the most likely differential operation (i.e. symmetric output of P and N)  For example, the surface plot below suggest a resistor will suffice for the content of the series element:

DiffMdlSerR

With the csv table, one can quickly confirm whether the resistance is same across different voltage or not. If positive, then a linear resistor can be used. Otherwise, the non-linear resistance need to be modeled either with PWL spice element or as a series current (another IV table) in the series element. To be able to visualize the data with certain degrees of manipulation, a flow needs to have such 3D plotting capabilities. Otherwise, tool like matlab and familiarity of its syntax becomes necessary.

DiffMdlSurf

If the sweeping data shows surface like above, then either a surface modeling is needed or a “series MOSFET” needs to be created  as a series element. For surface modeling, one can use “response surface modeling” like method to calculate fitting coefficients in minimizing the mean squared error sense. Other general modeling approach like neural network is certainly also possible.

DiffMdlRSM

One should also check the residue of the prediction formula and maybe visualized as a scattered plot. SPIMPro’s general modeling and plotting functions are demonstrated below:

DiffMdlRSM2

DiffMdlRSM3

A good fit should have very small residue, shown as grey line across 0.0 in the picture above (other red dots are nominal values from sweeping grids). With valid results, one will need to translate this “prediction formula” into spice netlist using E/F/G/H control elements.

In matlab, similar process can be done using “lsfit” function.

  • Series element:DiffMdlSerThe same table content is sufficient to construct a series element. The steps needed here is to translate data into a IBIS compatible format. Such process is trivial when the model is simply R/L/C. For series current and series MOSFET (can have up to 100 tables of different bias condition), the attention needed to perform such work manually is not economical and a tool/flow should be used instead.

Modeling of the C-Diff:

CDiff

Similar process (tabulated and modeling) can be applied to calculated C_Diff. However, it’s much more limited in terms of series element’s syntax and E/F/G/H equation such that describing such surface (both frequency and voltage dependent) for C_Diff is not practical using either spice or series element. As a result, a trade-off needs to be made when picking values (or to average) from the surface and a single value is used when constructing model for C_Diff.

Verilog-A modeling:

In our submission to the IBIS summit later this year, we proposed another flow which is Verilog/VHDL based. One of the advantages of this implementation is that the raw, table-like data can be used directly using built-in $table_model function. Its usage also enables polarity differentiation and elaborated description of C_Diff, which is very crucial to the transient data accuracy. The details about this will be published here once our paper is accepted.

Combined model:

In previous post, we mentioned that in addition to the “series element” for differential model, pure [External Model] of the differential model type may also be used. Using the proposed Verilog-A/VHDL for series element only model, these two methods can then work together, as shown below:

CombSer

A series model is still declared in the “[Series PIn Mapping]” section to be connected between buffer’s N and P outputs. Its definition, can optionally include an external model such as the one implemented with behavioral language. This “[external model]”  works on top of existing series model definitions and can provide extra info. such as frequency/voltage dependent C_Diff if the simulator supports. Optionally, the “top” series model can be simply a  shell (e.g. with high impedance) and all the info. regarding the differential current is encapsulated inside the added model. When comparing to external model attached under output/IO buffer, this external model can be significantly less complicated and also provide more tuning capabilities.

Differential flow validation:

To validate a differential modeling flow,  one may construct a differential buffer by creating an artificial coupling (e.g. using some R/C elements) and then connect it between two known IBIS buffers. During the validation, the calculated common-mode data from DC sweep should reconstruct exactly the same PU/PD/PC/GC tables as those in the known IBIS buffer. Differential current will reveal the resitive element of the coupling portion. Calculated C_Diff should reveal the coupling capacitive element. Both of these steady state differential current and C_Diff can be validated using generated csv table or raw data. User should then find that with both captured accurately, the transient VT data calculated will also correlate to those of original IBIS buffer very well.

Paper and audio recording:

We presented this study at the 2016 Asian IBIS Summits. Readers may download the presentation from IBIS website or [HERE]. Audio recording at Tokyo is also available [HERE]

Optimization for SI & PI: Systematic approach

In previous post, we talked about exploring solution space linearly using what-if analysis. When more comprehensive or a near global search for best/worst performance is desired, a systematic approach must be used.

Response surface modeling (RSM):

System output responses Y1, Y2 shown below may have both controllable and uncontrollable input variables X and Z. In system analysis, the output is obtain via circuit simulation mostly and is thus deterministic. As a result uncontrollable factors maybe lumped into constant term and the mapping between controllable factor, X, to its output can be viewed as a multi-dimensional surface like shown in the right below. Search of optimized combination is like searching for maximum or minimum on the curvature.

doersm

This type of “mapping” from x to y is called “response surface modeling (RSM). It takes many sampling points to construct such response surface. A design of experiments (DOE) method is often used in this RSM approach.

Design of experiment (DOE):

When more than several variables are involved and each of them has a range of possible values, using full grid (full combination)  to do exhaustive search for best combination is really not feasible.

If a performance measurement, Y, is represented as a function f(x) of design variables x1, x2 ~ xn, then we can use a Taylor series to approximate f(x)

The higher order (bigger value of alpha above) to be included as part of the series, the more accurate it will resemble the original function f(x). It’s a little like decomposing time domain square wave in frequency domain using FFT. In system analysis, just like in many phenomenons in real world, f(x) is dominated by lower order terms. Take two variable and two order maximum as an example, the equation above can be further simplified as the following quadratic form:

quadraeq

Different value of input variables (X1, X2.. etc) will have different output performance Y. When more than several sampling points are taken, then the equations can be written as an matrix form, each row represents a sampling run:

quadramtx

When further generalize to all the variables, a linear system is formed:

DOERSMMtx

With this, one can use linear algebra technique such as pseudo inverse and/or singular value decomposition to solve for coefficients beta such that the error is minimized (optimized) in the mean squared error (MSE) sense.

modelfit

Using this DOE/RSM methodology, several decisions needs to be made in advacne:

  • Selection of input variable X and order: only dominate variables should be used to minimize the number of columns;
  • Selection of output target Y: output target obtained by mathematical operation (post-processing) may loose the lower order relation to original variable x;
  • Choice of sampling algorithms and number of samples: each row of the matrix corresponds to a “simulation” run, the samples must cover enough design space to make solution meaningful while minimize impact due to the noise of modeling.

DOE analysis flow:

A systematic optimization flow based on DOE/RSM thus includes the following steps:

  • Define variables: only dominate variable should be used in the analysis. A trivial variable will increase the matrix size and have very small coefficient (beta). The selection of dominated variable may be identified from experience, previous analysis run, linear sweep or what-if analysis. The DOE flow may also be performed several times, with non-significant variables being removed at the end of each iteration.
  • Create sampling points: There are several sampling algorithms when choosing sampling points. The choice should be made based on design’s coverage, optimality and efficiency. For SI/PI, when number of variables is around 10, central composite design is a good choice as it is full quadratic with only about 1000 design to run. D-Optimal is a good choice when number of variables is bigger (up to 30). When using neural network for final modeling, full quadratic is not needed and a space filling type design is a good choice. All these designs are available in statistic software package and subset of them have been implemented in our MPro module.

Design

  • Create corresponding test cases: Regardless of the design, a variable’s range needs to be decided. Depending on whether a variable is categorical (non-continuous) or numerical (continuous), possible step value may be decided. A generic representation usually use -1, 0, and 1 in the design table to represent minimum, typical and maximum variable values. Then next step is to translate such settings into corresponding design. For netlist type circuit representation, pattern replacement is sufficient. For geometric synthesis which require further mathematical manipulation using these design variables, a more flexible mapping mechanism should be provided. At the end of the stage, each row of the design table will be translate to a corresponding circuit design in order to be simulated.

Collect

  • Simulate and post-processing: A simulation manager is often desired in this step in order to distribute testcases to run on different CPU threads or different machines. A post-process step is executed right after simulation ends to extract performance matrices from the results. Outcome of this step is a row of output measurement for each test case run.

SimMgr

  • Map inputs to outputs:  Form a second or third order equations using defined independent variable, then solve for their coefficients using SVD solver. Residues values which is difference between original response and “predicted” one based on solved formula can then be calculated. A well fit model will have very small residue. A R^2 value, which is the portion of variation attributed to the model can be used to indicate the fit. A R^2 >= 0.95 is usually desired.

modeling

Prediction

  • Optimize: Constraints such as non-negative value and must fall within variable range needs to be imposed. Based on these restriction, solution to minimize or maximizing a cost function, which can be weighted sum of several performance targets, can be searched. Depending on the order of the prediction formula constructed, different type of optimization method can be used:
    • Linear programming: good for formula with only first order variable, which is usually the case for stackup performance based on geometric parameters;
    • Non-linear method: when formula has higher order terms, method like Nelder algorithm may be used;
    • Genetic algorithm: when model is highly non-linear or neural network based, this algorithm is best to search for optimized solution.

Optimize

  • Prune variables for next iterations: As the variables’ coefficients reveal their significance toward the output Y, some of them may be removed for next iteration analysis. A significance list may also be formed as a reference in the design process.

As one can see, a systematic flow such as DOE/RSM requires much more efforts and intermediate steps comparing to a simple linear sweep or what-if analysis. On the other hand, a well fit prediction model can also be served as a base of quick “what-if” analysis to replace time consuming simulation and be used as an initial guidance when using design variables.

A stackup what-if based on model built via DOE/RSM flow

A stackup what-if based on model built via DOE/RSM flow

 

Simulator development: Modeling (S & P)

System channel is usually represented in S-parameters. They can be extracted in frequency domain using a 3D field solver, and/or cascaded stage by stage using tool like SPISim’s SPro. With LTI (linear time invariant) assumption, it’s possible to synthesize eye or BER plot of millions of bits using statistically analysis… using single time domain pulse for these parameters. However, it’s still often desired to be able to simulate the s-parameter in time domain for defined bit patterns. Thus, a system simulator like our SSolve must be able to support such requirement. In addition, one may also want to know the frequency response when given a broadband spice converted spice elements, such as via, package or connector models. So the reversed process, i.e. extract S-parameters from spice elements, is also often required. In this post, we will briefly talk about how these may be developed in simulator like SSolver.

S-Element… S-Parameters:

There have been many conference and journal papers proposing different methods of simulating S-parameter in time domain. However, at the most basic level, S-parameter can be considered as a transfer function or filter block. Thus DSP techniques can apply: transfer function multiplying inputs in frequency domain can be converted into time domain using convolution:

Convolusion

The time domain at the right hand side can be further separated into two parts: history up to this time point (integrate from -infinity to t=n-1) and the value at this particular moment t=n due to input. The first part is a constant as it’s already happened in the past and can’t be changed, the second part is, however, input dependent and must be updated within the “solve” and “stamp” hot loop inside newton iteration. When putting together, they form a Norton circuit form of I = Y * V + J where Y is value affected at this moment and J, a constant, is due to past history. This Norton form can then be “stamped” accordingly for matrix solving.

Interested reader may refer to the paper published by HP linked below for detailed explanations and math:

Integration of Transient S-Parameter Simulation into HSpice

The equation [18] and [19] is the aforementioned Norton equivalent circuit form and can be used accordingly.

The convolution method needs to update history with the solved results of this time step for next time step to be used. In addition, the basic convolution requires that the dt to be constant, thus a variable time step simulation will be greatly hampered by this requirement in terms of performance. So the convolution modeling has rooms for improvement.

One of the possible approach is using vector fitting technique mentioned in previous post about “W-element” modeling. With the S-parameter data in frequency domain, one may construct a Pade approximation using several poles and zeros. Then basis functions can be created for each of the pole in time domain and simulate accordingly. A benefit of this process is that the constructed form is a rational function which is guaranteed to be causal. So if there are issue regarding causality of the provided s-parameter, it can be fixed during the modeling process. Lastly, due to exact fitting of multi-port s-parameter across the frequency spectrum are not likely, some sort of error minimization (in the MSE sense) is needed to have a balance between accuracy and number of poles.

 

P-Element… Port element:

Often times after a package, connector or via modeling engineer created a model, he or she will use tool like broadband spice to convert such 3D extracted s-parameters to spice equivalent circuit composed of various basic elements. When a system designer or SI engineer receive such converted models, the original S-parameter may not be available already. Rather than insert this model into the channel and simulate blindly, it’s often beneficial to be able to reconstruct and inspect the model’s frequency domain response first before actually using them. S-Parameter extraction via simulation is basically a form of AC simulation, thus with AC model of the system elements constructed, the S-Parameter extraction part becomes easy.

The context here is small signal S-parameter extraction, thus all the AC signal is done very close to the operating point. That is, a DC solution needs to be obtained first for each port’s respecitve bias condition, then the AC stimulus is applied and solve for each frequency point.

A “Port” or “P-element” has several properties: dc bias condition, reference impedance, port-name and port-order. For a multi-port s-parameter extraction, one port is excited at a time with specified dc bias value. AC sweep is then performed while the other ports are terminated to their reference impedance. The power wave of input and output, measured and processed using current injection and nodal voltage measured, can then obtained for this input to the other output ports. Using simple math described in the link below:

S-parameter measurements

one can obtain S-parameter of Sij (i is port with input stimulus, j are the other ports) content easily this way. Repeat the same process for the other ports one at a time (with their respective dc bias condition) and the full S-parameter can be obtained. Finally, the order of ports are arranged according to the port properties, their respective port name are written out at the top of the touch stone file and the process is complete. Should there be needs to convert to other formats such as Y, Z parameter (sometimes good for checking connectivity), one can do so easily with formula (assuming generalized 2-N ports) or simply use developed tool like our SPro.

 

Back to the top:

Back from these modeling physics to the computer science domain, one also need to consider the following topics when doing simulator development:

  • Memory pool management (allocation, expansion and clean-up)
  • Multi-threading consideration
  • Plug-in architecture for future devices
  • ….

While the list can go on and on and the tasks may be daunting, the end results are definitely worthy to the analysis flow and methodologies development. With developed simulator, there is no longer absolute need to form a close-loop formula or equation in order to solve circuit equation. The module or flow running on top can simply create netlist and have this simulator solved for you. Not to mentioned this also make maintenance and testing much easier. For an EDA company like us, I would say this is a journey worth taking.

Simulator development: Modeling (B & W)

When modeling device for a circuit simulator, the raw netlist input needs to be converted into internal structure first, then a physical model is constructed during the “modeling” phase, and corresponding equivalent Norton or Thevenin circuits’ parameter are solved within each Newton iteration at each timestep. The solved parameters are finally “stamped”  into system matrix for Newton iteration solving. “Model” and “Solve” is the essential part of device modeling for a circuit simulator and that’s whey “Physics” come into play.

In these two posts, we will briefly talk about how system devices, in particular IBIS, Transmission line and S-parameter are “modeled” and “solved”.

B-Element… IBIS:

Looking at the IBIS’s structure, the modeling part is actually quite straight forward:

IBISEvolve

The four IV curve data: pull-up, pull-down, power clamp and ground clamp act like non-linear resistors. With terminal voltage known within each Newton iteration, the conductance can be look-up from these curve tables and obtained using linear interpolation.

The switching coefficients and composite currents are both time dependent. Their values are calculated in the “modeling” phase when simulation has not even started. The obtained coefficients is a multiplier which will further scale the conductance calculated for IV data and thus stamped value. These scaling are such that when test load specified in the waveform section is connected, driver at the pin will reproduce exactly the same waveform data given in the model. As to C-Comp, it can be inserted using simulator’s existing infrastructure so the integrator there will manage the stamping and error prediction.

The more complicated portion of IBIS modeling inside a simulator is due to the options available for the end user, thus model developer must plan in advance. For example, the c_comp may be split across different terminals. Each waveform, IV or components have different skews which book-keeping codes must take care of. There might be added submodel for pre-emphasis or de-emphasis so the class implementation-wise one should consider “composite” pattern such that recursive inclusion can happen. At the end, this is a relative simple device to model for simulator, particular when comparing to the transmission line.

 

W-Element… Transmission line:

Every electromagnetic text book will give transmission line structure as shown below:

RLGC

This is an uniform distributed model and is implemented early in various simulators as  the “U” element. While implements T-Line model this way is now outdated due to the performance issue, it’s still how the T-Line’s raw data, frequency dependent tabular model, are given:

TabModel

The tabular model are field solved of Maxwell’s equations based on layer stackup, trace layout, material properties and sometime special treatment (like surface roughness) and finally presented as R/L/G/C data at low (DC), high (Infinity) frequencies and many points in between. Thus to model T-Line for a simulator to use, one has to convert these data to a mathematical form first which can then be used in either time or frequency domain. For transmission line, this can starts with Telegrapher’s equations.

By solving the KCL/KVL of a unit length RLGC circuit above, one can derive and find the telegrapher’s equation:

And the solution to this equations, as explained in the wiki link above, involve a wave propagation function Gamma:

RLGCEq2

When realize this in the system model, it as the Norton equivalent circuit form:

So on each side (near end and far end) of the transmission line, there are two components: Z (admittance) at that particular time step and current source due to the propagation delay originated from the other end. These two components (Z(s) and r(s)) can be obtained from the tabular data in frequency domain and then converted to integrate-able form in time domain so that they can be “stamped”. Generally, it includes the following steps:

  • Parse and store the raw tabular model;
  • Calculate the propagation delay and characteristic impedance using highest frequency data (or using extrapolation), these value will be used for interpolation later in time domain.
  • Construct the Z(s) or Y(s) and the wave function r(s) shown in the system model. As transmission lines are usually coupled, these curves are multi-dimensional in frequency domain;
  • Using vector fitting technique to represent this frequency domain functions using series of poles and zeros. In most of the cases, particular when the model data has insufficient bandwidth or low quality, exact fit is not possible with reasonable number of poles/zeros and thus best fit in the minimum-square-error sense needs to be performed.
  • Once pole and zeros are found, they can be converted into time domain as different order of terms. All these terms combined together will form the Y(t) or r(t) in time domain. Pade’s approximation may be used here.
  • During time domain simulation, use interpolation to find r(t)’s value in the past history (propagation delay ago) and use that data to construct the equivalent model of this end at this particular time point.
  • For frequency domain analysis, vector fitting and conversion to integral form is not needed. The Y(s) and r(s) data can be used directly for stamping at this frequency with some interpolation.

For the first three steps, I wrote a simple matlab codes to demonstrate how how they are done:

Impedance function:MatlabY

Propagation function:MatlabH

Plots:PlotY

PlotH

While the matlab codes above seems straightforward,  most simulator (including SPISim’s SSolver) will program with native codes (C/C++) for performance consideration. So a whole lots of matrix operation (inverse, eigen value, LU decomposition etc) will also come into play during the process.

It’s rarely the case that the developed model or codes will work the first try. With so many terms (of converted poles and zeros) for so many dimensions (coupled cases), it’s a daunting task to figure out what has gone wrong when waveform is not as expected. It’s often simpler to back to basic: check steady state solution first, use one line, no reflection with by matching impedance at the other end, use characteristic impedance to find nominal reflection value and so on to help identifying the issue:

TLDebug

Interested reader may find more details about SPISim’s implementation, same as HSpice’s, in the following paper:

“Optimal Transient Simulation of Transmission Lines” by Dmitri Borisovich Kuznetsov, Jose E. Schutt-Aine, IEEE Trans. on Circuit and Systems, I Feb. 1996

In terms of book, I have found that Chap. 5 of Dan Oh’s “High-speed Signaling” book, S6 in our reference book section, give best explanations among others. This maybe because Mr. Oh is from UIUC around the same time when the paper was published as well 🙂 It also worth mentioning that similar technique can be applied to other passive, homogeneous device modeling, such as the system channel. For example,  one common approach of checking and fixing casual issue of a s-parameter is by using vector fitting and convert to rational function form.

Simulator development: Abstraction

In previous post regarding simulator development, we mentioned that simulator at its core is linear algebra (with or without relaxation) solving matrices formed to describe netlist’s nodal cutset (KCL) and mesh loops (KVL). We also mentioned that the “hot-loop” of circuit simulation are the “solve” and “stamp” routines, i.e., device solve modeling equations at that particular dc point or time-step then put contents into the aforementioned matrix formation. So a lot of thinking needs to go into formulating these two steps such that simulator developed is stable, maintainable and extensible.

On the p170 of the classic “Computer Methods for Circuit Analysis and Design” book, stamping for basic elements are listed:

Stamp

So the first level of abstraction, also the main work of “Solve” routine within each device, is to solve the modeling equations using terminal conditions (i.e. voltage or current) in a particular iteration, then translate into corresponding I, V, Y (admittance), G (conductance) then put into the circuit matrix for simulator to solve. In addition, because Newton method requires first derivative to progress and find next possible root, the “partial” value (first derivative of matrix) also needs to be computed by the device model and provide to the simulator accordingly. Without this, simulator will need to perform numerical derivative (auto-partial) to call “solve” and “stamp” routine multiple times in order to find the derivative dV/dI, or dI/dV etc at that particular terminal conditions. This will result slowness and instability of the circuit simulation.

Each device, regardless how non-linear it is, may be “linearized” to simple equivalent circuit under certain condition. This “certain condition” can be fixed time point, or fixed terminal condition like voltage supply. That is, within each Newton iteration (thus only belong to that iteration and that time point), one may transform non-linear device model into a simple linear equivalent one. Mostly using Norton or Thevenin theorems:

EqCkt

As to how to represent a device model into such linear circuit depends on the device’s physics. In my mind, this is where (device) physics comes into play in the simulator development.

For example, for simple elements like R, V, I, one can simply “stamp” entries by the book. Control sources can also be considered as (controlled) conductance, admittance and “stamp” accordingly. Complicated devices such as transistor, transmission line or S-parameters certainly need some mathematical derivation in advance. Katzenelson algorithm may be used in some condition to solve PWL network. Even devices like L and C also needs special consideration, such as numerical integration and prediction.

LCDev

The derivative forms for C and L above show that there is no direct solution to find L and C’s conductance or admittance in time domain. So numerical differentiation approach like Backward Euler may be used to calculate conductance value based on circuit history, i,e, result of previous time steps. If this is a fixed-time step simulator, then task will be easier. A variable time step simulator will certainly need much more consideration: how big a time step can be taken, what is the integration or resulted differentiation error etc. Even within device level, such as transmission line and s-parameter, the device it self also must keep track of past history so that reflection from the other end will happen at the right time.  From the discussion up to this point, it should be convincing that the simulator development requires multiple disciplines.

CapCode

The second level of abstraction happened at the architecture level. There are only 26 English characters but there are much more for device types… some may be even custom made like antenna or macro circuits. So while elementary devices like R, L, C, I, V, E, F, G, H etc all have their own prefixes in the netlist, a mechanism must be in place such that the simulator can support more (some future) devices. This is mostly done using dynamic link library with predefined interfaces and access. The example below is API from Berkeley spice:

PortType

By defining these port types and access functions, the simulator limits the device’s accessibility to its internal structure and even matrices, thus can be more stable. At the same time, the defined interface also allows extensible for future devices. It is thus the device designers he or she needs to map the device under modeling to the limited, predefined interfaces such that the data can be used by simulator at the top and simulate accordingly.

After all these are sorted out, then the remaining part of the simulator development is to figure out physics of the devices, construct a model, realize that model using numerical techniques, solve and extract equivalent’s values at that particular time and iteration, and pass the data back to main caller functions then wait to see whether this solution converges for this iteration. If so, then perform book keeping either for future time step reference, or predict maximum time step simulator can take based on device’s limitation (e.g. break point in PWL sources or transmission line delay).

In SPISim’s SSolver, we reference the Berkeley architecture profoundly and focus more on the device modeling and integration. Existing spice does not have support of any devices required for system analysis, such as IBIS, Lossy coupled, transmission line, S-parameters. Even S-parameter extraction is also very tedious and limited to only two ports originally. While our experience in the past enables us to grasp the simulator architecture easily and build up functionalities much quickly, it is still respectful for us when reading the relative document and source codes when the Berkeley team developed such simulator in the first place several decades ago.