Uncategorised

Details: Published: Wednesday, 26 September 2018 05:12; Hits: 4442

Liberty file format: (.lib): These are standard files for representing timing info for stdcells as gates, flops, etc. They contain all arcs for all stdcells, as well as functionality of these stdcells. That is why synthesis tools are able to map RTL to gate, by using this functionality information for all stdcells present in these liberty files. They use timing info from these files to figure out optimal gates to meet timing.

The most common liberty files in use are the ones used for higher node tech ( >22nm). These have simple look up table (LUT) delays specified for all cells. This is the conventional NLDM (non linear delay model) based. The other more accurate one is CCS (composite current source) model which is employed for tech 22nm and below to give accuracy within 2% of spice simulations. CCS will be discussed later.

syntax:

A very good resource is the official Liberty user guide and reference manual uploaded here: liberty.pdf

General syntax of a test.lib file is as follows:

1st stmt names the library. stmts that follow are library level attributes that apply to the whole lib, as tech type, defn, defaults, etc. then every cell in lib has separate cell description.

stmts are buliding blocks of lib. 4 types:
1. group stmt: {} used to enclose contents of group. Ex:
pin(A) {
related_pin: B; //pin group stmt

cap1_rise (cap_template) { index1 ("..."); values (" ..."); } // groups nay be nested recursively here
}

2. Attribute stmt: attribute_name: attribute_value; => attribute value sometimes enclosed in double quotes. Attributes explained in detail later.
pin (A){
direction : output;
function : "X+Y"; => this is used by synthesis tool, to figure out which gate to use for given RTL logic.
}

3. define stmt: to create new attribute. syntax is: define (attr_name, group_name, attr_type);
Ex: to define a new string attribute called bork, which is valid in a pin group, use
define (bork, pin, string) ;
You give the new attribute a value using the simple attribute syntax:
bork : "nimo"

4. wire load: define the estimated wire length as a function of fanout. You can also define scaling factors to derive wire resistance, capacitance, and area from a given length of wire.
wire_load("3K_2LM") { //name => implies it's for 2 metal layer and for design whose size is < 3K.
    resistance : 0; //res in ohms/unit length. Res=0 implies no resistance.
         capacitance : 1; //cap in cap_unit/unit length. Note unit is in pf, so cap=1pf/unit is
         area : 0; //area/unit length
         slope : 0.0118413; //characterizes linear fanout length behavior beyond the scope
of the longest length described by the fanout_length attributes.
         fanout_length( 1, 0.005469 ) ; //for fanout=1, estimated wire length is 0.005 units
         fanout_length( 2, 0.00943588 ) ;
         ....
         fanout_length( 19, 0.259363 ) ; //for fanout=19, estimated wire length is 0.26 units (for linear interploation, wire length for FO=19 = 0.005*19=0.1, so actual wire length is higher than linear interploation.
    }

wire_load("3K_3LM") {//name => implies it's for 3 metal layer and for design whose size is < 3K. similarly for 3K<6K, 6K<16K, so on.
         resistance : 0;
       ...
}

#wire load selection criteria is given below which selects from one of the wire load models above
wire_load_selection (2LM) {
                  wire_load_from_area (0, 3000, "3K_2LM" ); => specs that if 0 < area_of_design < 3000, choose 3K_2LM wire load model.
          wire_load_from_area (3000, 6000, "6K_2LM" ); => choose 6K_2LM for 3000 < area_of_design < 6000. 6K_2LM usually has longer length for a given FO than 3K_2LM, as bigger the design, longer the nets for a particular FO. Simlarly 6K_3LM has lower length for a given FO compared to 6K_2LM as 3 metal layers provide more routing resource, so longer wires not needed.
}
wire_load_selection (3LM) {
       ...
}

default_wire_load      : "6K_3LM"; => by default, wire_load(6K_3LM) is chosen.
default_wire_load_selection   : 3LM ; => by default, wire_load_selection(3LM) is chosen, and within this 6K_3LM is chosen.

5. include_file(file_name); => This includes that file from the dir specified in search path.

6. fanout_load: this specifies fanout load for each i/p pin of cell. If not specified, default_fanout_load defined at top of lib file is used.
This may be some number as 1 for smallest size gate (invx1), and then defined appropriately for bigger gates. This will be used by synthesis tool, when we specify max_fanout_load, then all the fanout_load attached to the o/p pin are added to calculate total fanout_load.

7. function: used to represent function of o/p pins of a cell
function : "A&B"; => rep that o/p pin is AND of i/p pins.

Note that simple combinatorial gates can be represented by "function:" stmt, but with seq logic, it's not easy. For latches/flops, we use special keywords. In .lib file, "latch" group used to describe latches and "ff" group used to describe flops. In GTECH (during synthesis in Synopsys DC), both registers and latches are represented by a SEQGEN cell, which has many i/p and o/p pins. Any type of flop/latch can be configured from this SEQGEN cell by tying it's various inputs and outputs.

Example.lib: The lib example below can be applied to any cell, an std cell, or an IP e.g memory module. Just as for a std cell, we specify setup/hold arcs or delay arcs for all i/p pins, we do the same for an IP lib file for all it's i/p pins. When lib files are created for IP, they are called as ETM (extracted Timing model). These ETM hide the internal details of an IP, and just show the arcs on all i/p and o/p pins. These ETM are also used in big SOC, since over there, we run timing on block level, and then when moving to higher level, we generate ETM models of these lower level blocks. That way STA runs much faster at higher module level. We finally take this approach all the way to chip level, where all top modules in it are ETM. this allows STA to run in a much faster time. In some cases where SOC have 10B+ transistors, it's not even possible to run STA flat on chip level gate netlist, since it will take weeks to complete. On other hand, top chip level runs with ETM of lower level blocks can run in less than a day.

library (LIB_W_150_2.5_STDCELL.db) { /* name of library. name can be with .db or w/o it. entire lib desc, lib level attr desc below */

/* genral lib attr */
technology (cmos); /* tech tools used, default name is cmos*/
delay_model : table_lookup; /* which delay model to use in delay calc. generic_cmos is default, which is simplest model. 4 others arr table_lookup, piecwwise_cmos, dcm, polynomial. table_lookup is most common. table_lookup is aka Non linear Delay model (NLDM), and this is the one which is shown in this example below*/
bus_naming_style : "Bus%sPin%d"; /* naming convention for buses */
routing_layers ("routing_layer_1, routing_layer_2"); /* all routing layers available for PnR */

/* delay and slew attr */
//define varios slew and delay attr like thresholds for measuring delay and slew ..
input_threshold_pct_fall : 46; // threshold of 46% fall at i/p pin of receiver for measuring delay
input_threshold_pct_rise : 46;
output_threshold_pct_fall : 46; // threshold of 46% fall at o/p pin of driver for measuring delay
output_threshold_pct_rise : 46;
slew_lower_threshold_pct_fall : 20; //slew starting point is at 20% rise/fall
slew_lower_threshold_pct_rise : 20;
slew_upper_threshold_pct_fall : 80; //slew ending point is 80% rise/fall. This start/end points are used to get the linear slope of waveform
slew_upper_threshold_pct_rise : 80;

/* define units */
time_unit: "10ps"; /* to identify physical time unit in lib. most common is 1ns*/
voltage_unit: "100mv"; /* to scale i/p, o/p voltage groups. most common is 1V*/
current_unit: "1mA"; /* drive current unit genrated by o/p pads, or pull-up/pull-down transistor */
pulling_resistance_unit: "10ohm"; /* res for pull-up/pull-down transistor */
capacitive_load_unit (1,pf); /* unit for all caps*/
leakage_power_unit: 100uW; /* unit of power values. Power units are usually not reported, and calcualted from V, I, C. However, lkg is added for Synopsys DesignPower*/

voltage_map (VDD, 0.5); //These map var VDD to 0.5V. Similarly map other voltages as VPP, VBB, VSS. These mappings are needed if these var are used later.
...

default values /* env defn*/
nom_process : 3;
nom_temperature : 150;
nom_voltage : 2.5;
default_fanout_load : 1; => by default, each i/p pin assigned a fanout load of 1. we override this by assigning explicit FO on each i/p pin of all cells (by using fanout_load : 1;)
default_max_fanout : 20; => max_fanout set at 20 for all o/p pins. we don't specify it explicitly for o/p pins except for tie_hi/tie_lo pins of TIE cell.
default_input_pin_cap : 1; => default is 1 unit. however, each i/p pin assigned explicit cap (by using capacitance : 0.004)
default_inout_pin_cap : 1;
default_output_pin_cap : 0; => default is 0 unit. however, each o/p pin assigned explicit cap (which is again very close to 0, as src/drn cap is negligible)

operating_conditions (W_150_2.5) { //just one op cond specified for particular lib. name is W_150_2.5 (W=weak, T=150C, V=2.5V) but can be anything as "SlowSlow_0p9v_m25c". Here op cond is WCCOM (worst case cond). Usually there is only 1 op cond specified in single lib, but there may be multiple too, in which case we choose the one we want. Other op cond BCCOM (best case cond) may be defined in some other lib. This section is used in PT/synthesis to set operating condition. More details on "set_operating_condtion" specified in "PT - OCV" section.
    process : 3; => Process is usually defined as a number where some process number=nom. Any number below nom is considered fast process, while number above nom is considered slow process.
    temperature : 150; => This defines Temperature for this lib
    voltage : 2.5; => This defines voltage for this lib
    tree_type : "balanced_tree"; //interconnect model for calc interconnect delay. During Synthesis, "compile" cmd uses the model from here to select a formula for calc interconnect delays. 3 models available: best_case_tree (uses lumped RC model), worst_case_tree (all loads assume full wire resistance) and balanced_tree (all loads share wire resistance evenly). Here, model is "balanced_tree".

voltage_map(VDD_HIGH, 0.540); => This is latest liberty cmd, that is used to map voltage for PG(power / ground) pin of block. This is the voltage that this PG pin is mapped to for this corner. The flow defaults to this voltage, when no other voltages are set on this pin. It also issues error/warning, when the voltage set on this pin, is not within a certain range of this voltage. We can specify voltage map for all PG pins of this block. NOTE: operating_condition also specifies voltage for a block, but it specifies for whole of the block, not for each indvidual power pin of block. More usage of this is explianed in "PT - DSLG flow" section.

delay_lut_template (name) { //name may be delay_template_5x6 or something descriptive. There may be multiple of these lut for power, driver_waveform, etc
   //lookup table template info. Below info says that when table is 2D with 2 indices, then 1st index ins i/p cap, while 2nd index is i/p slew, and the value reported in lut is the "delay" va;ue corresponding to this i/p cap and this i/p slew.
    index_1 ("1,2,3,4,5"); //index values here may be real values too
    variable_1 : total_output_net_capacitance; //o/p net cap used for table look up. variable1 corresponds to index1
    index_2 ("1,2,3,4,5,6");
    variable_2 : input_net_transition; //i/p net transition on that pin used for table look up. variable2 corresponds to index2
NOTE: each row in the 2D table reported later is for index_1 (so 5 rows), while each column in the 2D table refers to index_2 (so 6 columns) for the entries that we see in table. var1 and var2 may be other way around too
}

//wire load models
wire_load("3K_2LM") { ...}
wire_load("3K_3LM") { ...}

wire_load("zwlm") { resistance: 1; capacitance:0; fanout_length(1,0) ... } //this is zero wire load model which says that res=1ohm and cap=0pf per unit length of wire, and for fanout=1, assume length to be 0, FO=2, assume length to be 0 and so on until FO=20. So, essentially. RC delay is going to be 0 for all wires, as wire length is assumed to be 0 for all connections
...
wire_load_selection (2LM) { wire_load_from_area(0, 110300, "zwlm"); } //each of these wire load selection chooses a particular wire load model from above based on area of design. here it says that if area of design is between 0 and 110300 units then choose zwlm.
wire_load_selection (3LM) { ... }

default_wire_load : "6K_3LM";
default_wire_load_selection : 3LM ;
deafult_wire_load_mode: segmented ;

//////// All cells power/delay data ////////
cell (name1) { /* cell defn */
    //general info for each cell. All these attributes are defined by liberty syntax. We can have as many attributes for each cell as we want.
    version : 1.0;
    cell_leakage_power : 4.579760E+01;
    area : 1.25;
    cell_footprint : AN2;

    pg_pin (VDD) { pg_type: primary_power; voltage_name:VDD; related_bias_pin: VPP; } => optional. All pg_pins as VDD, VSS, VPP, VBB specified here

    //optional: lkg pwr for each combo of i/p pins. default lkg pwr is the one above (when none of below conditions occur). This is needed only if we want to model very accurate leakage power (<22nm)
    leakage_power () {
      value : 4.112860E+01;
      when : "A&!B"; //lkg pwr when A=1,B=0. similarly we define for other combo of A,B

      related_pg_pin: VDD //if we have multiple Power pins, then we can define power consumption for each pin separately. If we have bias voltage for nwell as VPP, then we have separate lkg power related to that pin.
    }

    //info for each i/p pin.
    pin (A) { //similarly for pin B and other i/p pin
      capacitance : 0.0027; //i/p cap on pin A (may have rise_cap and fall_cap also listed separately for low tech node (<22nm), however, rise/fall cap are very close to regular cap of pin)

receiver_capacitance () { //apart from simple values above, we can specify i/p slew dependent cap values to be used in receiver model in CCS model. More details in CCS section.

when: "!B&SI"; //cap can be different based on i/p pin state. So, we can condition based cap

       receiver_capacitance1_rise (receiver_cap_template_8x8) { //we have 4 such values for cap1_rise, cap1_fall, cap2_rise and cap2_fall
        index_1 ("0.00340741, 0.0126433, 0.031115, 0.0681482, 0.142125, 0.290168, 0.586164, 1.17816"); //i/p slew
        values ( \
          "0.000288178, 0.000321005, 0.000330344, 0.000333917, 0.000335357, 0.000336019, 0.000336375, 0.000336634" \ //cap1_rise values for diff i/p slew rate
        );
      }

      max_transition : 4.00; //max transition tolerated on i/p pin A. this max transition is there since the timing table for o/p pin has look up values upto tran time of 4ns. Any trnasition greater than 4ns has to be extrapolated by the timing tool to come up with delay for the cell, which may be inaccurate.
      direction : input;
      fanout_load : 1; //this pin assigned FO=1. This number is used by tool to estimate wireload for net connecting to this pin. This FO is also used to calc total FO load on each net for max FO Design rule violation. For bigger gates, we may assign FO=2,3,etc.

related_power_pin: VDD; //when we have pwr/gnd pins, we assign related power, gnd and bias pins (3 separate stmt)

      //internal_power arcs for i/p pin usually don't exist, since internal pwr is already captured in o/p pin. But when we have multiple i/p pins, it's possible that some internal pwr gets consumed, when i/p pin changes even when o/p pin doesn't change. This happens due to redistribution of cap on internal nodes, due to i/p pin switching. Note that if o/p pin toggles due to i/p pin toggling, then it gets reported as internal pwr on o/p pin. Pwr consumed here is small, so most libs do not care about internal pwr on i/p pins of stdcells. Only used for lower nm tech where we want to model power accurately
      internal_power () { //when pin A is toggling. similarly for pin B.
        when : "!B"; //this when condition is necessary, since this internal pwr only gets consumed for NAND gate when other pin=0. This forces o/p pin to 1. So, pin A toggling doesn't cause o/p to change in this case, resulting in internal pwr on pin A only
        related_pg_pin: "VDD"; //if we have multiple pwr pins like VDD, VPP, then we define pwr separately for each pg_pin, so that we can separate out current thru each of these pins. So, for 2 PG pins as VDD, VPP, we repeat this internal power table for pin A 2 times
        rise_power (inpower_template_8x1) { //there is only index_1 which has i/p transition time on it. There is no index for cap here
         }
        fall_power (inpower_template_8x1) {
        }
    }

    //info for o/p pin
    pin (Y) {
      //general info
      capacitance : 0.0000; //drn cap on o/p pin is 0 (we may also omit this)
      max_capacitance : 0.15; //max cap tolerated on o/p pin Y. this max cap is there since timing table for o/p pin has look up values upto max cap of 150ff. Any cap load of greater than 150ff has to be extrapolated by the timing tool to come up with delay for the cell, which may be inaccurate. We may also specify a min_cap which refers to the smallest cap present in LUT
      direction : output;
      function : "A&B"; //used by tools to know functionality of cells !=NOT, +=OR, &=AND
      power_down_function: "!VDD + !VPP + VSS + VBB"; // This says that cell si powered down when VDD=0 & VPP=0 & VSS=1 & VBB=1 (0=not present, 1=present)

related_bias_pin: VPP; //when we have pwr/gnd pins, we assign related power, gnd and bias pins (3 separate stmt)

      //timing arcs for o/p pin rise/fall delay and rise/fall transition wrt to all i/p pins
      timing () { //timing wrt to i/p pin A. similarly for timing wrt i/p pin B
        transport : "NO";
        related_pin : "A";
        timing_type : combinational;
        timing_sense : positive_unate;//+ve means o/p goes in same dirn as i/p

        when: "A1&!A2"; //optional, specifies that is timing arc is to be used when A1=1,A2=0. We also specify a sdf condition (sdf_cond: "A1==1'b1 && A2==1'b0") that is used when generating sdf file.
       mode(my_mode, "scan_2"); //optional. We can specify each tiing arc to be valid for specific conditions only. We achieve this via "mode" attribute. A mode attribute pertains to an individual timing arc. We specify a mode_name and mode_value, and this timing arc is active only when mode is set to that value. Here, we set our variable "my_mode" to value="scan_2", so this timing arc will be picked only when "my_mode" is set to "scan_2" mode. Here my_mode is not just a variable, that can be set via "set my_mode scan_2", but rather a mode variable, set via PT cmd "set_mode" in synthesis/STA scripts. See details of this cmd in PT cmds section.
        rise_transition (transitiondelayload6slew7_6x7) { } //NLDM LUT for o/p slew, similarly for fall_transition
        cell_rise (celldelayload6slew7_6x7) { } //NLDM LUT for o/p delay, similarly for cell_fall
      }
      timing () { //timing wrt i/p pin B
      }

      //power arcs for o/p pin rise/fall wrt to all i/p pins. arcs similar to those of timing
      //assumption is that both pins will never change at exactly the same time. so we can calc power wrt 1 pin toggling, then wrt other pin toggling
      internal_power () { //when pin A is toggling. similarly for pin B.
        related_pin : "A";
        related_pg_pin: "VDD"; //if we have multiple pwr pins like VDD, VPP, then we define pwr separately for each pg_pin, so that we can separate out current thru each of these pins. So, for 2 PG pins as VDD, VPP, we repeat this internal power table for pin A 2 times
        rise_power (outputpower_cap4_trans5) {
        }
        fall_power (outputpower_cap4_trans5) {
        }

      internal_power () { //power when pin B is toggling
      }

}
      //internal power can be for i/p pins as well as o/p pins as we saw above. For std IP as SRAM, etc we have internal power for i/p pins instead of o/p pins as power for IP varies based on whether it's enabled, and whether in rd/wrt mode. This power number accounts for all the power for that IP in various modes
ex:
   pin (CLK) { ...
       internal_power() {
        power_level : "VDD";
         when : "(WZ&!EZ)"; //similarly power for other modes as wrt=(!WZ&!EZ), idle=(EZ)
         power(inputpower_slew3){
         index_1("0.008,0.1500,0.600"); //i/p pin "CLK" slew rate (0.6ns is max slew rate)
         values(\
       "49.327, 49.315, 49.325"); //energy in pJ for whole IP when in rd=(WZ&!EZ)

             }
       }

//cell info for other cells
cell (name2) { /* cell defn */
   cell1 info
}
type (name) {
   bus type name
}
input_voltage (name) {
   input voltage information
}
output_voltage (name) {
   output voltage information
}

INTERNAL PIN: apart from i/p and o/p pins, we can define internal pins also. This is needed in cases, where there's a complex IP, and it has clocks generated internally that time i/p and o/p ports. In such cases, we define internal pin, which is some divided version of i/p clk, and characterize it's timing based on i/p clk rise/fall. We can have all timing arcs here as setup/hold delay arcs as well as min_pulse and min_period arcs, etc.

ex:

    pin("clk_pll_checkpin_int") {
      direction : internal ;
      clock : true ;
      capacitance : 0.000000 ;
      timing() {
        related_pin : "clk1_ext" ; //this is the i/p clk pin of the IP, which serves as the master source of this internal clock pin. We define timing for gen clk wrt master clk, so that gen clk can be timed correctly based on master clk i/p slew
        timing_type : combinational ;
        cell_rise (....);
      }

     timing() {
        related_pin : "clk_pll_checkpin_int" ; //this is related to itself as min_pulse_with/min_period types are defined on the pin itself
        timing_type : min_pulse_width ;
        rise_constraint (....);
      }

pin("IN1") { direction: input; ... timing() {
related_pin :"clk_pll_checkpin_int"; //Here i/p port IN1 has timings related to the internal clk defined above.

...... }

We can also use "generated_clock" directive to define internal generated clocks. This is so that we don't have to write "create_generated_clock" cmd ourselves to create internal generated clocks. This may be useful in some cases. However, more often we remove these internal clocks inside the phy, and it's preferred to write your own "create_generated_clock" cmd to create clks inside the phy. That way we have more control on what we want.

ex: generated_clock(my_int_clk) { /* This internal clk is defined as div by 2 of master clk, and it's defined on "port1_clk" pin of the IP. There still needs to be a path via "arcs" from gen_clk to master clk, for this gen clk to be created, else PT will give PTE-075 error "gen clk has no path to master clk"*/
      clock_pin : port1_clk ;
      master_pin : ext_800m_clk ;
      divided_by : 2 ;
    }

CHECKPIN: Timing tools as PT creates it's own internal pins for certain arcs even when the .lib beingread doesn't have any internal pins with that name. It creates an internal pin with name "*checkpin*" whenever a pin has a combinational and sequential delay timing arc. This is done to separate the two types of arcs. For ex: consider a cell which has a clk->q arc and a combo clk->gated_clk. Here 1st arc is seq, while 2nd arc is combo. We could have written both arc with related pin as "clk". But PT chooses to create a "checkpin" for seq arc, where clk->q is now referenced as clkcheckpin1->q along with other setup/hold seq arcs also referenced wrt checkpin. clk->gated_clk is still referenced wrt original "clk". a new combo arc from clk->clkceheckpin1 is created with 0 delay. All of this internal "checkpin" creation is done when reading in .lib or .db. So, don't be surprised if you see arcs referecing checkpin, when you no such internal clks. It's something peculiar to PT only.

Details of this is on solvnet => https://solvnetplus.synopsys.com/s/article/Internal-Checkpins-Created-in-Some-Library-Cells-1576002481225

Attributes:

As we saw above, we have various attributes for cells, pins, etc. 1 of the most important attribute in "timing" group is "timing_type" attribute. It's used by timing tools to determine timing paths. timing_sense attribute is used along with this. Also, we have related and constrained pin concept that these attr apply to:

Constrained pin: This is the pin which is being constrained. When we write timing arcs, this is the -to pin. For ex: EN pin of a clk gater is a constrained pin. This is the pin that you will see in .lb as "pin(PIN_1) { ... }

Related pin: Any constrained pin may be constrained wrt multiple pins. When we write timing arcs, this is the -from pin. For ex: EN pin of a clk gater may be constrained wrt clk pin, wrt to clear pin, wrt to st pin, etc. All of these pins as clk, clear, set, etc are called related pins. These are the pins that appear within constrained pin section in .lib as "timing() { {related_pin : "clk"; ... } {related_pin : "set"; ... } } etc.

1. timing_sense attribute can be unate or non_unate. unate is when o/p dirn is dependent on i/p direction (i.e inverter o/p is always opposite of inverter i/p). Non_unate is when o/p dirn has no relationship to input dirn (i.e fop o/p pin Q can be rising or falling with no relationship to i/p pin D dirn). This attr is needed since timing tools can't determine the sense as they can't see the guts of logic. Unate can be +ve unate or -ve unate.

positive_unate : if rising/falling change on i/p causes o/p to rise/fall (same polarity),

negative_unate: if rising/falling change on i/p causes o/p to rise/fall i/p causes o/p to fall/rise (opposite polarity).

2. timing_type attribute: distinguishes b/w comb and seq cell. If this attr is not defined, cell is considered combinatorial. values defined for following timing arcs:

I. comb arc: timing arc attached to an o/p pin, and related pin is either i/p or o/p pin. timing arc has rise/fall_transition and cell_rise/fall for o/p pin wrt each i/p pin. It's used for all combo gates as AND, OR, etc. An arc from Clk to Q pin of a flop is NOT a combo arc (explained in seq arc)
- A. combinational: means o/p can rise or fall. for positive_unate, arc is for R->R,F->F. for negative_unate, arc is for R->F,F->R. for non_unate, arc is for {R,F}->{F,R}
- B. combinational_rise: rise means o/p is rising only. +ve_unate(R->R), -ve_unate(F->R), no_unate({R,F}->R})
- C. combinational_fall: fall means o/p is falling only. +ve_unate(F->F), -ve_unate(R->F), no_unate({R,F}->F})
II. seq arc: It's either delay arc (clk and o/p data) or constraint arc (clk and i/p data). It's used for flops/latches, etc. The seq arc is from "related" pin to the "constrained" pin.
- A. rising/falling_edge: arc whose timing o/p pin is sensitive to rising/falling signal at i/p pin. An ex is CLK->Q arc of a flop. Here when clk rises, o/p pin may rise or fall. It looks like a combo arc (i.e delay from i/p to o/p), but it's actually a seq arc, as the arc breaks here. We have a new timing arc start from clk pin to q pin. Another reason, it's not a combo arc is because o/p value changes only on +ve edge of clk and not on -ve edge (for a +ve flop). So, to differentitate this CLK->Q arc from pure combo clk->gclk arc, we write it as seq arc.
- B. preset/clear: arc affect only the rise/fall arrival time of o/p pin. logic 1/0 is asserted on o/p pin. EX: SR latch has clear arc on "Q" pin wrt "SZ" pin, and preset arc on "Q" wrt "SZ" pin.
- C. hold_rising/falling: designates rising/falling edge of related pin for hold check.
- D. setup_rising/falling: designates rising/falling edge of related pin for setup check.
- E. recovery_rising/falling: uses rising/falling edge of related pin for recovery check. clk is rising/falling edge triggered.
- F. removal_rising/falling: used when the cell is low-enable latch or rising-edge triggered FF (for removal_rising) or the cell is high-enable latch or falling-edge triggered FF (for removal_falling). intinsic_rise/fall attr used along with this.
- G. min_pulse_width: together with minimum_period value, specifies min pulse width for clk pin. can also be specified for other pins as set/reset, etc. Both *_high/low defined for clk pins, while *_high defined for active high set,reset pins while *_low defined for active low set,reset pins. Both high and low pulses need to have min width for clk, since there's a rising edge on both of them, and it may be missed, if it happens in a very small time (low pulse while clk is high, or high pulse while clk is low). If we want min_pulse_width to be specified in same format as other timing attributes, then we need to have related_pin set to same pin as i/p pin, and timing_type as "min_pulse_width". Then to specify min_pulse_width_high, we can specify rising transition with rise_constraint and have different values of high pulse width for different rising transition of pin. Similarly fall contraint means min_pulse_width_low. Usually min_pulse width should be greater than a gate delay in that tech, since the clk pulse passes thru several gates inside the flop, so a pulse less than a gate delay may be swallowed by the gate itself (i.e pulse may start dying before it even rose to 100%, since the delay is more than pulse width)
III. nonseq arc: when setup/hold are specified on data pin with a non-clk pin as the related pin. The signal of a pin must be stable for a specified period of time before and after another pin of the same cell change state, for the cell to function as expected. Called nonseq since related pin is not clk. 4 possible arcs are non_seq_setup/hold_rising/falling. rising/falling edge are meant for related pin. These are called data to data paths.
- Ex: SR latch has non_seq_setup/hold_rising arcs on "RZ"(data) rising wrt "SZ"(clk as related pin) rising and vice versa. This arc exists since when both RZ/SZ go inactive, o/p Q is uncertain depending on which pin went inactive first. Similar arcs for clrz wrt prez and vice versa for all flops/latches which have clrz and prez pins on them.
IV: nochange arc: used for latch devices with latch enable signals. 4 possible arcs of nochange_high/low_high/low indicate +ve/-ve pulse on constrained pin and +ve/-ve pulse on related pin.

stdcells and their .lib arcs:

In PT, we can see all the arcs for a particular cell by typing: report_lib <args> (see in PT_ETS.txt for more details). We'll use this cmd when looking at arcs for cells below. This will ensure our cell timing arc understanding is consistent with what Timing tool sees. Below are different kind of stdcells discussed, along with their timing arcs.

1. comb logic: combinatorial gates as AND, OR, etc. arcs are for o/p pin with related i/p pin. o/p pin rise/fall wrt each i/p pin. positive_unate/negative_unate indicates the dirn of input pin. 3 kinds:
A. Data path: Adders, comparators, etc. AD2 (half adder, S=A^B, CO=A&B), AD3 (full adder, S=A^B^CI, CO=A&B+A&C+B&C) SU2 (subtractor/comparator)
B. Gates: AN21/NA21 (2/3/4 i/p and/nand gate), BF09/BH03 (2 to 7 i/p Boolean functions), EN21 (2 i/p EX-NOR), EX22 (2/3 i/p EX OR), BU10/IV10 (buffers,tri-state buffers, inverters), OR31/NO31 (2/3/4 i/p or/nor gate)
C. Multiplxer: MU111 (multiplexer). if multiplexer implemented using pass gates then it's no more comb, so special attributes have to be placed for such 1 hot mux)

Example arc for NAND gate: NOTE: AND has 2 gates in it (nand followed by inv). So, better to look at an nand.
cell (NA210) {
    version : 1.0;
    cell_leakage_power : 3.75; //avg (default) lkg power in pW (unit defined in top)
    area : 1.40;
    cell_footprint : AN2;

    leakage_power () {//lkg power for A=1, B=0
      value : 6;
      when : "A&!B";
    }
    leakage_power () {//lkg power for A=0, B=1
      value : 7;
      when : "!A&B";
    }
   pin (A) {
      capacitance : 0.0065;//cap in pf. 0.006pf=6ff
      max_transition : 3.50; //max slew rate allowed on i/p pin is 3.5ns (for all cells)
      direction : input;
      fanout_load : 1; //fanout load defined as 1 for i/p pin (for all cells). this fanout load is used when calc FO at any o/p pin (FO load for all i/p pins at receiver added to get FO load at o/p of driver)
    }
    pin (B) { //for i/p pin B
      capacitance : 0.0063;
      max_transition : 3.50;
      direction : input;
      fanout_load : 1;
    }
    pin (Y) { //for o/p pin Y
      capacitance : 0.0000;
      max_capacitance : 0.11; //max cap allowd on pin Y is set to 110ff. assume pmos/nmos same size = x. So, i/p cap for EFO purpose = 1/1.5(n)+1(p)=1.66*6ff/2=5ff. max EFO=110/5=22. it's same as for invx1, as all x1 gates have same driving strength. When we goto size x2, max cap is set to 0.22 (since i/p drv strength is twice [i/p cap is 12ff], so max EFO is still 22)
      direction : output;
      function : "A&B";
      timing () {
        transport : "NO";
        related_pin : "A"; => related pin says with respect to which i/p pin is o/p delay based on. For flops with pin D, related pin would be CLK for setup or hold checks.
        timing_type : combinational; => refers to related pin dirn (for ex, if it's hold_rising, then rising refers to pin "A" dirn)
        timing_sense : positive_unate;

        rise_transition (transitiondelayload5slew6) { //o/p slew rate
          index_1 ("0.0054,0.0162,0.0324,0.0486,0.0864");//o/p load in pf. NOTE: max cap in table here is 86.4ff, while max cap is set to 110ff. So, extrapolation is done.
          index_2 ("0.04,0.1,0.4,0.8,1.5,3.5");//i/p slew in ns (max i/p slew is 3.5ns)
          values (\
                  "0.1665, 0.1666, 0.1707, 0.1765, 0.1864, 0.2188",\ => 1st row is for index_1, entry 1
                  "0.3182, 0.3189, 0.3203, 0.3243, 0.3283, 0.3477",\ => each column is index_2 entry 1-6
                  "0.5517, 0.5515, 0.5516, 0.5549, 0.5566, 0.5669",\
                  "0.7859, 0.7859, 0.7844, 0.7864, 0.7886, 0.7946",\
                  "1.3322, 1.3305, 1.3300, 1.3316, 1.3326, 1.3363"); => 5th row is for index_1, entry 5
        }
        cell_rise (celldelayload5slew6) { //delay thru cell
          index_1 ("0.0054,0.0162,0.0324,0.0486,0.0864");
          index_2 ("0.04,0.1,0.4,0.8,1.5,3.5");
          values (\
                  "0.2670, 0.2874, 0.3779, 0.4501, 0.5348, 0.6757",\
                  "0.3734, 0.3939, 0.4843, 0.5573, 0.6425, 0.7908",\
                  "0.5278, 0.5482, 0.6387, 0.7130, 0.7971, 0.9461",\
                  "0.6806, 0.7011, 0.7922, 0.8666, 0.9508, 1.0993",\
                  "1.0361, 1.0568, 1.1484, 1.2227, 1.3087, 1.4557");
        }
   fall_transition (transitiondelayload5slew6) { ... }
        cell_fall (celldelayload5slew6) { ... }
   //similarly for pin B
        timing () { ... }

   internal_power () {
        related_pin : "A";
        rise_power (outputpower_cap3_trans4) { //pwr in pW when o/p pin Y is rising
          index_1 ("0.0108,0.0432,0.0864");
          index_2 ("0.1000,0.5000,1.2000,3.8000");
          values (\
                  "0.0246, 0.0242, 0.0270, 0.0409",\
                  "0.0255, 0.0244, 0.0257, 0.0367",\
                  "0.0257, 0.0248, 0.0251, 0.0338");
        }
        fall_power (outputpower_cap3_trans4) { //pwr when pin o/p pin Y is falling
          index_1 ("0.0108,0.0432,0.0864");
          index_2 ("0.1000,0.5000,1.2000,3.8000");
          values (\
                  "0.0077, 0.0013, 0.0024, 0.0157",\
                  "0.0087, 0.0063, 0.0032, 0.0109",\
                  "0.0090, 0.0077, 0.0062, 0.0084");
        }
      }
      internal_power () { ... } //similarly for pin B.
     }
    }
}

2. seq logic: Flops and latches. The name of flop/latches in libraries is such that it allows to distinguish b/w scan/no_scan, +ve/-ve, Clrz/Prez/both pins. as an example name XYZ=> X=D(no scan),T(scan). Y=N(-ve),T(+ve), Z=B(both),C(clr),P(preset),N(none). clr/preset are active low.

A. no scan flops: DNB10/DTB10(-ve/+ve, clr/preset), DNC10/DTC10(-ve/+ve, clr), DNN10/DTN10(-ve/+ve, none), DTP10(+ve, preset).

ex: Negative edge triggered D-FF, async active low clear, both Q and QZ outputs., 4X Drive
cell (DNC40) {
...//ff group: describes either a single stage or master-slave Flip Flop. ff_bank used to rep multi-bit flip-flop.
ff ("IQ","IQZ") { => IQ defines state of non-inverting o/p, while IQZ defines inverting output state (internal states of cross coupled inverters within the flop). These can be named anything except name of a pin in the cell being described.
      next_state : "D"; => required, it's a logic eqn written in terms of i/p pins or 1st state variable (IQ)
      clocked_on : "CLK'"; => required, identify active edge of clock signal (here CLK' indicates it's -ve edge triggered device). all pins listed here are treated as clocks by DC. For ex, for ff with CE pin, we can write clocked_on: "CLK & CE", but then we define clock attribute as true for CLK and false for CE.
      clear : "CLRZ'"; => optional, gives active value for clear input. here's it's CLRZ' => clrz bar
      preset : "xx"; =>optional, gives active value for preset input
      clear_preset_var1 : L; => this is there if both clrz,prez pins there. implies IQ=L if both clrz,prez active.
      clear_preset_var2 : L; => this is there if both clrz,prez pins there. implies IQZ=L if both clrz,prez active.
    }
pin (CLK) {
      min_pulse_width_high : 0.9572;
      min_pulse_width_low : 0.7352;
      capacitance : 0.0152;
      max_transition : 4.10;
      direction : input;
      fanout_load : 1;
      clock : true; => clock attribute needs to be set to true, so that DC treats this as clock.
...
}
pin (CLRZ) {
      min_pulse_width_low : 0.6865; //clrz low pulse can't be < 0.68ns. This translates into $width check when running PT/gate_sims. No check for high pulse as high is inactive, so even if there's a high glitch, it's ok as o/p will still be low.
      capacitance : 0.0129;
      max_transition : 4.10;
      direction : input;
      fanout_load : 1;
...
      //timing: 4 arcs = recovery_falling/removal_falling wrt CLK (implies clk falling edge), non_seq_setup_rising/non_seq_hold_rising wrt PREZ. see top of this file for details on various arcs for all cells. Since related pin is "CLK" so timing arc is -from "CLK" pin -to specified pins (i.e -to CLRZ/PREZ etc). This is how seq timing arcs are written. They are always from "related" pin to "constrained" pin.
      timing() { //timing for removal_falling related to clk pin (clk pin falling since it's -ve edge flop)
        related_pin : "CLK"; //since related pin is CLK, arc is: -from CLK -to CLRZ
        timing_type : removal_falling;
        rise_constraint (constraint_slewref_6slewdata_6) { //note that i/p pins use word "constraint" for timing arcs instead of cell_rise, etc as used for o/p pins. This has rise_constraint only as recovery/removal are for active to inactive edge only
        }
      }
      timing() { //timing for recovery_falling related to clk pin
        related_pin : "CLK";
        timing_type : recovery_falling;
        rise_constraint (constraint_slewref_6slewdata_6) { //note this has rise_constraint only
        }
      }
      timing () { //CLRZ rising (rise_constraint) should setup some time before PREZ rising (non_seq_setup_rising)
        related_pin : "PREZ"; //since related pin is PREZ, arc is: -from PREZ -to CLRZ
        timing_type : non_seq_setup_rising; //setup arc
        rise_constraint (constraint_slewref_6slewdata_6) {
        }
      }
      timing () { //CLRZ rising (rise_constraint) should hold for some time after PREZ rising (non_seq_setup_rising)
        related_pin : "PREZ";
        timing_type : non_seq_hold_rising; //hold arc
        rise_constraint (constraint_slewref_6slewdata_6) {
        }
      }
}
pin (PREZ) {   //similar arcs for PREZ as for CLRZ
}

pin (D) {
      capacitance : 0.0054;
      max_transition : 4.10;
      direction : input;
      fanout_load : 1;
...
      //pin D has 2 arcs, setup/hold wrt clk falling
      timing () { //pin D needs to setup with clk falling
        related_pin : "CLK"; //since related pin is CLK, arc is: -from CLK -to D
        timing_type : setup_falling;
        rise_constraint (constraint_slewref_6slewdata_6) { //pin D rising edge setup. setup/hold arcs are dependent on D and CLK pin slew rates, and do not have dependence on o/p load. So, 2D table has index1 as clk_slew and index2 as data_slew
   }
        fall_constraint (constraint_slewref_6slewdata_6) { //pin D falling edge setup
   }
      }
      timing () { //pin D needs to hold with clk falling
        related_pin : "CLK";
        timing_type : hold_falling;
        rise_constraint (constraint_slewref_6slewdata_6) { //same for hold
   }
        fall_constraint (constraint_slewref_6slewdata_6) {
   }
      }

}
pin (Q) {
      capacitance : 0.0000;
      max_capacitance : 0.77;
      direction : output;
      function : "IQ";
...
      //Q pin has 4 arcs: delay arcs wrt PREZ falling, CLRZ falling and Q falling, CLRZ falling and Q rising, and CLK falling
      timing () { //Q rising wrt PREZ falling
        transport : "NO";
        related_pin : "PREZ";
        timing_type : preset;
        timing_sense : negative_unate;
        rise_transition (transitiondelayload6slew7) {
   }
        cell_rise (celldelayload6slew7) {
   }
      }

      timing () {//Q falling wrt CLRZ falling
        transport : "NO";
        related_pin : "CLRZ";
        timing_type : clear;
        timing_sense : positive_unate;
        fall_transition (transitiondelayload6slew7) {
   }
   cell_fall (celldelayload6slew7) {
   }
      }

      timing () {//Q rising wrt CLRZ falling. this happens since clrz has priority, so when both clrz,prez are low, then Q=L. But if clrz goes high, then Q goes high as prez is still active.
        transport : "NO";
        related_pin : "CLRZ";
        timing_type : preset;
        timing_sense : positive_unate;
        rise_transition (transitiondelayload6slew7) {
   }
   cell_rise (celldelayload6slew7) {
   }
      }

      timing () {//Q rise/fall wrt clk falling
        transport : "NO";
        related_pin : "CLK";
        timing_type : falling_edge;
        rise_transition (transitiondelayload6slew7) {
   }
   fall_transition (transitiondelayload6slew7) {
   }
   cell_fall (celldelayload6slew7) {
   }
   cell_rise (celldelayload6slew7) {
   }
     }
}
pin (QZ) { //same arcs as those of Q
      capacitance : 0.0000;
      max_capacitance : 0.77;
      direction : output;
      function : "IQZ";
..
}
} => end of cell

In PT, for a regular flop with D, CLK and Q pins, we see these 4 arcs. NOTE that all arcs are "-from" CP pin (related pin) "-to" Q or D pin (constrained pin). Always keep that in mind when considering arcs.

pt_shell> report_lib -timing TSM_LIB {DFLOP_SVT}
****************************************

                            Arc                   Arc Pins
   Lib Cell Attributes    # Type/Sense      From        To         When
   ----------------------------------------------------------------------------
                 s         0 hold_clk_rise   CP          D
                           1 setup_clk_rise CP          D
                           2 clock_pulse_width_high
                                              CP          CP         D
                           3 clock_pulse_width_low
                                              CP          CP         D
                           4 rising_edge     CP          Q

B. scan flops: TNB11/TDB11(-ve/+ve, clr/preset), TNC10/TDC10(-ve/+ve, clr), TNN10/TDN10(-ve/+ve, none), TNP/TDP(-ve/+ve, preset). All these scan flops have test_cell group to identify them as scan cells.
arcs for TDB11 are for:
I. prez pin: 4 arcs. 2 arcs are with clk as related pin, recovery_rising/removal_rising(implies clk rising) for prez rising. no falling edge arc as recovery/removal checks are only for async signal going from active to inactive. other 2 arcs are with clrz as related pin, non_seq_setup/hold_rising(implies clrz rising) for prez rising. again no falling edge arcs here.
II. clrz pin: 4 arcs same as for prez pin. recovery_rising/removal_rising with clk as related pin, and non_seq_setup/hold_rising with prez as related pin.
III. Data pin: 4 arcs. with clk as related pin, setup/hold_rising(implies clk rising) for data pin rising and falling.
IV: Q pin: 5 arcs. 1 arc is with prez as related pin, "preset" arc for Q rising. 2 arcs are with clrz as related pin, "clear" arc for Q falling, and "preset" arc for Q rising. Note that for clrz related pin, we have "preset" arc also. this is because clrz has priority over prez, so when both clrz/prez are low, and then clrz goes high, then Q goes high. so, we have "preset" arc for Q rising with clrz as related pin. 2 arcs with clk as related pin, "rising_edge"(implies clk rising) for Q rising/falling.
V: SD pin: 4 arcs. with clk as related pin, setup/hold_rising(implies clk rising) for SD rising/falling. same as Data pin arcs.
VI: SCAN pin: 4 arcs. with clk as related pin, setup/hold_rising(implies clk rising) for SCAN rising/falling. same as Data pin arcs.

ex: Scan flop
cell (TDN10) { ...
ff ("IQ","IQZ") {
next_state : " (D SCAN') + (SD SCAN) "; => states that next state is D when scan=0, and SD when scan=1
clocked_on : "CLK";
}

//test_cell group: added to the cell desc to identify it as scan cell. this group defines only the non-test mode fn of scan cell.
test_cell () { => identifies this cell as scan cell
      ff ("IQ","IQZ") { => model only the non-test cell behaviour here.
        next_state : "D"; => in no-test, next state=D
        clocked_on : "CLK";
      }
      pin (D) {
        direction : input;
      }
      pin (CLK) {
        direction : input;
      }
      pin (SD) {
        direction : input;
        signal_type : test_scan_in; => scan_data_in
      }
      pin (SCAN) {
        direction : input;
        signal_type : test_scan_enable; => scan_enable
      }
      pin (Q) {
        function : "IQ";
        direction : output;
        signal_type : test_scan_out; => scan_data_out
      }
      pin (QZ) {
        function : "IQZ";
        direction : output;
        signal_type : test_scan_out_inverted; =>
      }
}

C. latch (no scan): LAB10( nand SR latch), LAL10/LAH10(active low/high), LAH27(active high with clr/preset), LAH2B(active high with clr)
arcs for LAH27 are for: (clk pin has no arc but has "min_pulse_width_high" check and is tagged as "clock : true"). Note, a active high latch essentially behaves as -ve flop, so all arcs same as those for flop, except for Q pin comb arc from D->Q.
I. prez pin: 4 arcs. 2 arcs are with clk as related pin, recovery_falling/removal_falling(implies clk falling) for prez rising. clk falling edge taken as latch turns off at falling edge of clk. 2 arcs are with clrz as related pin, non_seq_setup/hold_rising(implies clrz rising) for prez rising.
II. clrz pin: 4 arcs same as for prez pin. recovery_falling/removal_falling with clk as related pin, and non_seq_setup/hold_rising with prez as related pin.
III. Data pin: 4 arcs. with clk as related pin, setup/hold_falling(implies clk falling) for data pin rising and falling.
IV: Q pin: 8 arcs. 2 arcs with prez as related pin, "preset" arc for Q rising (prez falling) and "clear" arc for Q falling (prez rising). 2 arcs with clrz as related pin, "clear" arc for Q falling (clrz falling) and "preset" arc for Q rising (clrz rising). Note that prez has priority over clrz here, so with prez as related pin "clear" arc exists for Q falling. But irrespective of that, whenever clrz or prez go high (while clk is high), then i/p Data will flow to Q, so with clrz rising, "preset" arc exists for Q rising, and for prez rising, "clear" arc exists for Q falling. So, with prez as related pin, "clear" arc exists for Q falling in 2 ways:
    A. clrz=0, clk=0or1, and prez rises => Q falls (case of prez having priority)
    B. clrz=1, clk = 1, and prez rises => Q falls (case of D->Q path while clk active)
2 arcs with clk as related pin, "rising_edge"(implies clk rising) for Q rising/falling. 2 arcs with Data as related pin, "combinational" for Q rising/falling.

ex: active high D-latch, async active low clear/preset, both Q and QZ outputs., 4X Drive
cell (LAH21) {
...//latch group below: describes level sensitive storage device. latch_bank used to rep multi-bit latch.
latch ("IQ","IQZ") { => IQ defines state of non-inverting o/p, while IQZ defines inverting output state (internal states of cross coupled inverters within the flop). These can be named anything except name of a pin in the cell being described.
      enable: "CLK"; => optional. specify enable (active high)
      data_in: "D"; => optional, data
      preset : "PREZ'"; => preset is active low (note ' at end of PREZ to indicate bar)
      clear : "CLRZ'"; => clr is active low
      clear_preset_var1 : H; => IQ (var1) =H when both preset and clear are active
      clear_preset_var2 : H; => IQZ (var2) =H when both preset and clear are active
    }
pin (CLK) { .... clock: true; => clock attribute needs to be set to true, so that DC treats this as clock. No timing arcs.
pin (D) { .. } => 2 timing arcs, setup_falling/hold_falling wrt CLK falling and D pin rise/fall constraint
pin (CLRZ) or (PREZ) => these don't have any special attr. just treated as normal pins. They have 4 arcs: recovery_falling/removal_falling wrt CLK pin falling and CLRZ rising (rise_constraint), and non_seq_setup/hold_rising for pin CLRZ wrt pin PREZ rising (or for PREZ pin: non_seq_setup/hold_rising for pin PREZ wrt pin CLRZ rising)
pin (Q) { ...//4 arcs: wrt clrz rise/fall, prez fall, clk fall and combinatorial arc for D rise/fall.
function: "IQ"; => Q has same value as var IQ above. IQ=H when both clrz/prez active, so prez has priority
pin (QZ) { ...
function: "IQZ"; => QZ has same value as var IQZ above. IQZ=H when both clrz/prez active, so clrz has priority

D. latch(with scan): ADD DETAILS

3. clock cells: cells on clk path. CGN4/CGP4 (clk gaters), CTB20 (clk tree buffer)
arcs for CGP40 are for: (CG* cells have statetable instead of function, and then o/p pin uses "state_function" to define functionality)
I. EN: 4 arcs with CLK as related pin, setup/hold_rising(clk rising) for EN rising and falling. clk rising since active low latch present. Note that arc has to consider path upto the "and" gate to calc setup/hold, since just meeting setup/hold to the latch i/p doesn't guarantee that EN signal will meet setup/hold to "and" gate.
II. GCLK: state_function: "CLK * ENL", where CLK and ENL(internal node) values are in statetable. 2 "comb" arcs with clk as related pin, for o/p rise/fall.

ex: clk tree buffer
cell (CTB70) { ...
cell_footprint : CTNIBUF; //Use this attribute to assign the same footprint class to all cells that have the same layout boundary. Cells with the same footprint class are considered interchangeable and can be swapped during in-place optimization. Cells without cell_footprint attributes are not swapped during in-place optimization. NOTE that all CTB are assigned same footprint, even thogh they have different layout boundary. similary for CG*, AN2*, etc. all cells from same class are assigned a footprint in TI lib files.
    dont_touch : true; //marked as don't touch, so that some opt step doesn't touch/remove it
    dont_use : true; //marked as don't use so that they are not used during for normal logic design (use only for clk tree)
...}

NOTE: cell_footprint is set to "NIBUF" (non inverting buf) for all buffers (BU110, BU120, etc) and set to "DELAYBUF" for all delay cells (BU112, BU113, BU116, etc). Tool identifies buffers/delay cells by looking at function stmt of cell which is "function : "A";". All delay cells are marked as "dont_use", so normal logic design doesn't use these delay cells to fix hold time.

ex: clk gating cell: CGP10 (passes EN when CLK is Low)
cell (CGP10) {
    version : 1.0;
    cell_leakage_power : 2.204898E+01;
    area : 4.00;
    dont_use : true;
    dont_touch : true;
    cell_footprint : CGP;
    clock_gating_integrated_cell : "latch_posedge"; => this atr says to synthesis tool that it's integrated clk gating cell.

    statetable (" CLK EN","ENL") { //("i/p node names", "internal node names")CLK, EN are input pins, ENL is defined as internal node. statetable is used to define fn of complex seq cells
      table : "L L   : - : L ,\ => "i/p values : current internal value : next internal values". When clk=L, EN=L, ENL current value is - (whatever it's supposed to be), and ENL next value is L.
               L H   : - : H ,\ => here also ENL is same as EN (as CLK is Low=active)
               H -   : - : N "; => no change in ENL
    }

    pin (ENL) { //internal node ENL used to define statetable above
      direction : internal;
      internal_node : "ENL";
    }

    pin (CLK) {
      ...
      clock : true;
      clock_gate_clock_pin : true; //clk gating attr defined
      internal_power () { .... }
    }

    pin (EN) {
      ...
      clock_gate_enable_pin : true; //clk gating attr defined
      internal_power () { ... }
      //2 timing arcs: setup and hold for EN pin wrt CLK rising (note: arcs are for when clk goes inactive).
      timing () { //hold check for EN rise/fall
        related_pin : "CLK";
        timing_type : hold_rising;
        rise_constraint (constraint_slewref_7slewdata_7) { ... }
        fall_constraint (constraint_slewref_7slewdata_7) { ... }
      timing () { //setup check for EN rise/fall
        related_pin : "CLK";
        timing_type : setup_rising; ...
      }
    }

    pin (GCLK) {
      capacitance : 0.0000;
      max_capacitance : 0.19;
      direction : output;
      clock_gate_out_pin : true; //clk gating attr defined
      state_function : " CLK * ENL "; //o/p is product of internal node ENL (defined above) and CLK. When CLK=0, o/p=0, but when CLK=1, o/p=ENL

      timing () { //c2q delay
        transport : "NO";
        related_pin : "CLK";
        timing_type : combinational;
        timing_sense : positive_unate;
        rise_transition (transitiondelayload8slew9) { ... }
        fall_transition (transitiondelayload8slew9) { ... }
        cell_fall (celldelayload8slew9) { ... }
        cell_rise (celldelayload8slew9) { ... }
      }
      internal_power () { ... }
   }
}

4. special cells:
A. PB110 (3 state bus holder) => no function specified as attribute "driver_type: bus_hold" is defined, indicating it's bi-dir pin, and it holds the last logic value when no-one is driving.
B. TO010 (tie-off cell) : used to tie constant values to these cells. tie-off cells are identified by looking at "function : "0 or 1" in the pin attribute.
DC will tie any contant net to this cell unless "set_direct_power_rail_tie" is used for that partcular net. Then, that net will be left floating during synth, but will be connnected directly to vdd/vss during PnR.

cell (TO010) {
      area : 1.75;
      cell_footprint : TO010;
      pin(LO) {
          max_fanout : 50;
          max_capacitance : 100.04;
          direction : output ;
          function : " 0 " ; => this identifies it as tieoff cell for constant logic "0"
         }
      pin(HI) {
          max_fanout : 50;
          max_capacitance : 100.04;
          direction : output ;
          function : " 1 " ; => this identifies it as tieoff cell for constant logic "1"
         }
}

5. missing cells: antenna, decap, filler, tap cells.

A. decoupling cells, filler cells and tap cells: decap cells, are cells that have a capacitor placed between the power rail and the ground rail to overcome dynamic voltage drop; filler cells are used to connect the gaps between the cells after placement; and tap cells are physical-only cells that have power and ground pins and do not have signal pins. Tap cells are well-tied cells that bias the silicon infrastructure of n-wells or p-wells (to connect body/substrate of all devices). All of these are identified by using these attributes for cells:
cell (cell_name) {
¡
is_decap_cell : <true | false>;
is_filler_cell : <true | false>;
is_tap_cell : <true | false>;
¡
}

NOTE: since these are physical only cells (no logic function or timing), we usually don't put these cells in .lib file. They only exist in *.lef file. Some Synopsys tools will complain about this, since they don't find the correct attribute on the cell (as it's missing in .lib). However, we can create a physical only .lib, and we can put all these cells in there (especially the decap cells). Then we don't see the warnings. Or, we should not put these cells in netlist during synthesis.
ex: decap cell in *PHYS.lib
cell (SPAREMOSCAP) {
area : 0.75; //no other attribute besides area.
}

B. antenna cells => used to fix antenna violations. It just has a nmos whose gate is tied to vss, and src/drn are tied to i/p A.
NOTE: function is not defined for Antenna Protection cell.
cell (AP001) {
    version : 1.0;
    cell_leakage_power : 3.828184E+00;
    area : 1.00;
    dont_use : true;
    dont_touch : true;
    cell_footprint : DIODE;

    leakage_power () {
      value : 3.884488E+00;
      when : "A";
    }

    leakage_power () {
      value : 3.771880E+00;
      when : "!A";
    }

    pin (A) {
      capacitance : 0.0028;
      direction : input;
      fanout_load : 1;

      internal_power () {
        rise_power (inputpower_trans5) {
          index_1 ("0.0100,0.2000,1.0000,2.0000,4.0000");
          values ("-0.0001, -0.0001, -0.0001, -0.0001, -0.0001");
        }
        fall_power (inputpower_trans5) {
          index_1 ("0.0100,0.2000,1.0000,2.0000,4.0000");
          values ("0.0001, 0.0001, 0.0001, 0.0001, 0.0001");
        }
      }
    }
}

----------------

delay models:

To calculate any delay thru a path, timing tool must accurately calculate the delay and slew (transition time) at each stage of each timing path. A stage consists of a driving cell, the annotated RC network at the output of the cell, and the capacitive load of the network load pins. Models are employed for driver, wire network and receiver load. The driver model models any cell as a driver (which may be current or voltage source). The wire network is modeled as reduced RC network. Reduced RC network should behave same as original RC network at all frquencies, but allows lot lower computation to calculate delays (PT uses Arnoldi reduction method)The receiver is simply a capacitance. However, the cap may vary depending on rise/fall transition on receiver, min/max condition, miller effect (cap changing due to coupling b/w i/p and o/p, where o/p is changing simultaeously while input is changing), etc. To account for this, models also uses a receiver model to account for this cap as accurately as possible. 2 delay models widely in use:

1. NLDM: (non linear delay model)

For simple NLDM model, driver is a linear voltage ramp in series with a resistor. This is captured via a lookup table, instead of having equations which are more time consuming. The simple LUT model (aka NLDM) employed above works for 22nm tech and above. It specifies delay at midpoint ( at 50% rise or fall). We specify o/p delay + o/p transition time for different i/p slew rate and different o/p load, via a LUT. So, slew rate (b/w 20% to 80% rise or fall with linear slope) and delay (b/w 50% rise/fal to 50% rise/fall) are 2 important parameters that define the shape of o/p waveform (o/p load and i/p slope are used as indexes). However in this simple table, we do not capture the exact waveform of input or output of cell. It's a fixed o/p transition slew rate. This starts adding inaccuracies in delays when compared to spice models. Using a more complex CCS model allows us to capture the waveform more accurately, which is needed for tech < 22nm to get timing results within 2%-5% of spice results. The receiver model NLDM uses is a single cap value for a given timing path. However, cap values may be different based on rise/fall or min/max conditions.

2. CCS: (constant current source model)

CCS model was developed to reduce inaccuracies at 20nm and lower tech. It uses constant current source model (constant current source implies infinite driver strength). It models driver as time varying current source. It can handle high resistive nets (driven by fast drivers), which is a problem for NLDM. CCS receiver model uses 2 cap values for each timing arc. It uses cap C1 for receiver voltage going upto the midpoint of VDD, and then uses C2 for going from the midpoint of VDD to the end. This models miller cap more accurately. For receiver cap, we specify 2D tables for both rise/fall at i/p of receiver. 2D tables are for 4 parameters: receiver_capacitance1_rise, receiver_capacitance1_fall, receiver_capacitance2_rise, receiver_capacitance1_fall. We see at 7nm and below that C1 and C2 themselves differ by upto 20%, and they also vary by as much as 50% across different i/p slew rate and o/p load. So, that signifies the importance of having these receiver models in CCS across diff slew rate and load.

Representing Composite Current Source (CCS) Driver Information: In the Liberty syntax, using CCS model, you can represent nonlinear delay information at the pin level by specifying a current lookup table at the timing group level that is dependent upon input slew and output load. CCS describes each CCS driver switching current waveform by adaptively sampling data points. So basically we take the 2D lookup table from NLDM, and instead of specifying single transition time for each i/p slew and o/p load, we provide current value at different points in time.
To define your lookup tables, use the following groups and attributes:
1. output_current_template group in the library group level
2. output_current_rise and output_current_fall groups in the timing group level

Example of cell:

cell (AOI21_LVT) {

pin(A) { /group for i/p pin. /similarly for all other i/p pins

direction: input; // many other attributes defined

receiver_capacitance () { ... } => tables for different index, and for different cond (when: "!A1&A2)

internal_power () { ... } => tables

}

pin(Z) { //group for o/p pin

direction: output; // many other attributes defined as function, etc

internal_power () { ... } => tables for each related i/p pin for diff condition

timing () { //for each related_pin, there may be more timing groups for each condition

related_pin: "A"; //similarly for related_pin B, etc

when: "!A&B"; //similarly for diff condition

cell_rise (delay_8x8) { ... } // similarly for cell_fall, rise_transition, fall_tarnsition

ocv_sigma_cell_rise (delay_8x8) { sigma_type: early; ... } //EARLY: similarly for ocv cell_fall, rise_transition, fall_tarnsition

ocv_sigma_cell_rise (delay_8x8) { sigma_type: late; ... } //LATE:

ccsn_first_stage () {

stage_type: both; //many more attr as "when, etc

dc_current (ccsn_dc_template) { ... }

output_voltage_fall () { vector (template1) { ... } vector (template1) { ... } ...} //similarly for o/p voltage rise

propagated_noise_high () { vector (template1) { ... } vector (template1) { ... } ...} //similarly for noise_low

}

receiver_capacitance1_rise () { ... } //similarly for cap1_fall, cap2_rise, cap2_fall
output_current_fall () { vector (template1) { ... } vector (template1) { ... } ...} //similarly for o/p current rise. These tables are big as they have current values for lot of time samples for each i/p slew and o/p load

} //end of timing group

timing () {

related_pin: "B";

Example of lib:
library (new_lib) {
...
output_current_template (CCT) { //template for CCS => o/p current waveform wrt 3 var below
variable_1: input_net_transition;
variable_2: total_output_net_capacitance;
variable_3: time;
}

lu_table_template (ccsn_prop_template) { //template for noise
    variable_1 : input_noise_height;
    variable_2 : input_noise_width;
    variable_3 : total_output_net_capacitance;
    variable_4 : time;
}

dynamic_current () { => this models dynamic current at power pins (VDD/VSS) of a gate (here inverter) with both rise/fall at i/p. This can be used to calculate dynamic peak IR more accurately. In absence of this, we use "fixed current" at power pins throughout the switching, which is not so accurate.

    related_inputs : "I";
    related_outputs : "Z";
    switching_group () {
      input_switching_condition (fall);
      output_switching_condition (rise); //o/p is rising, so current waveform is primarily thru VDD as it charges cap, however some short circuit current also flows thru VSS
      pg_current (VDD) {
        vector (ccsp_template2) {
          reference_time : 0.00138;
          index_1 ("0.0023"); //slew rate at i/p of gate
          index_2 ("0.00023"); //load on o/p of gate
          index_3 ("0, 0.0005314, 0.001445, 0.002875, 0.00306608, 0.00314519, 0.00327175, 0.00976063, 0.0144103, 0.0188013, 0.0204694, 0.0249671, 0.0381518, 1.44866"); // these are time delay from reference point of 0.00138 units
          values ( \
            "8.7941e-07, 0.0714941, 0.0554155, 0.109314, 0.0744151, 0.0740751, 0.0750564, 0.0518026, 0.0122771, 0.00195741, 0.000952697, 0.000121253, 1.55532e-07, 1.73258e-06" \ //as can be seen, current is almost 0 at start and end, but goes theu a peak in between. +ve values imply current is getting pulled out of VDD.
          );
        }

vector (ccsp_template2) { // we repeat above table multiple times for differnt slew rates and load. they may end up with different refrence time depending on delay thru cell
}

       pg_current (VSS) { //similarly for VSS pin. NOTE that for VSS, current values are -ve (implying current is pushed into VSS), and they are of much smaller magnitude than VDD current, as it's only small amount of short circuit current
        vector (ccsp_template2) {
          reference_time : 0.00138;
          index_1 ("0.0023");
          index_2 ("0.00023");
          index_3 ("0, 0.000670518, 0.00175046, 0.002875, 0.00300388, 0.00345785, 0.00367968, 0.00392382, 0.00409207, 0.00437108, 0.00472153, 0.00507438, 0.00524245, 0.00675625, 0.00699046, 0.00995021, 0.0131331, 0.0144103, 0.015075, 0.0167913, 0.0188013, 0.0204694, 0.0225021, 0.0249671, 0.0467915, 1.44078, 1.44866");
          values ( \
            "-8.82073e-07, 0.0923592, 0.0406992, 0.0189937, -0.00807851, -0.0116778, -0.0111549, -0.012948, -0.0129895, -0.0129045, -0.0145467, -0.0128898, -0.0144827, -0.0128716, -0.0132787, -0.00995613, -0.00380439, -0.00217233, -0.00167868, -0.000805295, -0.000346744, -0.000158811, -7.18249e-05, -1.5272e-05, 5.02506e-06, -8.37263e-06, 5.12667e-06" \
          );
        }

   switching_group () { //repeat above group for other dirn, i.e rise at i/p
      input_switching_condition (rise);
      output_switching_condition (fall);
      pg_current (VDD) { ... } //NOTE: current values for VDD are -ve here, while VSS are -ve too (implying current is pushed into both VDD and VSS here, maybe because of ripple at o/p which causes o/p voltage to be higher than VDD)
      pg_current (VSS) { .. } //similarly for VSS. VSS current lot higher than VDD current as only small amount of short circuit current flows thru VDD

}

}//end of dynamic current section
...

pin(Z) { ...
timing() { //For CCS, timing section has extra CCS LUT

cell_rise (delay_tem...) { .... } //regular NLDM LUT is also present here, so that NLDM will be used if specified in the tool

   ccsn_first_stage () { //This specs CCS for first stage of gate (channel connected block or CCB) if gate has multiple stages inside it. For ex, AND gate has nand followed by inverter. So, we repeat this section for last_stage too.
        is_inverting : true;
        is_needed : true;

        when: "A&!SE|SD"; //all CCS values below can be defined condition based
        miller_cap_fall : 0.000207711;
        miller_cap_rise : 0.000205185;
        stage_type : both;
        dc_current (ccsn_dc_template) { //2D dc current table which lists the DC current measured at CCB o/p node, with indexes specifying i/p node and o/p node voltage
          index_1 ("-0.95, -0.475, -0.19, -0.095, 0, 0.0475, 0.095, 0.1425, 0.19, 0.2375, 0.285, 0.3325, 0.38, 0.4275, 0.475, 0.5225, 0.57, 0.6175, 0.665, 0.7125, 0.76, 0.8075, 0.855, 0.9025, 0.95, 1.045, 1.14, 1.425, 1.9"); //i/p voltage
          index_2 ("-0.95, -0.475, -0.19, -0.095, 0, 0.0475, 0.095, 0.1425, 0.19, 0.2375, 0.285, 0.3325, 0.38, 0.4275, 0.475, 0.5225, 0.57, 0.6175, 0.665, 0.7125, 0.76, 0.8075, 0.855, 0.9025, 0.95, 1.045, 1.14, 1.425, 1.9"); //o/p voltage
          values ( "0.436551, 0.363591, 0.349409, 0.343731, 0.337281, 0.333664, ... ", ) //and so on ..

}

output_voltage_rise() { //voltage waveforms are not important in CCS, as currents are used to come up with delay and slew at o/p (I=Cdv/dt, So, deltaV can be calculated from i(t) and C). So, we see very few vectors for voltage waveform, but a lot for current waveform

          vector (ccsn_vout_template) {
            index_1 ("0.02306"); => i/p tran
            index_2 ("0.0018245"); => o/p cap
            index_3 ("0.0215692, 0.0266808, 0.03194, 0.0380183, 0.0470364"); => time
            values ( \
              "0.095, 0.28, 0.475, 0.66, 0.82" \ => provides sample points of o/p voltage. voltage is 0.09V at 21ps, then 0.28V at 26ps, and so on ..
            );
          }

vector (ccsn_vout_template) { ... } //this is repeated for diff slew rates and load
}

output_voltage_fall() { ... }

output_current_rise() { //most important section for CCS. It provides detailed current waveform at all possible i/p slew and slow load. So, for 7x8 NLDM LUT, there would be about 56 (7*8) vectors here. So, this section usually long
   vector(CCT) {
    reference_time : 0.05; =>
    index_1(0.1); => i/p tran
    index_2(2.1); => o/p cap
    index_3("1.0, 1.5, 2.0, 2.5, 3.0"); => time
    values("0.0003, 0.007, 0.022, 0.027, 0.028" ); => current values of the driver model for current rising at o/p. NOTE: current is not in shape of bell curve here, not sure why, maybe the rise time is very sharp, so not captured here
    }

vector(next1) { .. } //for other slew rates and load

}
}
}

output_current_fall() { ... }

       propagated_noise_high () { // This is to be able to run noise runs. It propagates noise thru the cell, and shows how o/p waveform looks for different i/p waveform
          vector (ccsn_prop_template) { //similarly for other vectors
            index_1 ("0.595548"); => i/p noise height
            index_2 ("0.283096"); => i/p noise width
            index_3 ("0.0018245"); => o/p cap
            index_4 ("0.141002, 0.154822, 0.183612, 0.20954, 0.226069"); => time
            values ( \
              "0.810785, 0.727257, 0.671571, 0.727257, 0.810785" \ => waveform of o/p noise sampled at various times. At t=0.14, V=0.8V(which is =VDD), then it dips a little, then goes back to VDD. For noise_low, it will be bump from VSS, back to VSS
            );
          }
    propagated_noise_low () { ... } //for low noise

receiver_capacitance1_rise (delay_template_7x7_0) { //NOTE: these cap values are for o/p pins, not sure why we need for o/p pins, when we have it for i/p pins
        index_1 ("0.00205853, 0.00859214, 0.0216594, 0.0477043, 0.0998837, 0.204153, 0.412781");
        index_2 ("0.00023, 0.00081, 0.00196, 0.00426, 0.00887, 0.01807, 0.03649");
        values ( \
          "0.000400186, 0.000424999, 0.000440668, 0.000447652, 0.000451636, 0.000453669, 0.000454712", \
          ....
          "0.000527481, 0.000507608, 0.000493188, 0.000483650, 0.000477769, 0.000473495, 0.000472111" \
        );
      }
receiver_capacitance2_rise (delay_template_7x7_0) { .. }

receiver_capacitance1_fall (delay_template_7x7_0) { .. }

receiver_capacitance2_fall (delay_template_7x7_0) { .. }

} //end of ccsn_first stage

ccsn_last_stage () { .... } //repeat whole section above for last stage if more than 1 stage present in stdcell. NOTE: last stage is important one for any stdcell, as we care about what comes at the o/p of cell, and not much about happens on internal nodes. Usually, if stdcell has only 1 stage, we only have values for ccsn_first_stage (which is actually the last stage). If stdcell has multiple stages, then ccsn_first_stage is very small (has only voltage and noise waveforms, no other groups)

internal_power () {

      related_pin : "I";
      related_pg_pin : VDD;
      rise_power (power_template_7x7_0) { .. } //tables for both rise and fall power. Only shown for VDD pin, as power is delivered via VDD only
      fall_power (power_template_7x7_0) { .. }
    ...
   }
}
}
}

NOTE: there may be too many such arcs to rep current adequately at each slew rate and load. So, we also have compact CCS rep in .lib, so that .lib file doesn't grow tremendously.

Variations in process parameters: To account for this, new extensions added to liberty

Liberty Variation Format (LVF): These are extension to lib format. They are used to specify variation parameters which are needed for OCV timing analysis. Many new groups defined for LVF. We can use these groups in regular .lib files, as long as the tools support reading these LVF groups.

timing () {

cell_rise (delay_temp_8x8) { //regular cell delay for rise

       index_1 ("0.0019, 0.0058, 0.0137, 0.0295, 0.061, 0.1241, 0.2502, 0.5025");
        index_2 ("0.00016, 0.00088, 0.00232, 0.00519, 0.01093, 0.02241, 0.04538, 0.09131");
        values ( \
          "0.00814129, 0.0102127, 0.0141374, 0.0218088, 0.0370626, 0.067524, 0.128439, 0.250268", \
          ...
          "0.17358, 0.190633, 0.214257, 0.245593, 0.287164, 0.341309, 0.429989, 0.585219" \
        );
      }

ocv_sigma_cell_rise (delay_temp_8x8) { //sigma values for cell delay rise. Each value specifies 1 sigma delta from nominal delay value above. Used in POCV analysis. Here sigma value is different for different slew/load.

sigma_type : early;
        index_1 ("0.0019, 0.0058, 0.0137, 0.0295, 0.061, 0.1241, 0.2502, 0.5025");
        index_2 ("0.00016, 0.00088, 0.00232, 0.00519, 0.01093, 0.02241, 0.04538, 0.09131");
        values ( \
          "0.000311555, 0.000401142, 0.000580422, 0.000937877, 0.00165292, 0.00308312, 0.00594014, 0.011653", \ => Here, 0.0003 is the 1 sigma offset from mean of 0.0081 specified above for given load/slew rate. So, offset is about 5% from mean, which can be significant when added across multiple gates. Also, note that sigma offset as a % of mean delay is diff for diff load/slew rate, so having single sigma offset value would have given inaccuracies.
     ....
          "0.0101789, 0.0101964, 0.0102317, 0.0103037, 0.0104537, 0.0107767, 0.0143185, 0.021458" \
        );
      }
      ocv_sigma_cell_rise (delay_template_8x8) {
        sigma_type : late;
        index_1 ("0.0019, 0.0058, 0.0137, 0.0295, 0.061, 0.1241, 0.2502, 0.5025");
        index_2 ("0.00016, 0.00088, 0.00232, 0.00519, 0.01093, 0.02241, 0.04538, 0.09131");
        values ( \
          "0.000391953, 0.000508076, 0.000740406, 0.00120356, 0.00212998, 0.00398288, 0.00765485, 0.0149973", \
        ...
          "0.0104575, 0.010507, 0.0106064, 0.0108063, 0.0112123, 0.0120466, 0.0167537, 0.026188" \
        );
      }

... }

Details: Published: Friday, 21 September 2018 18:10; Hits: 1097

gnuplot: It is an open source software to plot graphs of any function. It's one of the oldest and easiest plotting software to use. Although it has name gnu in it, it's not associated with GNU project. It doesn't have GNU license, it has it's own license. The source code is copyrighted, but freely distributed.

Official website is: http://gnuplot.info/

Very good documentation is available here in this pdf file, which is also distributed with the software: http://gnuplot.info/docs_5.0/gnuplot.pdf

This is a very large doc of 250 pages. Probably 100 or so pages are relevant. A concise tutorial is here: http://people.duke.edu/~hpgavin/gnuplot.html

Another one here: http://hirophysics.com/gnuplot/gnuplot.html

gnuplot can plot in 2D and 3D, and can save plots in many popular formats as jpeg, png, pdf, etc.

Installation:

Install gnuplot on CentOS by typing below cmd on terminal:

sudo yum install gnuplot => Once done, gnuplot executable should reside in /usr/bin/gnuplot

To see version of gnuplot, type on cmd terminal:

gnuplot --version => On my system, it shows "gnuplot 4.6 patchlevel 2". This is gnuplot 4, while doc above is for gnuplot 5. So, there may be some differences.

Alternatives to Gnuplot:

I've always used gnuplot in past, just because I found it installed on my Linux systems at work. It's very easy to use, and easy to install. Works flawlessly.

However, now there are many more plotting software available. Within python, you have multiple libraries that you can import to do 2D/3D plots.Some of the popular ones are:

1. pygnuplot: Within python, there is python gnuplot (calles as pygnuplot). We can use this to import gnuplot in python, and then use all the capabilities of gnuplot inside of python.

Offcial website: https://pypi.org/project/py-gnuplot/

2. matplotlib: matplotlib is another library in python, that can draw complex 2D/3D plots. It's based on matplot, and most of the cmds are similar. See in python matplotlib section.

syntax:

- gnuplot is case sensitive.

- All command names may be abbreviated as long as the abbreviation is not ambiguous.

- Any number of commands may appear on a line, separated by semicolons(;).

- Strings may be set off by either single or double quotes, although there are some subtle differences.

- # anywhere in the line indicates comment, and rest of the line is ignored

- Commands may extend over several input lines by ending each line but the last with a backslash "\".

comands:

type gnuplot on terminal to bring up gnuplot shell, where you can type gnuplot cmds. It shows up as "gnplot >" on the screen. This is interactive session. If you want gnuplot to run on a file in which you already have all gunplot cmds, you can use a "batch" session by typing "gnuplot input1.txt" where input1.txt has all gnuplot cmds.

There are a lot of cmds that are gnuplot specific that you can type to plot graphs, set var, make loops (for), conditional stmt (if-else), etc. Many of linux cmds (such as cd, history, clear, etc) are also supported in gnuplot. Part III of gnuplot doc (page 66 onwards) details these cmds, Following are the imp cmds you can use for basic working:

1. various output formats:

gnuplot supports many different graphics devices. Use set terminal to tell gnuplot what kind of output to generate. Gnuplot supports a large number of output formats (jpeg, gif, pdf, X11, windows, etc). These are selected by choosing the appropriate terminal type, possibly with additional modifying options. See page 193 of gnuplot pdf doc above.

The gui box where all plots are drawn is called the canvas. Plot is drawn on this canvas, and can be chosen to be of any size.

set term <terminal_type> size <XX>, <YY> => This specifies the o/p type and size of canvas

set size <XX> <YY> => scales the plot itself relative to the size of the canvas. By default, plot will fill the entire canvas. Scale values less than 1 will cause the plot to not fill the entire canvas.

Typing "set term" shows all available terminals for particular installed version of gnuplot. "show term" shows current terminal. By default, it shows x11 terminal. x11 is X11 windowing system for bitmap displays. More info in "X window system" section. So, the canvas that you see is in same X windows format as your desktop gui.

ex: set terminal jpeg => This generates output in jpeg format. PNG, JPEG and GIF images are created using the external library libgd. In most cases, PNG is to be preferred for single plots, and GIF for animations. Both are loss-less image formats, and produce better image quality than the lossy JPEG format.

NOTE: once we set terminal to something other than x11 (lower case x), then the X11 window no longer shows or updates, since the terminal is set to "jpeg" or whatever we set it to. In order to see plots interactively, we have to set term back to x11 by running cmd "set term x11".

Use set output to redirect that output to a file or device. If we don't do this, then o/p is generated in that format and by default, printed on gnuplot, which is basically garbage. only x11 o/p is what can be displayed on gnuplot. All other formats will need to be redirected to some file. Then we can open that file with a image reader for that format.

ex: Following 3 cmds typed on gnuplot sets the o/p type to "png" and directs the o/p to file "test.png". Now the 1st plot that we do following the "set output" cmd will be saved to "test.png". For any subsequent plots to be saved, we have to again do "set output" cmd with appr file name.

gnuplot > set terminal png size 600,400

gnuplot > set size 0.5,0.5

gnuplot > set output "test.png"

gnuplot > plot 3*x+2 #only this plot will be saved to test.png.

NOTE: by default, we don't need to use any of the above cmds in interactive mode, since we want to see the plots interactively. Only once we are done with interactive plotting, we can go ahead and save those plots by using cmds above (may be in batch session)

3. set => to assign values to variables which are predefined. We need to set few settings, before we plot 2D or 3D plots.

title: ex: gnuplot> set title "Some Math Functions" => This sets title on top of graph

output: ex: set output "my_fig.png" => This saves the plot with given name

gridlines: This is to set gridlines on the plot, so that it's easy to see the points

set ytics 10 # y scale marks and values will be at every 10th unit, So, on y axis, we'll see 10, 20, 30 and so on, and there will be a horizontal grid line at each of these

set xtics 2 # x scale marks and values will be at every 2nd unit, So, on y axis, we'll see 0, 2, 4 and so on, and there will be a vertical grid line at each of these

Now, we run "set grid" and above xtics and ytics will go into effect.

However once we have defined xtics and ytics, we can use below lines to further customize grid lines.

set grid ytics lt 1 lw 2 lc rgb "#bbbbbb" set grid xtics lt 1 lw 2 lc rgb "#bbbbbb"

Where:
lt means line type (0 for dashed line, 1 for solid line)
lw means line width
lc means line color

These settings can be in any order. So, for example, The following plot settings:set grid ytics lc rgb "#bbbbbb" lw 1 lt 0 set grid xtics lc rgb "#bbbbbb" lw 1 lt 0

ex: following cmds customize plot and then we plot the graph
set title "Some Math Functions"
set xrange [-10:10] => sets range of x on X axis
set yrange [-2:2] => sets range of y on Y axis

set xtics 2; set ytics 10; set grid => If we have already plotted graphs before w/o the grid, we can just type replot. then all current plots will be replotted with grid.

2. plot/splot => plot/splot is the most used cmd. There are 4 plotting cmds:

plot: to plot anything in 2D (most imp cmd)
- ex: gnuplot> plot (sin(2*x)/sin(x)) => plots graph of function given vs "x". x is std variable. If we use any var other than x, then it errors out.
splot: to plot 3D. splot means "surface plot" which sows the surface of plot.
- ex: gnuplot> splot sin(x)*cos(y) => plots graph of func in z dir wih x and y as var on the 2 axis. Here, "x" and "y" are std var. Using any var other than x,y errors out.
replot, refresh: These replot previous plot/splot cmds.

2D plots using plot: There are various settings that can be used to control how to show our 2D plot.

set samples 10000 => this is used to increase resolution by inc # of samples (default is 100). This is needed to make 2D plots smooth, but it does take longer to plot.
plot log10(abs(sin(10*x)/sin(x))) => plots bode plot (log(x) plots log with base e). Note: No parenthesis around plot cmd, as it's optional

Multiplot graphs on same plot by specifying more than one plot:

gnuplot > plot 3*x+5, x**2+8, 1/x + 8*x => parenthesis are optional, multiple graphs separated by comma

We can also define a function, so that we can plot it easily w/o rewriting it everytime. We can specify x range of plot, and y range is chosen automatically to fit the graph.

gnuplot> f(x) = exp(-x**2 / 2) => we can also define func with more var, i.e, f(x,a,b)=1/(1+a*exp(-(b*x))), then plot f(x,1,2)

gnuplot> plot [t=-4:4] f(t), t**2 / 16 => NOTE: we changed the var from x to t. We are plotting 2 graphs here with t ranging from -4 to +4

3D plots using splot: There are various settings that can be used to control how to show our 3D plot. Below settings are recommended to be set before using splot cmd, as they make 3D plots more meaningful.

gnuplot> set hidden3d #This causes the surface to be opaque. It's hard to look at 3D plots with transparent surface, so we make it opaque.
gnuplot> set pm3d #this makes 3D palette mapped, i.e diferent values on z-axis are shown with different colors, so it's easy to see how 3D plot values are changing with change in x,y.
gnuplot> set contour both #this options draws contour lines on base (i.e X,Y plane) and on surface too. contour lines are lines where z values are same across any x,y. So, another easy way to see how z values are changing

gnuplot> set isosamples 100 #this defines the sampling rate (similar to sampling rate in 2D using samples), i.e how often x,y are sampled. By default, they are sampled every 10 isolines per u,v axis, which produces very crude 3D surfaces. setting it to 100 makes these 3D graphs very smooth, though it will take longer to plot 3D graphs with higher setting of isosamples.

gnuplot> splot 1/(1+exp(-(x+y))) #Now, after setting above 4 settings, we can plot this 3D plot of sigmoid function.

gnuplot> f1(x,y,a,b)=1/(1+exp(-(a*x+b*y))) => This defines func f1 in var x,y,a,b.

gnuplot> splot 20*f1(x,y,1,100) + 4*f1(x,y,2,1) => Here we plot func sum of f1 with different constant a,b. Resulting plot is 3D plot in terms of x,y

4. plot from user provided file: Both 2D/3D plots can be plotted via user provided file.

plot "abc.data" => plots data from abc.data, where abc.data single column file, with each row containing data corresponding to that line number (the line number becomes "x" on X axis).

However, if we have multiple cols, and we want to plot data for all cols, we specify col number. col 0 is psuedo col that translates to line number (or sample number), col 1 is 1st col of your file and so on

plot "abc.data" using 2 => plots graph with line number on x axis and col values for 2nd col on y axis

If we want to plot multiple cols on same plot, we can repeat above cmd multiple times but that will create multiple plots, each being a plot for only 1 col. To resolve that, we can do this:

plot "abc.data" using 2, "" using 3, "" using 4 => this says plot the same file on same plot using col=2,3,4

We can also use a for loop to be efficient (usually for files having lot of cols)

plot for [col=1:4] "abc.data" using col => plots for all col 1 thru 4

To have lines instead of points, we can do

plot for [col=1:4] "abc.data" using col with lines

plot for [col=2:3] "retire.data" using 0:col with lines

Details: Published: Friday, 05 January 2018 18:47; Hits: 587

Digital refers to high (power supply) and low (ground) signals. high is indicated by 1 and low by 0. Any circuit that is designed to operate on these high/low signals as inputs and generate high/low signals as outputs is referred as digital logic.

Using transistors, it's possible to design digital circuits which will compute AND, OR, XOR and many other complicated digital functions. Before getting into logic details, let's look at some basic digital concepts.

boolean algebra: A and B are digital (0 or 1)

AND = A.B

OR = A + B

NOT = ? = ~A

4 important logic reduction equations:

1. A + A.B = A => A.(1+B)=A

2. A + ?.B = A+B => A+A.B+?.B=A+(A+?).B=A+B

3. (A+B).(A+C) = A+B.C => A.A+A.C+A.B+B.C = A.(1+C+B)+B.C = A+B.C

4. (A+B).(?+C) = A.B + ?.C => A.?+B.?+A.C+B.C=

De Morgan's law: (can be extended to any number of variables, not limited to 2 variables)

1. ~(A+B+C+...X) = ~(A) . ~(B). ~(C). ... ~(X)

2. ~(A.B.C....X) = ~(A) + ~(B) + ~(C) + ... + ~(X)

Any logical eqn can be written as POS (product of Sum) or SOP (sum of product). SOP is referred as minterm while POS as maxterm.

SOP (AND-OR): F = x1.x2 + x3.x4

POS (OR-AND): F = (x1+x2).(x3+x4)

Shannon's expansion theorem: useful to compare logical equations

F(x1,x2,...,xn) = x1?.F(0,x2,...,xn) + x1.F(1,x2,....xn) = (x1?+F(1,x2,...,xn)).(x1+F(0,x2,...,xn))

Shannon's thm gives rise to canonical form: where there exists only form for each eqn. This is useful for comparing various eqn.

Minterm canonical representation: (SOP form)

F(x1,x2,...,xn) = F(0,0,...0).x1?.x2?...xn? + (It has 2^n minterms)

F(0,0,...1).x1?.x2?...xn +

........ +

F(1,1,...1).x1.x2...xn

Maxterm canonical representation: (POS form)

F(x1,x2,...,xn) = [F(0,0,...0) + x1 + x2 + ... + xn]. (It has 2^n maxterms)

[F(0,0,...1) + x1? + x2 + ... + xn].

........

[F(1,1,...1) + x1? + x2? + ... + xn?]

Digital basic blocks:

NAND gate: Y = ?(A.B)

NOR gate: Y = ?(A+B)

NOT gate: Y = ?A

NAND, NOR and NOT are called fundamental gates, as any logic function can be built using these 3 kinds of gates.

XOR gate: Y = A ? B

Adder: Adders are one of the fundamental logic blocks that are found in digital library along with other logic gates. Reason is because adders are used widely, so they are optimized and put in library

Half Adder (HA): adds 2 input bits and produces 2 outputs, Sum and Carry

S=A ? B

C=A.B

Full Adder (FA): adds 3 input bits and produces 2 outputs, Sum and Carry

S=A ? B ? Cin

C=A.B + Cin.(A+B)

optimizing logic functions:

There are combinatorial and sequential logic in any design. When we talk about optimizing design, it's means reducing the overall cost of design. Cost may be defined as area, speed, power, etc.

For any given logic function, firstly Truth tables are determined which describe what the output of a logic function is given it's input. These truth tables can be converted into equations with AND,OR etc implementing the functionality (i.e Y = A.B.C.D + ?.(C+E) + C.F). We can use logic reduction techniques (Karnaugh maps) to implement function using minimal logic. However K-maps are manual tools which are not very efficient for large logic equations. There are automated tools called synthesis tools, which produce such reduced logic. These tools determine prime implicant (PI) of the logic function, and then using Quine's Prime Implicant Therom, they select a subset of Prime Implicant that give minimal cost. If number of PI is very high, then heuristics are used to reduce run time. This may not give lowest cost, but are pretty close to being lowest cost solution for that function. Synthesis tool "Espresso" uses PI technique to optimize logic.

optimization of logic can be done by reducing eqn to 2 level logic either in SOP form (1st layer of AND gate followed by 2nd layer of OR gate) or in POS form (1st layer of OR gate followed by 2nd layer of AND gate). Although 2 level logic minimization is easy, but it may not give optimal cost, so multilevel logic minimization also done by tools.

ex: F = s.t.v+s.t.w.x+s.t.y.z+u.v+w.x+u.y.z => This is 2 level logic minimization. Can't be reduced any further in 2 levels. It uses 6 AND gates and 1 OR gate. However if factoring done, we get F = (s.t+u).(v+w.y+y.z) => This is 3 level logic minimization (since it has 3 levels = AND at 1st level, OR at 2nd level and AND at 3rd level). It uses 4 AND gates and 2 OR gate. So, it uses 1 less gate, and fanin of gates is also lower resulting in much smaller area, delay and power. However, we would not have arrived at this optimal solution if we confined ourselves to 2 level logic. This multilevel optimization is done via factoring (where prime divisors are found and boolean division done)

After optimizing combinatorial logic, we need to optimize sequential logic (flops, latches) too. However, there isn't much to optimize there as the number of flops is fixed by design requirement. The only place where flops can be optimized in in finite state machine (FSM). Synthesis tools optimize this by choosing lowest number of flops and gates to implement FSM.

Details: Published: Thursday, 28 December 2017 20:20; Hits: 1490

Verilog

Verilog is the most popular HDL, and unlike software programming languages which keep on cropping up dozen a day, HDL can't change that easily as it has to be supported by all the CAD tools. So, You should learn Verilog if you are ever going to work in VLSI/Hardware. It's a very simple language, and easy to follow (very much C style language). We have separate section for SystemVerilog which is just an extension of Verilog geared towards Testing, and NOT design.

Syntax:

Verilog is case sensitive.(different than vhdl which is case insensitive). All verilog keywords are lowercase. wire is verilog keyword. Wire and WIRE are totally different names (not keyword since their case is different than verilog keyword "wire" which is all small case).
Verilog file is composed of stmt, where each stmt starts with a keyword.
Verilog file syntax is: comment | module ... endmodule | primitive ... endprimitive | compiler_directive
- comment: // => for 1 line, /* ... */ => for multiline (same as C)
- module: module ... endmodule
- primitive: primitive ... endprimitive
- Compiler directives: can be put anywhere in verilog file and are handled by preprocessor part of compiler. All compiler directives precede by ` (back quote). Their scope is until the point where next compiler directive changes previous directive. Below are few common Compiler directives:
  - `include "file1.v"
  - `define WORDSIZE 64
  - `timescale 1ns/1ps => 1 ns is used for reporting but 1ps is used internally for resolution.
  - `ifdef `else `endif
  - `default_nettype trireg => default is wire
  - `resetall => resets allcompiler directives back to original default values

So, in short, everything in verilog file has to be within "module ... endmodule" keywords (with exception of comments and compiler directives). There is one more exception "primitive ... endprimitive" keyword, which is similar to "module ... endmodule", but it describes logic behaviour in terms of truth table, instead of in terms of logic operations. This is useful in modeling certain logic. Since primitive is only used for modeling latches/flops, etc, and not used in real design, we will not worry about it.

Keywords:

Every language has reserved keywords that are used by compiler to make sense of the program. Look at pg 17 of "verilog book" by vivek sagdeo.

gates: (total 26 primitives provided from which larger structural models may be built). Mostly used in models of gate stdcells. UDP are used to model more complex seqential elements. These gates may take i/p as 0,1,x,z and give o/p as 0,1,x. Usually effect of z and x on i/p are the same, as they give same o/p.
- A. logic gates: and/nand, or/nor, xor/xnor. when instantiating, 1st parameter is o/p, rest all are i/p. delays may be specified for propagation time, while strength may be specified on outputs. o/p can't be z (only 0,1,x).
  NOTE: there's no mux defined, so we define mux using udp (primitive). muludp (Q,S,A,B) used in model of MUX2, etc.
  ex: nand #1 g_mynand (myout, myin1, ~myin2); => typically used in gate models of NA210, etc.
- B. buf/inv gates: buf/bufif0/bufif1, not/notif0/notif1: buf/not have 1 o/p and 1 data i/p while *if* have 1 extra ctl i/p to model 3 state drivers. o/p is z when ctl line is inactive.
- C. mos gates: nmos/pmos, rnmos/rpmos: these model transmission gates. "r" versions model transistors with very high resistivity when conducting. These have 1 o/p(src/drn), 1 data i/p(drn/src) and 1 ctl i/p(gate). o/p is z when ctl line is inactive.
- D. Bidir gates: tran/tranif1/tranif0, rtran/rtranif1/rtranif0: these model true bidir transmission gates. tran/rtran have 2 inout terminals, while *tranif* have extra ctl i/p.
- E. pullup/pulldown gates: pull0/pull1/pullup/pulldown: these are single o/p, which drive pull strength value onto o/p net. strength may be specified
strength: weak0/weak1, strong0/strong1, supply0/supply1, highz0/highz1
data types: wire/wand/wor/tri/tri0/tri1/trior/triand/trireg, reg, time, integer, real, input, output, inout, event, parameter. We define variables to be one of these data types. We'll look thru all data types later. Variable names can be any set of characters, but for special characters, we have to use "escape character \". Also, var can't start with numeric digits, so \ should be used there too, where var name starts with digits 0-9. escaped identifier terminate with white space.
- ex: wire \reset* ; => this creates a var named "reset*". IF there was no whitespace (wire \reset*;) then var would be "reset*;", which would be syntax error since ";" is missing. other ex: wire \123name1 ; => name starts with digit, so "\" used.
- ex: Let's say netlist is flat, and has net names of form "a/b.c". So, we'll need to use special char "\a/b.c". But then a space is needed after that, otherwise tool won't now where that name ends. Note the space below
  veridian_tb.u_dig_top.\u_efuse_wrap/u_efuse_top.u_rom0 .mem[8][31] = 1'b0; //NOTE: "u_efuse_wrap/u_efuse_top.u_rom0" is name of 1 instance in verilog netlist. If name itself started with \, then we need \\. i.e: \efuse/i.b => \\efuse/i.b (NOTE: always a space at end)
- sometimes, we need everything to be treated as one name, and space messes that up. In that case, we can use double quotes.
  ncsim> tcheck -off "veridian_tb.u_dig_top.\u_efuse_wrap/u_efuse_top.u_rom0 .mem[8].u_sync_flop"
Stmt: case/casex/casez/default/endcase, if/else, while, for/forever, repeat, (In SV, we can use foreach, see in sv.txt file)
Blocks: begin/end, module/endmodule, specify/endspecify, primitive/endprimitive, function/endfunction, task/endtask, table/endtable,
Others: always, assign/deassign, initial, fork/join, edge/posedge/negedge, force/release, defparam/specparam, wait

Data Types:

Any language has data types that var are assigned to. Verilog has 2 diff kind of data types: structural and Behavioral.

Structural data type: This data type is used to decribe the structure of design. 2 primary structural data types are reg and wire. These refer to physical nets in deign. In real hardware, any data type can have only 2 values - 0 or 1 (since these are digital logic that can be either at VDD or GND). However, in Verilog, we allow every reg/wire to take 4 values - 0, 1, X (unknown) or Z (floating). X and Z are extra values that help us in simulating the design and point to errors. Reg and wire look similar, however there are subtle diff b/w them. Initial value of reg is "x", while that of wire is "z". Declaration for wire/reg is done inside a module, but outside of initial/always blocks.
Behavioural data types:

Assignment:

Let's say we have var defined (i.e var "a") and assigned it to correct data type (i.e type wire). Now we need to assign values to those var (i.e a=1). This needs to mimic how values on nets/wires etc are assigned in real hardware. For this, Verilog has 2 kinds of assignment statements. Each assignment needs to be either one of the 2 kinds:

Procedural assign stmt: used within procedural blocks (always, initial). initial is used for tb only. Here we can directly assign values to reg or wire, ex. a=2'b00; or digtop_tb.b=0 or a<=b&c;
- NOTE: any variables being assigned here have to be "reg" and not "wire". This is just a verilog syntax thing. Usually we get an error "A net is not a legal value in this context" when we directly try to assign some port of dut in the tc file. We have to declare it as a reg in digtop_tb file. However "force a = 0" is allowed inside "initial block" even if a is a wire, as force doesn't care.
Continuous assign stmt: used with assign stmt. ex: assign a=0; If we write a=0; then it has to be within always or initial block (i.e make it a procedural assign stmt).

So, within a module, every stmt has to be within procedural block (always, initial) or has to be a continuous assign stmt. Gate primitives(i.e and, or, etc), module instantiations, udp instantiation and specify_blocks can also be used.

Verilog Process:

One important place where HDL differ from Regular software languages is "Process". Process is a special block to model logic running concurrently in hardware. Only 2 ways to declare a verilog process: always & initial.

A. always:

always stmt keeps on running for ever. It's an infinite loop similar to while (1) {...}. It starts executing stmt that are within the always until it reaches the end, then it starts again, and keeps on repeating. It stops only on sensitivity list or on encountering # delay. We can have always block with sensitivity list or without sensitivity list.

1. w/o sensitivity list: used to implement clk osc. ex: always begin ... end.
ex: below code implements a clk with frq=2. It starts with clk=x at time=0, then goes to clk=0 at time 1, then clk=1 at time 2, then comes to the beginning of loop and again gets clk=0 at time=3 and so on.
reg clk = 0; //This stmt added so that at time=0, clk=0 and not x. It starts with x and then switches to 0 at time 0. We don't see an x in sims at time 0, as both events happen at time0, but there's a -ve edge at time 0 which is seen by simulator.
always begin
#1 clk = 1'b0; //delay of #1 causes the simulator to move this event to queue to be evaluated 1ns later, and go to other always block
#1 clk = 1'b1; //same as above. after this, it goes to beginning of this loop, and again encounters #1 causing it to wait.
end

ex: always a = 1'b0; => this will cause simulator to hang, as this loop will run forever. Once control comes to this loop, simulator can never exit this always stmt, as there's no delay or sensitivity list to exit this.
ex: always a = b; => this will cause simulator to hang as explained above. To prevent this, we need always @* a = b; This causes loop to get executed only when b changes.
ex: always a = #10 b; => this will run, but it runs and "a" gets updated at 0ns, then at 10ns, 20ns and so on with whatever "b" was 10ns earlier. It doesn't have anything to do with "b" changing, as it doesn't have @*.

2. with sensitivity list: Wait for event operator "@" can be used to provide sensitivity list. @ is a edge sensitive event control ("wait" provides level sensitive event control). If the sensitivity list contains all the inputs of the always block, then the code in always block will execute whenever it's i/p changes. This models a combinatorial logic.
V1995: always @a begin stmt end; => wait for +ve or -ve edge of a, and when that event happenes, execute all the stmt with "begin end" block. Then return back to start of the block at always and keep waiting until "a" changes. The block won't look at "a" while it's within the "begin end" block. This is OK for comb logic, as it will look at it when it comes at start of always block, as long as there are no delays inside the block.
V1995: always @(a or b or c) begin .. end => whenevr any of a,b,c,d changes.
V2001: always @* => implies that whenever any i/p changes, execute the block
V2001: always @(a,b,c) => comma allowed instead of or.

NOTE: For the clk generation example shown above, if we use always @*, then clk will always remain at x, since there is no input inside the block (it's tied off values 0 or 1 which don't change).

B. Initial:

initial-block is a special-case of the always-block, one which terminates after it completes for the first time. however keyword "forever" can be used to keep on running it for ever.

always is used to model both seq (NBA <=) and comb (BA =). Comb can also be specified using cont assign stmt (assign). assign is preferred for comb. always is used for comb when using case, if-else, loops (for,while,forever) or when we want to have large comb logic inside a block as an entity. Any var within always should be a reg, while that in assign should be wire. by defasult a var is wire, so no need to define it explicitly. NBA should be used wherever there are storage elements involved.

always stmt models flop or latch:
1. flop: always @(posedge clk or negedge reset).
2. latch: always @* begin if(en) Q = D end // whenever an if-else/case stmt is incompletely specified, latch is built. We could have also coded it as "always @(en,D) begin .. end".
3. comb logic:
A: assign x=b&c; //assign stmt builts comb
B: always @* if (b) x=c else x=a; //whenever if-else/case stmt is completely specified , comb is built. In the sensitivity list, no edges can be specified for a comb logic. If any of the i/p are omitted from sensitivity list (i.e when @* is not used), then it doesn't model a comb logic.

--------------
NBA vs BA: all BA assgn can be replaced with NBA, and all sims will work fine, but will take more time, as NBA use 2 queues (evaluate and update). NBA are necessary for sim purpose. synthesis tools don't diff b/w BA and NBA, and will generate the same o/p netlist.

ex:
always @(posedge clk)
regb = rega;
always @(posedge clk
rega = data;

This is a shift reg when written correctly using NBA (<=). It is OK even with BA, provided first always block executes before second always block.
In pre-synthesis simulation, the second always block may execute before the first always block. In that case, data will run through to regb on the same posedge of the clock. However, this will synthesize to a shift register, where data will not run through to regb on the same posedge of the clock. So, sim o/p will differ b/w rtl vs gate.

statements:

1. if-else: if (expr) statement_or_null [else statement_or_null]. If else can only be used with a procedural assign stmt (i.e always, begin). It can't be used with continuous assign stmt (i.e assign). else is optional and is paired with nearest if clause. expr of 0,x,z is treated as false, while any other value is treated true. NOTE: there can be only 1 "if stmt" in a "procedural assign", but if there are multiple (with no else clause), then the last one overwrites all previous ones. Ex:
always @(posedge ....) begin
if (a=1'b1) x<= c&d;
if (b=1'b1) x<= e&f; => This gets prioroty as it's last one simulated. So on +ve edge of clk, if both a and b are 1, then x gets value of e&f; It's same scenario as we do in state machine, where we assign default, and then overwrite it only in states where we need to. However this kind of code should be avoided as it can have 0 width glitch as x will change from "c&d" to "e&f" in zero time.
end
2. ?: conditional operator: expr ? expr1 : expr2. Used in place of if-else stmt. One diff b/w ?: and if-else is that since this is an operator, it can be in any expr that is either a part of procedural assign stmt or continuous assign stmt.
3. Loops: 4 stmt: repeat, for, while, forever. These can only be used in procedural assign stmt.
4. case: case expr comparison is effective when all compared bits are identical. comparison is done using 4 valued logic (0,1,x,z), so 2 bit case cond can evaluate to 16 diff values. Therefore, special types of case stmt are provided, which can contain don't-care values. casex treats both "x" and "z" as don't care, while casez treats only "z" as don't care. "?" is treated as "don't care" value for comparison purpose.
ex: reg a;
casez (a) //here if a=1'bz or 1'b?, then stmt1 is executed since then it's don't care. after stmt1 matches, stmt4 is never looked at.
1'b0 : statement1;
1'b1 : statement2;
1'bx : statement3;
1'bz : statement4;
endcase
ex:
case (1'b1) //stmt1 executed only if a=1'b1
a : statement1;
endcase

if-else-if vs case stmt:
---
1. if-else is more general as any set of comparison expr may be used in if-else-if, while in case, all case expr are compared against a common controlling expr.
2. if stmt comparison involving x/z results in x/z which is interpreted as false (unless case equality === is used). case stmt compares with x/z, and match happens only when all x/z match. So, comp involving x/z is interpreted as not matching to anything (if only 0/1 bits are there) and results in going to default. So, comp expr in case may include x/z which is useful in debug.

--------------

5. function and task: explained in testbench section below.

Module definition/instantiation:

Module definition:

In v2001, we specify port dirn and type in one stmt within brackets. This is aka "ANSI C style for module ports definitions". Recommended to have 1 input or 1 output per line

---------
module patgen ( //declarations are contained in parentheses itself
input wire [7:0] a,b, // no need to specify wire, as it's understood by default
output c,
output reg [3:0] y //note last entry doesn't have a comma after it
);

//internal wires, reg should be decalred next
reg [1:0] a,b; //
reg clk = 0; //in v2001, we can init var at time 0 within the declaration, instead of having separate initial stmt. Note if we have, reg a=top.b; => it will assign a with whatever top.b signal is at time=0, which would be x(if top.b hasn't been init). to assign a cont, we should do:
always @* a <= top.b;

//now have always/initial/assign blocks
initial begin ... end
assign b=clk;
always @* a <= b&~clk;

endmodule

---------

or we can define module ports as below: //not preferred as it's not as compact as above. It's NOT ANSI C style.
module patgen (a,b,c,y); //declarations not contained in parentheses
input wire [7:0] a,b;
output wire c;
output reg [3:0] y;
...
endmodule

In v1995, we had to specify port dirn and type separately in 2 stmts outside the brackets with port names within brackets. This is what we see in above ex. This was a necessity in V1995. With V2001, we don't need to do this anymore.
module patgen (a,b,y);
input a,b;
output y;
reg y;
// wire a,b; //a,b data type not defined as it's understood to be wire by default
...
reg clk;
initial
clk = 0; //in v1995, we had to use initial to init a var
...
endmodule

NO ports: when module doesn't have any ports, we can define like this
module Adder; //or as module Adder();
begin ...end ...
endmodule

module instantiation:

module name followed by instance name and pins
-------------------
Ex:
module Adder ( .... define adder ...)
Adder I_adder(A,B,S,C); //instantiation of Adder

Parameter/localparam:

parameter: To set value to a particular parameter within a module. They are written within module body, but NOT within always or assign block. It can be overridden by using defparam, or during instantiation.
ex: module tfilter ( ...);
parameter SREG_SIZE = 2; //sets default value to 2
parameter real r=5.0 from [-5:+5] exclude 0; => specifies real param with range of -5 to +5 (both included) and excluding 0. []=>include range, ()=>exclude range
endmodule

1. override using defparam:
defparam a_cont_deg.SREG_SIZE = 3; //overrides parameter value of 2 in a_cont_deg isnt of tfilter module. can't be inside always or assign block.
tfilter a_cont_deg (.reset(..), ...);

2. override during instantiation: If an override value is not specified, default value is used.
ex: tfilter #(3,6,7 ..) a_cont_deg (.reset(..), ...); //if multiple parameters, they are overridden in the order they are defined in module. The ones that are not defined use default values. To avoid confusion, we can do explicit assignment (added in V2001) as shown below (recommended):
ex: tfilter #(.SREG_SIZE(3),.MSIZE(4),...) a_cont_deg (.reset(..), ...); //being explicit (V2001)

3. on irun cmd line:
+defparam+veridian_tb.u_rom0.PRELOADFILE="efuse.img" \ => works. But if full path to efuse.img provided, then you need \ before "
+defparam+veridian_tb.u_rom0.PRELOADFILE=\"${DVWORK}/efuse/efuse.img\" \ => works
-defparam veridian_tb.u_rom0.PRELOADFILE="${DVWORK}/efuse/efuse.img" \ => doesn't work??

localparam: was added in V2001. It doesn't allow the parameter value to be overridden. Trying to overwrite it gives an error.
module tfilter ( ...);
localparam SREG_SIZE = 2; //sets value to 2. It can't be changed from anywhere.
endmodule

Time scale:

Timescale directives should be given before each module in each file separately. Since it's compiler directive, it's expanded at time of compilation. So, for nested modules, lower level module timescale is inherited from higher ones if not defined for the lower ones. If timescale directive not defined, then previous one remains in force until overridden by next such directive. If nothing defined at all, then timescale directive from simulator cmd line option used (irun ... +nctimescale+1ns/1ps ..) => equiv to defining timescale at beginning of each source file. So, depending on order of compilation, modules with no timescale directive might get timescale values from different files, which might cause different simulators to yield different results. That's why it's recommended to provide Timescale directive at beginning of each file or module.
`timescale time_unit/time_precision
Delays are multiples of time_unit rounded to time_precision. #x => x*time_unit rounded to time_precision. On nWave, it shows up as that number in units you choose.
i.e.
`timescale 10ns/1ns => #1 is 10ns.
#1.55 a = b;
'a' gets 'b' after 16 ns because 10ns*1.55 = 15.5 ns = 16ns rounded to nearest 1ns

`timescale 1ns/1ps => #1 is 1ns.
#1.00055 a = b;
'a' gets 'b' after 1.0006 ns because 1ns*1.00055ns = 1.00055ns = 1.0006ns rounded to the nearsest pico second

values of 1,10,100 are allowed in time_unit with unit of meas as s,ms,us,ns,ps,fs

NOTE: in verilog, time units (ns,ms,etc) are not allowed to be part of time value (ie #10, but not #10ms, etc). time units are inferred from timescale directive. However, SV allows us to specify #10ns, thus allowing us to remove ambiguity as to what delay #10 represents. It also saves typing when we have to specify large units as seconds (in SV, we write #10s; while in verilog we have to write #10_000_000_000;).

Delays:

Used in 3 different kinds of modelling: gate level, dataflow, behavioural.

I. Gate modelling: models gate delays

3 values of delays allowed which is min delay, typ delay and max delay [for each of rising, falling and turnoff (Z or floating) event, 0->Z or 1->Z]. turnoff event only applies to bufif0/1, notif0/1.
4 types of delay:

Rise delay: 0,x,z->1
Fall Delay: 1,x,z->0
Turn-Off Delay: 0,1,x->z
Change-to-unknown: 0,1,z->x (delay value is taken to be the minimum of the above three)

The 3 entries separated by comma are each for rising, falling and turnoff(if applicable). Within each entry, the triplet are for min,typ and max delay. If only 1 delay value specified, it's taken for all 3(rise,fall,turn-off). If 2 delay values specified, they are taken for rise and fall resp and min of these is taken for turn-off.

Ex: bufif1 #(1:2:3,4:5:6,7:8:9) (bus, out,dir) => #(R_Min:R_typ:R_max, F_Min:F_typ:F_max, Z_Min:Z_typ:Z_max)
Ex: nand #(12,15) g(a,b,out) => nand gate has rising delay of 12 and falling delay of 15 time units. This applies for all 3: min,typ and max.
Ex: buf #1 I_buf (Y,A); => in gate level models, we've stmt like this. this causes of delay of 1 units on Y, but the flow continues, so the other rtl stmt following this don't see this delay. This is because these are module inst, which are separate continuous blocks by themselves.

The delay model above is used for distributed delay (every element of ckt is assigned delay, and those are added up). Other delay model being used is lumped delay, where complete modules instead of gates are specified delays. The third option is path delay or pin-to-pin delay, where delays specified from pin to pin using specify.

Path delay:

To specify delays to paths across a module (for ex for a flop, we might want to specify D->Q pin delay) we can use specify block inside a module to specify timing b/w module's i/p and o/p. specify specparam ... endspecify (see in verilog book, pg 155). this is called path delay mode as the total delay from i/p to o/p is specified, and runs faster in simulation than distributed delay mode where delay is assigned to each gate in the module. (for TI_functiononly, we assign #0 gate delay to primitive gate). specify is useful when we do sdf annotation, as annotator will disregard delays in specify section, but will keep delay values specified as # delay. All our timing checks (setup,hold,etc) are also put in specify section for the same reason, so that these values will be disregarded when sdf values are present. However, the arcs in specify section have to match 1-to-1 with arcs in sdf file, else we'll get warnings that sdf annotation didn't happen for those pins.

2 methods to describe module path delays: one using "=>" and othet using "*>"

=>: establishes parallel conn b/w src i/p bits and dest o/p bits. Each bit in src connects to corresponding bit in dest. Ex: In models file in AN210.v, we have (A +=> Y) = 0.01 ; => A=>Y would imply rising/falling delay from A to Y is 0.01 time units (upto 9 values can be specified for delays as explained for delay ex above). However with += it means i/p A is not inverted internally in the module, -= means i/p A is inverted internally in the module (i.e there is a bubble on i/p A) . So, -= will apply to cells with input AZ (ex: NO311F). This polarity token is ignored by most logic simulators, but may be used by timing analyzers.
*>: establishes full conn b/w src i/p bits and dest o/p bits. Each bit in src connects to every bit in dest. Ex: in DTP20.v, we have ( CLK,PREZ *> Q,QZ) = (0.100000:0.100000:0.100000 , 0.100000:0.100000:0.100000); => delay from clk->Q, clk->QZ, prez->Q, prez->QZ are all 0.1 unit delay for rise/fall (min,typ,max)

II. Dataflow modelling: to model delays for nets or assign stmt

all 3 ex below are equiv. Any changes of the signals being assigned to the o/p net will only be propagated after the specified delay. These delay are called inertial delay, as any changes in i/p which are less than the delay specified are ignored (i.e glitches < delay are not propagated). inertial delays are easy for the simulator to implement.
In ex below: If either of the values of in1 or in2 should happen to change before the assigment to out has taken place, then the previous assignment will not be carried out, and new assign will be substituted for the old assgn with new timing, as input pulses shorter than the specified delay are filtered out.

ex 1: net declaration delay - this delay is associated with net (propagation delay) and is added to the gate delay of driver, if any. This models wire with delay of 10.
wire #10 out;
assign out = in1 & in2;

ex 2: Regular Assignment Delay - this models delay from when RHS changes to when LHS changes.
wire out;
assign #10 out = in1 & in2;

ex 3: Implicit Continuous Assigment - this is same as ex 2, but 2 stmt combined into 1.
wire #10 out = in1 & in2;
wire a = 1'b1; => equiv to forcing a to 1.

ex 4: Incorrect (this is inter-assignment delay with incorrect syntax)
wire out;
#10 assign out = in1 & in2; => here, this continuous assign is outside always/initial block. So, delay of 10 should cause all stmt after it to be delayed, but continuous assign are all running in parallel, so invalid. To make it work, we've to put it in always block as shown below in "inter-assignment delay"

III. behavioural modelling: These delays can only be used in always/initial blocks in RTL for reg type.

1. inter-assignment delay (inertial delay), aka regular delays or delayed assignment (most common type used in testbench). This used with either continuous assgn ( as with wire above in ex 1-3) or with always/initial block. used most commonly with blocking assgn (a = b etc). This inertial delay, so glitches less than propagation delay are filtered out. SDF file treats delays as intertial delays, so glitches get filtered out.
ex:
reg q; //note: q is defined as reg since it's used within always block
always @* #10 q=x+y; //waits for 10 time steps before executing the cmd. So, this causes a delay of 10 for all subsequent rtl stmt within that always block.
This delayed assgn above is equiv to this code:
begin // Equivalent to delayed assignment above.
#10; // Delay.
q = x+y; // Assign q. Overall same as #10 q = x+y. Note: no temporary staorage of result(x+y). Whatever is the current value of x,y is used.
end

2. intra-assignment delay (transport delay). used most commonly with non-blocking assgn (a <= b etc). This is non-inertial delay, so glitches not filtered out. These are used to model delay for that piece of logic only, as it would be in real gate delay. Using BA to model transport delay (as shown below) doesn't work (have to use NBA)
ex:
reg q;
always @* q = #10 x+y; //value of x + y is stored in tmp var at the time that the assignment is executed, but this value is not assigned to q until after the delay period, regardless of whether or not x or y have changed during that time. After storing it in tmp var, sim waits for #10 (execution moves to other always/initial blocks), and then copies tmp to q after delay of 10. execution doesn't move forward for this block, as it's BA. Any intermediate changes to a,b are lost, as it doesn't come back to start of this block. It comes to start of "always @" waiting for any signal change on x,y only after all stmts are done in this block. So, glitches are filtered here (inertial delay). This causes a delay of 10 not only for this stmt, but for any rtl code after this stmt within that always block, as it's BA. Stmt in other initial/always block still move forward, as #delay moves control to other blocks.
This intra assgn delay above is equiv to this code:
begin // Equivalent to intra-assignment delay.
hold = x+y; // Sample and hold y immediately.
#10; // Delay. NOTE: this delay applies to all stmt after this. At this point, execution jumps to other blocks, and comes back here after #10 delay has occurred.
q = hold; // Assignment to q. Overall same as q = #10 x+y.
end

General rules for assigning delays:

Assign of delay to LHS or RHS of BA to model combo logic is flawed.
- 1. LHS delay: always @* #10 q=a+b; => bad for mdeling, ok for tb. will cause q to update with latest value of a+b. Supposed to model inertial delay but flawed. use "assign" instead of "always @*".
- 2. RHS delay: always @* q=#10 a+b; => bad for mdeling, bad for tb. will cause q to update with sampled value of a+b, and missing out on any changes of a,b within that #10. So, value of q will be wrong for a while until a,b changes again after #10. supposed to model transport delay but flawed. use NBA instead of BA. @* is imp here. If we use "always q=#10 a+b;" then q will get updated every #10 irrespective of changes on a or b (see detailed explanation above).
Assign of delay to LHS of NBA to model combo logic is flawed, but to RHS of NBA is good (models non-inertial(transport) delay).
- 1. LHS delay: always @* #10 q<=a+b; => bad for mdeling, bad for tb (as it's inefficient compared to BA, so use BA instead). will cause q to update with latest value of a+b. same behaviour as with LHS delay on BA.
- 2. RHS delay: always @* q<=#10 a+b; => good for mdeling, ok for tb. will cause q to update with sampled value of a+b. since it's NBA, it willgo back to beginning of "always @" and wait for any new event. So any changes of a,b within that #10 will again be captured. So, value of q will be updated after #10 for any change on a,b. So, it models transport delays accurately.
Assgin of delay to LHS of "assign" is good and models inertial delays
- 1. assign #5 q=a+b; => q is scheduled to be updated #5 later. However, if a,b updates within that #5, then that future scheduled event is killed and replaced with newer event later. So, changes on a,b < #5 are killed, so it's inertial delay. So, assign is not exactly similar to "always @". assign don't queue up o/p assgn, they only keep track of next o/p value and when it will occur. It's like "always @* #5 q<=a+b;" but it doesn't wait for #5 before looking at i/p again. It continuosly looks at i/p.
- 2. assign q=#5 a+b; => not valid (RHS delay not allowed on assign)
- 3. assign q<=a+b; => not valid (NBA not valid with assign)

NUTSHELL: use assign with BA to model inertial delay, and RHS delay with NBA to model transport delay.

ex:
always @* q <= @(posedge clk) D; D is evaluated whenever it changes, but is not assigned to q until after clk posedge. => trnasport delay
always @(posedge clk) q <= D; Here on posedge of clk, instantaneous value of D is assigned to q (D is not held anywhere) => inertial delay

ex: Here clk starts with x.
initial #2 clk = 1'b0;
always #1 clk = ~clk; => at #1, RHS clk=x,so LHS clk=x; at #2, RHS clk=0 or x depending on whether initial executed first or always executed first. Then LHS clk=1 if initial executed first. If always executes first, then LHS clk=x, but then initial will execute and clk will be 0. So, b/w #2 to #3 clk=1 or 0, depending on what stmt executes first.
From the sims, looks like initial executed first, resulting in clk=1 at #2. however, by changing clk<=1'b0, we can make always execute first, resulting in clk=0 at #2.

NOTE: By default, verilog gate level models and interconnect delays are always simulated as transport delays, but they look as if they are simulated as pure inertial delays (since they don't allow glitches shorter than prop delay to pass thru). This is beacuse, by default, pulse_r and pulse_e are set to 100% in simulators. See simulation section for more details.

Operators: operator precedence decides what the final value is (pg 310 in verilog book). All operator treat Reg/wire as unsigned (unless specified explicitly as signed in V2001) and real/integer as signed or unsigned (depending on what they are specified as).

--------
logical:   !, &&, || => logical negation,and,or. two sides are logical T or F values. True is "1" or anything non-zero, while false is "0".
ex: a=4'b1100; => here a is non-zero, so a=True, so !a=false or 0. If a=4'b0000; then a=False, so !a=True or 1.
Bitwise: ~, &,   |,   ^, ^~ (or ~^) => bitwise negation, and, or, xor and xnor. Each bit in that operand is operated on.
ex: a=8'b1010xzxz; => ~a=8'b0101xxxx;
Unary reduction: & (nand = ~&), | (nor = ~|), ^ (xnor = ~^ or ^~) => unary reduction produces the single bit operation on all of the bits of the operand. Unary reduction and bitwise operation are distinguished by syntax.
Ex: any_val = |(val[7:0]); => or of all 8 bits of val.
Ex: !(reset_n) => when reset_n is not high/true (i.e low/false). similar to ~reset_n

comparison:
==, != :logical equality/inequality, result will be unknown if x or z in the input.
Ex: if (A==1) then stmt1 else stmt2; => if A=x, then this will result in x, which is "false" so stmt2 will execute. So, with ==, comparison with "x" or "z" is always evaluated to false. To do comparison with "x" or "z", we use ===.
===, !== :logical equality/inequality including x and z
> >=, <, <=, >>, << :less than/greater than, right/left shift

arithmetic: +, -, *, /, %(modulus).

concatenation: {} => joins together bits from 2 or more comma separated expr. It can also have a repition multiplier right before {.
Ex: {c, b, 2{a,b}} => {c, b, a, b, a, b}
ex: wire [3:0] c = {4{a}}; => NOTE: curly brackets are reuired outside of 4 too. Just 4{a] won't work, as 4 copies of a will need to be concatenetated with outer brackets. ex: c = 4{a} => c = a,a,a,a => incorrect. {4{a}} => c={a,a,a,a} which does the correct concatenation

NOTE: in SV, we can use these operators in C style. i,e a=a+b can be written as a += b;

structural data type: Wire or reg:
------------
2 primary data types are reg and wire. declaration for wire/reg is done inside a module, but outside of initial/always blocks. Initial value of reg is "x", while that of wire is "z".

In V95, reg, wire, ports were all unsigned only. The only data type in V95 that could be signed was integer data type. so, if we wanted to do signed operations, we had to define that var as integer which limited that number to 32 bits. Or, we could sign extend the msb of reg, and operate on those extended reg. Remember that any reg[x:0] is interpreted as +ve number 0 to 2^x-1 in V95. + just does a bitwise addition (numbers can be -ve too), while - does a 2's complement of the number to subtract and then adds it to the original number. However in V2001, reg, wire,ports could be defined as signed in which case, verilog internally sign extends them to get the correct result.
NOTE that result of any operation is dependent on RHS (operator and operands, number of bits in those, etc) and not on LHS.

NOTE: In verilog, any -ve number is internally rep in 2's complement format. We can do any arithmetic on any signed/unsigned number and we'll always get the correct result, if the result is within the range of numbers that can be rep. For ex in 4 bits, +ve numbers are 0 to +15, while -ve numbers are -8 to +7. If we subtract 0101 from 1111, then result would be 15-5=1010=10 (if 1111 is considered unsigned=15) or -1-5=1010=-6(if 1111 is considered signed=-1). So result is correct in both cases. If result falls out of range, then we need extra bits to rep it. For ex: in 4 bit rep: 2-5=0010-0101=0010+1011=1101=(rep correctly as -3 in 2's complement since it's within range but incorrectly as +13 in unsigned rep, since it's outside valid range). However in that case, we have to sign extend the operands by extra bits so that we get correct result which has the extra msb. If we don't sign extend, then look at this ex in 3 bits: if A=-3=101, B=+2=010 => Sum=101+010=111 (in 4 bits, it's 0111 which is +7=> not correct. In 3 bits, it's still rep correctly as -1). However, if we sign extend A and B to 4 bits, then Sum=1101+0010=1111 which is -1=> correct. We sign extend, since adding 2 n-bit values, results in (n+1) bit value (we can sign extend indefinitely any number and the number will still be the same = that's characteristic of 2's complement).

ex: reg cntr[3:0], cntr <= cntr - 5; => this takes a 2's complement of 5, reps it in same no. of bits as other operand (which is 4 bits in this ex) and then adds it to cntr. This will work correctly as long as cntr is 0 or +ve. As soon as cntr value becomes -ve, it won't be rep correctly as unsigned number. As a signed number, cntr can still be rep correctly in 4 bits upto -8, but after that more msb bits are needed to rep it correctly.

V1995: ex: to add signed numbers (in V95) => here input should be provided in 2's complement format in order for this to work.
module add_signed_1995 (input [2:0] A, input [2:0] B, output [3:0] Sum)
assign Sum = {A[2],A} + {B[2],B};

V2001: ex: to add signed numbers (in V2001) => we declare ports as type signed:
module add_signed_2001 (input signed [2:0] A, input signed [2:0] B, output signed [3:0] Sum)
assign Sum = A + B; //here sign extension done automatically

NOTE: in v2001, be careful when any of the operands in unsigned i.e sum=A+B+Carry; where Carry is unsigned 1 bit carry. Verilog states that if any operand in unsigned, then the result is unsigned. Also, all signed operands will be signed extended to match the size of the largest signed operand. So, here A and B won't be sign extended, as whole summation is assumed to be unsigned, so msb of sum won't be correct for -ve numbers (same as in V95 where we don't sign extend). During synthesis, we'll see a warning "signed to unsigned conversion occurs (VER-318)". If we declare C to be signed, then C would just be sign-extended, so if C=1, it would be sign extended to 4'b1111. This is incorrect as this would subtract 1 instead of adding 1. Correct way to do it would be to declare C as unsigned and then do a $signed conversion after prefixing a 0 to it i.e Sum=A+B+$signed({1'b0,C});

-------------------

Procedural assignments (any assignment in a procedure as always,initial,task,function) are used to model seq logic, while assign are used to model combinatorial logic. The reg variables store the last value that was procedurally assigned to them whereas the wire variables represent physical connections between structural entities such as gates. wires don't store any value. (NOTE: net data type can be wire, wand (wired and), wor(wired or), tri).
whenever you use assign statement (combinatorial) to a o/p port, use output wire. We can omit "wire" from port type, since by default it's assumed to be wire.
whenever you use something in always or initial statement or tasks or functions (all these are procedural assignments) to a o/p port, use output reg. (so, in our *_tc.v, all references or assignments (digtop_tb.nRST = 1'b0) should have variables declared as reg in digtop_tb).

For assign, we need to declate a var as wire and then assign it.
wire a;
assign a = b&c; //we can't write a=b&c; without an assign stmt. But if a was declared as reg, we could have done that as long it was done within an "always" block or in an "initial" block.

However, to save typing, we can do:
wire a = b&c; => does same job as above

NOTE: Reg can be used as a sequential element. But wire cannot be. Reg can be driven from an intial/always block.
1. Output/Inout port of an instantiated module can be connected to a wire only (not reg), as the net is just connecting the o/p port to some other port so it can't be driven from an initial/always block (ex: digtop dut (.Yout(y_out)). Here even though Yout might be a reg, y_out must be declared a wire).
2. Also, any net that we are assigning value to in an intial block has to be a reg, since wires can't be assigned values (ex: initial #1 digtop_tb.a_net=1'b0. here a_net should be declared a reg and NOT a wire as it's being assigned value from within initial block.

So, reg can be used anywhere, but wire can only be used with assign stmt. This is just a syntax thing from verilog. From simulation point, there's one subtle difference:
The value of wire(on R.H.S) is evaluated for every simulation delta/ change in simulation time, where as the reg is evaluated only when there is change in any of the signal in the sensitivity list. That is why even though the combo logic implemented using assign/procedural blocks is functionally same but has the above difference. Also, because of the above behavior we say that reg need to store the value it has until there is any change in the sensitivtiy list.

This gives rise to issues (as discussed in wire/reg section above), where reg a=top.b gets assigned only once at time 0, while wire a=top.b gets a cont assgn with top.v. So, with reg, we've to use "always @* a=top.b" to mimic behaviour of wire.
Starting from sv (system verilog), we use logic for both reg and wire. logic behaves same way as reg, so we need to use always @*.

Behavioral Data Types: integer, real, and time. num can be int or real (signed or unsigned)
---------
These data types used for testbench only and not for synthesis, as structural data type wire/reg are the only ones that synthesize efficiently. Integers can be displayed using %d, while real with %f, and time with %t.
const numbers can be rep in decimal, binary, octal or hex. -ve numbers rep in 2's complement format.
_ is legal anywhere in num, except as first char, where it's ignored.

1. integer num: In verilog, keyword "integer" is used to define integers. In SV, "int" is used. They can be sized (as specified) or unsized numbers (Unsized size is 32 bits).
-----------
Syntax: <size>'<radix><value>; => size is specified as decimal number in number of bits. default size is 32 bits and default radix is decimal. This is valid for interger as well as reg.

When <size> is smaller than <value>, then leftmost bits of <value> are truncated
When <size> is larger than <value>, then leftmost bits are filled with 0,Z,X(if 0/1,Z,X in leftmost bit in <value>).
1 => stored as 00000000000000000000000000000001 (32 bit decimal num)
6'b10_0011 => stored as 100011 (only 6 bits used to store it, as 6 bit size is specified)
2'd1 => stored as 01 (as 2 implies 2 bits to store the value of decimal 1. since leftmost bit is 1, so remaining left most bits are filled with 0)
2'b1 => stored as 01 (as 2 implies 2 bits to store the value of binary 1. since rightmost bit is 1, so remaining left most bits are filled with 0). To store both bits as 1, do 2'b11 or 2{1'b1} or {1'b1,1'b1}. If you do 'b1, it will store it as 00000000000000000000000000000001, so not what we wanted.
'hF => stored as 00000000000000000000000000001111 )as no size specified so 32 bits)

ex: reg [63:0] a; a='d200999000777000; //this will cause an overflow, and a will get assigned incorrect value (even though "a" is 64 bit, number on RHS is by default stored as 32 bit, so some truncated 32 bit num will get assigned to 64 bit var). overflow warning will be displayed. To prevent overflow, do: a=64'd200999000777000; => This 64'd forces the number to be stored in 64 bit format, and then it gets correctly assigned to LHS.
NOTE: any number is stored as 32 bit except explicitly asked to store in more bits. ex: 17689999001 + 3457788890 will give incorrect result as both numbers are stored in 32 bit, which will cause overflow.
ex: integer a; a=2'd3;
Note: If we define int x=3; Then x[31:0]=00000000000000000000000000000011. We can access any bits of x. Ex: 8th bit of x is x[7]=0. To access 4 lsb of x, x[3:0]=0011. This is helpful when we are looping in "for" loop using x as variable, then we can specify individual bits of x to some internal register to be written.

ex: reg [1:0] a=13; //this is same as "reg a=2'b01" as reg stores 2 lsb of 32 bit integer "13"=4'b1101. So, 01 is stored in a[1:0]. Note: every number is default to decimal radix, and is internally stored as 32 bit int.
ex: reg [3:0] a=1111; This assigns a=4'b0111 as 4'd1111 = 'b100_0101_0111. So, 4 lsb bits are 0111.

2. real num: keyword "real" is used to define real. either decimal (<value>.<value>) or scientific (<mantissa>E<exponent>)
------------
1.2 => +ve real num 1.2
3.5E6 => 3.5*(10^6) = 3,500,000.0
10e-1 => 10 *(10^-1) = 1 (e/E already implies a 10)
ex: real a; a=5.2;

Any number that does not have negative sign prefix is a positive number. -ve num rep as 2's complement.
32'hDEAD_BEEF => Unsigned (or signed positive) number rep internally as 32'hDEAD_BEEF
-32'hDEAD_BEEF => -ve num rep internally in 2's complement form as 32'h21524111
14'h1234 => Unsigned (or signed positive) number rep internally as 32'h00001234
-14'h1234 => -ve num rep internally in 2's complement form as 32'ffffedcc

3. time: holds sim time which is returned from system fn $time. size is 64 bit. If min timing resolution is 1fs(10^-15sec), then 50 bits are enough to rep 1 sec. So, 64 bits can rep about 10,000 sec.

--------------------

Data Structures: Arrays, vectors, memories.
---------------
NOTE: see system_verilog.txt for packed vs unpacked arrays.

Array: used to hold several objects of the same type.
----
ex: integer i[3:0]; //integer array with a length of 4. i[0],i[1],i[2],i[3] are each an integer with 32 bits
ex: reg     r[7:0]; //scalar reg array with length of 8. r[0] to r[7] are 8 distinct reg.

We can also have arrays of instances (added in v1995).
ex: Adder I_adder[3:0] (sum, {c_out, carry[3:1]}, a, b, {carry[3:1], c_in});
=> is equiv to
Adder I_adder[0] (sum[0], carry[1], a[0], b[0], c_in)
Adder I_adder[1] (sum[1], carry[1], a[1], b[1], carry[1]) and so on ..

generate:
--------
In V2001, generate stmt was added to achieve this and much more. generate was taken from vhdl. During elaboration, the compiler replaces stmt inside generate with multiple copies of those stmt, so it saves typing. 3 kinds of generate stmt:
1. for loop generate 2. if-else generate 3. case generate
These "generate for loops" are different than "normal for loops" since normal for loops are never replaced with multiple copies, but rather are executed at run time. generate stmt are synthesizable.
ex: generate stmt to have 4 copies of U=U[0] to U[3].
genvar i; => new variable type that can only be used inside generate. It's a local +ve integer and can be declared inside or outside generate stmt.
generate
for (i=0; i < 4; i=i+1) begin : MEM
memory U (read, write, data_in[(i*8)+7:(i*8)], address,data_out[(i*8)+7:(i*8)]);
end
endgenerate => no semicolon (endgenerate is optional)

ex:
generate
for (i=0; i < 2; i=i+1) begin : MEM
always @(reset, reg[i]) begin
   case (i)
    'd0: reg_a[i]=8'h01;
    'd1: reg_a[i]=8'h10;
   end
end
end
endgenerate

ex:
generate //genvar i has to be declared. If i has already been declared and used in some other for loop, then some variable has to be defined here, or else it gives an error
for (i=0; i < 2; i=i+1) begin : MEM //for loops should be outside always stmt, else it gives an error
   always @(posedge clk or negedge rst[i])
     if(~rst[i]) A[i+1] <= 1'b0;
     else        A[i+1] <= data[i+5];
end
endgenerate

Vectors: used to represent multi-bit busses. These are multi-bit words of type reg or wire
------
ex: reg [7:0] MultiBitWord1;    // 8-bit reg vector with MSB=7 LSB=0. This is diff than "reg r[7:0]" as we can't refrence r, we can only ref r[0],r[1] etc as 1 bit. However, when we define "reg [7:0] r", then we can ref r, as that implies word r, which is 8 bits in length.
ex: reg a;                      // single bit vector often referred to as a scalar

ref vectors:
ex: a = MultiBitWord1; //if a is also 8 bit wide then all 8 bits of MultiBitWord1 assigned to a
ex: bitslice = MultiBitWord1[3:0]; //applies the 3-0 bits of MultiBitWord1 to bitslice

Memories: array of vector reg
----------
reg [7:0] ram; // This is a 8 bit register vector.

to build mem array, do as below
reg [15:0] ram[255:0]; // This says there is an array of ram from 255 to 0, which is each 16 bits wide. so ram[0] has 16 bits, ram[1] has 16 bits and so on. For ex. 3rd bit of 2nd byte is ram[1][2]. In SV, we could have also written it as "byte ram[255:0]".
input [7:0] addr; //this is the 8 bit addr to array of ram
input [15:0] wrt_data;
output [15:0] rd_data;

So to read:
assign rd_data = ram[addr]; // This is equiv to rd_data[15:0] = ram[addr[7:0]][15:0] same as rd_data[15:0] = ram[8'bxxxx_xxxx] , so rd 16 bits from that addr location.
NOTE: Sometimes , we don't have all bits defined(i.e some bits may be x) for ram[addr], so in those cases we use case stmt:
always @(*) begin
case(addr)
ADDR1 : rd_data = {1'b0,ram[addr][6:0]};
ADDR2 : rd_data = {VAL1, ram[addr][6],VAL2[5:0]};
default: rd_data = 8'b00;
endcase
end

To write: (to write into latch, where all ram locations are built using latch and not flop)
always @(*)
if(wrt_en) ram[addr] = wrt_data; // Writes data[15:0] to that ram addr.

----
Verilog 2001 added support for auto increment/decrement array index. The offset direction indicates if the width expression is added to or subtracted from the base expression.
[base_expr +: width_expr] //positive offset
[base_expr -: width_expr] //negative offset

ex: wire [7:0] byteN=rd_data[8*count +:8]; => if count=4, then rd_data[39:32] assgn to byteN. base_expr=32, width=8, so limit_expr=32 + width of 8 = 39. If it was -, then it would be [32:25]. to get [31:24], we should do rd_data[(8*count-1) -:8];

-------------
----------------------------------
latches:
--------
To implement latch, we could have written the code as above:
----
always @(*)
if(en) q = data;
----

with reset:
----
always @(*)
if (~reset) q = 1'b0;
else if (en) q = data;

however, instead of =, prefrred way would be <=
---
always @(*)
if (~reset) q <= 1'b0;
else if (en) q <= data;

Here, since reset is async reset for a latch, there is no way for the tool to figure out that it needs to use a async latch. It just sees it as a normal if else combinational logic (latch followed by 2 muxes). So, to infer an async latch, we have to say
// synopsys async_set_reset "reset"
This says to the synopsys tool that use a async latch (instead of combinational logic for reset) with async signal tied to reset.

Note: instead of *, we could use all i/p in the sensitivity list of always @. But * automatically implies that all i/p are in sensitivity list in V2001 and beyond.
always @ ( en or reset or data) //or always @(en,reset,data) or always @(*) or always @*
if (~reset) q <= 1'b0;
else if (en) q <= data;

-----------------------------------

Coding style
------------
always @(posedge clk) begin
a<= 1;
a<= 0;
end

The above code may simulate correctly overwriting previous value of a with final assigned value. So, a will be 0 at every clk edge. However Lint (syntax checking for verilog and VHDL) will catch this.

The below code is correct but Lint will report it. It's equiv to assigning a to 1 in cases 00,01 and 11, and a to 0 in case 10. However it saves typing if we assign it at start. Then it gets overwritten in the case of 10.

always @(posedge clk) begin
a<= 1;
case (b) begin
   00: y<=c;
   01: y<=d;
   10: a<=0;y<=e;
   11: y<=f;
endcase

-------------------------

synthesis imposed coding style:
----------------------------
for a set/reset flip-flop: This style requires that all signals in a sequential-logic sensitivity list be specified with an edge (posedge or negedge).
ex: always @(posedge clk or negedge n_reset or negedge n_set) begin
      if (!n_reset) q_out <= 1'b0; //reset has priority over set
     else if (!n_set) q_out <= 1'b1;        // set all bits to one
     else q_out <= data_in;   //

However, this causes simulation to be different than synthesized FF, as synthesized FF are level sensitive to set/reset.
Problem scenarion: when n_reset goes low, then n_set goes low, and then n_reset goes high. Above code will have q_out=0 until the next clock, even though n_set is active. Actual h/w is level sensitive, so it will cause q_out=1 as soon as n_reset goes high. To match actual h/w, we need to add this to sensitivity list:
Fixed: always @(posedge clk or negedge n_reset or negedge n_set or posedge (rst_n & ~set_n))

------------------------
Compilation directive: `elsif and `ifndef were added in v2001.
`ifdef, `else, `elsif, `ifndef, `endif

ex1:
`ifdef TYPE_1
$display(" TYPE_1 message ");
`else
`ifdef TYPE_2
$display(" TYPE_2 message ");
   `endif
`endif

Compile with: +define+TYPE_1
Then simulate,RESULT: TYPE_1 message

Compile with    +define+TYPE_2
Then simulate,RESULT: TYPE_2 message

ex2:
`ifdef U00
reg f01;
`elsif D01
reg t01;
`elsif D01
   `ifdef D01
   reg t040;
   `elsif D00
   reg f040;
   `endif
`else
reg t03;
`endif

***************
verilog TestBench
***************

-------
DUT connected to testbench: 3 modules are there. DUT is original design RTL, Test is module in each testcase separately (module name Test should be the same for all testcases so that TOP_TB can just inst Test and not change name for each diff testcase) and TOP_TB is toplevel testbench that instantiates and connects both DUT and Test. Test has the logic to drive DUT i/p signals. Test can also be written as task() instead of as module.

module DUT (input logic A, output logic B, ...);   always @ .... endmodule => these are all sv files, so logic used
module Test (input logic B, output logic A, ...);   initial begin ... $finish; end endmodule => NOTE: A,B dirn are reversed from DUT
module TOP_TB;
   logic A,B,..;
   logic clk=0, reset; //internal clk, reset signals
    always #5 clk = ~clk; //clk osc with clk=0 at time=0
   DUT(.A(A), .B(B), .CLK(clk), ..);
   Test(.B(B), .A(A), ...);
endmodule

-------------------
Procedural timing control: stmt following this don't execute until condition satsified.
1. delay control: stmt delayed in its execution
ex: #(d+e)/2 rega=regb; // after delay of (d+e)/2, rega equals regb (d and e need to be defined as parameters)

2. event control: using implicit event or declared event
A. implicit event: value changes on nets and variable used as events to trigger the execution of a statement.
negedge is detected on 1->x, 1->z, 1->0, x->0, z->0.
posedge is detected on 0->x, 0->z, 0->1, x->1, z->1.
ex:
@r rega = regb; // controlled by any value change in the reg r
@(posedge clock) rega = regb; // controlled by posedge on clock
forever @(negedge clock) rega = regb; // controlled by negative edge
@(posedge clock); // wait until posedge of clk appears

B. explicit event: A new data type, in addition to nets and variables, called event can be declared. An identifier declared as an event data type is called a named event. An event name shall be declared explicitly before it is used. Event is abstract in nature and doesn't require any port connection (thus it can be used across module boundary). event can be triggered at any place in code, and then those stmt that are waiting on this event get executed.
NOTE: For a trigger to unblock a process waiting on an event, the waiting process must execute the @ statement before the triggering process executes the trigger operator, ->. If the trigger executes first, then the waiting process remains blocked. So, event value (trigger) must be changed by a separate process (trigger followed by control in same process will not work, and pgm will keep on waiting indefinitely for unblock to happen).

ex: event e; => Event declaration. variable e is of data type event
initial begin => Event triggering.
#10;
-> e; => This can be put in various places of testbench code
... => these stmts get executed nevertheless. They don't wait for event e or anything.
d=1;
-> e; => calls event e again, which changes value of variable e. This triggers the below always block again.
end

always @e d = 0; => This stmt waits for a change on variable e. variable e changes on ->e, so at that time d gets assigned 0. Then this process is done, and it keeps on waiting for another change in variable e. When that happens in above code, this assignment is again done.
wait (e); => this is another way of waiting for event "e" to get triggered. stmt following this will execute only when "e" is triggered somewhere. This is level sensitive (in contrast to @ which is edge sensitive), so it looks for e to be true. If e=0, then it keeps on waiting. Useful for syncing various processes. just as with @, we can use any reg,wire,etc instead of event "e" as an arg to wait. wait(0) is always false, while wait(1) is always true.

-------------------

fork-join : causes processes to run in parallel
--------
fork
begin proc1 end //can also be a single stmt like: repeat (16) @(negedge digtop_tb.clk_osc);
begin proc2 end
join

However join happens only when both processes complete. If one of the processes doesn't complete, then join never happens. To prevent this in testbenches where some event may never happen in case of a fault, we use join_any (SV construct). However, these forked processes are still running and will run until they complete. It's just that the pgm can continue forward. If it finds a $finish, then it terminates these forked processes and causes simulator to stop.

//when either 100 edges of ecp_clk OR 500 edges of clk_osc happen, join happens
fork
repeat (100) @(negedge digtop_tb.ecp_clk);
repeat (500) @(posedge digtop_tb.clk_osc);
join_any

join_all: in SV is equiv to join in verilog, as join_all waits for all forked processes to complete before proceeding.
join_none: in SV allows the pgm to move forward without waiting for any forked processes to complete. However, these forked processes are running in background till they complete. This is helpful by putting these in tasks, as this proc will run forever, and the pgm can continue forward to next line from where the task was called.

In verilog, sim terminate when all forked proc complete. But in SV, since we have variations of join, there are 2 variation of fork:
1. wait fork; => This allows fork to wait (before proceeding) until all forked child proc have finished. This is used in cacses, where before terminating the pgm, we want to ensure that all forked processes have completed. Ex: we use it after join_any or join_none, but right before $finish
2. disable fork; => This allows all forked processes to be terminated. This is useful, when we want to kill all forked proc, when any of the forked proc complete. Ex: we use it after "join_any" to kill all other forked processes on completion of any 1 forked process.

ex:
task a();
   fork begin // outer fork
      fork         //2 forked proc below. we wait for event to happen. If signal_1 changes within that time, then it's an ERROR
         begin
            @(posedge signal_1);
            $display("ERROR: Unexpected change in output signal at %t", $time);
         end
         begin
            @stop_trig; //wait for event to happen via ->stop_trig;
         end
      join_any    //join if any of them finish
      disable fork; //stop the other forked proc, if any of them finish
   end join_none // This is outer fork, and keeps internal 2 forked proc running. It gets out of this task to next line in pgm, as "join_none" allows it to move forward.
endtask

verilog task and functions: they are defined within a module.
--------------------------
task:
----
like s/w procedure. task call is separate procedural stmt. It can't be called from cont assgn or be used in an expr. o/p of a task is contained in o/p port.
NOTE: variables defined outside the task (which are not local to the task, called global var) can be accessed within the task, but variables defined within the task (called local var) can't be accessed outside the task defn.
The above var type, local/global refers to scope of var - i.e where it can be acccessed. There is other var type auto/static which refers to storage duration of var - i.e when a var is created/destroyed. "static" or "automatic (auto)" type can be local or global. All var by default are auto meaning their value is lost when execution leaves their scope, and are recreated when scope is entered. When defined as "static", variables remain allocated in the memory throughout the life of the program irrespective of whatever function/task. i.e values are retained from one call of the function to another.
ex: function a();
     static int a =0; //if static is not defined, then var is auto by default.
     a= a +1; //with each call, value of "a" printed is 0,1,2,3,4,.... If it wasn't defined as static, then val would always be 0.
    endfunction

NOTE: task/function can also be defined as auto or static. By default they are auto i.e re-entrant in pgm languages as C - items declared within the task are dynamically allocated (on stack) rather than shared between different invocations of the task (static storage). In verilog-1995, unlike C, args and local var were static and NOT auto by default. Meaning they were stored in fixed location. This caused confusion, as various calls to same task/fn from several places in program caused the value to be indeterminate. So, "auto" keyword was added from V2001 onwards to to force simulator to use stack for local var/args. To do this put word "automatic" in program, task, fn, module stmt.
ex: program automatic test; ... task ... endtask ... endprogram => all var in task stored on stack.
This allows us to write recursive functions in verilog. It's good coding style to put task/function as auto, since if task is defined as "non-auto" and if same task is called at same time with diff values, returned val might be indeterminate.
ncvlog/NOAUTO = In Verilog, variables declared in an automatic task or function are automatic. In SystemVerilog, there are more complex rules for which variables are automatic. Automatic variables are deallocated when execution leaves their scope. variables declared in a static task or function are static = i.e they are initialized only once. Ex: function static display(); //all var declared within this fn are now static.

Ex: task defn and call
module acc;
reg var_acc;
...
//task call
task_name(a_1, b_1[7:0]); => i/p passed thru a_1, o/p copied into b_1[7:0], These ports a_1,b_1 are local to the task, and are overwritten with each task call.
...
//task defn: There is no "always" block inside a task, as it runs only when called.
task task_name (input a, output reg [7:0] b); => i/p, o/p can only be passed thru these ports. These port defn is same as those for module defn (where V2001 takes newer concise format, while previous version take older style). Also from V2001, ports are defaulted to input logic. So, for port a, we can just use "a" instead of "input a"
reg [1:0] a; => local variables declared whose scope is local to the task. If these var need to be initialized, they can be init here in SV (in next stmt do a=2'b00;). However in verilog, you've to do init within begin/end stmt.
begin ... var_acc = 1'b1; ... end => variables defined outside the task can be read as well as be written here.
endtask
...
endmodule

NOTE:
1. If we don't define a "type" for i/p, o/p port variables, then default data type "wire" is picked up, the same way as it happens for module defn. we could have written task as:
task task_name(input integer a, output reg [7:0] b); to force certain type.
2. In verilog 95, these i/o ports have to be defined separately,
eg: task task_name; input a; output [7:0] b; => similar to how we did it for module defn
3. Cadenece IUS doesn't support unpacked arrays for i/o ports of task, module,etc. unpacked arrays have to be declared internally within module or task defn. It gives Error like (assuming rdata defined as port "output reg [7:0] rdata[7:0]"):
ncvlog: *E,MEMDIO : Memory 'rdata' previously declared as input/output/inout.
To get rid of this error, declare rdata as "output reg [63:0] rdata", and then declare rev_data within the module or task (outside begin/end stmt) as "reg [7:0] rev_data[7:0]; begin rev_data[0]=rdata[7:0]; rev_data[1]=rdata[15:8]; ..."

function:
--------
like s/w function (fn). called from with an expr or cont assgn. o/p of a fn is contained in name of fn itself. so, during execution of fn, value must be assgn to fn name.
Ex:
function reg [7:0] calc_parity(); //here o/p value being returned is 8 bit reg. If nothing is to be returned, use "void" (default)
calc_parity = data[0] ^ data[1]; //o/p value is returned in calc_parity function name itself
endfunction : calc_parity //name of function here is optional

diff b/w task/fn:
---------------
task can contain timing/event ctl stmt (#,@,wait), but a function can't. i.e task can consume time.
task may have 0 or more input,output or inout ports, while function has 1 or more input ports (no output or inout ports for fn)
task may call other tasks and fn, but fn may only call other fn and not other task. fn can be called from other task and fn.
fn must have return value, and value must be used as in assgn stmt. return value can be ignored by casting result to void

verilog system task:
--------------------
all verilog system tasks are preceded by $. They are not separate procedural call, so they have to be in "inital begin .. end" or in "always @ ...". That is why we put all $display stmt in *_tc.v module as it has an "initial" block, within which all $display stmt go.

1A. $display: to format, capital or small letters are both equiv. i.e %d or %D are both fine for displaying decimal.
-----------
ex: $display("hello a=%0d \t b=%d at time=%t\n",a,b,$time);
[0]=> its optional to place 0. when used, it prints w/o any leading zeroes or spaces.
real: %[w.d]e, %[w.d]f, %[w.d]g = scintific/decimal/short form
decimal: %[0]d (%b/%o/%h for binary/octal/hex)
char: %[0]c (%s for string) => If we use %s to display 8 bit hex value, then ASCII equiv of that hex is displayed.
For ex, if reg [7:0] a=8'h61; then ($display("%s",a); will display "a".
time: %[0]t for current format time. time is 64 bit. If timescale is 1ns/1ps, then it displays in ps. If we use %d for time, then it displays in ns.
names: %[0]m,%[0]M = display hier name. This is helful in displaying "from what modules is some lower level module called from". Ex: if some synchronizer is called from multiple places, we can have an initial stamt in lower level module with display statement. Ex: sync_2ff.v
// synopsys translate_off => used for non-synthesizable code, it turns synthesis off
module sync_2ff ( .... )
initial
begin
$display("sync_2ff: %m"); => this displays higher level module name for all modules at start of sim.
end
// synopsys translate_on => it turns synthesis back on following this stmt
always @....
endmodule

ex: reg [255:0] string1; => string defined as reg since there's no string type in verilog. SV has string type.
In Verilog: string1={"/path_to","/","file",".txt"}; $display("%s",string1); => concatenation works for string
In SV: string str1; str1 = "first string";

1A. $sformat: used extensively in uvm_error to format anything to string.
ex: $sformat(str,"%s %d %s",str1,num,str2); => would produce string with these 3 var concatenated and with space in between. str can now be passed to anywhere i.e $system(str); => will run str cmd (i.e if str = "ls -lrt", then that would be run)
ex: `uvm_error("ERROR", $sformat("ckm error at time %t",$time))

1B. $write: $write is same as $display except that $display always adds a newline at end, while $write doesn't.

1C. $strobe: strobe.

1D. $monitor: continuous monitoring

1E. dump vcd files: This is std task in verilog, so is supported by all simulators.
   - $dumpfile("tmp.vcd"); => task specifies which file to dump the variables in. If name not provided then dumped in verilog.dump
   - $dumpvars(level, list_of_variables_or_modules); => this task specifies which variables should be dumped. If no args provided, then all var at all levels dumped. level condition:
        - If level = 0, then all variables within the modules from the list will be dumped. If any module from the list contains module instances, then all variables from these modules will also be dumped. So, basically all hier starting frpm parent is dumped.
        - If level = 1, then only listed variables and variables of listed modules will be dumped.

2. $finish: to finish simulation. $finish(0)=>prints nothing, $finish(1 or 2)=>prints some info. $finish is equiv to $finish(1). $stop stops sim and returns control back to simulator's cmd line interpreter.

3. $random: generates random num. random num generated are from predetermined seq, and seed controls initial starting value of that seq.
---------
xmit_start = $random; => generates a random no b/w -(2**32-1) to (2**32). To generate +ve random number, use braces.
xmit_start = {$random}; => generates +ve random no from 0 to (2**32)
xmit_start = {$random} % (2^3); => generates random no from 0 to 7
$random(seed) => generates random number with seed(integer seed;) specified. Otherwise default seed is 0, and it will always generate the same seq of random num. 1st random num gen would be with seed 0 (if not specified), and then seed (specified or not) would be modfied by $random to create new seed to be used for next call of $random.
NOTE: seed is an inout value, so it's passed as i/p at start of $random, but is returned back a modified value as o/p at end of $random. So, using $random(5) is illegal as arg here is const 5, instead of variable. seed is not a special keyword, and we could have used $random(a) also. default value of any integer is 0, so $random starts with seed as 0, but then for next call, a gets modified to some other random value, which is used for next call of $random(a). So, seed is just controling the starting point of the random seq.

SV has $urandom and $urandom_range which are more flexible (generates unsigned num):
a = $urandom; => This fn has exactly same syntax as $random. Using this fn allows us to control seed from irun cmd line. We should see a msg saying "SVSEED set from command line: 2069655130" implying seed was taken from cmd line. With $random, we don't see this msg.
b=$urandom(seed); //seed is optional, and can be controlled from within pgm too.
ex: irun -svseed 145 => simulator assigns 145 as seed to urandom
ex: irun -svseed random => simulator assigns seed to random num using current time and process Id. So, this is most preferred way.

a = $urandom_range(255,1); => returns a random num b/w max=255 and min=1. Seed comes from cmd line or else defaults as 0. If min_val not provided, then min taken as 0. $urandom_range(1,255) behaves same as $urandom_range(255,1).

4. $time: returns simulation time. $realtime returns time in unit specified (timeunit 1ns; timeprecision 1ps; $realtime => 3.114ns

5. file system tasks: In V95, file IO was limited to reading hex files into memory array using readmemb/readmemh(data in the file could be binary/hex numbers only separated by white space) and writing file using $display and $monitor. But in V2001, system tasks were added to do C-type file operations.
$fopen: opens a file, $fclose: closes a file,
$fread: reads binary data from the file into a register or memory.
$fwrite: writes data in given format to file
$fscanf: reads characters from the file, interprets them according to a format, and stores results in its arguments.
$fgetc: reads character at a time, $fgets: reads a line at a time.
ex:
integer fd, dnum; => fd to store file descriptor. It's a 32 bit file descriptor.
fd=$fopen("image.bin", "rb"); => rb or r => rd. wb or w => wrt. ab or a => append. b=> binary file. fd is the file descriptor returned. It's 0, if cmd is unsuccessful.
reg [63:0] data; => this reg is used to store read data.
$fwrite(fd,"%b\n",data); => writes data in binary format to file
dnum=$fread(data,fd); => reads binary data. By default $fread will store data in the first data location through the final location. so data[0]=1st_bit, data[63]=64th bit. dnum stores the number of elements read from the file. -1 is returned for error. so dnum can be used to check for errors.
dnum = $fscanf(fd,"%h %h\n",din[31:16],din[15:0]); => Reads file charcaters in hex format, and stores them in 2, 16 bit reg.
r = $fscanf(fd, "%d", CMD ); => reads file and stores char as decimal number in integer variable "CMD".
$fclose(fd); => we close the file once finished reading/writing

readmemh/readmemb: h or b refers to input file format. Data stored in array is always binary. readmem* always works, while fscanf etc may not always work.
--------
reg [31:0] arr[0:7]; //make sure array size [31:0] is larger than each line of file, else readmem will not load values into array and give an error "size too large". NOTE: arr[lsb:msb] should be provided else we get error.
initial $readmemh("include_files/otp.img",arr); => here file is first divided based on white space or newline character. Then bits are read with arr[0][0] storing LSB of 1st line, arr[0][31] storing MSB of 1st line (i.e 9C gets stored as arr[0][7:0]=10011100 (NOTE: nums are still stored as binary even though file is read in hex. This is since they are stored in reg which can only be binary. They can be displayed in any format, hex, dec, etc)
for(l=0;l<=31;l=l+1) begin => to display all contents read
$display("otp.img: location = %d content = %h",l,arr[l][7:0]);
end

$readmemb => reads binary file (which contains only 0 and 1).

force/release, deposit: (can be used in input.tcl (irun cmd line) or in testcase)
------------
force/release:
ex: force digtop.freq_check = 1'b1; //in any verilog file
    #100 release digtop.freq_check; //releases the force so that value is driven by logic again
ex: force digtop.bus[7:0] = 8'hFF; //in input.tcl

deposit:
ex: deposit veridian_tb.CLKOUT = 1'b0; (in input.tcl)
ex: #20 $deposit(tb.CLKOUT, 1'b0); (in any verilog file, task, initial, etc)

timing check system tasks: there are 12 timing checks in verilog.
-------------
timing check related ones are setup/hold, skew, removal/recovery, period and width. These may only be used in specify blocks, so that back annotated values can work when using sdf files. In sdf files, we have TIMINGCHECKS section, which has SETUPHOLD, RECREM, etc which has exact values for these, instead of some arbitrary values being used in verilog timing checks inside specify blocks.

1. $setup(data_line, clk_line, limit[, notifier]); => limit is period before the event on the clk_line (normally a rising edge) during which the data_line signal is not allowed to change. If the signal breaks this constraint, an error is generated. +ve limit specifies data should change before clk.
2. $hold(clk_line, data_line, limit[, notifier]); => limit here specifies period after an event on the clk_line. Note: here 1st 2 args are in opposite order. +ve limit specifies data should change after clk.
3. $rec/$recovery(reference_event, data_event, recovery_limit [,notifier]); => specifies time constraint b/w async ctl signal and clk signal.
ex: $recovery( posedge set, posedge clk, 10 ); => viol reported if posedge clk (data) hapens within 10 units of posedge set (clk/ref)
recovery( posedge set, posedge clk, recovery_param );
4. $rem/$removal => removal
5. $width(reference_event, limit [,threshold, notifier]); => specifies min pulse width from one edge transition to opposite edge transition.
ex: $width (posedge clk, 5); => if pulse width from posedge clk to negedge clk is < 5, it reports violation.
6. $period(reference_event, limit [,notifier]); => specifies min pulse width from one edge transition to same edge transition.
7. $skew(reference_event, data_event, limit [,[notifier]]); => specifies max delay allowable b/w 2 signals. $timeskew and $fullskew also available.
ex: specparam skew_param=14; $skew(posedge clk1, negedge clk2, skew_param);
8. $nochange(ctl_port, data_port, start_edge_offset, end_edge_offset) => This check involves 3 transitions rather than 2 associated with all other timing checks. It checks if the data signal is stable in an interval of start_edge_offset and end_edge_offset of ctl signal being high or low. ctl signal has to be edge specified, while data signal doesn't have to. data must be stable "start_edge_offset" before specified edge of ctl port and must remain stable "end_edge_offset" after next edge of ctl port. So, it checks for 3 edges. data should setup before specified edge of control signal and hold after opposite edge of control signal.
ex: $nochange (posedge clock, d_input, 3, 5); => report a violation if "d_input" changes in the period of 3 time units before +ve edge of clk and 5 time units after -ve edge of the clock. In the whole time while clock is high, d_input shouldn't change (-3 from +ve dge and +5 from -ve edge of clk).
//below 2 stmt could be combined into 1. Not sure, why 2 separate ones provided in lib. probably because +ve and -ve RET have diff values in lib, so combining into 1 in verilog model will not work (as it won't be able to annotate)
ex: $nochange(negedge CLK, posedge RET, 0.005: 0.005: 0.005, 0.005: 0.005: 0.005, GVCnotifier5); //5ps setup/hold wrt -ve CLK (only for +ve RET). As per this check, -ve RET can still happen in this time window.
ex: $nochange(negedge CLK, negedge RET, 0.005: 0.005: 0.005, 0.005: 0.005: 0.005, GVCnotifier5); //this checks for -ve RET.

NOTE: setup/hold can be combined in 1. similarly for recrem. This completes all 12 timing checks for verilog.
$setuphold (reference_event, data_event, setup_limit, hold_limit [, notifier] [, tstamp_cond] [, tcheck_cond] [, delayed_clk] [, delayed_data]); => [...] mean optional. NOTE: SETUPHOLD in sdf file has data first and then clk, while $setuphold in verilog model file has clk first and data later (similar to $hold, see the syntax). So, be careful when comparing arcs.

1. notifier: reg variable used as a flag. When a timing violation occurs, the model functionality can use the notifier flag to modify the model outputs. So, we can generate an x or whatever message we want to o/p in such a case. notifier switches (x->1, 0->1, 1->0) whenever a timing violation occurs. This is passed into the UDP primitive for flop to generate an x for o/p. Notifier reg are not init, since that may cause UDP to goto x state at time 0, depending on the order in which the UDP received its i/p.
2. tstamp_cond: Places a condition on the <control_event> and the <clk_event>, if both <setup_limit> and <hold_limit> are positive values. Places a condition only on the <control_event> if the <setup_limit> is negative. Places a condition only on the <clk_event> if the <hold_limit> is negative.
3. tcheck_cond: Places a condition on the <control_event> and the <clk_event> if both <setup_limit> and <hold_limit> are positive values. Places a condition only on the <clk_event> if the <setup_limit> is negative. Places a condition only on the <control_event> if the <hold_limit> is negative.
4. delayed_clk: Delayed signal value for <clk_event> when one of the limits is negative.
5. delayed data: Delayed signal value for <data_event> when one of the limits is negative.
These delayed copies of clk/data are used as inputs to udp, instead of using CLK or D. This is because if UDPs latch data on a clk, and setup or hold times are -ve, then event driven simulators will give incorrect results unless udp inputs are delayed accordingly. clk and/or data are delayed so that there is +ve setup and hold time. (Then at edge of clk, simulators can see if D changed correctly before clk changed. If D setup time was -ve and it could change after clk change, then simulator has no way to check for future event) This doesn't change anything, only the clk or data is delayed by an amount setup+hold+delta. Simulators may give "non-convergence warning" (ncelab: *W, NTCNNC) in cases where delayed signal still gives -ve setup or hold time. This may happen when there are diff constraints for +ve and -ve data wrt clk, or when violation regions created by timing checks do not overlap. If any of setup or hold is -ve in sdf file, delays are added. If after delaying, we can make all setup and hold numbers to be +ve, then algorithm converges. If we can't make all of them +ve even after delaying, then all -ve limits are set to 0 to make algorithm converge. if violation regions do not overlap, then all -ve limits are set to 0. By setting -ve limits to 0, we are more pessimistic and are making violation window larger. This is OK as design is still guranteed to meet setup/hold if it passes this more rigorous 0 limit. By adding delays, and making algo converge w/o forcing any -ve limit to 0, we keep the original sdf limits, so that's preferred option.
Look in Ncsim doc(incisive_sim_overview.pdf and negative_timing_check_NTCNNC_AppNote.pdf ) for detail.
NOTE:
1. sum of setup+hold should be always +ve. +ve setup means before the clk, while +ve hold means after the clk.
2. Adding delay on clk line, dec setup time while inc hold time. Adding delay on data line, does opposite. However, violation window (sum of setup+hold) remains same.

ex: (from verilog models file SDC20.v)

$setuphold(posedge CLK, posedge D, 0.01: 0.01: 0.01, 0.01: 0.01: 0.01, GVCnotifier1_zd ,,TCHKON_AND_GVC_S_NOT1_CLRZ_NOT0_ != 0, GVC_CLK_CLK, GVC_D_D ); => 10ps is specified as setup/hold time for pos D wrt pos clk. similarly for neg D. GVCnotifier1_zd is the notifier which toggles if a violation occurs. GVCnotifier1_zd goes from x to 0, whenever a violation occurs. This switching causes la_nudp primitive to o/p an "x". Before toggling, it does tcheck_cond (TCHKON_AND_GVC_S_NOT1_CLRZ_NOT0_ != 0). GVC_CLK_CLK is an exact delayed copy of CLK, while GVC_D_D is an exact delayed copy of D.

$recrem(posedge CLRZ, posedge CLK &&& SLEEPMODE, 0.01: 0.01: 0.01, 0.01: 0.01: 0.01, GVCnotifier1_zd ,,TCHKON_AND_GVC_SD_NOT0_D_NOT0_S_WHATEVER_ != 0, GVC_CLRZ_CLRZ , GVC_CLK_CLK); //here (COND SLEEPMODE) is added in sdf file to recrem arc

$width(negedge CLRZ &&& TCHKON_AND_Q != 0 ,0.01 : 0.01 : 0.01 ,0, GVCnotifier2_zd) ;

--------------------------
verilog simulation details:
---------------------------
At beginning of sim, time T (tracks sim time in timesteps) is set to 0, nets set to z, variables set to x. All procedural blocks (initial and always blocks) then become active. In Verilog-2001, variables may be initialized in their respective declarations and this initialization is permitted either before or after the procedural blocks become active at time 0.

These active events (evaluate or update) are put in active event queue. They are then evaluated or updated. update and evaluate events can happen in any order, depending on their position in active event queue.
If it's update event, then specific objects are updated, and any evaluation events resulting from these are added to active event queue.
If it's evaluate event, then specific processes are evaluated and any update events resulting from these are then added to active event queue. Note that Blocking assgn (BA) are evaluated but have no update events.

A. We keep going thru these active event queue. active events such as blocking assignments and contiuous assignments can trigger additional assignments and procedural blocks causing more active events and NBA update events to be scheduled in the same time step. Under these circumstances, the new active events would be executed, until queue gets empty.
B. Then inactive events (as #0 blocking assignments) are activated causing them to move from "inactive event queue" to active event queue. They may cause more events to be activated and this contiues, until "active event queue" gets empty.
C. Then we activate all NBA update events => NBA update events are put in active event queue from NBA update event queue. When these activated events are executed (causing LHS to be executed), they may cause additional processes to trigger and cause more active events and more nonblocking update events to be scheduled in the same time step. Activity in the current time step continues to iterate until all events in the current time step have been executed and no more processes, that could cause more events to be scheduled, can be triggered.
D. At this point, all of the $monitor and $strobe commands will get moved from "monitor event queue" to "active event queue". This causes them to get executed and display their respective values. Then the simulation time T can be advanced to the next time step. Then we activate all inactive events for time T.

Queues: there are 4 queues that are kept for current sim time, and many others for future sim time:
1. active event queue: most verilog events scheduled here: BA, evaluate RHS of NBA, continuous assgn, $display stmt, evaluate i/p and update o/p of primitives. These events can happen in any order.
2. inactive event queue: all #0 BA kept here. Note, these are evaluated when there are no more active events in the current time T. These #0 stmt should NOT be used.
3. NBA update event queue: updates LHS of NBA and keeps the update event here.
4. monitor event queue: $monitor and $strobe stmt.

Ex of code of RTL simulating: in snug_2002_cec_verilog.pdf page 8-10 (section 4.0)
module tb;
reg a=0; reg b=0; //assigns a to 0 at time T=0

tb techniques:
--------------
assgn at time 0:
-----
at time 0, either always block or initial block may become active first (IEEE std says all procedural blocks become active, but doesn't specify the order). That may cause race condition in sim if the initial block becomes active first, as first edge of signal in initial block may not be seen by always blocks. All vendors have implemented Verilog simulators to activate all always blocks before activating initial blocks, which means that the always blocks are ready for the edge signal before the edge signal is defined in an initial block. We can't count on this, so we use NBA for signals in initial block.

In the reset ex below, If the initial block becomes active before the always block, the always block will not recognize reset until the next detected posedge clk or the next assertion of reset. So, we can either use NBA or put a delay for reset so that it asserts 1-2 clk cycles after sim starts

//reset coding
-------------
initial begin
rst_n <= 0; // NBA will force the reset signal to be executed at the end of time step 0, after all of the always blocks have become active. This will force the always blocks to trigger again when the reset is updated, still at time 0.
//#5 rst_n = 0; //this will also work, as rst_n remains x for 5 delay units.
...
end

always @(posedge clk or negedge rst_n) ...

//clk osc coding
----------------
`define CLK_PRD 10
initial begin
clk <= 0; //NBA forces signal to go low at end of time 0, after all seq proceses have become active. This ensures that any procedural that might be sensitive to a negedge clk will be triggered at time 0. We chose 0 as initial value, since most of the designs are posedge clk based, so we avoid +ve edge at time 0 by having this. For negedge clk based design, we could have clk <= 1.
forever #(`CLK_PRD/2) clk = ~clk; //this is more sim efficient BA (blocking assignment) inside forever stmt. We could have also written the whole procedure separately as "always #50 clk = ~clk;" and setting clk to 0 at time 0 by doing "reg clk = 0;".
end

ex: always #50 clk <= ~clk; => This has NBA, so ~clk is stored in tmp0, and sim progresses to next delta step, since there's nothing to block it. It comes to begin of same stmt and again stores a copy of ~clk in tmp1. It keeps on doing it, till it runs out of memroy. So, use BA so that the stmt gets executed before the time can adance further.

##we can also model clk as follows
initial begin clk = 0; end //this causes a -ve edge on clk from x to 0
always @clk clk <= #10 ~clk; //this is executed on 1st x->0 edge (from above initial stmt). clk will get stored in tmp0 and get assigned to 1 after #10. loop comes back to beginning as it's NBA. It waits for next edge of clk which happens #10 later.

##generally, we see clk osc modeled as follows: (This is OK except that clk -ve transition from X to 0 at time 0 may not be seen by all blocks)
initial forever begin
clk = 0;
#5 clk = 1;
#5;
end

#this also models clk in just 1 line
ex: wire #5 clk = clk === 0; => starts with clk=0, then changes to clk=1 at #5 and then repeats.

NOTE: we can also build clk osc using clk osc example shown in "always" notes above.

//timeout coding => to kill test when sim has run for a large time, and hasn't ended normally (put this in digtop_tb.v)
`define SIM_TIMEOUT 500_000_000
initial begin
#`SIM_TIMEOUT
$display ("** TEST KILLED ** (Time:%d)", $time);
$finish(2);
end

//flag "x" and "glitches" on any o/p pin => since they indicate some underlying ckt problem. include this file in digtop_tb.v
time time_prev; //64 bit int num used
always @(out1) begin //shown only for 1 o/p port. Repeat this for all o/p ports
if (out[1]===1'bx) $display("X-state: at time %t",$time); //x-state
if ($time-time_prev < glitch_limit_out[1]) $display("Glitch: at time %t",$time); //glitch compared
time_prev=$time; //this stores stores time when signal changed.
end

//state machine coding
2 ways: one where we separate out combo and seq flops, and other where we directly code it as one.
1. coded separately: next state coded as combo logic (preferred as per LINT tools)
ex: always @* begin //combo logic coded here
     sm_state_a = sm_state_r; //_a refers to i/p of flop (combo=next state), while _r refers to o/p of flop (seq=current state). This stmt is needed as it says that incase nothing is assigned to sm_state_a below, assign flop o/p sm_state_r to it. If we don't do this, then a latch may be inferred for those cases in this combo logic. This stmt may also be put below in if-else stmt instead of putting it here, but it's safer here as it guarantees that old value will be kept incase no new value is assigned in code below, we don't need to check for all if-else conditions below to find out if latch is inferred or not.
     sm_data_a = sm_data_r;
     if (...) sm_state_a = 3'b000 else if (...) sm_state_a = 3'b111
     else begin
       case (sm_state_r) //NOTE: It has next state from flop. usual state machines are written starting from here as above stmt only assign defaults in case they are not assigned below.
         SM_STATE0: begin sm_state_a = SM_STATE1; sm_data_a = 'hF; end
         SM_STATE1: begin ... end
         default : begin ... end //default is not needed as we have sm_state_a = sm_state_r at top to take care of any default cases
        endcase
      end
     end

     always @(posedge CLK, negedge XRST) begin //seq logic coded here
       if (~XRST) begin sm_state_r <= 0; sm_data_r <= 0; ... end
       else       begin sm_statr_r <= sm_statr_a; sm_data_r <= sm_data_a; ... end
     end

2. coded as one: next state coded on RHS while current state coded on RHS of flop

-----------------------------------------------
verilog simulator: see in simulation.txt file for info.
************
NcSim: type "quit" at command line if it keeps on running

--------------------------

crypto openssl code:
-------
test.v:
always @(trig) begin
      fm = $fopen("file_msg","w");
      fk = $fopen("file_key","w");
      fo = $fopen("file_out","w");
      $fwrite(fm,"%h%h%h%h",msg4,msg3,msg2,msg1);   //msg, key, tmp, etc are defined as int, so stored as hex here
      $system("xxd -r -p file_msg file.bin"); //-r=reverse (hexdump to binary) converts hex msg to bin msg
      $sformat(str,"openssl aes-128-ecb -e -in file.bin -out file.aes -K %h%h%h%h -iv 0 -nopad",key4,key3,key2,key1);
      //str = "openssl aes-128-ecb -e -in file.bin -out file.aes -K f6e1a2ed6bd2ebd7f98854f35e0c0fbc -iv 0 -nopad"
      $system(str);         //execute above openssl cmd (str stores the cmd)
      $system("xxd -p file.aes > file_out"); //-p=binary to hexdump. dump o/p in hexdump
      $readmemh("file_out",arr); //store enc msg contents in "reg [127:0] arr"
      tmp4 = arr[127:96];
      tmp3 = arr[95:64];
      tmp2 = arr[63:32];
      tmp1 = arr[31:0];
      $display(" Msg values are: msg4=%h msg3=%h msg2=%h msg1=%h",msg4,msg3,msg2,msg1);
      $display(" Key values are: key4=%h key3=%h key2=%h key1=%h",key4,key3,key2,key1);
      $display(" Enc values are: tmp4=%h tmp3=%h tmp2=%h tmp1=%h",tmp4,tmp3,tmp2,tmp1);
end

------------------------------

Details: Published: Thursday, 28 December 2017 12:37; Hits: 2633

This section deals with all aspects of hardware design.

Open source tools:

There has been a lot of development in open source CAD tools for VLSI design. Though these are not state of the art, but they are good enough to fabricate multi million gate designs in 14nm and above. A lot of developments have taken place, and as of 2023, tons of chips have been fabricated relying entirely on open source tools. FOSSi (Free and Open Source Silicon) Foundation ( https://fossi-foundation.org ) is behind this development too.

Check this link on youtube for latest developments: https://www.youtube.com/watch?v=OmEbzRp_NGg

In keeping up with philosophy of open source, I'm going to list most relevant open source tools that can be used to design hardware. In the past, you had to assemble all these open source tools, and then use them together to start from RTL and get to the final gds. However, now as of 2023, there are multiple flows available, that take these separate open source tools, put them in a package, and just download the flow kit. This has made things lot easier.

For anyone to be able to fabricate the final chip, we need 3 components:

PDK (Process Design Kit) from fab: There are tons of fabs in US and abroad that take your design in GDS format, and print it in silicon. TSMC is most well known which is a pure-play silicon foundary. Samsung, Intel, etc have their own Foundary for chips that they design in house.
- SKywater Tech (https://www.skywatertechnology.com): Skywater Tech is the only US based pure-play silicon Foundary, based in Bloomington, Minnesota. It was formed in 2017 and went public in 2021. It has collaborated with Efabless and Google to create the first open source chip manufacturing program. They fabricate chips in 90nm and 130nm CMOS process. They have open sourced Sky 130nm (S130) PDK, which is available on github. Link: https://github.com/google/skywater-pdk. This was the last hurdle for open source tool chain to clear, as prior to this, there were no real world open source PDK. There were some experimental PDK, but not ones that could take your design and fabricate silicon.
- Efabless (https://efabless.com): Efabless is another Fab Company which is a crowdsourcing design platform for custom silicon. They have Multi Project wafer (MPW) shuttles that anyone can get their silicon design on. They use
Use tool chain flow from RTL to GDS
Flow for submitting the GDS to fab
Librarystart from RTL to

These are the 2 flowchain (or tool chain) that I'm going to talk about.

Open Road:

This is the latest open source toolflow chain that has industry heavyweights behind it. It was launched in 2018 with DARPA. UC San Diego is leading the effort with involvement from companies as Qualcomm, ARM, etc.

Official website: https://theopenroadproject.org/

Resource link on above site talks about all the steps needed => https://theopenroadproject.org/resources/

All User uides on installation/setup/running are here (all relevant docs are here): https://openroad.readthedocs.io/en/latest/

Start from here which starts with flow-scripts: https://openroad-flow-scripts.readthedocs.io/en/latest/

Ubuntu and CentOS are both supported. I'll go with Ubuntu, since CentOS isn't supported officially anymore (as of 2023). So lot of stuff in CentOS breaks with installation on newer laptops (due to older drivers not working any more). We'll go with local installation of open road on our Linux system.

Steps here: https://openroad-flow-scripts.readthedocs.io/en/latest/user/BuildLocally.html

Qflow:

This was a flow that was developed around 2018. The website is http://opencircuitdesign.com. It has links to all open source tools that can actually be used to design and simulate real circuit. The founder of this website, Tim Edwards, has actually designed and fabricated a microcontroller using only the open source tools (in 2018), and it was a first time silicon success (i.e no bugs in fabricated chip). Some of the open source tools have been written by him, while some he got from others. But he combined all of them, put them in a neat flow, that can take an RTL, and generate gds. No doubt, he has done an amazing job. I've followed his instructions step and step, and have been able to get the whole toolflow working on Linux OS "CentOS Linux release 7.5.1804". Below, I'm going to show step by step instructions on how to get started with his toolflow, called "qflow".

Before we go into the toolflow, we need design of transistors, gates etc that can be fabricated in fabs. OSU (Oklahoma State universiyty) provides all of this at this link: https://vlsiarch.ecen.okstate.edu/flow/. These are included as part of tool flow "qflow", so you do not need to download anything from here, but it's good to keep a copy of the material in a separate directory on your machine.

The download link at bottom of this (http://vlsiarch.ecen.okstate.edu/flows) page provides all design related files needed for different cad tool flows (synopsys and cadence and mosis). Using these cad tool flow, full designs can be done and chips fabricated in different fabs (AMI =American Microsystems Inc (purchased by OnSemi) and TSMC). These open design related files were developed by this prof, James Stine. He was initially working at IIT (Illinois Insttitute of Tech), and then moved o to OSU (Oklahoma State University). So, you will find material from both places, but OSU stuff is latest, so use that for design flow.

If you follow the link, you will see 3 dir: These dir have tech files related to these nodes => AMI 0.5um, AMI 0.35um, TSMC 0.25um, TSMC 0.18um, FreePDK 45nm. FreePDK 45nm has been designed jointly with North Carolina State University, and is the most advanced node currently supported. The 3 dir you see are:

1. FreePDK_SRC => It has 45nm design files. In this there is *.tar.gz. Download that file and extract it in a dir named "FreePDK_SRC" (or any other name, doesn't matter). It will create a dir named "OSU_FreePDK". Inside it are 2 *.tar.gz. Extract both of them to create 2 new dir: OSU_FreePDK_Tech and osu_freepdk_1.0. We will not bother with 45nm design at all, as it's very advanced for our experimental purpose.

2. MOSIS_SCMOS => This has MOSIS (Metal Oxide Semiconductor Implementation Service) SCMOS (scalable CMOS) design files. MOSIS is a middle man foundary service that provides fab access to TSMC, GF (global Foundaries), AMS and AMI (now part of OnSemi). It has IIT and OSU stdcell libraries. We will not bother with IIT libs, as they are older.We will only deal with these 2 dir (download these 2 tar.gz files in a dir named "MOSIS_SCMOS" (or any other name, doesn't matter):

A. osu_soc_v2.7 => Inside this is a tar.gz file. Download and extract it in a dir named "osu_soc_v2.7" ((or any other name, doesn't matter). It will create 2 subdir "cadence" and "synopsys", as shown in the link. This is the version that is used by "qflow".

B. osu_stdcells_v2.4 => Inside this there are 3 tar.gz files. Download and extract it in a dir named "osu_stdcells_v2.4" ((or any other name, doesn't matter). After extracting all 3 of them, it will create 3 subdir "flow", "lib" and "ref_designs", as shown in the link.

3. stdcell_datasheet => This has datasheet for all stdcells in different tech (AMI 0.6um, AMI 0.5um, AMI 0.35um, TSMC 0.25um, TSMC 0.18um, FreePDK 45nm). We do not need to download anything from this dir, as it is for informative purpose only. We will need to refer to this datasheet from time to time though, so will be nice to keep this link bookmarked.

In my case, after downloading and extracting everything, dir structure looks like this:

/home/vlsi/osu_flows

FreePDK_SRC => It has files for FreePDK 45nm
- OSU_FreePDK
  - OSU_FreePDK_Tech
    - cdssetup
    - lib
    - techfile
  - osu_freepdk_1.0
    - flow
    - lib
    - ref_design
MOSIS_SCMOS => It has files for AMI 0.5um, AMI 0.35um, TSMC 0.25um, TSMC 0.18um
- osu_stdcells_v2.4
  - flow
  - lib
  - ref_design
- osu_soc_v2.7
  - cadence
    - flow
    - lib
    - ref_design
  - synopsys
    - flow
    - lib
    - ref_design
flow => flow dir has techfiles, tcl scripts
lib => lib dir has .v files, .lef files
ref_design => ref_design dir has a reference design of a mips processor, that can be used as a sample design to work with.

First, we need to understand chip (IC) design, and then use this understanding for system design, so that we can have a hardware that can actually do something. Chip design will be explained in a separate section. Here I'll go with the toolflow "qlow" instructions and setup. You will need to know the design process, before you go into this toolflow section.

Page 80 of 80

Start
Prev
71
72
73
74
75
76
77
78
79
80
Next
End

Nav view search

Navigation

Search