- Details
- Published: Friday, 06 November 2020 05:25
- Hits: 2410
This shows all the latest articles published on this website:
This shows all the latest articles published on this website:
Probability Distribution
We looked at pdf (probability distribution function) earlier. A probability distribution can either be univariate or multivariate
Normal Distribution:
There are many kinds of univariate/multivariate distribution function, but we'll mostly talk about "Normal Distribution" aka "Guassian distribution" (or bell shape distribution). Normal distribution is what you will encounter in almost all practical examples in semiconductors, AI, etc. So makes sense to study normal dist in detail. Yu can read about many other kind of dist in wikipedia link:
1. Univariate normal distribution:
https://en.wikipedia.org/wiki/Normal_distribution
pdf function is:
f(x) = 1/(σ√2π).exp(-1/2*((x-μ)/σ)2) => Here μ=mean, σ=std deviation (or σ2= variance). We divide it by σ, so that the integral of f(x) is 1.
Standard normal distribution is simplest normal dist with μ=0, σ=0.
The way we write that random var X belongs to normal distrbution is via this notation:
X ~ N(μ, σ2) => Here N means normal distribution. mean and variance are provided.
We often hear 1σ, 2σ, terms. These refer to σ in Normal dist. If we draw pdf for normal distribution and try to calculate as to how many samples lie within +/- 1σ, we see that 68% of the values are within 1σ or 1 std deviation. Similarly fo 2σ, it's 95%, while for 3σ, it's 99.7%. 3σ is often referred to as 1 out of 1000 outside the range. So that implies that 3σ is roughly taken as 99.9% even though it's 99.7% when solved.
As 3σ is taken as 1 out of 103 or 10-3, 4σ is taken as 10-4, 5σ as 10-5 and 6σ as 10-6 event. So, 6σ implies only 1 out of 1M chance of the sample ebing outside the range. 6σ is used very commonly in industries. Many products have requirement of 6σ defects, i.e 1 ppm defect (only 1 out of 1M parts is allowed to be defective). In semiconductors, 3σ defect rate is targeted for a lot of parameters.
2. Multivariate normal distribution: It's generalization of one dimenesional univariate normal dist to higher dimensions.
https://en.wikipedia.org/wiki/Multivariate_normal_distribution
A random vector X = X1,X2,...Xn is said to be multivariate normal dist if Every linear combination of its components is normally distributed.
A Multivariate Normal dist is hard to visualize, and not that common. A more common case of multivariate normal dist is bivariate normal dist which is normal dist with dimension=2.
Bivariate normal distribution: Given 2 random vector X, Y, a bivariate pdf function is:
f(x,y) = 1/(2πσx σy √(1-ρ2)).exp(-1/(2(1-ρ2))*[ ((x-μx)/σx)2 - 2ρ(x-μx)/σx*(y-μy)/σy + ((y-μy)/σy)2 ] => Here μ=mean, σ=std deviation (or σ2= variance). We defined a new term rho (ρ), which is the Pearson correlation coefficient R b/w X and Y. It's the same Pearson coeff that we saw earlier in stats section. rho (ρ) captures the dependence of Y on X. If Y is independent of X, then ρ=0, while if Y is completely dependent on X, then ρ=1. We will see more examples of this later. We divide this expr by complex looking term, so that the 2D integral of f(x,y) is 1.
2D plot of f(x,y): We will use gnuplot to plot these.
This is the gnuplot pgm (look in gnuplot section for cmds and usage). f_bi is the final func for Bivariate normal dist func.
gnuplot> set pm3d
gnuplot> set contour both
gnuplot> set isosamples 100
gnuplot> set xrange[-1:1]
gnuplot> set yrange[-1:1]
gnuplot> f1(x,y,mux,muy,sigx,sigy,rho)=((x-mux)/sigx)**2 - 2*rho*((x-mux)/sigx)*((y-muy)/sigy) + ((y-muy)/sigy)**2
gnuplot> f_bi(x,y,mux,muy,sigx,sigy,rho)=1/(2*pi*sigx*sigy*(1-rho**2)**0.5)*exp((-1/(2*(1 - rho**2)))*f1(x,y,mux,muy,sigx,sigy,rho))
1. f(x,y) with ρ=0 => Let's say we have sample of people where X is their height and Y is their IQ. We don't expect to have any dependence between the two. So, here f(X) on X axis is the height of people which is a 1D normal distribution around some mean. Similarly f(Y) on Y axis is the IQ of people which is again a 1D normal distribution around some mean. If we plot a 2D pdf of this, then we are basically multiplying probability of X with probability of Y to get probability at point (x,y). Superimposing f(X) and f(Y) gives contour as a circle as X=mean+sigma or X=mean-sigma will yield the same value for Y as probability of Y doesn't change based on what the probability of X is. Infact this is the properrty and definition of independence => if f(x,y)=f(x).f(y) that means X and Y are independent. We can see that setting ρ=0 yields that. Below is the gnuplot function and the plot
gnuplot> splot f_bi(x,y,0,0,0.4,0.4,0.0)
2. f(x,y) with ρ=0.5 =>Here we can consider the example of same people as above but plot weight Y vs Height X. We hope to see some correlation. What this means is that pdf(Y) varies depending on which point X is chosen. So, if we are at X=mean, then pdf(Y) is some shape, and if we choose X=mean + sigma, then pdf(Y) is some other shape (but both shapes are normal). So, pdf(Y) plotted independently on Y axis as f(Y) is for a particular X. We have to find pdf(Y) for each value of X, and then draw 2D plot for all such X. This data is going to come from field observation, and the 2D plot that we get will determine what the value of ρ is. Here the contour plot start becoming an ellipse instead of a circle. You can find proof on internet that this eqn indeed becomes an ellipse (circle is a special case of an ellipse, where major and minor axis are the same). There is one such proof here: https://www.michaelchughes.com/blog/2013/01/why-contours-for-multivariate-gaussian-are-elliptical/
In this case when we draw pdf(X) and pdf(Y) on 2 axis, it is the pdf assuming ρ=0 (same as in case 1 above). You can think of it as pdf of height X irrespective of what the weight Y is. Of course the pdf of height X is different for different weights Y, But we are kind of drawing the global pdf distribution, the same as we drew in case 1 above. Similarly we do it for pdf of weight Y. So, remember this distinction - pdf plots on X and Y axis in case 2 are still pdf plots from case 1 above. When we start plotting the 2D points, is when we know if it's an ellipse or a circle, which gives us the value of ρ.
gnuplot> splot f_bi(x,y,0,0,0.4,0.4,0.5)
3. f(x,y) with ρ=0.95 =>Here correlation goes to extreme. We can consider the example of same people as above but Y axis as "score in Algebra2" and X axis as "score in Algebra1". We hope to see very strong correlation, as someone who scores well in Algebra1 will have high probability of scoring well in Algebra2. Similarly someone who scored bad in Algebra1 will have high probability of scoring bad in Algebra2 as well. The plot here starts becoming narrow ellipse and in the extreme case of ρ=1 becomes a 1D slanted copy of pdf of X. What that means is that Y doesn't even have a distribution given X. i.e if we are told that X=57 is the score, then Y is fixed to be Y=59 => Y doesn't have a distribution anymore given a particular X. In real life, Y will likely have a distribution for ex. from Y=54 to Y=60 (+3σ to -3σ range). This data is again going to come from field observation.
gnuplot> splot f_bi(x,y,0,0,0.4,0.4,0.95)
Let's see the example in detail once again for all values of ρ => If there are 5 kids with Algebra1 scores of (8,11,6,9,10) at -3σ, then if we go and look at Algebra2 score of these 5 kids, that will tell us the value of ρ. If scores in Algebra2 are all over the place from 0 to 100 (i.e 89, 9, 50, 75, 32) then we have no dependence and 2D contour plot looks like circle. However, if we see Algebra2 scores for these 5 kids are in narrow range as (7,10, 6,11,12), then this has a high dependence and the 2D contour plot looks like a narrow ellipse. This indicates a high value of ρ.
Also, we observe that as ρ goes from ρ=0 (plot 1) to ρ=1 (plot 3) the circle starts moving inwards, and squeezed into an ellipse. So points with some probablility for plot 1 (let's say 0.01 is combined pdf for point A on circle) have moved inwards for same probability for plot 2 and further in for plot 3. Also, the height of 3D plot goes up as the total pdf has to remain 1 for any curve. It's not too difficult to visualize this. Consider a (-3σ, -3σ) point for X and Y axis. This point has probability of 0.003*0.003 = 0.00001 for plot 1 where ρ=0 (i.e X and Y are independent). Now with ρ=1 (plot 3), the -3σ point for X axis has probability of 0.003, but -3σ point for Y axis has probability of 1 (since with full correlation, Y has 100% probability of being at -3σ when X is at -3σ). So, probability of (-3σ, -3σ) point is 0.003*1=0.003. So, this point now moves inward into the ellipse. The original point of 0.00001 probability is not (-3σ, -3σ) point anymore. It looks like (-4σ, -4σ) point now lies closer to that original point, since it's probability is 10^-4*1=0.0001. Even this is higher. Maybe be more like (-4.5σ, -4.5σ) point lies on that point. So, we see how the correlation factor moves the σ points inwards.
2D plot for different samples:
In all the above plots we considered a sample of people and plotted different attributes of same sample of people. However, if we are plotting attributes of different samples, then it gets tricky. For ex, let's say we plot height of women vs height of men. What does it mean? Given pdf of height of men and pdf of height of women, what does combined pdf mean? Does it mean => given men of height=5ft with prob=0.1, and women of height 4ft with prob=0.2, what is the combined probability of finding men of height=5ft AND women of height=4ft. Best we can say is that are independent and so combined prob=0.1*0.2=0.02. So, we expect to see plot which is going to be similar plot as 1 above (with ρ=0). But how do we get field data for this sample to draw a 2D plot. Do we choose a man, and then choose a woman? The combined 2D pdf doesn't make sense, as men and women are 2 different samples.
However, we know that in a population where people are shorter, both men and women tend to be shorter, and in a population where people are taller, both men and women tend to be taller. So, if we take a sample of people, where men's height varied from 6ft to 7ft, and plotted women's height from that community, we might see that their height varies from 5.5ft to 6.5ft. Similarly for population where men's height varied from 5ft to 6ft, and plotted women's height from that community, we might see that their height varies from 4.5ft to 5.5ft. These are local variation within a subset, instead of global variation. If we take all of these local plots, and combine hem into a global plot, then we can get the dependence data. They suggest some correlation. If we plot all of these on our 2D plot, we may see that ρ≠0. We will see ellipse instead of a circle for iso contours of these 2D plot. These kind of plots are very common in semiconductors that we will see later.
Properties: A lot of cool properties of normal distribution appear, if we take the random variables to be independent, i.e ρ=0. Let's look at some of these properties:
Sum of independent Normal RV: Below is true ONLY for independent RV. If RV have ρ≠0, then below property is not true any more.
If X1, X2, ...Xn are independent normal random variables, with means ,
, ..and standard deviations
,
, ... then their sum X1+X2+...+xn will also be normally distribute with mean μ1 + μ2 + ...+ μn
and variance σ12 + σ22 + ... + σn2.
A proof exists here: https://online.stat.psu.edu/stat414/lesson/26/26.1
Probability & Statistics
These 2 go together. Probability is the basic foundation for statistics. It's basic knowledge is needed in a couple of things that we do in AI and in VLSI.
Basic probability:
https://en.wikipedia.org/wiki/Probability_theory
Probability is a number from 0 to 1 => 0 means 0% probability and 1 means 100% probability. Probability of an event is rep by letter "P" => P(event). Sum or integral of probability of all possible outcomes will always be 1.
Discrete Probability Distribution: This is for events that are countable, i.e throwing a dice, tossing coin, etc.
P(X) = 0.4 => Probability of event "X" happening is 40%.
If we roll a dice, then probability of any number 1 to 6 showing up is 1/6. P(dice=1)=1/6, P(dice=6)=1/6
Continuous Probability Distribution: This is for events that occur in continuous space, i.e temperature of water, etc.
PDF: Probability distribution function: When we have a function which is continuous, then instead of having discrete probability number, we have continuous probability function. This is called pdf. Integral of pdf over all possible outcomes will be 1 (just as in discrete case, the sum was 1)
P(x1<x<x2) = ∫ f(x)dx, where f(X) is the pdf, and integral is taken over limits x1 to x2
Factorial:
Factorial is defined as multiplication of all numbers less than or equal to that number. It's denoted by ! mark. So, 3!=3*2*1. 1!=1. n!=n*(n-1)...*2*1
n! = n*(n-1)!.
We define 0! as 1, as that keeps it consistent with other mathematical formulas used in Permutation and Combination shown below. It seems like 0! should be 0, but keeping it 1 allows it to blend nicely with Permutation formula for non-zero numbers. We'll see that below.
Permutation and Combination:
The most important concept related to probability is figuring out all outcomes that are asked for a given event and divide it by all possible outcomes. As an ex, if probability of getting a 7 on throwing 2 dice is to be calculated, we can calculate as follows:
Number of ways 7 is possible E(sum=7)= (1,6), (2,5), (3,4), (4,3), (5,2) and (6,1) = 6 ways
Total number of possibilities of any number E(any sum) = 6 possibilities of 1st dice (1..6) * 6 possibilities of 2nd dice (1..6) = 6*6 = 36 ways
So, probability of getting 7 = E(sum=7)/E(sum=anything) = 6/36 = 1/6
Number of
Another general probability question is when we have to choose few things out of a given set of things, and we want to know of all different ways of doing it. This is where Permutation/Combination comes in. There are 2 handy formulas that we can use.
This link has very good explanation with the formula at the end: https://www.mathsisfun.com/combinatorics/combinations-permutations.html
Problems: Permutation + Combination
One of the biggest confusions in solving permutation/combination problems is to figure out whether the problem is a permutation problem or a combinatorial one. Many times it's not clear, and sometimes it's a mix of the 2. We'll look at some common problems below.
Basic Statistics:
https://en.wikipedia.org/wiki/Mathematical_statistics
Satistics is widely used in AI. There is a channel called "StatQuest" on Youtube that I found very helpful on learning basic statistics:
https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw
For any sample X, where x1, x2, ..., xn are the individual samples in X, we define various terms that are very important in stats. Let's review these terms:
Central Limit Theorem (CLT):
It is one of the great results of mathematics. It's used both in Probability and statistics. IIt's not going to be used anywhere in our material, but it's good to know this. It establishes the importance of "Normal Distributiion". Theorem is stated in link below:
https://en.wikipedia.org/wiki/Central_limit_theorem
ETS:
ETS is encounter timing system, which is a STA timing tool from cadence. It's similar to PT.
Steps: Below are the steps to run ETS.
dir: /db/proj_ch/design1p0/HDL/ETS/digtop
cmd: ets -10.1_USR1_s096 -nowin -f scripts/check_timing_mmmc.tcl | tee logs/run_et_mmc_timing.log => -nowin means no window, else gui window comes up
File: scripts/check_timing_mmmc.tcl:
----
setDesignMode -process 250 => sets process tech to 250nm. For 180nm, use 180. For 150nm, use 150.
#read min/max lib
read_lib -max /db/../synopsys/src/PML30_W_150_1.65_CORE.lib /db/../synopsys/src/PML30_W_150_1.65_CTS.lib
read_lib -min /db/.../synopsys/src/PML30_S_-40_1.95_CORE.lib /db/.../synopsys/src/PML30_S_-40_1.95_CTS.lib
#read verilog
read_verilog ../../FinalFiles/digtop/digtop_final_route.v
set_top_module digtop => only when we run this, is when all max/min lib, netlist files are analyzed.
source scripts/create_views.tcl => source views file, same as from Autoroute dir
set_analysis_view -setup {func_max func_min scan_max scan_min} -hold {func_max func_min scan_max scan_min}
#set propagated clk by entering interactive mode
set_interactive_constraint_modes [all_constraint_modes -active]
set_propagated_clock [all_clocks]
set_clock_propagation propagated
set_interactive_constraint_modes {}
#read min/max qrc spef files
read_spef -rc_corner max_rc ../../FinalFiles/digtop/digtop_qrc_max_coupled.spef
read_spef -rc_corner min_rc ../../FinalFiles/digtop/digtop_qrc_min_coupled.spef
#wrt sdf
write_sdf -min_view func_min -max_view func_max -edges check_edge ./sdfs/digtop_func.sdf => has both min/max not in same file
write_sdf -min_view func_max -max_view func_max -edges check_edge ./sdfs/digtop_func_max.sdf => min/max are both equal to max
write_sdf -min_view func_min -max_view func_min -edges check_edge ./sdfs/digtop_func_min.sdf => min/max are both equal to min
NOTE: synopsys sdc file used in create_views.tcl is used to gen the delay in sdf file. set_load units may not get set appr, causing mismatch in delay num for all o/p buffers b/w SNPS sdf and CDNS sdf. Look in PnR_VDI.txt for more info.
#set_analysis_mode: sets analysis mode for timing analysis.
#-analysisType => single: based on one op cond from single lib, bcwc: uses max delay for all paths during setup checks and min delay for all paths during hold check from min/max lib, onChipVariation: for setup, uses max delay for data path, and min delay for clk path, while for hold, uses min delay for data path, and max delay for clk path. default is onChipVariation.
#-cprr < none | both | setup | hold > => removes pessimism from common portion of clock paths. none: disables removoal of cprr, while both enables cprr for both setup and hold modes. default is none.
set_analysis_mode -analysisType bcWc -cppr both
#setAnalysisMode => this is the equiv cmd in vdio as set_analysis_mode in ETS. diff is that default analysisType is set to single (if only 1 lib is provided) or bcwc (if 2 lib are provided).
#set_delay_cal_mode -engine aae -SIAware true => this is used to set timing engine, as well as to specify if we want SI(similar to PTSI)
#log file in log dir to report timing, clk, violations, etc. If this file is clean, no need to look at files in rpts dir.
check_timing -verbose >> $check_timing_log
report_analysis_coverage >> $check_timing_log
report_case_analysis >> $check_timing_log
report_clocks >> $check_timing_log
report_constraint -all_violators >> $check_timing_log => If we see any max_cap violation, we can run "report_net_parasitics -rc_corner " to find out cap of that net. We can get gate cap of all load atached from .liberty files. NOTE: wire/via cap(in rc table) and gate cap(in stdcell lib) changes with corners.
report_inactive_arcs >> $check_timing_log => This is put at end of file since it's very long.
# generate separate reports for setup and hold for func/scan
# report_timing: reports timing (used in vdio/ets)
#-early/-late => use -early for hold paths and -late for setup paths.
#-path_type full_clock => shows full expanded path (in PT, we use full_clock_expanded to get same effect)
#-max_paths => max # of worst paths irrespective of end points (i.e paths with same end points will show up multiple times here). If we do not want to see multiple paths with same end point, we can exclude those by using -max_points. In this case, it shows only 1 worst path to each end point. If we want to see specific # of paths to each end point, use -nworst option along with -max_points. We can only use one of the 2 options => max_paths or max_points.
#-net => adds a row for net arc. This separates net delay from cell delay (else by default: net delay is added to the next cell delay)
#-format { .. } => default format is: {instance arc cell delay arrival required}. With -net option, it shows net also. net delay is shown with i/p pins (A,B,C), while cell delay is shown for o/p pins (Y). additional options as load, pin_load, wire_load are also helpful.
#-view => By default, the command reports the worst end-point(s) across all views. if we want to view results for a particular view. use that view. The view should have already been created using "create_analysis_view" and set using "set_analysis_view". i.e:
=> create_analysis_view -name func_max -delay_corner max_delay_corner -constraint_mode functional
=> create_analysis_view -name func_min -delay_corner min_delay_corner -constraint_mode functional
=> set_analysis_view -setup {func_max func_min} -hold {func_max func_min} => now, we can run setup or hold analysis on both func_max and func_min. For this run, we already set view to "-setup {func_max func_min scan_max scan_min} -hold {func_max func_min scan_max scan_min}"
report_timing -from -to -path_type full_clock -view func_max -early => reports partcular hold path for view func_max. NOTE that this will work only if hold is calc for analysis_view "func_max".
#func setup/hold at func_min/func_max
#if we do not specify -view below, then all views set currently will be used. So, for "early" all views "func_max, func_min, scan_max, scan_min" will be used and shown in the single report. Each path will show a view so it's easy to see which view was used for that particular timing of path. However, it's better to separate out func view and scan view reports. We could have also set_analysis_view to just func_max/func_min for this run, and then for the 4 scan reports, we could have set views to just scan_max/scan_min. It's same either way.
report_timing -path_type full_clock -view func_max -max_paths 2000 -early -format {instance cell arc load slew delay arrival required} >> $func_rptfilename
report_timing -path_type full_clock -view func_max -max_paths 2000 -late -format {instance cell arc load slew delay arrival required} >> $func_rptfilename
report_timing -path_type full_clock -view func_min -max_paths 2000 -early -format {instance cell arc load slew delay arrival required} >> $func_rptfilename
report_timing -path_type full_clock -view func_min -max_paths 2000 -late -format {instance cell arc load slew delay arrival required} >> $func_rptfilename
#scan setup/hold at scan_min/scan_max
report_timing -path_type full_clock -view scan_max -max_paths 2000 -early -format {instance cell arc load slew delay arrival required} >> $scan_rptfilename
report_timing -path_type full_clock -view scan_max -max_paths 2000 -late -format {instance cell arc load slew delay arrival required} >> $scan_rptfilename
report_timing -path_type full_clock -view scan_min -max_paths 2000 -early -format {instance cell arc load slew delay arrival required} >> $scan_rptfilename
report_timing -path_type full_clock -view scan_min -max_paths 2000 -late -format {instance cell arc load slew delay arrival required} >> $scan_rptfilename
exit
---------
Timing reports in ETS:
ex: recovery check
Path 1842: MET Recovery Check with Pin Iregfile/tm_bank3_reg_6/C
Endpoint: Iregfile/tm_bank3_reg_6/CLRZ (^) checked with trailing edge of 'clk_latch_reg' => generated clk(div by 2 of osc_clk). so waveform is 1 101 201
Beginpoint: Iclk_rst_gen/n_reset_neg_sync_reg/Q (^) triggered by trailing edge of 'osc_clk' => created clk with waveform 1 51 101
Analysis View: func_max => shows view, doesn't show path group
Other End Arrival Time 104.067 => denotes capture clk timing
- Recovery 0.592 => recovery time for LAH1B from lib (+ve number in lib). +ve means it should setup sometime before the clk edge. So, we subtract recovery time from clk path delay. For setup, we subtract setup time, while for hold we add hold time.
+ Phase Shift 0.000 => This is the clock period for setup(for 10MHz clk, it's 100ns phase shift added for next clk edge)
+ CPPR Adjustment 0.000
= Required Time 103.475 => clk path delay
- Arrival Time 61.653 => data path delay
= Slack Time 41.822
=> start of data path (launch path)
Clock Fall Edge 51.000 => start point of clk fall
+ Drive Adjustment 0.041 => adjusted by driver for clk (invx1 or so), this number is added within clk/data path for PT, after the source latency number.
= Beginpoint Arrival Time 51.041
Timing Path: => data path as in PT
------------------------------------------------------------------------------------------------------------
Instance Cell Arc Load Slew Delay Arrival Required
Time Time
------------------------------------------------------------------------------------------------------------
clkosc v 3.312 0.064 51.041 92.863 => clk fall at 51.04ns
clkosc__L1_I0 CTB02B A v -> Y v 53.942 0.224 0.308 51.349 93.171 => clktree latency
clkosc__L2_I3 CTB45B A v -> Y v 62.517 0.346 0.488 51.837 93.659 => clktree latency
Iclk_rst/n_sync_reg DNC12 CLK v -> Q ^ 72.593 2.889 2.026 53.863 95.685
Iregfile/U171 NO211 A ^ -> Y ^ 55.505 4.133 2.999 56.862 98.684
Iregfile/FE_OFC0_n12 BU110J A ^ -> Y ^ 77.212 3.081 2.460 59.322 101.144
Iregfile/FE_OFC1_n12 BU110J A ^ -> Y ^ 72.938 2.919 2.235 61.557 103.379
Iregfile/tm_reg_6 LAH1B CLRZ ^ 72.938 2.927 0.096 61.653 103.475 => final arrival time of clrz
------------------------------------------------------------------------------------------------------------
=> start of clk path (capture path)
Clock Rise Edge 1.000 => start point of clk rise
+ Drive Adjustment 0.082
# + Source Insertion Delay -1.267 => insertion delay added if indicated in constraints (usually not present)
= Beginpoint Arrival Time 1.082 => final clk after adjusting for driver
Other End Path: => clk path as in PT
-----------------------------------------------------------------------------------------------------------------------------------
Instance Cell Arc Load Slew Delay Arrival Required Generated Clock
Time Time Adjustment
-----------------------------------------------------------------------------------------------------------------------------------
clkosc ^ 3.312 0.154 1.082 -40.740 => clk rise at 1.08ns
clkosc__L1_I0 CTB02B A ^ -> Y ^ 53.942 0.266 0.300 1.383 -40.440 => clktree latency
clkosc__L2_I4 CTB45B A ^ -> Y ^ 58.511 0.390 0.409 1.791 -40.031 => clktree latency
Ireg/wr_stb_sync_reg DTCD2 CLK ^ -> Q v 5.878 0.250 0.564 102.355 60.533 clk_latch_reg Adj. = 100.000 => falling edge of latch clk is setup edge for data. So, clk adjustment is done by 100ns (1/2 clk_latch_reg cycle). In PT, this adjustment is done in start of clk path.
Ireg/U176 AN2D0 B v -> Y v 46.242 1.848 1.702 104.057 62.235
Ireg/tm_bank3_reg_6 LAH1B C v 46.242 1.847 0.011 104.067 62.245
-----------------------------------------------------------------------------------------------------------------------------------
Ex: For SR latches, data to data checks done:
#Path 4: VIOLATED Data To Data Setup Check with Pin Imtr_b/itrip_latch_00/SZ => indicates SZ is clk
#Endpoint: Imtr_b/itrip_latch_00/RZ (^) checked with leading edge of 'clk_latch_reg' => indicates RZ is data (endpoint of data is RZ). clk_latch_reg refers to clk of SZ pin.
#Beginpoint: mtr_b_enbl (v) triggered by leading edge of 'osc_clk' => indicates startpoint of data is mtr_b_enbl pin. osc_clk refers to the clk firing the mtr_b_enbl signal. So, osc_clk is also the clk firing RZ, as there's only comb logic b/w mtr_b_enbl signal and RZ pin.
#Path Groups: {in2reg}
#Other End Arrival Time 1.861 => this is clk delay for SZ starting from created or generated clk. here, it's gen clk "clk_latch_reg".
#- Data Check Setup 0.036 => this is internal data setup req of latch wrt clk. here, RZ should come 0.036ns before SZ, so subtracted
#+ Phase Shift -100.000 => now, actual phases of clks taken into account (in PT, phase shifts are part of data/clk delays, but not in ETS). here, osc_clk has period of 100ns, while clk_latch_reg has period of 200ns. since SZ(clk) comes from clk_latch_reg, it may change at 0ns or 200ns or 400ns and so on, while RZ(data) coming from osc_clk may change at 0ns or 100ns or 200ns and so on. For data to data setup, we try to meet data setup wrt first clk edge. First SZ +ve edge is at 0ns, while worst case RZ +ve edge occurs at 100ns (if RZ +ve edge at 0ns chosen, then easy to meet timing, also if RZ +ve edge at 200ns chosen, then 2nd +ve edge of SZ would be chosen, which makes this pattern repeat, so we choose worst possible setup which is 100ns in this case). Phase shift is added to clk.
#= Required Time -98.175
#- Arrival Time 22.125 => this is data delay for RZ rising starting from mtr_b_enbl pin
#= Slack Time -120.299 => final slack
PrimePower (PP):
Synopsys PrimePower Product family analyzes power consumption of design at various stages starting from RTL all the way to final PnR netlist. PrimePower provides vector-free and vector-based peak power and averaged power analysis capabilities for RTL and gate-level designs. It calculates the power for a circuit at the cell level and reports the power consumption at the chip, block, and cell levels. Supported power analysis modes include average power, peak power, glitch power, clock network power, dynamic and leakage power, and multivoltage power;
There are 2 flavors of PrimePower:
Using PrimePower:
PrimePower may be used standalone, or may be invoked from within other Synopsys tool as PrimeTime. Also PT may be invoked from within PP. Both tools share many of the same libraries, databases, and commands, and support power and timing analysis. We need separate licenses of PrimePower and PrimeTime irrespective of which way they are invoked. These are the 2 ways:
PrimePower (PP) and PrimeTime-PX (PT-PX):
When Synopsys initially came with their Power tool in 2000's, it used the 2nd option above (i.e Power tool was invoked from within PT). They called this tool PT-PX or PT with Power Analysis. Even though it was invoked from within PT_SHELL, it required separate license of PT-PX to run power. This tool calculated power only at gate level. Later they added the capability to calculate power at RTL level. This required power tool to be invoked separately. So, they introduced PrimePower (PP) as a standalone tool for Power analysis. PP could be invoked for both RTL and gate level Power. PT-PX was rebranded as belonging to "PrimePower family". For our purpose, PT-PX is treated same as PP.
Startup File: When PP or PTPX is invoked, we can have an optional synopsys startup file that will be sourced on startup. It's similar to PT startup file:.synopsys_pt.setup
PT-PX combines simulation time window to report power within a window.
All the options and cmds are almost same for PP and PT-PX. Inputs and Outputs are same too.
Inputs:
Outputs
PP:
PP can be invoked for both RTL and Gate. When we say PP, we mean gate level power runs. Only when we say PP-RTL is when we refer to PP running on RTL.
Steps:
Following are the steps to invoke PP:
0. Invoke pwr_shell normally => As you would for PT timing runs
pwr_shell -2012.12-SP3 -f scripts/run_power.tcl | tee logs/run_power.log => can be invoked in gui mode too. run_pwer.tcl has cmds for running power flow, so that we can run in batch mode (i.e automated). If we don't want it automated we can type the cmds on pt_shell too. By default, all cmds processed by tool including those in setup file are dumped in pwr_shell_command.log.
GUI: To start gui, type "gui_start" from pwr_shell window.
1. set library, read gate level verilog netlist and spef file => same as in normal PT flow. pwr is calc for chosen PVT corner.
set search_path "$search_path /db/pdkoa/1533e035/current/diglib/pml48h/synopsys/bin"
set target_library PML48H_W_85_3_CORE.db
set link_library {* PML48H_W_85_3_CORE.db}
read_verilog /db/ATAGO/.../FinalFiles/digtop_final_route.v => read final routed netlist
current_design digtop
link
set_app_var power_limit_extrapolation_range true => By default, PP extrapolates indefinitely if the data point for internal power lookup is out of range. When set to TRUE, the tool limits the extrapolation.this is false. That means that infinite extrapolation
set_operating_conditions
2. Invoke PT or restore PT session
update_timing
restore_session
check_power
read_parasitics /db/ATAGO/.../FinalFiles/digtop_final_route_max.spef => read max spef file
2. set power analysis so that PTPX license is invoked
set power_enable_analysis true => This is what invokes PP from within PT.
set power_analysis_mode averaged | time_based
check_activity
3. Read VCD file from one of the simulation (it needs to be gate level VCD file with back annotation of parasitics)
read_vcd /sim/ATAGO/.../sim1_max.vcd.gz -strip_path digtop_tb/IDUT/spi_regs -time {100489 800552} => strips module of interest so that pwr is reported starting from that module as top level. time is in ns.
#report_switching_activity > reports/power_swtching.rpt => to examine tr/sp (see below) and vcd file syntax
#write_activity_waveforms => generates activity waveforms from the activity file.
4. report power
#check_power -verbose => prior to analysis, verifies that analysis i/p are valid
#update_power => This is needed for RTL VCD or when no vcd provided to propagate activity to nets/registers not annotated from RTL VCD file.
#report_switching_activity => to examine propagated values of tr/sp
#create_power_waveforms -cycle_accurate => to show pwr waveform
report_power > ./reports/power_summary.rpt
report_power -hier > ./reports/power_hierarchy.rpt
#report_power -cell -flat -net -hier -verbose -nosplit > power_detail.rpt
save_session =>
exit
PT-PX:
When Synopsys initially came with their Power tool in 2000's, it used the 2nd option above (i.e Power tool was invoked from within PT). It called this tool PT-PX or PT with Power Analysis. Even though it invoked from within PT_SHELL, it required separate license of PT-PX. PT-PX combines simulation time window to report power within a window.
Steps:
Following are the steps to invoke PT-PX:
0. Invoke pt_shell normally => As you would for PT timing runs
pt_shell -2012.12-SP3 -f scripts/run_power.tcl |tee logs/run_power.log => can be invoked in gui mode too. run_pwer.tcl has cmds for running power flow, so that we can run in batch mode (i.e automated). If we don't want it automated we can type the cmds on pt_shell too.
run_power.tcl script above has following cmds:
1. set library, read gate level verilog netlist and spef file => same as in normal PT flow. pwr is calc for chosen PVT corner.
set search_path "$search_path /db/pdkoa/1533e035/current/diglib/pml48h/synopsys/bin"
set target_library PML48H_W_85_3_CORE.db
set link_library {* PML48H_W_85_3_CORE.db}
read_verilog /db/ATAGO/.../FinalFiles/digtop_final_route.v => read final routed netlist
current_design digtop
link
read_parasitics /db/ATAGO/.../FinalFiles/digtop_final_route_max.spef => read max spef file
2. set power analysis so that PP license is invoked
set_app_var power_enable_analysis true => This is what enables Power Analysis from within PT.
set power_analysis_mode averaged
3. Read VCD file from one of the simulation (it needs to be gate level VCD file with back annotation of parasitics)
read_vcd /sim/ATAGO/.../sim1_max.vcd.gz -strip_path digtop_tb/IDUT/spi_regs -time {100489 800552} => strips module of interest so that pwr is reported starting from that module as top level. time is in ns.
#report_switching_activity > reports/power_swtching.rpt => to examine tr/sp (see below) and vcd file syntax
4. report power
#check_power -verbose => prior to analysis, verifies that analysis i/p are valid
#update_power => This is needed for RTL VCD or when no vcd provided to propagate activity to nets/registers not annotated from RTL VCD file.
#report_switching_activity => to examine propagated values of tr/sp
#create_power_waveforms -cycle_accurate => to show pwr waveform
report_power > ./reports/power_summary.rpt
report_power -hier > ./reports/power_hierarchy.rpt
#report_power -cell -flat -net -hier -verbose -nosplit > power_detail.rpt
exit
PP:
Report: power summary report:
1. static power: Cell Leakage power. It's leakage in the cell from VDD to VSS when cell i/p is at 0 or 1 (subthreshold lkg from src to drn since gates never turn off completely). It includes gate lkg also (gate lkg is captured only for i/p pins for each transistor, as o/p pin will finally connect to i/p pin of some other transistor. gate lkg is just the current flowing into the gate when i/p of gate is 0 or 1). cell lkg pwr number comes from *.lib file. Pwr(lkg)=V*I(subthreshold_lkg)+V*I(gate_lkg).
It has a default lkg pwr number for each cell, as well as different lkg pwr numbers depending on diff i/p values. ex:
cell (AN210_3V) {
cell_leakage_power : 1.731915E+00; => default lkg pwr
leakage_power () { => we can have many of these conditions for each cell
value : 1.718650E+00; => lkg pwr = 1.7pW when A=1 and B=0. pwr unit defined as pw by "leakage_power_unit : "1pW";" in .lib file
when : "A&!B";
}
2. dynamic power: 2 components to this:
A. internal pwr: This includes short ckt pwr when cell o/p is switching, as well as pwr due to charging of internal nodes in the cell (due to src/drn cap on all o/p nodes and gate cap on internal nodes). cell int pwr number comes from *.lib file. Pwr(int)=Eint*Tr where Tr=number of toggles/time.
Just like timing() section, we have internal_power() section for o/p pin. It shows int pwr for each combination of i/p values (as pwr will change due to short ckt current, drn/src cap changing). ex:
cell (AN210_3V) {
pin (Y) { => pwr is always for o/p pin, since i/p pin pwr is calcualted separately as switching pwr.
internal_power () { => pwr unit is in pJ = power unit(pW) * time_unit(s) (it's energy, not power).
related_pin : "A"; => this is when o/p changes due to i/p pin A changing
rise_power (outputpower_cap4_trans5) { ... 34.39 .. } => pwr under diff cap load on o/p pin, and diff slew on i/p pin
fall_power (outputpower_cap4_trans5) { ... 34.39 .. } => fall_power is when o/p pin falls due to pin A rising/falling
}
internal_power () {
related_pin : "B"; => this is when o/p changes due to i/p pin B changing
rise_power (outputpower_cap4_trans5) { ... 34.39 .. } => rise_power is when o/p pin rises due to pin B rising/falling
fall_power (outputpower_cap4_trans5) { ... 40 .. } => 40pJ energy per toggle. Since time is in ns, pwr=mw??
}
}
}
B. switching pwr: This is due to charging/discharging of all the o/p load in design. This includes wire cap and gate cap on i/p pins which switch whenever o/p pin of any gate switches. Pwr(sw)=0.5*C*V^2*Tr. Tr=number of toggles/time.
Total_pwr = Pwr(lkg) + Pwr(int) + Pwr(sw) = Pwr(lkg) + Eint*Tr + 0.5*C*V^2*Tr (Pwr(lkg) and Eint come from.lib).
To calc avg pwr, static probability (Sp) is calcualted for all the nodes to be at 1 or 0. This is then used to calc lkg pwr for each cell. Toggle rate is caluclated for each node to calc dynamic pwr.
To calc peak pwr, vcd file is required to analyze events. It's useful for dynamic IR drop. If vcd file not provided, then tool doesn't know the seq of events. Merely toggle rate doesn't tell it whether all nodes toggle at same time or not.
When VCd file is not provided, default Tr/Sp is applied to starting points (PI, black box o/p). default Tr/Sp can be modified using (power_default_toggle_rate, power_default_static_probability)