Linear Functions:

Before we look into best fit functions, let's look at linear functions. Linear functions are functions that  satisfy these 2 requirements:

1. f(a*x) = a*f(x)

2. f(x+y) = f(x) + f(y)

These 2 requirements can be combined into one as f(a*x+b*y) = a*f(x)+b*f(y)

Linear functions are important as they state that any scaling and summation of linear functions is also linear and can be computed easily be separating the terms out. The single order polynomial f(x)=m*x+b is a linear function, while polynomials of higher order as f(x) = a*x^2 + b*x + c aren't.  But not all functions which look like linear are linear. We'll see examples below.

Best Fit Functions:

AI is all about finding a best fit function for any set of data. We saw in earlier article that for Logistic Regression, sigmoid sunction is a good function for best fit. However there is nothing special about a sigmoid function. From Fourier theorem, we know that a sum of sine/cosine functions can represent any function f(x) (with some limitations, but we'll ignore those). In fact, any function f(x) can be represented as infinite summation of polynomials of x (again with some limitations, but we'll ignore those). Sine/Cosine functions can also be represented as infinite summation of polynomials of x, so they are also able to represent any function f(x). Since any function can be rep as polynomial of x (Taylor's theorem), that implies that any function f(x) can be represented as summation of any other function g(x) that can be represented as infinite summation of polynomials. .

What about functions g(x) that are not infinite summation of x. Let's say g(x)=4+2*x. Will g(x) be able to represent any function f(x)? Since any func f(x) is infinite summation of polynomials, it can be approximated as finite sum of polynomials too. Of course, lower the number of polynomials terms we have in summation of f(x), less will be the accuracy in representing x. Let's see this with an example:

ex: f(x) = 3 + 7*x + 4*x^2 + 9*x^3 + .....

If g(x) = 4+2*x, then we can write f(x) = A*g(x). If we choose A=3, then then 3*g(x)=12+6*x, which is able to approximate f(x) though not exactly. Not only the higher powers of x are missing, but even the 1st 2 terms for f(x) don't match exactly with A*g(x). No matter how many linear combination of g(x) we use, we can't match the 1st 2 terms of f(x).

i.e f(x) = A1*g(x) + A2*g(x) = (A1+A2)*g(x), which is the same as B*g(x). So, we don't achieve anything better by summing the same function g(x) with different coefficients.

However, if we define 2 linear functions, g1(x) and g2(x), where g1(x)=4+2*x, while g2(x)=1+3*x, then A1*g1(x) + A2*g2(x) can be made to represent 3+7*x, by choosing A1=1/5, A2=11/5. Thus we are able to match 1st 2 terms of f(x) exactly.

However, if we had flexibility in choosing g(x), then we would choose g(x)=3+7*x. Then the 1st 2 terms of f(x) would match exactly with g(x), by using just 1 func g(x)

Similarly, if g(x) is chosen to be 2nd degree polynomial, i.e g(x)=1+2*x+3*x^2, then we can choose g1(x), g2(x), g3(x) to be 3 different 2nd degree poly eqn, and approximate f(x)=A1*g1(x) + A2*g2(x) + A3*g3(x). Or if we had flexibility in choosing g(x), then we would choose g(x)=3+7*x+4*x^2. Then the 1st 3 terms of f(x) would match exactly with g(x).

Continuing the same way, higher the order of g(x), closer will the approximation of f(x) with linear summation of any function g(x).

X as a multi dimensional vector:

Now let's consider eqn in n dimension, where f(x) is not a eqn in single var "x", but in "n" var x1,x2,...xn. i.e we define f(X) where X=(x1 x2 x3 .... xn).

Let's stick to 1st degree linear eqn g(x)=m*x+c. We define g1(x1)=m1*x1+c1, g2(x2)=m2*x2+c2, .... gn(xn)=mn*xn+cn

Then f(x1,x2,...,xn) = g1(x1)+g2(x2)+...+gn(xn) = m1*x1+c1 + m2*x2+c2 + .... mn*xn+cn = m1*x1 + m2*x2 + ... mn*xn + (c1 + c2 + ... + cn) = m1*x1 + m2*x2 + ... mn*xn + b (where b = c1+c2+...+cn)

So, for n dimensional space, if we choose g(x1,x2,...,xn) = m1*x1 + m2*x2 + ... mn*xn + b, then we can get a best fit n dimensional plane to function f(x1,x2,...,xn). However, the approximation function is 1st degree polynomial, so it doesn't have any curves or bends (just flat plane). This is a linear function.

Linear function with bendings:

What if we are able to introduce a bend in linear function g(x), so that it's not a straight line anymore. If we then add up these functions with bends, we can have any kind of bend desired at any point. Then we may be able to approximate any function f(x) with these function g(x) by having a lot of these g(x) functions with bends.

Let's see this in 3D, since multidimensional is difficult to visualize. We write above f(x) in 2D as:

f(x,y)=m1*x + m2*y = 2*x+5*y

gnuplot> splot 2*x+5*y => As seen below, this plot is a plane

 

Now, we take a simple function called absolute function. It has a bend, and slopes of 2 lines for x<0 and x>0 are -ve of each other.

gnuplot> splot abs(x) => As seen below, this plot has a bend at x=0

 

 

Now, we plot the same function as first one, but this time with abs functions applied to x and y. As you can see, we have bends so that we can generate planes at different angles to fit complex curves.

gnuplot> splot 2*abs(x)+5*abs(y)

 

 Is abs() function linear? It does look linear, but it has a bend (so 2 linear functions in 2 range).

Let's pick 2 points: x1=1 and x2=-1. Then abs(x1+x2) = abs(1-1) = abs(0) = 0. However, if we compute f(x1) and f(x2), we get f(-1)=1, and abs(1)=1. So. abs(x1)+abs(x2) = 2 which is not same as abs(x1+x2). So, abs() function is not linear. Similarly any 1st order eqn with a bend is not linear.

Taylor theorem tells us that any function can be expanded into infinite polynomial series. We should be able to find Taylor series for abs(x) function.

Note: f(x) = abs(x) = √(x^2) = √(1+(x^2 - 1)) = √(1+t) where t = (x^2 - 1)

√(1+t) is a binomial series which can be expanded into Taylor series as explained here: https://en.wikipedia.org/wiki/Binomial_series#Convergence

(1+t)^1/2 = 1 + (1/2)t - (1/(2*4))t^2 + ((1*3)/(2*4*6))t^3 - ...

So, f(x) = abs(x) = 1 + (x^2-1)/2 - (1/(2*4))(x^2-1)^2 + ((1*3)/(2*4*6))(x^2-1)^3 - ... = [1-1/2-1/(2*4)-...] + x^2*[1/2+1/4+...] + x^4*[-1/(2*4)+....] + x^6*[...] + ...

Thus we see that we get Taylor series expansion of abs(x) as summation of even powers of x. So, it is indeed not a linear eqn. As it's infinite summation, it can be used to represent any function as explained at top of this article.

ReLU function:

Just as absolute func has a bend and is not linear, many other linear looking functions can be formed which have a bend, but are not linear. One such function that is very popular in AI is ReLU (Rectified linear unit). Here instead of having slope as -1 for x<0, we make the slope=0 for x<0. This function is defined as below:

Relu(x) = x for x>0, = 0 for x<0

gnuplot> f2(x)=(x>0) ? x : 0 #this is the eqn to get a ReLu func in gnuplot
gnuplot> splot f2(x)

The above plot looks similar to how abs(x) function looked like, except that it's 0 for all x <0.

Now, let's plot a function which is a difference of the 2 Relu plots.

gnuplot> splot f2(x+5)-f2(x-5)

The Relu plot above ( Relu(x+5) - Relu(x-5) ) now has 2 knees at x=-5 and x=+5. It actually resembles a sigmoid function (explained below). However, it doesn't have smooth edges as in sigmoid func. Since sigmoid function can fit any func, linear sum of Relu func can also fit any func. The advantage with Relu is that it's similar to linear (it's linear in 2 separate regions, although it's not linear overall), so derivatives are straight forward.

There is very good link here on why Relu functions work so well in curve fitting (and how are they non-linear inspite of giving an impression of a linear eqn):

https://towardsdatascience.com/if-rectified-linear-units-are-linear-how-do-they-add-nonlinearity-40247d3e4792

 

Sigmoid function:

Sigmoid function being an exponential function, it's has higher powers of x in it's expansion, instead of just having "x" (i.e x, x^2, x^3, etc).

i.e σ(z) = 1 / (1 - e^(-z)) = A1 + A2*z + A3*z^2 + ... (taylor expansion)

Sigmoid function would fit better than Relu functions above as they have higher orders of x (so they have smooth edges). However, the are also more compute intensive, and so are not used except when absolutely necessary.

Let's plot a 2D sigmoid funcion, where z=a*x+b*y. We use gnuplot to plot the functions below:

 f1(x,y,a,b)=1/(1+exp(-(a*x+b*y)))

Plot 0:

gnuplot> splot f1(x,y,1,4) => As seen below, this is a smooth function varying from 0 to 1. Looks kind of similar to difference of Relu function plotted above.

Plot 1:

gnuplot> splot f1(x,y,2,1) => As seen below, plot is same as that above, except that the slope direction is different

Plot 2:

gnuplot> splot (2*f1(x,y,1,4) + 4*f1(x,y,2,1)) => Here we multiply the above 2 plots by different weights and add them up. So, resulting plot is no longer b/w 0 and 1, but varies from 0 to 6.

 

We define another sigmoid function, which is in 1 dimension

 g(x) = 1/(1+exp(-(x)))

Plot 3:

gnuplot> splot g(2*f1(x,y,1,4) + 4*f1(x,y,2,1)) => Here we took sigmoid of above plot, so resulting plot is confined to be b/w 0 and 1. However, because of the weights we chose, resulting plot ranges from 0.5 to 1, instead of ranging from 0 to 1.

Plot 4:

gnuplot> splot g(-2*f1(x,y,1,4) + 4*f1(x,y,2,1)) => almost same plot as above, except that z range here is from 0.1 to 1 (by changing weight to -ve number)

 

Summary:

 Now, that we know Relu and sigmoid functions are not linear, and in fact are polynomials of higher degree. As such, they can be used to represent any function, by using enough linear combinations of these functions. So, they can be used as fitting functions to fit any n dimensional function. These are used very frequently in AI to fit our training data. We will look at their implementation in AI section.

Course 1 - week 4 - Deep Neural Network:

This is week 4 of Course 1. Here we generalize NN from 1 hidden layer to any number of hidden layers. Maths get complicated, but it's repeating the same thing as in 2 Layer NN. 2 layer NN has one hidden layer and 1 output layer. L layer neural network has (L-1) hidden layers and 1 output layer. We don't count input layer in the number of layers.

There are few formulas here for forward and backward propagation. these form the backbone of DNN. These formula are summarized here:

https://www.coursera.org/learn/neural-networks-deep-learning/supplement/E79Uh/clarification-about-what-does-this-have-to-do-with-the-brain-video

There is very good derivation of all these equations here:

https://medium.com/@pdquant/all-the-backpropagation-derivatives-d5275f727f60

There are 2 programming assignments in this week: First we build a 2 layer NN and then a L layer NN to predict cat vs non-cat in given pictures. 2 Layer NN is just a repeat from last week's exercise, while L layer NN is generalization of 2 layer NN.

Programming Assignment 1: Here we build helper functions to help build a deep NN.  We also build helper function for a 2 layer NN separately.

Here's the link to pgm assigment:

Building_your_Deep_Neural_Network_Step_by_Step_v8a.html

This project has 3 python pgm, that we need to understand.

A. testCases_v4a.py => There are bunch of testcases here to test your functions as you write them. In my pgm, I've them turned on.

testCases_v4a.py

B. dnn_utils_v2.py => this is a pgm that defines couple of functions. 

dnn_utils_v2.py

These functions are:

  • sigmoid(): This calculates sigmoid for a given Z (Z can be scalar or an array). Output returned is both A (which is sigmoid of Z), and cache (which is same as i/p Z)
  • sigmoid_backward(): This calculates dZ given dA and Z. dZ = dA*σ(Z)*[1-σ(Z)]. We stored Z in cache (in sigmoid() above)
  • relu(): This calculates relu for a given Z (Z can be scalar or an array). Output returned is both A (which is relu of Z), and cache (which is same as i/p Z)
  • relu_backward(): This calculates dZ given dA and Z. dZ = dA for A>0 else dZ=0. We stored Z in cache (in relu() above)

We'll import this file in our main pgm.

C. test_cr1_wk4_ex1.py => This pgm just defines the helper functions that we'll call in our 2 layer and L layer NN model that we define in assignment 2. Below is the whole pgm:

test_cr1_wk4_ex1.py

Below are the functions defined in our pgm:

  • initialize_parameters() => This function exactly same as previous week's function for 2 Layer NN. Input to func is size of i/p layer, hidden layer and output layer. It initializes W1,b1 and W2,b2 arrays. W1, W2 are init with random values (Very important to have random values instead of 0), while b1,b2 are init to 0. It puts these 4 arrays in dictionary "parameters" and returns that. NOTE: To be succinct, we will use w,b to mean W1,b1,W2,b2, going forward.
  • initialize_parameters_deep() =>This initializes w,b for L layer NN (same as for 2 layer NN, but extended to L laeys). i/p is an array containing sizes of all the layers, while o/p is initialized W1,b1, W2, b2, .... WL,bL for L layer NN. All weights are bias are stored in dictionary "parameters"
  • Forward functions: These are functions for forward computation:
    • linear_forward() => It computes output Z, given i/p A, W, b. Z = np.dot(W,A)+b. this is calculated for a single layer, using i/p A (which is the o/p from previous layer) and computing Z. It returns Z and linear_cache which is a tuple containing (Aprev,W,b), where Aprev is for previous layer, while W, b are for current layer.
    • linear_activation_forward() => This computes activation A for Z that we calculated above for layer "l". The reason we separated out the 2 functions for computing Z and A, is because A requires 2 diff functions, sigmoid or relu for computing A (depending on which one we want to use for current layer. sigmoid is used for output layer, while relu is used for all other layers). This keeps code clean.
      • We call following functions:
        • linear_forward() => returns Z, linear_cache
        • sigmoid() => returns A, activation_cache
        • relu() => returns A, activation_cache
      • We store all relevant values in tuple cache:
        • linear_cache => stores tuple (Aprev,W,b), where Aprev is for previous layer, while W, b are for current layer.
        • activation_cache => stores computed Z for current layer
        • cache => stores tuple (linear_cache, activation_cache) = (Aprev, W, b, Z). In previous week example, we used cache to store A, Z for both layers (A1, Z1, A2, Z2), but here we store W, b too for each layer on top of A (for previous layer) and Z (for current layer) in tuple cache.
      • The function finally returns A for current layer and cache. So, we end up returning (Aprev, W, b, Z, A), where Aprev is for previous layer, while W, b, Z, A are for current layer.
    • L_model_forward() => This function does forward computation staring from i/p X, and generating o/p Y hat  (i.e output AL for last layer L). This is same as forward_propagation() function that we used in last week's example. It's just more complicated now, since it involves L layers now, instead of having just 2 layers. We define tuple "caches", which is just all cache appended.
      • From layer 1 to layer (L-1) (hidden layers), we call function linear_activation_forward()  with "Relu" function in a for loop (L-1) times
        • In each loop, cache and A are returned for that layer. A is used in next iteration, while cache is appened to tuple "caches"
      • For last layer L (o/p layer), we again call function linear_activation_forward(), but this time with "sigmoid" function
        • cache and AL are returned for last layer. AL is going to be used in compute_cost() function (defined below), while cache is appened to tuple "caches"
  • compute_cost() => computes cost (which is the log function of AL,Y).
  • Backward functions: These are functions for forward computation. They are the same as their forward conterpart, just going backward from layer L to layer 1.
    • linear_backward() => This is the backward counterpart of linear_forward() func. Given i/p cache and dZ for a given layer, it computes gradients dW, db, dA. Input cache stores tuple (Aprev, W, b). NOTE: dW computation requires A from previous layer
      •     A_prev, W, b = cache 
      •     dW = 1/m * np.dot(dZ,A_prev.T)
      •     db = 1/m * np.sum(dZ,axis=1,keepdims=True)
      •     dA_prev = np.dot(W.T,dZ)
    • linear_activation_backward() => This is the backward counterpart of linear_activation_forward() func. Instead of computing A from Z, this computes dA for previous layer given dA (from which dZ is computed) for current layer.
      • We call following functions (same as what used in linear_activation_forward(), but now in backward dirn):
        • sigmoid_backward() => returns dZ given dA for sigmoid func
        • relu_backward() => returns dZ given dA for relu func
        • linear_backward()=> using dZ returned by sigmoid/relu backward func above, it computes dA_prev, which is dA for previous layer (since we are going in reverse dirn)
      • The function finally returns dA for previous layer and dW, db for current layer.
    • L_model_backward() => This is the backward counterpart of L_model_forward(). This function does backward computation staring from o/p Y hat  (i.e output AL for last layer L) and going all the way to the input X. It returns dictionary "grads" containing dW, db, dA.
      • dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
      • dA{L-1}, dWL, dbL => computed using func linear_activation_backward() for layer L. Uses dAL from above as i/p to this func
      • Now, we run a loop from layer L-1 to layer 1 to compute dA, dW, db for each layer "l"
        • dA{l-1}, dWl, dbl => computed using func linear_activation_backward() for layer "l". Uses dAl from prev iteration as i/p to this func to compute dA{l-1}. It uses dA{L-1} from above for l=L-1 to compute da{L-2} and then keeps on iterating backward.
      • Finally it returns dictionar grads containing dW, db, dA for each layer
  • update_parameters() => This function is same as that in previous week exercise. computes new w,b given old w,b and dw,db, using the learning rate provided. This is done for w,b for all layers 1 to L (i.e W1=W1-learning_rate*dW1, b1=db1-learning_rate*dW1, .... , WL=WL-learning_rate*dWL, bL=dbL-learning_rate*dWL)
  • 2_layer_model()/L_layer_model => These are the main func, but they are not called here. They are part of assignment 2.

 

Programming Assignment 2: Here we use helper functions defined above in assignment 1 to help build a 2 Layer shallow NN and a L layer deep NN. We find optimal weights using training data and then apply those weights on test data to predict whether the picture has a cat or not.

Here's the link to pgm assigment:

Deep+Neural+Network+-+Application+v8.html

This project has 2 python pgm, that we need to understand.

A. dnn_app_utils_v3.py => this is a pgm that defines all the functions that we defined in assignment 1 above (both from dnn_utils_v2.py and test_cr1_wk4_ex1.py). So, either we can use our functions from assignment 1 or use functions in here. If you wrote all functions in assignment 1 correctly, then it should match all functions in this pgm below (except for few difference noted below).

dnn_app_utils_v3.py

The few differences to note in above pgm are:

  • load_data() function: This function is extra here. It is exactly same as load_dataset() that we used in week2 assignment to load cat vs no cat dataset. Here too we load the same cat vs no cat dataset that's in h5 file.
  • predict(): This prints accuracy for any i/p set X (which can have multiple pictures in it). It uses w,b and generates output y hat for the given X. If y hat > 0.5, it predicts it as cat, else non cat. It then compares the results to actual y values, and prints accuracy. It only calls 1 function=> L_model_forward. In returns probability array "p" for all pictures. I added extra var "probas" (which is the output value y hat), so that we can how close or far off were different predictions, even if they were correct or wrong. This gives us a sense of how our algorithm is doing.
  • print_mislabeled_images(): This takes as i/p dataset X,Y along with predicted Y hat, and plots all images that aren't same as what was predicted (i.e wrongly classified)
  • IMPORTANT: initialize_parameters_deep() function: This function is same as what we wrote in assignment 1 above, with a subtle difference. Here we use a different number to initialize w. Instead of multiplying the random number by 0.01, we multiply it by 1/ np.sqrt(layer_dims[l-1]) for a given layer l. As you will see, this causes a lot of difference in getting the cost low. With 0.01, our cost starts at 0.693148, and remains at 0.643985 at 2400 iteration. Accuracy for training data remains low at 0.65. However, using the new sqrt multiplier, our cost starts at 0.771749, and goes down to 0.092878 at iteration 2400, giving us a training data accuracy of 0.98.

We'll import this file in our main pgm below

B. test_cr1_wk4_ex2.py => This pgm calls functions in dnn_app_utils_v3.py.  Here, we define our algorithm for 2 layer NN and L layer NN by calling functions defined above. We find optimal weights, by trying out algorithm on training data.. We then apply those weights on test data to see how well our NN predicts cat vs non cat. Below is the whole pgm:

test_cr1_wk4_ex2.py

Below are the functions defined in our pgm:

  • two_layer_model() => This function implements a 2 layer NN. It is mostly same as previous week's function for 2 Layer NN which was called nn_model(). The big difference is that we used tanh() function for hidden layer, while here we'll use relu function for hidden layer. Input to func is size of i/p layer, hidden layer and output layer. On top of that we provide i/p dataset X, o/p dataset Y and a learning rate. The function returns optimal W1,b1,W2,b2., These are the steps in this function:
    • calls func initialize_parameters() to init w,b
    • It then iterates thru cost function to find optimal values of w,b that gives the lowest cost. It forms a "for" loop for predetermined number of iterations. Within each loop, it calls these functions:
      • linear_activation_forward() => Given values of X,W1,b1, it calls func linear_activation_forward()  with relu to get A1. It then calls linear_activation_forward()  again with A1,W2,b2 and sigmoid to get A2. computes A2(i.e Y hat). It returns A2 and cache.
      • compute_cost() => Given A2,Y,  it computes cost
      • Then it calc initial back propagation for dA2 = - (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2))
      • linear_activation_backward => Given dA2 and cache, it calls linear_activation_backward to get dA1, dW2, db2. It then calls linear_activation_backward()  again with dA1 and cache to get dA0, dW1,db1. It stores dW1,db1,dW2,db2 in dictionary grads.
      • update_parameters() => This computes new values of parameters using old parameters and gradients from grads.
    • In beginning, w and b are initialized. We start the loop and in first iteration, we run the 4 functions listed above to get new w,b based on dw, db, and learning rate chosen. Then we start with next iteration. In next iteration, we repeat the process with newly computed values of w,b fed into the 4 functions to get even newer dw, db, and update w,b. We keep on repeating this process for "num_iterations", until we get optimal w,b which hopefully give lot lower cost than what we started with.
    • It then returns dictionary "parameters" containing optimal W1,b1,W2,b2
  • L_layer_model() => This function implements a L layer NN. It's just an extension of 2 layer NN. Input to func is size of i/p layer, hidden layer and output layer. On top of that we provide i/p dataset X, o/p dataset Y and a learning rate. The function returns optimal W1,b1,...,WL,bL., These are the steps in this function:
    • calls func initialize_parameters_deep() to init w,b
    • It then iterates thru cost function to find optimal values of w,b that gives the lowest cost. It forms a "for" loop for predetermined number of iterations. Within each loop, it calls these functions:
      • L_model_forward() => Given values of X,W1,b1, it calls func linear_activation_forward()  with relu to get A1. It then calls linear_activation_forward()  again with A1,W2,b2 and sigmoid to get A2. computes A2(i.e Y hat). It returns A2 and cache.
      • compute_cost() => Given A2,Y,  it computes cost
      • Then it calc initial back propagation for dA2 = - (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2))
      • L_model_backward => Given dA2 and cache, it calls linear_activation_backward to get dA1, dW2, db2. It then calls linear_activation_backward()  again with dA1 and cache to get dA0, dW1,db1. It stores dW1,db1,dW2,db2 in dictionary grads.
      • update_parameters() => This computes new values of parameters using old parameters and gradients from grads.
    • In beginning, w and b are initialized. We start the loop and in first iteration, we run the 4 functions listed above to get new w,b based on dw, db, and learning rate chosen. Then we start with next iteration. In next iteration, we repeat the process with newly computed values of w,b fed into the 4 functions to get even newer dw, db, and update w,b. We keep on repeating this process for "num_iterations", until we get optimal w,b which hopefully give lot lower cost than what we started with.
    • It then returns dictionary "parameters" containing optimal W1,b1,...,WL,bL

Below is the explanation of main code (after we have defined our functions as above):

  1. We load our datset X,Y by using func load_data(). We then flatten X and normalize X (by dividing it by 255)
  2. We then run 2 NN on our data: 1 is 2 layer NN, while other is L layer NN. We can choose which one to run by setting appr variable. Size of i/p layer for both examples below is fixed to 12288 (64*64*3 which is the total number of data points associated with 1 picture). Size of o/p layer is fixed to 1 (since our o/p contains just 1 entry: 0 or 1 for cat vs non cat). Size of hidden layers is what we can play with, since it can be varied to any number we want.
    1. 2 layer NN:
      1. We call two_layer_model()  on this X,Y training dataset. We give dim of i/p layer, hidden layer and output layer, and set num of iterations to 2500. Hidden layer size is set to 7.
      2. Then we call  predict() to print accuracy on both training data and test data which is pretty low as expected.
      3. Then we print mislabeled images by calling func print_mislabeled_images.
    2. L layer NN:
      1. We call function L_layer_model() with i/p X,Y training dataset and number of hidden layers set to 3 (So, it's a 4 layer NN).
      2. Then we call  predict() to print accuracy of L NN on both training data and test data,  which is lot higher than 2 layer NN.
      3. Then we print mislabeled images by calling func print_mislabeled_images.
  3. Then we run the NN (2 Layer or L layer depending on which one is chosen) on our 10 picture dataset that I downloaded from internet (same as what we used in lecture 1, week 2 example). These are all cat pictures. In predict(), we return "y hat" also, so we are able to see all predicted values.

Results:

On running above pgm, we see these results:

2 layer NN: It achieves 99.9% accuracy on training data, but only 72% on test data.

 Cost after iteration 0: 0.693049735659989

...

Cost after iteration 2400: 0.048554785628770226
Accuracy: 0.9999999999999998
Accuracy: 0.72

When I run it thru my 10 random cat pictures downloaded from internet, I get 90% accuracy. Below are the A (y hat) value and the final predicted value . As can be seen, accuracy is very low at 60%.  Even for ones that were predicted correctly, y hat activation values are not 99% for all correct ones.

Accuracy: 0.6
prediction A [[0.2258492  0.88753723 0.04103057 0.97642935 0.87401607 0.85904489 0.49342905 0.99138362 0.96587573 0.3834667 ]]
prediction Y [[0. 1. 0. 1. 1. 1. 0. 1. 1. 0.]]

4 layer NN: It achieves 99% accuracy on training data and 80% accuracy on test data. For 1st layer, size=20, 2nd layer size=7, 3rd layer size=5 and 4th layer size=1 (since it's o/p layer). size of i/p layer is 12288.

Cost after iteration 0: 0.771749

.........

Cost after iteration 2400: 0.092878
Accuracy: 0.9856459330143539
Accuracy: 0.8

 As in 2 Layer NN, when we run 4 layer NN thru the same 10 random cat pictures, I get 90% accuracy which is lot higher than 2 layer NN. Below are the A (y hat) value and the final predicted value . As can be seen, even though accuracy is 90%, the algorthm completely failed for picture 10 which is reported as 0.2, even though it's a perfect cat picture (may be the background color made all the difference. Will need to check it with different background color to see if it makes any difference). The other picture that is right on borderline is the 6th picture. Here, may be too much background noise (things around the cat) is causing the issue. Will need to check with different background to see if that helps.

Accuracy: 0.9

prediction A [[0.99930858 0.97634997 0.96640157 0.9999905  0.95379876 0.5026841 0.92857836 0.99693246 0.99285584 0.21739979]]
prediction Y [[1. 1. 1. 1. 1. 1. 1. 1. 1. 0.]]

Initialization of w,b: If we used the initialization multiplying factor of 0.01 instead of 1 / np.sqrt(layer_dims[l-1]), we'll get a very bad accuracy: 65% on training set and 34% on test set. even worse is the fact that on our 10 random cat images, we get 0% accuracy. This all with just using a different initialization number for different layers. Perhaps this will be explored in next lecture series.

This is what initialization multiplying factor is for different layers (instead of using constant 0.01 for all layers, we increase this value as size of layer increases):

l= 1 => 1 / np.sqrt(layer_dims[l-1]) = 1 / np.sqrt(12288) = 0.009

l=2 => 1 / np.sqrt(layer_dims[l-1]) = 1 / np.sqrt(20) = 0.22

l=3 => 1 / np.sqrt(layer_dims[l-1]) = 1 / np.sqrt(7) = 0.38

l=4 => 1 / np.sqrt(layer_dims[l-1]) = 1 / np.sqrt(5) = 0.45

NOTE: Do NOT forget to change this multiplying factor of 0.01 if you plan to use your own functions from assignment 1 above.

Summary:

Here we build a 2 layer NN (with 1 hidden layer) as well as a L layer NN (with L-1 hidden layers). We can play around with lot of parameters here to see if our L layer NN (here we chose L=4) performs better with more layers, or more hidden units in each layer, or with a different initialized values, or different learning rates, etc. It's hard to say which of these values will give us the optimal results without trying it out. This will be the topic for Course 2 series.

 

all_* cmds:

These all_* cmds return collection of objects of that type. Various options provide what attr we want for that object collection.

 



all_clocks => creates collection of all clcoks. No args for this cmd. ex: all_clocks => returns all clks in design

 



all_inputs => Creates a collection of all input ports.

ex: all_inputs -clock CLK1 => returns only those i/p ports that are clocked by CLK1

 



all_outputs => Creates a collection of all output ports.

ex: all_outputs -clock CLK1 => returns only those o/p ports that are clocked by CLK1

 



all_registers => Creates a collection of register (Flip flop or latches) cells or pins. This is very useful cmd to trace all flops/latches fired by a particular clk. Particularly helpful during clk tree debug, as it shows only the sink endpoints. The only endpoint that is missing from this collection is any output port connected to the clock. Lots of arguments possible.

syntax: all_registers <options>

options:

  • -clock <clk_name> => only returns reg clocked by given clk. We can provide only 1 clk name here as providing multiple clks will error out (SEL-006 Error). To see only flops or latches, use option <-edge_triggered | -level_sensitive>. For flops, we may also specify -rise_clock <clk_name> or -fall_clock <clk_name> to see only flops which are triggered by either rising or falling edge of given clk.
  • <-clock_pins | data_pins | -output_pins | -async_pins> By default, reg name shown (or use option -cells), but we can report corresponding pins of cells by specifying these options. -clock_pins option most useful.
  • <-no_hierarchy> => Considers only the current instance; does not descend the hierarchy. This is useful to isolate regs in different modules

ex: all_registers -clock CLK1 => -clock returns only those reg those are clocked by CLK1. Without -clock option, all regs shown (irrespective of whether they are clocked in design or not), but by adding -clock CLK1, only regs shown which are actively driven by CLK1 (i.e not disabled or tied off). This resulting collection can be passed thru foreach collection loop.

 



all_clocks => Creates a collection of all clocks in design. Fast and easy way to see all clks. No options supported. We generally use  get_clcoks cmd to get clocks. See in "clk cmd" section for details.

 


 

all_fanin => Creates a collection of pins, ports, or cells in the timing fanin of specified objects (pins, ports or cells), specified via -to. The fanin stops at the timing startpoints (clock pins of registers or PI). To see only the startpoint and not the whole path, we use option "-startpoints_only". Since most of the times we are not interested in the whole path, but just at startpoints, we use option "-startpoints_only".There are many other options as follows:

 

  • -from/-through can be used to restrict the fanin thru speciified pins, ports or cells.
  • -only_cells includes cells only (and not pins/ports) in timing fanin.
  • -flat should be used to traverse fanin across hier, else by default fanin doesn't cross hier.
  • -trace_arcs may be used to control what kind of combinational arcs to trace. By default (or -trace_arcs timing), only valid timing arcs are traversed (disabled arcs + invalid case analysis arcs not traversed) , but by using "-trace_arcs enabled", invalid case analysis paths are also traced (disabled arcs are still not traced). By using "-trace_arcs all", both disabled arcs as well as invalid case analysis paths are traced.
  • -levels allows us to stop traversal on reaching a depth of certain vells from the object in -to list. So, "-levels 1" will go only 1 level deep. This allows us to see paths one depth at a time.
  • -continue_trace generated_clock_source => This option is very useful for traversing clock network paths, as it allows tracing thru the source pin of generated clocks, instead of stoppping at seq pin of gen clk source. In most cases, you will want to use this option.
    • IMP: For a clk gater cell, the generated clk is sometimes defined at o/p pin or clk pin of clk gater. This is done in cases where we want the generated clk to be defined as async to the parent clk (as an example, bist clks are defined on the ck gaters, and then bist clks declared async to func clk). In such cases, all_fanin will stop at generated clk pin, as that's a timing startpoint. If we define generated clk on Q pin, and do all_fanin, then fanin will stop at Q pin. There will be no fanin from Q to CP pin, unless we use this option (or we define gen clk on CP pin).

ex: all_fanin -flat -startpoints_only -to mod1/..reg_2/D => shows startpoints only (not whole path) of all fanin to the D pin of this reg. Startpoints may be PI or clk pin of other flops or Q pin of clkgaters.

 

report_transitive_fanin => This is a reporting cmd, but is included here since it's very similar to all_fanin. Produces report showing transitive fanin (not timing fanins in all_fanin) of specified objects (pins, ports or nets), specified via -to. We can provide -from/-through to constrain the fanin. A pin is considered to be in the transitive fanin of an object if there is a timing path through combinational logic from the pin to that object. So, not sure how it's different than timing fanin of all_fanin cmd. We can use -trace_arcs option as in all_fanin cmd. The fanin stops at the clock pins of registers or PI. Fanin is provided within the current instance, so if we want to see all fanin, current instance should be set to top module. NOTE: this is reporting cmd, so can't be used in scripts (as it doesn't o/p a collection)

ex: report_transitive_fanin -to FF1/D => Shows driver of i/ pin of flop (FF1/D pin), then the driver of i/p pins of this driver and so on until it gets to PI or clock pins of reg.

 


 

all_fanout => same as all_fanin except that it reports objects in timing fanout. Here, -from specifies objects whose fanout we want (for fanin, we used -to). The fanout stops at timing endpoint (D or other i/p pin of registers or PO). Again option "-endpoints_only" may be used to report only endpoints, instead of the whole path. There's -clock_tree option to constrain the search to objects in clock network only (-clock_tree and -from are exclusive, only one of them can be used). All other options are the same as all_fanin.

ex: all_fanout -flat  -endpoints -from mod1/..or2/Y => shows endpoints only for all fanout from Y pin of this OR gate.

report_transitive_fanout =>this is similar to report_transitive_fanin, except that it gives fanout report. However, there is an addition option "-clock_tree" as in all_fanout.

ex: report_transitive_fanout -from FF1/Q => Shows load of o/p pin of flop, then the driver pin of that load, and the load connected to that pin and so on.

 


 

all_connected => Creates a collection of objects connected to a speciifed net, pin, or port object, or a collection of exactly 1 net, pin or port object. -leaf option when used with a net returns global or leaf pins. This very useful to see all the objects connected to a given net, and then trace thru a given path.

ex: all_connected [get_nets CLOCK] => shows all objects connected to net "CLOCK"

Get all connected pins of a net: There are 2 ways: 1 shown under "PT - object access functions" section. That uses "get_pins -of_objects ... -leaf" and the other is  "all_connected ... -leaf".

  • ex: all_connected  mod1/IO_port6 -leaf => Here it shows all leaf pins of gates connected to this port of module (ports of modules are actually pins, since ports are only for top level)
    {"mod2/GATE_and2_0/ZN", "mod1/mod3/I_OR3/A", "mod4/I_DFF/CP"}

 


 

NOTE: Above 3 cmds along with report_transitive_fanin/report_transitive_fanout are used to debug and trace timing paths, when we want to see the logic structure. We can see logic structure by bringing up gate level schematic of the netlist in any other tool (such as Verdi), but advantage here is that it has the ability to show only valid timing paths after accounting for inactive case_analysis and disabled timing arcs. This helps to find out where case_analysis may not have been set correctly, or why some timing path abruptly ends.

 

object access cmds:

There are lot of cmds in CAD tools to access objects. These functions in Synopsys/Cadence tools return a collection of objects. Ex of such cmds are get_*, all_* (i.e. all_clocks, get_ports, etc). All these cmds below which return a collection of objects have some common options that can be provided. Collection returned by these cmds may be parsed thru a for loop to access each object of the collection. These cmds/collections being part of SDC can be applied to both Synopsys/Cadence tools.

ex: foreach_in_coll myobj [get_cell .... ] { get_att $myobj clock ... } => Here each object of collection is accessed individually inside the loop.

NOTE: Not all cmds below may not have exactly the same syntax or the same behaviour across vendors, even though they are SDC cmds. I've explained the cmds and corresponding results below as per PrimeTime manual.

Syntax: Syntax of object access function is as shown below:

syntax: cmd <options> patterns => all options are optional

patterns => result of cmd is passed to the pattern filter. There it is compared against these patterns for match, and only matching items are returned. Patterns can be plain pattern, list or collections. Only one pattern argument is supported (i.e "get_cells a_1 b_2" will return error as there's more than 1 arg). Though you are not required to put quotes around patterns, it's good practise to put patterns in " ". This way you can provide more than 1 pattern ( get_cells "a_1 b_2" will work fine as quotes make that whole string as 1 arg). Wildcards * and ? are supported, where * matches 0 or more characters, whereas ? matches exactly 1 character. Note that these match only until separator "/" found, so *,? do not match by descending into the hierarchy. Ex: get_cells *cell1* will match only if cell1 is in top level, i.e matches abcell12, cell1, but won't match A/cell1 or A/cell12 or C/D/abcell12, etc. If we want to match a cell in module A/B, then we have to provide full path as "get_cells A/B/*cell1*" => This will go into hier A/B and look for *cell1* over there. However, it will still not descend hier, i.e it won't go to hier A/B/C to find cell1. If we want to go one level down, we can use something like this: "get_cells A/B/*/*cell1*" => This will descend 1 level down in hier from A/B, and in all those hier modules, it will search for cells *cell1", i.e it will find cells A/B/C/cell1, A/B/D/my_cell1, but not cell A/B/cell1 or A/B/C/E/cell12. So, we need the full hier of cell that we are looking for, i,e number of / in that hier should be known as * will not jump across /. Ex: get_cells *A*/I_cell will only look one hier down in *A* module (i.e I_A, A_tmp, I_A_B, etc). It won't look in tmp/A/I_cell  or A/B/I_cell as that will require crossing hier. Since they may become cumbersome for searching thru all of design as we may not know the leves of hier, option -hier (explained below) is provided which matches across hierarchy.

NOTE: get_cells "top/a_1 top/b_1" will work as it has full names of 2 cells. However, if we have multiple collections (instead of multiple names or patterns), then we have to combine all collections into 1 collection, before using it with the cmd (as collections are just pointers). ex: get_cells "$a_1 $b_1" will error out as both a_1 and b_1 are collections. We have to use cmd =>  get_cells [add_to_collection $a_1 $b_1] => Here $b_1 is added to $a_1 coll and that is passed as arg to get_cells, which works !!

options =>

  • -hier => this option allows us to search for patterns across hierarchy starting from current instance. It's similar to unix "find" cmd. However, we don't provide hier path in pattern, we just provide the final object name. Ex: get_cells -hier *cell1* will match cell1 across all hier, i.e A/cell1, A/cell12, etc. However, with this option, we cannot provide hier path as A/B/*cell1*. It will error out with "Error: Nothing matched". This is because just as "find" cmd, it looks for exact pattern match across all hier. The cell name in object database is cell1, and NOT A/B/cell1, even though cell1 exists in module A/B, and gets reported as A/B/cell1. So, it's important to understand the diff b/w patterns when using hier and when not using hier. When not using -hier, we can provide the full hier path to descend to, but when using hier, we can only provide the final net or cell name, but not the hier. "-hier" option matches any matching pattern at any hier. i.e mod12 will match A/mod12 even though A/mod12 has further modules inside it as A/mod12/B. If we want to limit our search to specific hier as A/B, we should use -filter option (-filter "full_name =~ */A/B/*"). There is no other way to do it.
    • IMP: However, what I've seen is that the pattern that you can provide when you use -hier option is dependent on the cmd itself. For ex: get_cells, get_pins and get_nets accept different kind of patterns when used with -hier. See in their respective cmd description below.
  • -regexp -nocase => These are used when more complex pattern matching is required. *,? match simple character, but if we want to match let's say 2 digits followed by 1 letter, then we need to use regular expression. -regexp allows us to use regular exp for matching (i.e \d+). using \d+ with -regexp will look for 1 or more decimal digits, but if we do not use -regexp, then it will look for \d+ in the objects returned for a match. By default, upper/small cases are matched. -nocase means case insensitive matching. Regexp is same as tcl regexp. Use rigid quoting with { } around patterns as in tcl regexp. If -filter option is used (explained below), then regexp also modifies filter operators to match regexp, instead of simple wildcard patterns. There are some interesting cases with regex. Ex below:
    • Ex 1: get_cells -regexp  {my_mod/m[0-9]+_sys/cpu/mem/a_m[0-9]+} => Here we enclosed the pattern in {...}. That way the pattern is only used by getcells cmd as a regex. Here it matches name "with numbers 0-9 in them. This is the most common and preferred way to use regex.
    • Ex 2: get_cells -regexp  my_mod/m[0-9]+_sys/cpu/mem/a_m[0-9]+ => Here we did NOT enclose the pattern in {...}. So, tcl interpretor interprets this pattern before the regex gets into picture. Since it has [..] which executes a tcl cmd, it starts executing [0-9] which it finds is an invalid cmd. So, we get error "Error: unknown command '0-9' (CMD-005)"
    • Ex 3: get_cells -regexp  my_mod/m\[0-9\]+_sys/cpu/mem/a_m\[0-9\]+ => It's same as above, except that we preceeded "[" with backslash metacharacter, which prevents "[" to be treated as a metacharacter but instead as a literal for tcl interpretor. So, the pattern passed to the regex is "my_mod/m[0-9]+_sys/cpu/mem/a_m[0-9]+" which is exactly what we achieved with curly braces in 1st ex above.
    • Ex 4: get_cells -regexp  {my_mod/m\[0-9\]+_sys/cpu/mem/a_m\[0-9\]+} => This is same as above, except that we added { ... }. Compared to 1st ex, the only diff is that we now have backslash "\" with "[", which should cause "[" to be treated as literal and not as metachar. So, this should not have found any cell, as there's no cells with names "...m[0-9]..". To my surprise, it gives exactly same answer as ex 1 above.
    • Above we proposed use of { ... } for regexp pattern. However, when we have var that need to be expanded than {..} become a problem, as the var to be substituted may have it's own curly braces. In such cases, we have to get rid of the curly braces used with regexp. See ex below.
      • Ex: get_cells -regexp  {${mod_name}/m[0-9]+_sys/cpu/mem/a_m[0-9]+} => This searches for "${mod_name}/m...". It didn't expand $mod_name with it's value. We get a warning/error "Warning: No cell objects matched '${mod_name}/m...". Even if we put backslashes as "\$\{mod_name\}" or $mod_name, it keeps giving the same error. The only way to resolve it is to get rid of the outer {...} that we used with regex. We also have to put "\" before [,] (as in Ex 3 above). Then tcl interpreter is able to expand ${mod_name} and also treat [..] correctly. So, the fixed version is "get_cells -regexp ${mod_name}/m\[0-9\]+_sys/cpu/mem/a_m\[0-9\]+"
  • -exact => This matches exactly the pattern, without even substituting simple *,?. This is helpful, when *,? are part of the name of object itself, and you do not want *,? to be used as wildcards
  • -filter <expression> => result returned is filtered against the criteria here.  expression should be put in "expr". This is different than pattern matching, as here we can match for object's attributes. When -regex is used, it allows regexp to be applied only for patterns (NOT for expr in filter). If you have to use regex in expr of -filter, then use "filter_collection" cmd from SDC section. If expr evaluates to true, result is returned. Filter operations are series of relations (relation is an attribute name compared with a value thru a relational operator) joined together with AND/OR operators. The relational operators are ==(equal), !=(not equal), =~(matches pattern), !=(doesn't match pattern), >, <, >=, <=. Existence operators are defined, undefined.
    • NOTE: =~ should be used for patterns while == is used for exact match. So, full_name == *reg* will exact match "*reg*" (won't consider * as special char, same as using option "-exact"), while full_name =~ *reg* will match for pattern "reg" (i.e my_reg, reg_1, etc will match).
    • ex: -filter {"is_hierarchical == true AND undefined(sdf) && full_name =~ *\/mod1\/*reg1*"} => matching patterns match across hier i.e "/",
    • ex: -filter "full_name =~ *mod1*reg1*" will match across hier as mod1/reg1 or names as mod2/mod1_reg1_3 etc. having backslash char before regular forward slash is not really reqd, i.e "full_name =~ */mod1/*reg1*" is equally valid syntax. curly braces is optional, but it helps to keep everything inside filter as 1 block. This way of filtering is very powerful as this is the only cmd that works to find a cell/net, etc across specific hier in a design, when you don't know the levels of hier.
  • -quiet => Suppresses warning and error messages if no objects match.  Syntax error messages are not suppressed. Should not use -quiet as it could hide some design issue.
  • -of_objects <objects> => we can also use shorthand -of instead of -of_objects. cmd applies only to these objects. Depending on the cmd, it will look for collection of cells, etc connected to these objects specified here (objects can be either names or collection of ports, pins, gates, etc). NOTE: This is the most important option to trace design.

Ex: get_cells * -filter "is_hierarchical == true" => same as filter_collection cmd above. This is more efficient as objects are filtered out even before they are included in collection.

A lot of commands in CAD tools returns collection, while lot of commands expect collections as arguments. Cmds that expect collections as arguments cannot be given patterns or names, or else they will error out.

Many of the above options can be used across multiple cmds in timing, synthesis and PnR tools. All options above can be used as args in below cmds.

 


 

1. DESIGN get_* cmds => get_cells, get_clocks, get_nets, get_designs, get_pins, get_ports. These cmds find these objects relative to the current instance (i.e current hierarchy) from the current design (i.e instances in design, not lib cells in design).

  • get_cells => useful to find all cells or particular matching cells in design. i.e find all flops, or gates with certain drive strength, etc. It reports all cells (i.e modules in design. they don't have to be leaf cells).
    • ex: get_cells top/cell1* => {"top/cell1", "top/cell12"} => this gets cells present in "top" module only. Doesn't match "A/top/cell1" as explained above.
    • ex: get_cells -hier * => gets a collection of all cells across all hier. -hier => traverse thru all hier, otherwise it searches only in curent hier and doesn't descend down. NOTE: it shows modules, etc too.
    • ex: get_cells -hier cell1* => {"top1/cell1", "top1/cell1", "top2/mod3/cell12", ...} => this gets cells matching cell1* across all hier of design. NOTE: we can't do "get_cells -hier top1/cell1*" as explained above in common section.
    • ex: get_cells -hier {U_and2 U_or2} => returns cell {"chip/mod1/U_and2", "chip/U_and2, "chip/U_or2"} => Here it looks for exact cell name match as there are no wild card char. We can provide multiple cell names in the list, and it will look for any of the matching ones.
    • ex: get_cells -hier cell1* -filter "full_name =~ *top2/mod3/*" => {"top2/mod3/cell12", "A/B/top2/mod3/C/cell1"} => Above -hier option doesn't allow us to select specific modules to look in for that cell. This is one of the ways to get all matching cells in a certain hier. This just searches for matching name in the full name of cell returned, and if it matches, it returns those cells (* are needed in full_name pattern, else exact names will be searched for). This is the most popular form of using get_cells cmd. i.e always use "full_name" filter cmd as it always works, and is less error prone than writing complex patterns.
    • ex: get_cells -hier -filter "full_name =~ *top2/mod3/*cell1*" => This has no search pattern for cells, so searches for all cells. However, the search is limited to filter expr "*top2/mod3/*cell1*".
    • Finding all leaf cells within all hier of design: This is tricky as there is no "-leaf" option (as in get_pins which has a -leaf option). By default, get_cells reports all modules/sub-modules along with leaf cells which is underirable. Using -filter "is_hierarchical == false" gets leaf cells only
      • get_cells -hier * -filter "is_hierarchical == false" => this returns a collection of all leaf cells across all hier.
    • Finding specific cells within all hier of design:
      • ex: get_cells -hier * -filter "ref_name =~ SDF*" => this returns a collection of all cells which have reference cell name starting with SDF (i.e scan data flops). Basically, it gets all scannable flops in design, across all hier.
    • Finding specific cells within a given hier of design:
      • ex: get_cells -hier * -filter "ref_name =~ *SYNC_LIB* && full_name =~ I_top/pad/u_pcie/*" => This gets all cell under hier "I_top/pad/u_pcie/" whose ref name has SYNC_LIB in it. This will find cells as I_top/pad/u_pcie/cell1, I_top/pad/u_pcie/mod2/cells2, etc. i.e anything within the hier mentioned.
    • Finding all instances of a given lib cell: Ex is under  "get_lib_cells" section below.
  • get_pins => used to find all pins in design. Pins are the names assigned to I/O ports of modules or pins of library cells. These same names are carried to pins of instances (i.e D pin of "MSFLOP" library cell is also named as D pin of "reg1" flop, which is an instance of MSFLOP). -leaf option is unique to "get_pins". It returns only leaf (non hier) pins. Hier boundaries are crossed automatically to find such pins, so -hier option is not allowed with -leaf option. The pins reported are 1 driver pin and multiple load pins. If -leaf is not used, then pin names of modules and then lower level modules are shown which is similar to a net name, and not of much use to us. -leaf option is widely used to find driver of a net. See ex below
    • ex: get_pins o*/CP => {"o_reg1/CP", "o_reg2/CP", "o_reg3/CP"} => same pattern style as in get_cells
    • ex: get_pins "chip/mod1/U_and2/*" => returns 3 pins {"chip/mod1/U_and2/A1", "chip/mod1/U_and2/A2", "chip/mod1/U_and2/ZN"}
    • ex: get_pins -hier A1 => errors out with "Error: Nothing matched". This should have matched as in above ex, we see there is a pin A1 on cell "chip/mod1/U_and2/A1". Similar "get_cells -hier cell1" gave us valid cell names. The reason is that pins are stored along with name of their cell as one enity, so just a pin name by itself won't match anything. So pin Z is actually I_and2/Z. So, we have to provide matching cell name to which the pin is attched to.
      • ex: get_pins -hier U_and2/A1 =>this returns valid pin {"chip/mod1/U_and2/A1"}. So, this matches our understanding that cell_name/pin_name is considered one entity for pins for pattern matching. "get_pins -hier *U_and2/A1" also returns correct results, as does "get_pins -hier *A1" or get_pins -hier */A1". This is because they all match object "U_and2/A1". The hier before the name of the cell (U_and2) doesn't matter as that is not considered for matching purpose. It's only the name of the immediate cell that the pin is attached to and the pin, that is considered. So, "get_pins -hier */U_and2/A1" or "get_pins -hier chip/mod1/U_and2/A1" returns error "Error: Nothing matched", as the pin object name is U_and2/A1, so any other pattern fails matching (even though "chip/mod1/U_and2/A1" pin exists. "get_pins -hier *U_and2*" matches all pins in all hier for cells *U_and2*, so top/I_U_and2/A1, top/I_U_and2_0/A2, U-and2/Z, etc will match.
      • ex: get_pins -hier *DATA[*] => Here DATA[*] is port of a lower level module. Ports of a module are considered as pins (Ports are ONLY top level input/output ports of design). So, this will report DATA[*] ports of all submodules. We may include the submodule name also, as part of pin, as the pin is attched to submodule. ex: get_pins -hier *module1/DATA[*] => returns Top/I_module1/DATA[0], I_top/.../module1/DATA[1], etc.
      • ex: get_pins -hier gate_or2/A1 -filter "full_name=~*top1*" => This is similar usage as in get_cells ex above. Matching with "full_name" allows us to select specific modules, here top1. This matches all pins in any module where the path contains "top1", i.e A/c_top11_m/gate_or2/A1 matches and is returned in results. This is useful in cases where we want pins only from a certain hier.
      • ex: get_pins -hier gate_or2/A1 -filter "full_name=~*" => This matches cell "gate_or2/A1" in any module in any hier. So, it's similar to "get_pins -hier gate_or2/A1". We could have also used *gate_or2/A1 which would have matched blah_gate_or2/A1, but we could not use */gate_or2/A1
    • get_pins -quiet -of_object $latch_coll -filter "full_name=~ */Q and defined(clocks)" => Here we are getting Q pins of all clocked latches in a collection. This is easy way to extract any pin of cells from a given cell collection.
  • get_nets => creates collection of nets.  Since the same net can have multiple names at different levels of hier, 3 additional options are provided to get the name of net in any way desired (i.e -top_net*, -segments and -boundary_type). "-boundary_type <type>" allows 3 options of lower, upper or both (lower returns lower level or net inside the hier block, upper returns higher level or net outside the hier block, while both returns both nets. -of_objects option is required when using "-boundary_type" option. By default, upper or higher level net is returned on a hier pin. See PT manual for more details on options.
    • ex: get_nets block1/NET* => {"block1/NET1QNX", "block1/NET2QNX"}
    • ex: get_nets top/*A/* => returns all nets in top/*A module (i.e top/MA/net1, top/NO_A/net2, etc)
    • ex: get_nets chip/mod1/U_and2/* => This errors out "Nothing matched" as leaf cells (i.e gates, IP or any lib cells) just have pins and no nets defined (assuming U_and2 is a lib cell i.e AND2_NOM_*).
    • get_nets -hier NET1* => gets NET1* net in any hier of design. So, -hier option in get_nets behaves same way as in "get_cells -hier" cmd.
    • get_nets -of_objects [get_pins $pin_collection] -top_net_of_hierarchical_group -segments => This returns top hier net name that all the specified pins are connected to. This is useful in correlating net names in gate level to net names in RTL, as top level net is most likely the same name in both RTL and gate. This is particularly helpful in tracing when certain pins in gate level have set_case of 0 or 1, and we want to trace that net in RTL.

NOTE: all 3 cmds above, get_cells, get_pins and get_nets when used with option "-of_objects" can be combined to navigate a design, and are very powerful for tracing connectivity in design. Given any object we can trace it's drivers or loads (i.e if we want to trace back connection of a net, we can find o/p pins connected to that net, then find cells for those pins, then continue this path recursively by finding i/p pins of that cell and nets connected to that pin, and find pins of other driving cell and keep on going back by using this option)

Get all pins of cell: get_pins -of [get_cells [get_attribute $cell full_name]] => Here we get pins of given cell.In short form, we can also write: get_pins -of [get_cells $cell]

Get all connected pins of a net: There are 2 ways: 1 shown under "PT - all cmd" section. That uses  "all_connected ... -leaf". The other option is one shown here using get_pins -of ... -leaf".

ex: get_pins -of_objects [get_nets top/net1]  -leaf => This returns output in form {"top/inv/Z", "top/and2/A", "mod2/phy/IN1"} where one of them is driver, others are all load pins. pins may be ports on PHY or BlackBox also (as ports of PHY/BlackBox are eventual driver or load pins). If we don't use -leaf option, then only pins at hier module level are shown.

  • Get Drivers only: If we restrict ourselves to driver only, we use option: -filter "pin_direction==out".
  • Get Loads only: To restrict ourselves to load only, we use option: -filter "pin_direction==in". 
  • Get cells instead of pins: If we want to find cells instead of pins, we just do get_cells -of [get_pins $pin_collection] where $pin_collection is collection of all pins we got above. Or we can directly write => get_cells -of [get_pins -of ... -leaf]. We can't get cells directly by using "get_cells -of_objects [get_nets top/net1]" => This will show next level cell which may be a module and NOT the leaf cell. So, we have to go thru finding leaf pins, and then find leaf cell names via that route.

Get all nets on a pin: set net_name [get_attribute [get_net [all_connected [get_pin $full_pin_name]]] full_name] => assigns collection of "net" object to "net_name"

Finding Object type: The above cmds can be used to find out if the given object is a cell, pin, port or net. (i.e if "get_cells obj_name != "", then that means it's a cell).

  • get_ports => displays all ports of current design or current instance. Ports are I/O ports of design at current level. This cmd doesn't have -hier option, as there are no ports at lower hierarchical level for a design.
  • get_clocks => creates collection of clocks. ex: set clock_list [get_clocks "PHI*" -filter "period==15"] =>
  • get_timing_arcs => Creates a collection of timing arcs. This is same as get_lib_timing_arcs explained below, except that here it reports arcs for cells in design. Arcs would be the same as those reported for lib cells by using get_lib_timing_arcs. This cmd is useful in situations where we want to see the arcs on instance itself, instead of trying to find the lib cell for that instance.
  • get_timing_paths => creates collection of paths for custom reporting. It has a lot of options, which are different than std options for get_* cmds above. Most of the options here are same as those for report_timing cmd, We most commonly use report_timing cmd to report paths satisfying certain criteria. However to automate doing any operations on these paths, it is not easy to work with report_timing. get_timing_paths allows us to do any kind of custom manipulation/reporting for any of the paths, and get any attribute of these paths (i.e slack, etc). Widely used when automating scripts for large designs to report paths.

 


 

2. LIB get_lib_* cmds: These cmds are same as 1 above, except they find this info from inside library (lib loaded into the CAD tool, irrespective of whether they are used or not), and not from design. -hier option is not supported for any of get_lib* cmd. We can use option -of <object_name> to get lib data of specified objects (this may be name of libcell in library, or instance name depending on the cmd) or just provide the pattern w/o -of, in which case it'll return the lib/lib_cells matching the pattern (direcly from .lib file)

  • get_libs => Creates a collection of libraries in memory whose name matches the pattern. These library are the names mentioned in .db files under "library(...)". "-of <lib_cell_name>" option provides the lib corresponding to given lib cell.
    • ex: get_libs => {"TSM_W_125_2.5V", "TSM_S_-25_3.6V", ...} => NOTE: lib names may also be with .db extension (depending on what's the name provided in library(...) of .db file). This .db doesn't mean it's the file name, it's still the name of library.
      • Cadence Genus: get_libs
    • get_libs -of */tsmc_INV_D2 => This will give us the library of tsmc_INV_D2 stdcell. o/p will be {"TSM_W_125_2.5V"} or whatever libs we have which have cell "tsmc_INV_D2" in them
    • ex: get_libs *S_-25* => This will list all libs matching this name. NOTE: libraries are listed in o/p, but we have to query the collection to display all libraries. To display all libs directly (for viewing purpose), use list_libs.
    • list_libs => lists all libs read into PT. This is output as a list, so easier to be read. No options provided except -only_used which lists only those libs that are linked to design. M=>max lib, m=>min lib, *=>main lib.
      • ex: list_libs -only_used => Lib1 /db/.../lib1_PVT.db: Lib1    Lib2 ...   and so on. So, it shows all libs along with paths, as well as which are min/max. So, more commonly used.
  •  get_lib_cells (or get_lib_cell) => Creates a collection of library cells from libraries in memory whose name matches the pattern. These lib cells are the ones with string "cell(<cell_name>) { ... }". This string is present within library section (shown in get_libs above) of .db file. It is similar to get_cells, except that it displays cell definition (cell library name), and not cell instances (instance name used in design). Usually get_cells cmd is used as an input to get_lib_cells cmd, so that it's in proper format.
    • ex: get_lib_cells CGP* =>  prints all cells starting with CGP for all libraries => o/p in DC/EDI = TSM_W_125_2.5_CTS.db/CGPT80    TSM_S_-40_3.6_CTS.db/CGPT80 ....
      • Sometimes "get_lib_cells *" says "no matching cells found" as it's looking for cells not inside .db files but at top hierarchy. In order to get cells inside .db lib, we have to specify the library name. See next example:
    • ex: get_lib_cells TSM_W*/CGP* => prints all cg cells in all libraries starting with TSM_W =>o/p is diff for Synopsys vs Cadence tools
      • o/p in DC/EDI = TSM_W_125_2.5_CTS.db/CGPT80 TSM_W_125_2.5_CTS.db/CGPT40 TSM_W_125_2.5_CTS.db/CGP80 TSM_W_125_2.5_CTS.db/CGP40,
      • o/p in RC = /libraries/TSM_W_125_2.5_CTS.db/libcells/CGP40 ...
    • ex: get_lib_cells */*myphy* => This shows all *myphy* cells in any library. 1st * refers to any library(...)
    • ex: get_lib_cell -of_object Top/abc/my_an2 => When we use -of, then we get lib_cell corresponding to that object (object should be a cell instance name). This shows the lib cell used for that gate instance. o/p is {"TSM_W_125_2.5_stdcell/AN2"}. This is very powerful cmd used in any design to find out which libcell is getting used for gates that are reported in timing paths. "-of" or "-of_objects" is required, else it errors out "nothing matched" as it's looking for lib cells with that name (in absence of -of, it becomes like ex 1 ,2 and 3 above). We can use quotes too for object, i.e "Top/abc/my_an2"
    • ex: get_lib_cells -of [get_cells -hier *phy*] => same as above, except here we used -hier option to look in all of design for that phy cell. This is helpful when we just know the instance name of the phy, but don't know where in design is it. get_cells gets the o/p in proper format to be passed to get_lib_cells.
    • Finding all instances of a given lib cell: ex: get_cells -of TSM_2.5/AN2 => This shows all cell instances in design which are using lib cell TSM_2.5/AN2. So, this is reverse of what we use get_lib_cells for. get_lib_cells gives lib_cell for a given instance, while get_cells gives all instances in design for a given lib cell.
  • get_lib_pins => Creates a collection of library cell pins from libraries in memory whose name matches the pattern. It is similar to get_pins, except that it displays cell library pins, and not cell instance pins.
    • ex: get_lib_pins -of_objects o_reg1/Q => returns {"misc_cmos/FD2/Q"} => this shows how lib pin is used by pin in netlist. As can be seen, it's same pin name "Q" in both lib cell and instantiated cell
    • ex: get_lib_pins misc_cmos/AN2/* => returns {"misc_cmos/AN2/A", "misc_cmos/AN2/B", "misc_cmos/AN2/Z"}. NOTE: /* is needed at end to get all pins. If we just do "get_lib_pins misc_cmos/AN2" then we'll get an error "can't find lib pin misc_cmos/AN2" since it's a cell and not a pin.
  • get_lib_nets => There is no counterpart of "get_nets" cmd above, for libcells, as lib cells don't have nets accessible inside them.
  • get_lib_timing_arcs => Creates a collection of library arcs. This is very important to check that all arcs present in .lib or .db are showing here correctly for all cells. Every arc reported here is an arc with "timing_type" under timing section of that pin. So, if we find that D pi of Flop has 8 arcs for D pin in .lib file, we should see 8 arcs in this collection under arcs for D pin. The options supported here slightly different from standard options above. They are -to, -from, -of_objects, -filter, -quiet.

Ex: Since get_* cmds give collections, we will need to process this collection to display attributes of interest. The below proc (taken from synopsys website) gets all relevant info from each arc to display it nicely. Attributes store the arc info. The various attributes of lib timing arcs are: from_lib_pin, to_lib_pin, sense, sdf_cond, mode, object_class (=lib_timing_arc for all lib arcs), is_disabled, is_user_disabled. sdf_conf and mode may not exist in all libraries, so comment them out if not needed (else you may see a lot of warnings)

proc show_lib_arcs {args} {
 set lib_arcs [eval [concat get_lib_timing_arcs $args]]
 echo [format "%15s %-15s %18s %18s %18s" "from_lib_pin" "to_lib_pin" "sense" "sdf_cond" "mode"]
 echo [format "%15s %-15s %18s %18s %18s" "------------" "----------" "-----" "--------" "-----"]
 foreach_in_collection lib_arc $lib_arcs {
  set fpin [get_attribute $lib_arc from_lib_pin]
  set tpin [get_attribute $lib_arc to_lib_pin]
  set sense [get_attribute $lib_arc sense]

  set is_disabled [get_attribute $lib_arc is_disabled]
  set sdf_cond [get_attribute $lib_arc sdf_cond]
  set mode [get_attribute $lib_arc mode]
  set from_lib_pin_name [get_attribute $fpin base_name]
  set to_lib_pin_name [get_attribute $tpin base_name]
  echo [format "%15s -> %-15s %18s %18s %18s %18s" $from_lib_pin_name $to_lib_pin_name $sense $sdf_cond $mode $is_disabled]
  }
}

pt_shell> get_lib_cell -of_object Top/u_SPI_PAD => {"tsm_1p6v_m25c/LOW_PAD_V"} => This gives the PAD cells being used in design.
pt_shell> show_lib_arc -of_objects tsm_1p6v_m25c/SDF_SVT => getting lib arcs for SVT flop. 2 arcs on D pin for setup/hold, 1 c2q arc for Q pin and 1 min_pulse_width arc for clk pin. "from_lib_pin" is the related_pin in the lib, while "to_lib_pin" is the real pin whose arc is defined wrt the related_pin.

   from_lib_pin to_lib_pin                   sense           sdf_cond              mode
   ------------          ----------                   -----               --------                    -----      
             CP -> D                    hold_clk_rise         cond1    etm => n such arcs if n diff sdf conditions and/or mode exist. "timing_type" is "hold_rising" on D pin (related_pin as CP) in .lib file
             CP -> D                   setup_clk_rise      ncond1     etm => timing_type : setup_rising
             CP -> CP              clock_pulse_width_high     cond2     etm => n such arcs if n diff sdf conditions and/or mode exist. timing_type : min_pulse_width, rise_constraint

            CP -> CP              clock_pulse_width_low     cond2     etm => timing_type : min_pulse_width, fall_constraint
             CP -> Q                      rising_edge  D==1'b1&&SE==1'b1     etm => It's "timing_sense : negative_unate" and "timing_type : rising_edge" in .lib. n such arcs if n diff sdf conditions and/or mode exist

pt_shell> show_lib_arc -of_objects tsm_1p6v_m25c/INV_LVT => getting lib arcs for LVT INV

   from_lib_pin to_lib_pin                   sense           sdf_cond              mode
   ------------          ----------                   -----               --------                    -----      

              I -> ZN                  negative_unate     n/a    n/a => Here sdf_cond and mode are not defined for inverter cell. There is only 1 arc for inv in .lib. It's "timing_sense : negative_unate" and "timing_type : combinational"

NOTE: see report_lib cmd below which are alternative cmds to display same info in user readable format.

 


 

report_* cmds:  Since get_* cmds above return collection, it's not easy to view them unless we write a prc to go thru the collection. So, we have corresponding report_* cmds to display the output in user readable format. option "-nosplit" is supported for all of these report_* cmds which makes these reports more readable. -nosplit prevents line splitting, so that it's easier to read and extract info by other tools.

3. Design report_* cmds: These are report cmds for corresponding design get_* cmds . report_cell and report_hierarchy are most used ones as they give info about all cells with references used.

  • report_cell => This cmd shows cell info like reference lib cell, library it comes from (doesn't show the path for library file though), area, min/max OC (operating conditions) used, rail voltage, etc. By default, it shows info for current design. However, we can specify a instance name to generate info for only that instance. Lib shown is the link_lib, while OC shown are from inside the link lib. However, sometimes lib shown may not be link lib, although OC shown may be from link_lib (this happens in PT DSLG scaling flow where multiple libs are loaded, look into that section). Few more report_cell_* cmds supported.
    • ex: report_cell => reports info of all cells in current design. It doesn't go down the hier. Most of the times, our "current_design" is top level, so it gives info for top level design and blocks/cells at top level
    • ex: report_cell I_top/X/an2 => It shows ref cell, library and Max/Min OC, etc for cell specified. More than 1 cell can be specified via double quotes "I_a/an2 I_3/Y/nr3 ...".
    • ex: report_cell -verbose -connections => shows all above info for current design + connections of pins of cells and pins/nets they connect to. So, the driver/load pins of other cells connecting to cells of current design are shown. 
  • report_reference => displays all references in the current design (to whatever current design is set to, it. It doesn't go down the hier. This doesn't have any more options (except for -nosplit). This cmd not used as "report_cell" already does more than what this cmd does. NOTE: equiv DC cmd "report_reference" has different syntax and allows -hier option. Look in DC note.
  • report_hierarchy -full => displays references for all leaf cells in the current design (It goes down the hier). So, this cmd very useful to see the whole hierarchy and ref cells being used at the lowest leaf cells. option -noleaf omits reference cell info for leaf cells.
  • report_design => Displays attributes of the current design. Very useful cmd to see all libs, op cond etc used for current design.
  • report_port => Displays summarized info (maxcap, maxload, maxfanout, drive res, i/p transition time, driving cell, i/o-o/p delay, etc) about all ports in the current design. Lots of options supported
  • report_net => Just as in report_port, it reports info about all nets in current design. Lots of options supported
  • report_clock => shows all clock info. Explained in "PT clk cmd" section. Few more report_clock_* cmds supported.

4. Lib report_* cmds: report_lib and report_lib_groups are the only 2 cmds supported here. These are report cmds for corresponding lib get_lib* cmds.  As mentioned above, the above get_* cmd is cumbersome cmd to use to display all arcs. An easier way is to use: report_lib.

  • report_lib lib_name => this displays all lib cells for that library (We provide the name listed in library section of .lib file). To get more info about few lib cells, provide names of cells too.
    • ex: report_lib TSM_W_5Vt {AN2_SVT OR2_ULVT} => reports detailed info about these 2 cells from named library. Omitting names of lib_cells, reports all cells.
    •  ex: report_lib -timing_arc tsm_1p6v_m25c {SDF_SVT} => To get details on timing or power arcs, use -timing_arcs (or -timing) or -power_arcs (or -power)

                            Arc                   Arc Pins
   Lib Cell  Attributes    #  Type/Sense      From        To         When
   ----------------------------------------------------------------------------
   SDF_SVT
                 s         0  hold_clk_rise   CP          D          !SE&SDI => multiple arcs may exist for multiple sdf conditions
                           2  setup_clk_rise  CP          D          !SE&SDI

                         12 clock_pulse_width_high      CP          CP         SE&SDI => for timing_type "min_pulse_width" rise constraint
                         27 clock_pulse_width_low       CP          CP          SE&SDI => for timing_type "min_pulse_width" fall constraint
                         31 rising_edge     CP          Q          !SE

 

  • ex: report_lib -timing TSM_W_150_1.75_CORE.db {DTB10 } => cell DTB10 is a flop and it has both PREZ/CLRZ with Q/QZ o/p
    Total 21 arcs below:
                                Arc                   Arc Pins
       Lib Cell  Attributes    #  Type/Sense      From        To         When
       ----------------------------------------------------------------------------
       DTB10         s     0  clock_pulse_width_low            => min_pulse_width_low on pin (CLK)
                                                  CLK         CLK
                               1  clock_pulse_width_high        => min_pulse_width_high on pin (CLK)
                                                  CLK         CLK
                               2  removal_rise_clk_rise        =>  removal_rising on pin=PREZ, related pin=CLK
                                                  CLK         PREZ    
                               3  recovery_rise_clk_rise          =>  recovery_rising on pin=PREZ, related pin=CLK
                                                  CLK         PREZ
                               4  hold_clk_rise   CLK         D     => hold_rising on pin=D, related pin=CLK
                               5  setup_clk_rise  CLK         D     => setup_rising on pin=D, related pin=CLK
                               6  removal_rise_clk_rise          => same as for PREZ pin
                                                  CLK         CLRZ
                               7  recovery_rise_clk_rise        => same as for PREZ pin
                                                  CLK         CLRZ
                               8  rising_edge     CLK         QZ    => rising_edge on pin=QZ, related pin=CLK
                               9  rising_edge     CLK         Q     => same as for QZ above
                               10 clock_pulse_width_low              => min_pulse_width_low on pin=PREZ
                                                  PREZ        PREZ
                               11 nonseq_hold_rise_clk_rise        => non_seq_hold_rising on pin PREZ, related pin=CLRZ
                                                  PREZ        CLRZ
                               12 nonseq_setup_rise_clk_rise    => non_seq_setup_rising on pin PREZ, related pin=CLRZ
                                                  PREZ        CLRZ
                               13 preset_high     PREZ        QZ    => preset type on pin QZ, related pin=PREZ (positive unate)
                               14 clear_low       PREZ        QZ    => clear type on pin QZ, related pin=PREZ (positive unate)
                               15 preset_low      PREZ        Q    => preset type on pin Q, related pin=PREZ (negative unate)
                               16 nonseq_hold_rise_clk_rise        => same as for PREZ pin
                                                  CLRZ        PREZ
                               17 nonseq_setup_rise_clk_rise    => same as for PREZ pin
                                                  CLRZ        PREZ
                               18 clock_pulse_width_low        => same as for PREZ pin
                                                  CLRZ        CLRZ
                               19 preset_low      CLRZ        QZ    => preset type on pin QZ, related pin=CLRZ (negative unate). same as 20 except it's on QZ pin
                               20 preset_high     CLRZ        Q    => preset type on pin Q, related pin=CLRZ (positive unate). Here, PREZ and CLRZ are both low, giving Q=0 (assuming CLRZ has higher priority). If CLRZ goes high, then Q goes high (due to PREZ being low)
                               21 clear_low       CLRZ        Q    => clear type on pin Q, related pin=CLRZ (positive unate)

 

5. misc report_* cmds: misc 100's of report_* cmds to report almost anything about design. Look in DC or PT manual for syntax of these cmds. Few imp ones:

  • report_timing => most useful cmd. It's explained under it's own section "PT report_timing cmd"
  • report_activity* => reports various activity
  • report_

6. attribute related cmds: These are report_attribute, get_attribute, etc. These are explained in "SDC" section.

more cmds:

 


 

PT path timing cmds:

Most important info that we get from running a timing tool on the design is to find out if all the paths in design meet timing requirement. There are diferent types of timing that needs to be met for each path. i.e setup, hold, recovery/removal, etc. We have 2 kinds of cmds to show us the timing paths. We saw under "PT: object access functions" section that get_* and report_* are 2 kinds of cmds that allow us to access and report objects. For timing paths, we have those 2 cmds available:

1. report_timing cmd: This is for reporting path timing. This is for visual reporting, and can't be used in scripts etc.

2. get_timing cmd: This is for getting the timing path as a collection of objects, which can then be sued inside a script to get parameters of interest.

report_timing => most powerful cmd to see details of a timing path. This is the cmd that you will be using the most in any timing tool to debug paths and check timing. So, it's best to see this cmd in a separate section of it's own in detail. Max 2M paths are shown (If you specify a value >2M in any of the options below, it will error out or won't be honored).

Syntax: report_timing <options> <path_collection>

Options: There are lots of options available. See PT manual for full syntax. Here's the important ones.

  • -from/-to/-through: Options here same as in timing exception cmds (see that section) where we specify start/end point (pins, ports, nets, seq cells or clocks) of the path.  Can also specify direction as -rise_from, -fall_from, -rise_to, -fall_to, -rise_through, -fall_through.
    • If multiple objects are specified in -through option, then all paths passing thru any of the objects are reported. Ex: report_timing -from A1 -through {B1 B2} -through C1 -to D1 => this cmd gets a path that starts at A1, then passes through either B1 or B2, then passes through C1, and ends at D1.
  • -path_type < summary | end | short | full | full_clock | full_clock_expanded > => summary shows only startpoint, endpoint and slack. Useful for quick overview of path. full_* options are same as those show for -path above.
    • This old format is deprecated: -path < full | full_clock | full_clock_expanded > => default is full (meaning full clock paths are not shown).  use full_clock_expanded to see full clk path (including generated clk point). -full_clock and -full_clock_expanded are different only when clk is a generated clock.-path is deprecated, so use -path_type shown below.
  • -delay_type min|max => min=hold, max=setup (min_max reports both min and max timing). -delay (used previously) looks deprecated as of PT 2018, so use -delay_type.
    • Default is to show "max" (i.e setup) paths if no -delay_type option provided. So, if we want to see hold paths, we have to specify "-delay_type min" for hold paths (and "-delay_type max" for setup paths). Use -delay_type min_max to show both setup and hold arcs in same report. Options -max/min_rise/fall can be used to report path which are only rising/falling at data path endpoint.
  • -nets -cap(or -capacitance) -tran(or -transition_time)=> to show these values in timing reports. -voltage shows voltage (when we have Multi Voltage design, it's useful). -derate, -variation are used to report when we have derate/variations applied to PT runs.
  • -input_pins / -include_hierarchical_pins => Option -input_pins shows input pins in the path report. By default, the report shows only output pins. this is sometimes helpful as we want to know the i/p pin as well as o/p pin of a gate to check the arc in .lib. -include_hierarchical_pins shows hier pins crossed, as well as all leaf pins in the path (these show as 0 incremental delay).
  • -sort_by group|slack => By default, paths are sorted by group, and within each group, by slack (i.e if multiple groups present, all paths for grp 1 will be shown first, then paths for group 2 and so on). 
  • -nworst : number of paths to report per endpoint (default=1). We need to use this option when we do report_timing to a particular end point, and want to see all failing paths thru it. We usually set -nworst to something large so that we can all worst violating paths, irrespective of whether they are to the same endpoint or not. Otherwise we'll only see 1 failing path to each endpoint (even if they come from many different startpoints).
  • -max_paths : number of paths to report per path group. (default=1)
  • -start_end_pair => by default, only 1 worst path is reported (as max_paths=1 by default). However, this reports single worst path for each startpoint-endpoint pair. This option can result in large num of timing paths reported, so use -slack_lesser_than to limit paths. We can't use this option with -nworst, -max-paths, -uinque_pins or slack_greater_than (as start_end_pair option is meant to be used when we just want to do report_timing and don't want to clobber the report with similar paths).
  • -unique_pins => This reports only the single worst timing path thru any seq of pins. As an ex, we might have the same seq of pins repeated for diff paths, as rise/fall may change for different pins, and so multiple such paths are possible (I think only 2 paths are possible for unique start and end point, one with flop o/p rise at start and other with flop o/p fall at start. All other directions for rise/fall are determined based on gate polarity). This avoids displaying such duplicate paths which are different just in the unateness. This option is especially useful when we have non-unate logic as XOR gate, since it can result in large number of rise/fall combinations causing our report_timing to show 100's of paths (since XOR gate o/p depends on the value of other pins too, it may serve as inverting or non inverting so we may have to consider all possible cases).
  • -false => usually we provide list of false paths to PT to remove those paths from timing consideration. By using this option, we can also have PT automatically detect false paths based on the logic configuration of the design. This option is rarely used, as we don't want to rely on the tool to figure out FP for us.
  • -exceptions all => reports timing exception that applies to that path (to see why a certain path is unconstrained etc). Exception may be because of false_path, multicycle path or min/max delay. This is to helpful to use, when you have unconstrained endpoints in design. You can use this option on that path to see why is the path unconstrained. Other options are "dominant" and "overridden". We use option "all" as that shows us all exceptions that made this path unconstrained.
  • -slack_greater_than/-slack_lesser_than => used to show paths within particular slack range. Default is to show paths with slack < 0. To show paths with all slacks, use "-slack_lesser_than infinity"
  • -include_hier => reports timing with all hier included for a given net. So, if a given net is traversing 3 hier, we only see 1 line in reports. But by using this option, this one net will show up 3 times, with all 3 hier shown on separate lines. This is sometimes easier to debug as we may know the net name at a certain hier only. In such a case, printing all hier helps.
  • -group {*gating} => shows paths only for that path group. Use -group [get_path_groups *] to get reports for all path groups, where worst path for each path group is reported. If we don't use this option, then by default only the worst path in the whole design is shown. Usually we use it when we have multiple clock groups, and we want to see path associated with particular clock, but we don't know the start or end point.
  • -crosstalk_delta => reports annotated delta delay and delta transition time which are computed duringcrosstalk SI analysis. Using this option does not initiate crosstalk analysis.
  • -nosplit => this option is same as that used in other PT cmds, where it prevents line splitting. This way everything is printed in 1 line (by default, PT creates new line when text can't fit in 1 line, which becomes an issue for scripts that parse reports)
  • -normalized_slack => Sometimes we may want to report paths by normalized slack instead of raw slack. This option allows us to do that. To enable normalized slack analysis, set the timing_enable_normalized_slack variable to true before running timing analysis. The rationale behind normalized slack is this. Let's say we have few paths on clk running at 100Mhz and few other paths on clk at 1GHz. If both paths are failing by 0.1ns, then it's lot more expensive to fix 1Ghz path, as clk freq would need to increase by 10%. But for 100MHz path, we can just increase the freq by 1MHz and pass timing. So, instead of arranging paths by pure slack number, it's more beneficial to list them by slack which takes impact of clk freq. PT reports normalized slack = slack/allowed_path_delay.  Allowed path delay (aka normalized delay) is 1 cycle of clk for full cycle path, 0.5 for half cycle paths, n ycles for MCP, etc. normalized slack is a decimal from 0 to 1. So, 0.2 implies that clk freq will need to be increased by 20% to pass timing. 
  • -pba_mode < none | path | exhaustive > => specifies path based timing analysis modes. There are 2 kinds of timing analysis (Link: https://solvnet.synopsys.com/retrieve/012134.html):
    1. Graph based (GBA, default): looks at worst case i/p edge rate and load on a cell, and picks up the appropriate delay to use. Option <none> enables this mode.
      • Why do we even have GBA in STA tools? The answer is => for faster run times. Let's take an ex: if we have a nand gate, we should time both the paths from A1->ZN and B1->ZN with their own slew rate and delays. This is the correct thing to do. Then at o/p ZN, we will get 2 different values of slew and delay, depending on whether the path came from pin A1 or pin B1. Now let's assume that downstream of Nand gate, we have a buffer, which also has an arc from I->Z. Now pin I of this buffer can have 2 possible slew values depending on whether it came from A1 pin of Nand gate or B1 pin of Nand gate. Then o/p Z of buffer can also have 2 possible slew values and 2 possible delay values (corresponding to each slew). If the upstream nand gate was 100 i/p gate, then this buffer would have 100 possible slew/delay values (one for each path). If STA tools started storing slew/delay values for each upstream path for a given arc, then we would have 1000's of values stored per arc, and STA will take forever to run. To help STA run faster, we store only 1 slew and delay value for each arc (in reality, we store 2 values: 1 for min corner, and 1 for max corner). We stamp these min/max values for each arc in design. 1 slew value will give only 1 delay value, since delay of a timing arc is dependent on i/p slew and o/p load, both of which are fixed (i.e have only 1 value). In order to ensure that the design will time conservatively, we choose the worst case slew at the input pins of a gate, and stamp that worst slew on all i/p pins. That gives us a single o/p slew rate, that we use to propagate downstream. In ex above, we stamp these worst case min/max slew values on buffer/nand_gate timing arc. These worst slews are now propagated for all i/p pins of a gate, even though in silicon, these worst slews won't happen for all the paths. But that's the price we pay to speed up STA.
    2. Path based (PBA): This looks at all the paths in question, and figures out which path gives the worst timing. In this case, if path x has larger delay, but better edge rate than path y, which has smaller delay but much worse edge rate, PBA will analyze both paths separately, and pick the worst one to apply and move to the next cell. In Graph based, to analyze path x, it would have just picked the worst edge rate from path y (and stamped that on to the i/p pin of the cell for path x), and larger delay from path x (and use slew and delay to calculate delay thru the cell) even though this may not happen. Options -path and -exhaustive used for this. Option <path> performs path-based analysis on paths after they have been gathered by graph-based analysis. It's faster but worst case may not be reported. Option <exhaustive> performs an exhaustive path-based analysis to determine the truly worst-case paths in the design (after doing the recalc). This is the most accurate and most computation-intensive mode. You cannot use the exhaustive mode together with the -start_end_pair, -cover_design, or path_collection options. We always run with -path, and if that makes the path pass timing, then we are good. If that still fails, then we use exhaustive mode.
      • NOTE: For Cadence ETS, cmd for pba mode is "report_timing -retime path_slew_propagation"

Ex:

  • report_timing -from A/CP -to B/D -delay min => report timing for top failing path for hold.
  • report_timing -group [get_path_groups *] -path_type summary => This reports 1 path for each path group in STA with summary only. i.e it only shows SP, EP and Slack for top path in each path grp.

Path collection: This is used with options as -start/-end/-through. Here we specify start, thru or end points. As we saw in section on "PT timing exception cmd", we can specify cell names for start/end_point. Then, all paths from all pins of start cell to all pins of end cell are considered. We know that valid start points are CLK pin of launch flop and valid end points are Data pin of capture flop for a data to data path. But a cell has 4-5 pins as  Clk, Data, Set, Reset, Out pin Q etc. Since only of these pins is a vlaid start/end point for a given flop, PT warns that there are multiple invalid start/end points and it'll be ignoring those.
Ex: report_timing -from sync_reg -to tsd_latch => PT warns that of 5 pins in start_point of sync_reg (Dff10 has PREZ,CLK,D,Q,QZ pins), 4 are invalid start points. CLK of Dff10 is the only valid start point. PT also warns that of 4 pins in end_point of tsd_latch (LAT10 has SZ,RZ,Q,QZ pins), 2 are invalid end points. SZ/RZ of LAT10 are the only valid end points. For PT false paths, start point should always be CLK and endpoint should always be D.

IMP: Synthesis tools as DC and VDIO don't warn about invalid start/end points when cells are provided in collection list. They just ignore the constraint, if it doesn't conform to valid start/end point. VDIO/ETS reports may show startpoints as Q or D, but when false pathing, we should always write them as from CLK, or they might get ignored.

 

Correlate timing sown in report_timing with that in lib files: Sometimes, we want to check if the delay reported in report_timing cmd for a certain cell is what is expected based on the delay of that cell in the .lib file. To do this, we do "report_timing" for path of interest. Then we run following cmds

  • Get name of lib cell for cell of interest: use get_lib_cells cmd.
    • pt_shell> get_lib_cells -of [get_cells chip/gen/U134] => {"TSM_SS/OAI21_LVT"} => This is the lib cell used.
  • Get name of .lib file where that lib cell is: use get_attr cmd.
    • pt_shell> get_attr [get_lib TSM_SS] source_file_name => /home/.../TSM_SS.db => This is the name of the file where that lib cell resides.
  • Report pin to pin delay for the cell
    • pt_shell> report_delay_calculation -from chip/mod1/A1 -to chip/mod1/Z => this shows details of how cell delay was calculated (at given i/p transition and o/p load)

 

report_clock_timing => This cmd is specifically for showing detailed rpt on clks. It shows clock timing info summary, which lists max/min of skew, latency and transition time over given clk n/w.

  • report_clock_timing -type summary -clock [get_clocks *] => lists clock timing info summary, which lists max/min of skew, latency and transition time over given clk n/w.
  • report_clock_timing -type skew -setup -verbose -clock [get_clocks *] => This gives more detailed info about given clk attr (over here for skew). By default, the report displays the values of these attributes only at sink pins (that is, the clock pins of sequential devices) of the clock network. Use the -verbose option to display source-to-sink path traces.
  • report_constraints -min_period => This cmd also reports all min period viols. Pu this cmd in some PT section. FIXME

 


 

PT reporting style:

PT reports timing for clk and data path in 2 separate sections. 1st section "data_arrival_time" refers to data path from start point, while 2nd section "data_required_time" refers to clk path of end point. 1st section shows path from clk to data_out of seq element and then thru the combinational path all the way to data_in of next seq element, while 2nd section shows primarily the clk path of final seq element, ending at clk pin. Towards the end of 2nd section, it shows the final "data check setup time" inferred from .lib file for that cell.

Reports are shown for per stage. A stage consists of a cell together with its fan out net. So, transition time reported is at the i/p of next cell. delay shown is combined delay from i/p of cell to o/p of cell going thru the net to the i/p of next cell. & in report indicates parasitic data.

Ex: report_timing -from reg0 to reg1 => a typical path from one flop to other flop
Point Incr Path
------------------------------------------------------------------------------
clock clk_800k (rise edge) 1.00 1.00 => start point of 1st section
clock network delay (propagated) 3.41 4.41
.....
Imtr_b/itrip_latch_00/SZ (LAB10) 0.00 7.37 r
data arrival time 7.37
-----
clock clk_800k (rise edge) 101.00 101.00 => start point of 2nd section (usually starts at 1 clk cycle delay, 100 ns is the cycle time here)
clock network delay (propagated) 3.85 104.85
.....
data check setup time -0.04 105.76 => setup time implies wrt clk, data has to setup. So, we subtract setup time from .lib file to get data required time (as +ve setup time means data should come earlier)
data required time 105.76
------------------------------------------------------------------------------
data required time 105.76
data arrival time -7.37
------------------------------------------------------------------------------
slack (MET/VIOLATED) 98.39

 

Async paths in DC vs PT:

Async paths are paths ending in clear or preset pin instead of ending in D pin. These paths are covered differently than regular data-clock paths. DC (design Compiler from Synopsys used for synthesizing RTL) and PT treat these paths differently when reporting.

2 kinds of Async paths:

  • recovery/removal checks: PT performs this check by default, but DC neither analyzes nor opt these paths.
    • DC: To analyze and opt these paths in DC, use this: set enable_recovery_removal_arcs true
    • PT: To disable these paths in PT, use this: set timing_disable_recovery_removal_checks true
  • Timing paths thru asynchronous pins (i.e paths flowing thru set/reset pins to Q/QZ o/p pin of the flop and then setting up to clk of next flop as D pin, these are clear/preset arcs in .lib file) : by default neither PT nor DC report these paths.
    • DC: To report these paths in DC, use this: -enable_preset_clear_arcs (in report_timing cmd). Even if we have this option, it only allows us to view these paths during reporting, but DC never opt these paths.
    • PT: To report these paths in PT, use this: set timing_enable_preset_clear_arcs true (default is false)

NOTE: For recovery/removal paths, use Q of 1st flop as startpoint and CLRZ/PREZ of next flop as end point. For some reason, using CLK of 1st flop as startpoint doesn't work.

Latch based borrowing:

In latch based paths, borrowing occurs if data arrives in the transparency window. See PT doc (page 36-38). So startpoints may be from D, Q or CLK pins of Latch. CLK and Q startpoints are treated as starting from the clk of latch, while D is treated as starting from the D i/p of latch and going thru the latch to Q o/p pin of latch. Note, this behaviour is different when VDIO/ETS is used to report such paths. In VDIO/ETS path from startpoint D is still the same, but paths from CLK and Q startpoints are treated as worst case slack paths from D, CLK or Q.

to report timing for such paths, and see the latch borrowing, use -trace_latch_borrow (otherwise startpoint delay is shown for D, doesn't show the actual path to D)
Ex: report_timing -from ... -to ... -trace_latch_borrow

Ex: of time borrowing path
Point Incr Path
---------------------------------------------------------------
clock spi_stb_clk (rise edge) 55.00 55.00 => start point
...
data arrival time 56.55 => this is data path delay from 1st flop.
---
clock clk_latch_reg (rise edge) 1.00 1.00 => end point
...
Iregfile/tm_bank0_reg_9/C (LAH1B) 3.66 r => this is total delay to clk of latch
time borrowed from endpoint 52.89 56.55 => since data comes much later than rising edge of clk, we borrow the difference (56.55-3.66=52.89) from this latch, so that next path from latch o/p will consider that D->Q delay is 52.89ns.
data required time 56.55
---------------------------------------------------------------
data required time 56.55
data arrival time -56.55
---------------------------------------------------------------
slack (MET) 0.00

Time Borrowing Information
---------------------------------------------------
clk_latch_reg nominal pulse width 100.00 => this is width of clk pulse = clk_period/2
clock latency difference -0.36 => this is reduction of pulse width due to diffeerence in rising and falling edge
library setup time -0.26 => this is setup time of latch (wrt falling edge of clk) which needs to be subtracted, as max time available is reduced by this
---------------------------------------------------
max time borrow 99.37 => final max time available to borrow
actual time borrow 52.89 => actual time borrowed < max available. so timing met
---------------------------------------------------

 


 

get_timing_paths => A direct counterpart of report_timing is get_timing_paths. This cmd creates a collection of timing paths for custom reporting or other processing. All options are the same as report_timing cmd. The order in which paths are returned from get_timing_paths matches the reported timing path order of report _timing.

Syntax: get_timing_paths <options> <path_collection> => Same as in report_timing cmd.

NOTE: the collection returned above will always show only one path unless we use the option "-max_paths". By default, max_paths is 1 (as in report_timing), so only 1 path will be reported, and size of collection will always be 1. Even with "-max_paths" set, we still have to provide option "-slack_lesser_than infinity" to see all paths, otherwise only paths with -ve slack will be put into the collection.

Ex below:

  • pt_shell> set mypaths [get_timing_paths -nworst 4 -max_paths 20] => It gets the collection of paths returned by get_timing_paths (with 20 such paths) and sets "mypaths" var to that collection for later processing.
  • pt_shell> sizeof_collection $mypaths => returns 20
  • pt_shell> report_timing $mypaths => 20 path timing reports displayed

Ex: report_timing [filter_collection [get_timing_paths -max 10000 -slack_lesser_than -10 -slack_greater_than -25] "dominant_exception == min_max_delay"] -path_type summary -max_paths 10 => Here we passed get_timing collection to report_timing to report timing for selected paths only. 

To get info about paths (i.e start/end points, slack, etc) we have to use get_attributes cmd. But for that we need to know what all attr are available for paths. We can use our familiar list_attributes cmd (explained in PT cmds section). define_user_attribute and set_user_attribute can be used to define our own attr and set it on paths of interest.

list_attributes -application -class timing_path => Most useful attr for paths are startpoint, startpoint_clock, endpoint, endpoint_clock, slack (slack of path), points (points correspond to pins or port along the path), arrival (arrival time of signal at a point in path)

Most widely used attr of a timing path is the "points" collection. A point corresponds to a pin or port along the path. Iterate through these points and collect attributes on them.

Ex: report arrival time at each cell in the specified path. The path has a launch flop, and nand2 gate and a capture flop. For simplicity, we just assumed one path. If there are multiple such paths, outer foreach loop will iterate thru all such paths.

foreach_in_collection paths [get_timing_paths -from A/flop1 -to B/flop2 -max_paths 100 -slack_lesser_than infinity] {

set path_points [get_attribute $paths points]
set startpoint [get_attribute $paths startpoint]
set endpoint [get_attribute $paths endpoint]

set slack [get_attribute $paths slack]

puts "startpoint -> [get_attribute $startpoint full_name]"

        foreach_in_collection point ${path_points} {
             set arrival [get_attribute $point arrival]
            puts " point -->  [get_object_name [get_attribute $point object]] arrival -> $arrival"
        }

puts "endpoint -> [get_attribute $endpoint full_name]"
puts "slack -> $slack"
}

Output from above code => It iterates thru the given path and reports arrival time at each pin or port aong the path.

sartpoint -> A/flop1/CP

point -->  A/flop1/CP arrival -> 0.00
point -->  A/flop1/Q arrival -> 0.22

point -->  A/I_nand2/A arrival -> 0.22

point -->  A/I_nand2/Z arrival -> 0.29
point -->  B/flop2/D arrival -> 0.29

enpoint -> B/flop2/D

slack -> -0.11

 ----------------

Instead of going thru the loop for all points in the path, we can also get attr directly for all points on the path.

ex:

  • get_attribute [get_timing_paths -from x1 -to x2] launch_clock_paths => gets the lauch clk path, i.e from launch clk to the clk pin of the launching flop
  • set my_points [get_attribute [get_attribute [get_timing_paths -from x1 -to x2] launch_clock_paths] points] => Here we get all the points on the launch clk path shown above.
  • set my_obj [get_attribute [get_attribute [get_attribute [get_timing_paths -from x1 -to x2] launch_clock_paths] points] object] OR set my_obj [get_attribute $my_points object]=> Here we get "object" attr of all points on the path. We need to get object attr in order to get object names
  • get_object_name $my_obj => This shows names of all points, i.e CLK_PAD I_BUF1 I_AND2 ... I_FLOP/CLK (the clk launch path starts from clk pad pin and ends at clk pin of the flop)

 


 

Reporting logic depth of any path:

The above cmd can be used to find logic depth of any path in PT. Solvnet already has a proc here: https://solvnetplus.synopsys.com/s/article/Find-the-Logic-Depth-of-a-Timing-Path-in-PrimeTime-1576092703524

The proc is to be used in PT, but a similar proc is there for Synthesis tool too (in the same link above). It goes thru the timing path, getting all the obj in the timing path. Then it gets all the input pins of such obj (it should report only 1 i/p pin per gate in the path, assuming 2 i/p pins are not shorted to same signal). Then it gets cells of all such pins, and reports them. You can exclude buf/Inv by using -exclude_unary (then it counts only those cells which don't have 2 pins, since inv/buf are the only gates which have 2 pins).

Put this proc in a file named get_logic_depth.tcl, source that file and run cmd as below

pt_shell> get_logic_depth [get_timing_paths -slack_lesser_than 0] => reports "15" as the logic depth for the top path with slack < 0. Only 1 path is reported for get_timing_paths, so only top paths logic depth reported.

proc get_logic_depth {my_path {exclude_unary ""} } {
  set my_cells  [get_cells -quiet -of \
                [get_pins -quiet \
                [get_attr -quiet \
                [get_attr $my_path points] object] \
                -f "pin_direction==in"] \
                -f "is_combinational==true && defined (lib_cell)"]
  if {$exclude_unary == "-exclude_unary"} {
        set my_cells [filter_collection $my_cells "number_of_pins!=2"] }
        return [sizeof_collection $my_cells]
}

define_proc_attributes get_logic_depth -info "Find Logic Depth of a Timing Path" \
  -define_args {
      {path "A single path collection" path list required} \
      {-exclude_unary "Exclude Buffers/Inverters along the path" \
      "\b" string optional}
}