Delay thru the Transistor:

A very important metric in digital design is the delay thru any gate. This determines the speed of the chip, since less the delay thru a transistor, faster is the gate, and less is the time it takes for a signal to go from one flop to next flop, resulting in a chip that could run faster.

In previous section on "solid state devices", we saw the eqn for Transistor current. Since this current determines the delay thru a transistor, any change in of these parameters of the eqn could cause a change in current and hence a change in delay. These are the 3 input conditions that could affect transistor delay:

  1. Process: Any change in process parameters µ, Cox, W, L or VTH could cause a change in delay thru transistor. These 5 process parameters vary depending on the fabrication process used. The first 4 process parameters affect delay thru transistor linearly, while Threshold voltage affects the delay as square and so has a more pronounced effect. A "fast" or "hot" process corner is one where these parameters change in a way, that makes the transistor run faster. Converse is true for a "slow" or "cold" process corner. Fabs can usually target their process to their customer's needs.
  2. Voltage: Any change in the voltage applied to terminals of transistor could cause a change in delay. Here we show Vgs only, but Vds could cause a change. Both of these voltage values are eventually dependent on supply Voltage (VDD), so the supply voltage at the transistor terminals could impact delay thru it. Highre the voltage, higher the transistor current and lower the delay.
  3. Temperature: From above eqn, it's not apparent that Temperature could cause any change in current. But if we look carefully, we notice that some process parameters are actually dependent on Temperature. 2 of such params are Mobility (µ) and Threshold Voltage (VTH).
    1. Mobility (µ): Mobility of electrons or holes is determined by how fast they are able to move thru any medium. As we saw in Resistance and Capacitance section, it's change in avg speed with change in Electric field. Recall that mobility of any charged particle is q*t/m, so as charge travels more before colliding with anything, it gets to a higher speed, and hence it's mobility is higher. In a lattice structure of a compound or element, how far these electrons or holes travel depends on the lattice structure and size of atoms around these moving electrons/holes. In general, as Temp increases, these electrons/holes get more energy and are more agitated. So, they travel with faster speed, but also hit the lattice structure more often. The final effect is that mobility decreases with higher Temperature. Current decreases with lower mobility. So, transistor gets slower with higher temperature.
    2. Threshold Voltage (VTH): Threshold Voltage of a transistor was explained in "solid state device" section. It's basically the barrier that electrons/holes in conduction band have to clear. As temp increases, more of these electron/hole pairs get in conduction band, due to higher energy. This allows more electron/hole to cross the hump, resulting in higher current, or effectively lower Threshold Voltage. In general, as Temp increases, threshold voltage gets lower as  carrier concentration increases. Current increases with lower VTH as current has square dependency on gate overdrive voltage. So, transistor gets faster with higher temperature.
    3. Net Effect: So, we see that these 2 effects have opposing effects with increase in Temp, where mobility causes transistors to slow down, while Threshold Voltage causes transistors to speed up. Net Effect is hard to gauge w/o knowing the exact relation of these 2 factors with Temp. In the past for designs 180nm and above, it used to be that increasing temperature used to make transistors slow, meaning mobility won over Threshold Voltage. We can see from transistor Current eqn, that if VDD is lot more than VTH , then even with square dependence, the effect of  VTH change will be muted. As an ex, consider case where  VTH is 10% of VDD. Now as Temp goes up, Vth will come down. A 20% reduction in  VTH will cause change of ( (VDD-0.1VDD)/(VDD-0.12VDD))^2 = (0.9/0.88)^2 = 1.04 or 4% increase in Drive current. This is assuming VTH is going down linearly with inc in Temp. However, a 20% reduction in µ will cause a 20% decrease in drive current (assuming µ is going down linearly with inc in Temp) over same Temp increase. So, net effect will be that transistor will get slower by 15% as Temp increases in that range. This was always the expected behaviour.
      • Temperature Inversion: We saw above that Inc in Temp resulted in slowdown of transistor for 180nm and above. However, with sub 180nm design, the trend started inverting, and transistors started running faster at higher temperatures, especially at lower voltages. This was due to the fact that VDD came down significantly with scaling of transistors, but VTH came down only a little. So, now Vth was about 50% of VDD. With increasing Temp, a 20% reduction in  VTH will cause change of ( (VDD-0.5VDD)/(VDD-0.6VDD))^2 = (0.5/0.4)^2 = 1.56 or 56% increase in Drive current. This is assuming VTH is going down linearly with inc in Temp. However, a 20% reduction in µ will still cause the same 20% decrease in drive current (assuming µ is going down linearly with inc in Temp) over same Temp increase. So, net effect will be that transistor will get faster by 30% as Temp increases in that range.  This phenomenon of transistors getting faster at higher temp was an anamoly and came to be known as temperature inversion. 

Delay thru R and C:

Above 3 conditions not only affect the delay thru a transistor, but also affects the delay thru wires which have resistance and capacitance in them. Thus we have to consider the effect of PVT on Resistance (R) and Capacitance (C) too. When process is making a transistor weaker, there's no rule that says that R, C will get slower too (i.e more resistance and higher capacitance). We'll have to look at equations for R, C to see their dependency on process, Voltage and Temperature.

  1. Process: Process impacts R, C both ways, however it's precise correlation with transistor is hard to gauge. We usually get a range of R, C and use those limits to bound the box for R, C. Note that R and C usually move in opposite direction. For ex, a process that increases R because it's making the wires thinner will decrease C as wires will have more distance between them. So, the product R, C may not change much across process variations.
    1. With lower nm tech, variations in metals/vias R,C are significant. There are also a lot more R,C process corners than just Rmin,Cmin and Rmax,Cmax. With 2 or more masks on same metal layer (in FinFets <16nm), the variations are even more pronounced as the 2 masks may shift on the same metal layer, causing more variations. Most of the times, it's not possible to run timing tools for all R,C corners. So, we just pick few R,C corners and then apply a BEOL margin to account for other corners which we may not have run, but may show worse performance. This margin is only applied for hold timing, as hold is more critical (failing to meet hold timing will result in chip not working).
  2. Voltage: Voltage has negligible impact on R, C to first order. Need to have an equation FIXME ?
  3. Temperature: Resistance increases with Temperature. However, capacitance doesn't have a clear relation with Temperature and will go up or down depending on dielectrics involved. Need to find more about C Vs T ? FIXME ?

Final Delay through a path involving Transistors and Wires:

Final delay thru a path depends on P, V, T. For "weak" P, transistors get weak, as well as R,C get weak too. We don't mention R,C separately, as it's assumed that N (normal) process means typical transistor, typical R, and typical C. However, in reality we may want to consider variants, where for a Strong process, transistor may be strong, but R, C may not be as strong.

PVT ranges:

The 3 PVT inputs that affect delay of circuits are very important in determining proper functioning of circuits. In digital circuits, they are used to check if all the paths in digital circuits meet timing. We run timing tools on our design to make sure our design meets timing. We check timing at various PVT corners. More details are in STA section.

We run timing at extreme PVT corners that our design can possibly be exposed to. We also have typical corner that the design is supposed to be exposed when being in a typical environment, but usually we don't run STA on this typical corner. Let's see the range of these PVT corners:

Process: For process we define a fast process corner and a slow process corner. fast process corner is where all transistors are supposed to be running faster, while slow corner is one where all transistors are supposed to be running slower. However how fast is fast corner really? For that we use a metric called 3 sigma variation. We draw a plot of all transistors across various dies, with current on X axis and number of transistors on Y axis. This gives us a gaussian plot. From this plot, we take 3 sigma variation from mean. The -3 sigma point gives us slow corner, while +3 sigma point gives us fast corner. 99.7% of the transistors lie within -3 sigma to +3 sigma range. So, we are willing to sacrifice the remaining o.3% of the chips if they don't work in real silicon. Since we have both PMOS and NMOS, we define fast and slow for PMOS and NMOS separately. So, we have 4 combinations:

  1. fast fast (FF): This is the corner where both NMOS and PMOS are fast
  2. slow slow (SS): This is the corner where both NMOS and PMOS are slow
  3. fast slow (FS): This is the corner where NMOS is fast but PMOS is slow. This doesn't really happen in real silicon by itself, though it's sometimes done on purpose.
  4. slow fast (SF): This is the corner where NMOS is slow but PMOS is fast. This doesn't really happen in real silicon by itself, though it's sometimes done on purpose.

Voltage: When we run STA at a certain voltage, we always mean the voltage at the transistor pins. It's not the voltage at chip pins. For smaller chips or ones which don't draw a whole lot of current for digital block, the difference is voltage b/w chip pins and tarnsistor pins is not much and can be ignored. However for digital SOC which have billions of transistors and run at 1V or below, the voltage difference can be substantial. We usually run some sims to figure out voltage at transistor pins. Once we know the voltage at transistor pins, we apply a some margin for PMU voltage overshoot and undershoot. Chip pins are usually driven by a PMU, whose all job is to keep the voltage fixed at specified level. Even then we account for some voltage overshoot/undershoot. As a rule of thumb, we apply +/-10% voltage overshoot and undershoot for chips that have a small digital core having less than a million transistors, and running at > 1V. This +/-10% also accounts for the IR drop that may occur on chip. This 10% rule of thumb is true only for small digital cores. For large digital SOCs, we run more detailed simulations.

Temperature: For temperature, we usually consider a range of -40C to +150C depending on what kind of temperature extremes we think the chip may be exposed to. The ambient temperature (temperature of environment) may not go to such extremes but the temperature of the transistor itself may go to large extremes. -40C to +150C provides us enough buffer for such temperature extremes. -25C to +85C is other temperature range that's seen in smaller chips, which aren't consuming too much power (i.e embedded chips), so a smaller range suffices for those. Lower temperatures are limited to ambient temp, as temp of chip can't go below ambient Temp (as chips will usually generate heat). But for higher temperatures, we go much higher than ambient Temps. That guarantees that nothing will break on the chip at higher Temps. . Of course, for people living in very cold climates, there's no guarantee that the chip will work :(

PVT Corners: We define 3 PVT corners.

1. typ: This is the TYP corner, where PVT is at it's typical value. So, Process = TT which means NMOS and PMOS are at their typical process value (i.e typical speed), Voltage = Typical voltage that the design is supposed to run at and Temperature = typical room temperature which is taken as 27C. Here we take R, C at their typical values, even though we know that if NMOS/PMOS are at their typ values, R,C may not be necessarily at their typ values.

2. min: This is the MIN delay corner where transistors are supposed to be at their minimum delay (i.e fastest). So, Process = FF which means NMOS and PMOS are at their fast process value (i.e fast speed), Voltage = Maximum voltage that the design is supposed to be exposed to (maximum PMU voltage overshoot) and Temperature = lowest temperature which is taken as -40C. However for lower nm nodes (<180nm) operating at very low voltages (< 1V), Temperature inversion may occur. Since min corner is run at highest voltage, it's possible that temperature inversion may not occur at higher voltages, so lowest temp may still still be ok for getting min delay. However, the behaviour may be different for different Vth transistors, so some paths may have min delay at some temp, while others may have at some other temp (depending on the High Vth and Low Vth mix of cells in the path).  So, a set of temperatures should be used at the highest voltage to make sure that all possible extremes of min delay are captured. Here we take R, C at their min values (even though R,C may not be at their min necessarily)

3. max: This is the MAX delay corner where transistors are supposed to be at their minimum delay (i.e fastest). This is just the opposite of MIN corner. So, Process = SS which means NMOS and PMOS are at their slow process value (i.e slow speed), Voltage = Minimum voltage that the design is supposed to be exposed to (maximum PMU voltage undershoot) and Temperature = highest temperature which is taken as +150C. Again there is this temperature inversion and voltage dependency problem as discussed above. Since we are at lowest voltage, Temp inversion is very likely to happen at low voltages, so lowest temp should be used here. So, for both min and max delay corner, we use the lowest Temp corner. Here we take R, C at their max values (even though R,C may not be at their max necessarily)

Temperature turned out to be not so straight forward, at low nm tech. With further scaling to <14nm, the trend with Temp inversion gets even more hotch-potch where depending on the voltage and the Vth of trnasistors (high Vth or low Vth), transistors got fast or slow at lower temperatures. So, now there is no clear trend on what temperatures to use. Best is to run max and min delay corner across a set of temperatures.

 

Global variation vs local variation:

When we talked about PVT corners above, we assume that same PVT corner applies to all transistors on a single die. For a different die, different PVT corner would apply. Assumption is that across multiple wafers and multiple dies on each wafer, all dies would be bounded by the max and min PVT corners. So, when we run STA at the max and min corners, we have kind of guaranteed that timing will be met for all these dies, no matter what the process, voltage or temperature be. So, if Process is fast, Voltage is low, and Temperature is high, this particular PVT point is bounded by our our max and min PVT corner, and so will pass timing as long as max and min timing are passing.

However, a question that immediately comes to mind is what about the PVT variations across multiple transistors within a die. For ex, on a given die, not all transistors will be fast-fast at same speed. They will have local variations, and some transistors will be slower than that "fast" corner, while some might be even faster. Similarly for voltage, not all transistors on the same die will see exactly the same voltage. Some transistors may see a little higher voltage while some others might see a little lower voltage depending on IR  drop. The same goes for temperature. Since temperature of a transistor is heavily affected by it's surroundings, it's possible that some transistors which are ON most of the time and running at high frequency may see a higher temperature than some other transistors which are OFF most of the times. This will affect delay of transistors differently and depending on the path, the timing will need to be re calculated with these more precise values of PVT. This is called on chip variation (OCV). This will be discussed in "OCV section".

What if we don't want to deal with OCV, since we have no clue on how to measure these PVT variations within a die. In that case we could use "max corner" for 1 path and min corner for other path on the same die. This guarantees that our chip will meet timing no matter what. However this is way pessimistic that what a real silicon would see. So, we end up unnecessarily putting a lot of margin in design which wastes area and power. We'll study about all of these in "OCV section", which is the next one.

VLSI Introduction:

VLSI: Very Large Scale Integration. This is the field of Electrical/Electronic Engineering that deals with the science of designing circuits and building them on chips.

There are many preliminary courses that you will need to take, before you can start designing chips.

Circuits are built using passive components: R, L, C.

On top of above 3, we have an active component known as "transistor" that brought about all the revolution in electronics. Transistor is basically an "electronic switch" that you can turn ON or OFF using voltage signal (by contrast, a manual switch at home requires physical force to turn it on or off). When we talk about VLSI or solid state, we are almost exclusively talking about transistors. Transistors are what made all modern chips possible, so it's one of the greatest invention that gave us all modern electronics today.

VLSI History:

Since 1950's, all these passive elements used to be stand alone devices. They were big, and to make any circuit out of them required considerable space. Then came active element, transistors. Transistors used to be stand alone devices too, similar to Resistors and capacitors. However, researchers started looking into ways of making these transistors smaller and getting a lot of them to be etched out on a single base. Silicon became a compound of choice. This gave birth to LSI in 1950's.  With advent of LSI, we started building transistors on Silicon wafers and integrating a lot of them on the same silicon wafer. This greatly reduced the size of each transistors and allowed thousands of transistors to be put together close to each other connected by miniature wires which were themselves etched on silicon. The number of transistors being etched on a single wafer kept on increasing which gave birth to VLSI. A very good history of VLSI and how small the dimensions of these transistor were in each technology can be found on this link:

https://en.wikichip.org/wiki/technology_node

FEOL vs BEOL:

FEOL (Front End of Line) refers to steps associated with transistor fabrication. Transistor fabrication on silicon involves 10-50 steps for modern CMOS tech. It needs the most advanced and cutting edge tools and tech to build the smallest transistors. It takes 30 days or more to complete all fab steps associated with a transistor and get it out. Photomasks for transistors are also the most expensive ones, as they need to have very high accuracy for the small size of the transistors.

BEOL (Back End of Line) refers to steps associated after the transistors layers are done. This is where interconnects to transistors are built using metal layers. Fabricating these metal layers isn't as complex as fabricating the transistors. Once transistor layers are deposited properly and transistors are functioning, building metal layers on top of it goes faster. It's 2 masks for each metal layer (one for horizontal connection and other for vertical cuts to connect one layer to another). These masks are cheaper as the width of metal layers isn't as small as transistor gate length. Each metal layer takes a day to go thru the fab, so 10 metal layers will take about 10 days. As FEOL dimensions shrink, BEOL dimensions also need to shrink, so that the overall gain in density can be achieved.

Full node (FN) vs Half node (HN):

Transistors size can get reduced by any amount from one node to other. However, it would be very expensive to introduce a new node just for little bit of size reduction. Reason is that size reduction of transistors usually imply that associated tools used in Fab have to be changed to design with lower transistor size, which is very costly. A rule that has been followed in Semiconductor fabrication is that a transistor reduction that would give 2X the transistors density is worth the cost. The lower size of chip due to 2X the density implies chip cost has been cut into half, which is able to absorb the extra cost of fab retooling. To get to 2X density, both X and Y dimension of chip has to decrease by 0.7X (since 0.7X * 0.7y = 0.5xy => half the size of original chip). So, we not only reduce the length of transistor by 0.7X, but we also decrease the width of transistor by 0.7X. This is known as a full node.So going from 1um node to 0.um node is a full node transition which is very costly. Full node transistion has been happening every 2 years, implying transistor density is doubling every 2 years. This is also known as Gordon Moore's Law, who famously predicted this in 1965. However, companied don't want to sit idle for 2 full years, without showing any improvement. To address this, these chip companies introduced a half node. Half node is a 10% reduction in size of transistors in fab without changing the tools completely. Design was done in such a way that existing design could be used, and reduction in size was done entirely in fab. This gave some incremental improvement without redesigning the circuit or re-tooling the fab. This came to be known as half node and came in between 2 full nodes. So, a half node would give 20% (since 0.9X * 0.9y = 0.8xy) reduction in IC size, while full node would give 50% reduction in size.

Every 4 years, we would get transistor length reduction to be 1/2 of it's current one (since every 2 years, it goes down by 0.7X, so every 4 years it's 0.7*0.7=0.49 or almost half). That is why FN tech goes like this 1um -> 500nm ->  250nm -> 130nm -> 65nm -> 32nm -> 16nm -> 7nm -> 3nm, and so on.

Tech Node:

A technology node of certian um or nm usually refers to the smallest dimension that can be etched out on the silicon. Usually it's the gate length of the transistor that has the smallest dimension. Also, the gate length of a transistor has inverse relationship to the performance, as shorter transistor length implies a higher current, and hence faster speed. Since the transistors were invented, tech nodes referred to transistor length (so a 2um node meant that the transistors on this node have 2um gate length with slight variations. You can't etch out a gate with length < 2um on this node). Along with this we also got a 2X increase in density with every full node. But as we got to smaller transistor lengths, it was observed that just reducing transistor gate length didn't guarantee a 2X increase in density. It was limited by how much closer you could place transistors to each other. If you couldn't scale that distance to half every 2 years, you wouldn't get to 2X scaling. So, 2X density improvement became the new definition for defining a full node. The nm or um number for tech node that used to refer to gate length didn't necessarily refer to gate length anymore, though it's close. Some people starting calling the half pitch as the better  definition for a tech node. Pitch is defined as the distance between 2 adjacent gates. Min pitch refers to the closest you can get 2 gates to each other while still having space to make contacts to both the gates and the active source/drain regions. Half pitch is half of this distance, and turns out that half pitch is a very relevant number when talking about density improvement. So, a lot of later tech nodes since 2000 use their tech node "nm" to refer to the "half pitch" or "gate length" or to something smaller than both of these. That "nm" number is more of a marketing ploy now. So, keep that in mind when going thru the tech nodes below.

Tech node timeframe is as below. I'm showing full node process only.

  • 50um - 10um => 50um was the first process developed in mid 1960's for building transistors on wafers. The typical wafer size (diameter) was < 1 inch (only 22mm). 50um is the typical thickness of human hair (100um is 1/10th of a mm), so transistors of this size could possibly be seen by naked human eyes (though microscope will be necessary as these 50um lines will be very close to each other and hence difficult to distinguish). Going from 50um to 10um, wafer size increased to 2 inch. 10um was being developed actively during early 1970's. Intel' 8008 was developed on 10um tech.
  • 10um - 1um => From mid 1970's to late 1980's, transistor size kept on decreasing, while wafer size kept on increasing (from 1 inch all the way to 6 inch or 150mm), resulting in even greater transistors per wafer. Bulk CMOS tech was being used with voltages at 5V. Only one metal layer was being used for interconnect, though 2 metal layers started getting used for 2um or below.. Intel's 8086 series was being developed during this time using 1 tech nodes close to 1um.
  • 1 um => introduced in late 1980's. Intel's 80386 and 80486 were based off this. 1um tech was big step, as transistors of this size were considered infeasible just a decade or two back.
  • 700 nm => Introduced in early 1990's, it was a full node followup to 1um tech. 3 metal layers were being used here. Intel's Pentium Pro was built on this process node.
  • 500 nm => Commercial ICs started getting produced using 0.5um tech in 1993. It was called a half micron process. 4 metal layers of Al-Cu (Aluminum Copper) started getting used. Oxide thickness was reduced to about 10nm. The process typically had a Threshold voltage of 0.5V and a supply voltage of 3.3 V.
  • 350 nm => Commercial production using 350nm started in late 1995. Number of metal layers went to 5 with oxide thickness further reduced to 6nm. Intel's Pentium and Pentium II were built on this.
  • 250 nm => Also known as "quarter micron" process,  Intel along with other leading semiC companies entered 0.25um process in 1997. Intel's process used 200 mm wafers, SiO2 dielectric and polysilicon electrodes. It used Aluminum inter-connects. Intel also made a smaller chip using 5% shrink to original design rules which used. Gate pitch and interconnect pitch was about 500nm-700nm. 
  • 180 nm => This was introduced in 1999 by Intel, TI, IBM and TSMC. Number of metal layers went to 7. Gate pitch and interconnect pitch was about 450nm-500nm. 
  • 130 nm => This was introduced in 2001 by Intel, TI, IBM and TSMC. Number of metal layers went to 8. Gate pitch and interconnect pitch was about 350nm. SOI process instead of Bulk started getting used at AMD, IBM, etc which was basically silicon on insulator which allowed the body to float instead of being tied to power supply as in bulk tech.
  • 90 nm => This was introduced in 2003. Gate pitch and interconnect pitch was about 250nm. At 90nm, 300 mm (12 inch) wafers started getting used, which was a big step from 200mm or 8 inch wafers that were being used before than.
  • 65 nm => This was introduced in 2006. Gate pitch and interconnect pitch was about 200nm. 
  • 45 nm => Commercial manufacturing using 45 nm process began in 2007. Intel's 45 nm process was the first time high-k + metal gate transistors was used in high-volume manufacturing process. Before this, poly was being used for gate, which had very high resistivity. Gate pitch and interconnect pitch was about 160nm-180nm. 
  • 32nm => 32nm manufacturing began in 2010. Metal layers went to 9 to 11 layers. 193nm Immersion Lithography was being used for 32nm. Gate length was 30nm, even though node was 32nm. Gate pitch and interconnect pitch was about 100nm-130nm. Supply voltages came down to 1V or below for the first time. 28nm was a half node introduced a year later, and was a stop gap b/w 32nm and 22nm.
  • 22 nm => 22nm manufacturing for processors began in 2012, although memories were being built on this node since 2008. Until 22nm, we were using planar transistors that were conventional transistors that had been used since CMOS tech came into existence. However, it was becoming more difficult to scale planar transistors as sizes kept on shrinking. Companies were looking for alternatives. Fin based 3D transistors were actively being researched and showed promise. The 22 nm became Intel's first generation of Tri-gate FinFET transistors and the first such transistor on the market. Other companies didn't jump to FinFET bandwagon yet and continued with planar transistors. Intel's core i3, i5 and i7 were built on this new tech. Gate pitch and interconnect pitch was about 80nm-100nm. Supply voltages came further down to 0.7V-0.8V.  20nm was a half node introduced in 2014 followed by a 16nm Full Node in late 2015. 20nm was still in Planar technology while 16nm moved to FinFet.
  • 16nm / 14nm => 16nm was the first time industry moved away completely from Planar transistor to FinFet transistors. Things got more confusing as "nm" no longer referred to gate length, and different companies started adopting different naming convention as per their choice. There is also confusion as to whether 16nm or 14 nm is a full node. Based on Maths, looks like 15nm should be full node. Some companies went with 16nm as their full node, while others went with 14nm. Manufacturing using 16nm/14nm began in 2014/2015. Both 14nm and 16nm were still based on 193nm Immersion Lithography. Supply voltages were same around 0.7V.
    • 16nm: TSMC introduced their first 16nm FinFet process known as 16FF, followed by later revisions as shown below. Gate length was 34 nm (not 16nm), with Fin pitch at 48nm and Gate pitch at 90nm.
      • 16FF => 16nm FinFet process. It used the same 20nm BEOL.
      • 16FF+ => Improved 16nm FF process to give 10-15% perf improvement
      • 16FFC => 16nm FinFet Compact was the refined version of earlier process which reduced cost by using less masks, and used half the power.
    • 14nm: Intel introduced their 14nm process as P1272/P1273. Samsung introduced 14LPE (Low Power Early), 14LPP (Low Power Performance), 14LPC and 14LPU. IBM (Fabs were sold to Global Foundary in 2014) 14HP process started manufacturing a bit later in 2017. UMC also started mass manufacturing of their 14nm process in 2017.  All these had smaller gate length varying from 20nm-30nm (nowhere close to 14nm). Fin pitch was 42nm, while gate pitch was at 70nm-80nm. Minimum metal pitch was 50nm-60nm. Fin width was 8nm, with Fin height at 40nm. Intel 14nm process had further refinements with 14nm+, 14nm++ which yielded up to 50% less power and 30%-40% higher drive current. Intel's 14nm process was the densest, with 1.5X raw logic density when compared to other leading Fabs.
  • 14 nm =>
  • 10nm
  • 7nm
  • 5nm
  • 3nm
  • 2nm

 

Diff node Scaling:

Pitch (in nm)   N7 N5 N3 N2        
                   
Poly Length   11 nm  6 nm            
Cell Height   240 (4*M0 + VDD +VSS = 6*M0) 210 (5*M0 + VDD +VSS = 7*M0)            
Cell Width   3*CPP (1 extra CPP due to PODE dummy) 2*CPP            
 Cell Area (invX1)    0.24*0.16=0.038um^2  0.21*0.1=0.021um^2            
cell density (nd2x1/mm^2)   ~20M/mm^2 ~40M/mm^2            
CPP (poly)   57 51            
M0 (H)   40 (< CPP), W=30

28 (< CPP), W=28

(low pitch due to double patterning)

           
M1 (V)   57 (= CPP), W=30 34 (< CPP), W=28            
M2 (H)   40 35            
M3 (V)   44  42            
M4-M8   76 (~2X of min)  M4=44, rest=76            
M9-M10   126 (~4X of min) 76            
M11-M12   720 (~20X of min) 126            
M13-M14   N/A 720            
M14-M17    N/A  N/A            
M18-Mxx                  
                   

 

VLSI Topics:

This is the sequence of topics we'll cover in VLSI section:

1. R, L, C: A lot of simple circuits may be made using R,L and C. These are simple to understand.

2. Solid state devices: Transistors, diodes. These are more difficult to understand.

3. Digital Library cells:

4. Digital Hardware language:

5. CAD tools

6. Analog design

7.

Links:

1. Sunburst: One of the best places to learn digital vlsi design: http://www.sunburst-design.com/

Here they offer a lot of paid training. You don't need to take any paid courses. They do have a lot of free papers, that have a lot of useful info. http://www.sunburst-design.com/papers/

I'll list these papers in different section as we talk about the various topics.

2. Teamvlsi: I saw a few good topics covered here. They also have a youtube channel with good videos. Link: https://teamvlsi.com

3.

 

Restaurants:

This section includes all restaurants chains where you can get food for decent price. As of 2022, prices are going up for all restaurants, so pricing may be outdated. Below are some of the chains that provide value for your money. If you are looking for fast food options, please check in "fast food" section.

 

Olive Garden:

This is a chain. Good Italian food for a very reasonable price. They are not a fast food, but look like high end restaurant with meals being served at table. All their entrees include unlimited soup, salad and bread sticks. This itself is worth $5 or more. Then their luch entrees are $8 and regular entrees are $15. Quantity is good enough for 2 people. Many times you can find their gift cards on sale ($40 for a $50 GC).

Few vegetarian dishes here that I like:

  1. Eggplant parmasino => This is a favorite among veggie Indians. It has eggplant stuffed with things, tastes nice.
  2. Stuffed Fettuccine Alfredo: This is a nice option. It's stuffed with cheese.
  3. Five Cheese Ziti al Forno => I don't remeber4 how it tasted. Will update it later?

Here's a link from slickdeals with various options to try: https://daily.slickdeals.net/food/olive-garden-special-meal-deals/

  • Monday-Friday lunch specials for $8-$10. Includes unlimited soup and salad.
  • For every dine in entree, you can get to take home an entree for $5 more. There are 3 options for take home entree = Fettuccine Alfredo, Five Cheese Ziti al Forno and Spaghetti with Meat Sauce. Lunch specials don't qualify for the take home.

 

 


 

DEALS:

 

All Gift Card deals for fast food are in gift card section. Consider buying those GC where possible and then get these deals.

 

2023:

 

 


 

09/27/23: Olive Garden - Unlimited Pasta for $14 - limited time:

Good offer that comes once in a while: https://slickdeals.net/f/16947016-olive-garden-never-ending-pasta-bowl-w-soup-salad-and-breadsticks-14-dine-in-only-at-participating-locations

 

Semiconductor Memory:

Processor and Memory are 2 most important components of any digital chip. Just as transistors are used to build logic functionality on a chip (such as AND, OR gates to buld a adder, etc), the same transistors are used to serve as memory to store bits. Memory can store bit 0 (voltage=0 Volts) and bit 1 (voltage=VDD Volts)

There are 2 kinds of memory:

1. Volatile Memory:

These are the memories that lose their contents when power is turned off. In your laptop, you have a hard drive, which is non volatile memory. It keeps the contents even when power is turned off. The CPU transfers programs from hard drive to a volatile memory, and accesses it from there. That makes the programs run faster, as there is significantly lower delay accessing contents from this faster volatile memory. There are 2 kinds of Volatile memories: 

A. SRAM (static random access memory):  This is usually seen on a processor, integrated with other logic. Any circuit that has 2 back to back inverters can serve as a memory. So, we could use flops, latches, etc to serve as memory. However, flops and latches have large number of gates (usually 8 or more gates), which is very costly in terms of area. Early on, engineers started making custom version of these latches so that can be put together closeer and need fewer transistors. They came with idea of using 6 transistors to make a memory cell (very similar to latch but with less transistors). Also, they reduced the size of transistor, and optimized the layout and decode logic, to start building compact memory modules. This memory is called 6T SRAM and is used in all logic chips to make register file, caches, etc. Tese memory are fast, but costly in terms of area, so they are usually limited in size to 64MB or so. They are used in caches and other memories on microprocessors. These are not sold as stand alone memories.

B. DRAM (dynamic random access memory):  This is the memory that is usually built and sold separately. It's not integrated in the processor, but sits right next to the processor. This is slower than SRAM, as it's sitting further away, and has lot more stuff packed. However, it requires only 1 transistor and one capacitor to build 1 memory cell. This makes it much smaller than SRAM. However, in absence of back to back inverter, there is no feedback loop to hold the bit value to a 0 or 1. So, periodic refreshing of the value is needed, which slows DRAM further. Since it needs to be refreshed periodically, it's called dynamic. All the memory that you hear in news, journals, etc is this DRAM. This memory is the one used on external memory modules that you buy from BestBuy, Amazon etc (known as DDR memory cards). They can go as large as 128GB or more (Samsung already reported 512GB DRAM memory modules available). DRAM started out as SDR DRAM, and then moved to DDR style DRAM. DRAM are also known as SDRAM (synchronous DRAM), as all signals are driven synchronous to a clock. NOTE: SDRAM and SDR DRAM are referring to 2 different things.

 

Next we look at each of these meories in detail:

A. SDR

B. DDR

DDR1

DDR2

DDR3

DDR4

DDR5

LP DDR

 

Foundations of CNN - Course 4 week 1

This course goes over basics of CNN. Detecting edges is the basic motivation behind CNN. In anypicture, we want to detect horizontal and vertical edges, so that we can identify boundaries of different things in the picture.

We construct a filter (or a kernel) with some dimension, and then convolve it with a picture to get an output. The convolution operator is denoted by asterisk (*) which is the same operator that's used in  multiplication. This causes confusion, but that's what has been used in Digital signal processing Convolution operations, so we use same notation for operator here. In python, func "conv_forward" does convolution, while in TF, tf.nn.conv2d does the job.

convolution just applies the operation of convolution for a given filter on all parts of the picture, one part at a time. When convolving, we just multiply element wise each entry of filter with each entry of picture, and sum them up to get a single number. See the example explained in lecture.

ex: A 6x6 matrix convolved with a 3x3 matrix gives 4x4 matrix.

Edge Detectors:

An example of vertical edge detector would be a 3x3 filter with 1st column as all1, then 2nd col as al 0, and 3rd col as all -1. This detects edges, if we associate +ve numbers with whiteness, -ve numbers with darkness, and 0 being in b/w white and black (i.e gray). We can also make a horizontal detector, by switching rows with columns, i.e 1st row is all 1, 2nd row is all 0, and 3rd row is all -1.

Instead of hard coding these 9 values in a edge detector filter, we can define them as 9 parameters: w1 to w9, and let NN pick up the most optimal numbers. Back propagation is used to learn these 9 parameters. This gives the best results.

Padding and Striding:

Valid Conv: Here o/p matrix shape is not same as i/p matrix shape.

A nxn picture convolved with fxf filter gives matrix with dimension (n-f+1) x (n-f+1). That's why 6x6 matrix convolved with 3x3 filter gave 4x4 o/p (as n=6, f=3, so o/p = 6-3+1=4)

Same conv: To keep the dimension of o/p the same as that of i/p pic, we can use padding, where we pad picture border with extra pixels on the boundary of pic. This involves adding row or col of 0 or 1 or some other value. We can choose padding number p such that the o/p matrix dim remain same as that of i/p pic.

With padding p, a n x n picture (padded with p pixels on each side of pic on border) convolved with f x f filter gives matrix with dimension (n+2p-f+1) x (n+2p-f+1). That's why 6x6 matrix (with p=1) convolved with 3x3 filter gives 6x6 o/p (as n=6, p=1, f=3, so o/p = 6+2-3+1=6). so, o/p matrix retains same shape as i/p matrix.

For any general shape of i/p matrix, we have to choose p such that o/p matrix shape is same as i/p matrix shape. For that to happen, n+2p-f+1 = n => p=(f-1)/2. So, for filter of size=3, we have to choose p=(3-1)/2=1.

With padding, we increase the size of o/p matrix. Striding does opposite of that where it reduces the size of o/p matrix. Striding is where we jump by more than 1 when calculating conv for adjoining boxes. So far, we used a stride of 1 for all our conv, but we could have used any stride number as 2, 3, etc. We do this stride or skipping in both horizontal and vertical directions.

With stride s, a n x n picture (padded with p pixels on each side of pic on border, and stride s) convolved with f x f filter gives matrix with dimension floor((n+2p-f)/s+1) x floor((n+2p-f)/s+1). We use floor function incase numbers don't divide to give an integer.

By using padding and striding together, we can do "same conv".

Convolution over Volume:

So far we have been doing conv over 2D matrix. We can extend this concept to do conv over volume (i.e 3D matrix). In such a case, the i/p matrix is 3D (where the 3rd dimension is for channel, i.e each 2D matrix is for separate color R, G, B). The filter is also 3D. The o/p matrix in such a case is still 2D with same dim as before (n-f+1) x (n-f+1) (assuming p=0, and s=1).

Conv over volume is same as that over area: multiplication and addition is done over all elements including the 3rd dim. So, o/p returned for each conv operation is still a single value for one given box.

However, if we have more than 1 filter for conv operation (i.e one filter is for vertical edge detection, while other filter is for horizontal edge detection, and so on), then the o/p matrix becomes a 3D matrix.

For N filters being applied on i/p pic with dim n x n x nc and filter with dim f x f x nc , the o/p matrix shape would be (n-f+1) x (n-f+1) x N.

Note that nc which is the number of channels in the i/p has to be the same for the filter.

Ex: An i/p pic of 6x6x3 conv with 2 filters of shape 3x3x3 gives o/p matrix of shape 4x4x2 (since n=6, f=3, nc=3 and N=2)

1 Layer of CNN:

For CNN also, we have multiple layers as in Deep NN. In Deep NN, for each layer, we compute activation func a[l]=g(z[l]) where g is the function used for that layer and z[l] = w[l] *a[l-1] + b[l] (* here means matrix multiplication).

In CNN, for each layer, we compute convolution instead of matrix multiplication. So, for i/p layer a[0], z[1] = w[1] *a[0] + b[1]  where w[1] is the filter matrix, and b[1]   is the offset added as before. Here asterisk * refers to convolution operation. Then we use activation function as ReLU, sigmoid, etc to compute o/p matrix a[1]=g(z[1]) . This is true even if we have more than 1 filter, our weight matrix will just have one extra dim for the number of filters.

In general for each layer "l" , we have following relation:

f[l] = filter size

p[l] = padding size

s[l] = stride size

nc[l] = number of filters. Each filter is of dim f[l] x f[l] x nc[l-1] 

dim for "l"th i/p layer a[l-1]  = nh[l-1]  x nw[l-1]  x nc[l-1] where nh = number of pixels across height of pic, nw = number of pixels across width of pic, nc = number of color channels of pic (for RGB, we have 3 channels), 

dim for "l"th o/p layer a[l-1] = nh[l]  x nw[l]  x nc[l] where nh[l]  = floor( (nh[l-1]  + 2p[l] - f[l])/s[l] + 1 ) , nw[l]  = floor( (nw[l-1]  + 2p[l] - f[l])/s[l] + 1 ) 

For m examples, A[l-1] = m x nh[l]  x nw[l]  x nc[l] 

dim of weight matrix w[l] = f[l]  x f[l]  x nc[l-1] x nc[l], where nc[l], is the number of filters in layer "l"

dim of bias matrix b[l] = 1 x 1 x 1 x nc[l] => bias is a single number for each filter, so for nc[l] filters, we have nc[l] parameters.

Example of Conv NN: provided in lecture

3 Types of layers in conv NN: Just using convolution layers may suffice to give us good results, but in practise, supplementing CONV layers with POOL layers and FC layers results in better results.

  1. convolution layer (CONV): This is about using the convolution operator.
  2. Pooling layer (POOL): This is about using max or avg of a subset of matrix, so as to reduce the size of matrix.
  3. Fully connected layer (FC): This is similar to conventional NN, where we connect each i/p entry to each o/p entry which results in a lot of weights being used. But since we use the FC feature in the last few stages of the NN, the size of matrix is greatly reduced by that time, resulting in fewer entries in weight matrix.

Reasons for using Conv NN: (see in last lecture)

  1. Parameter sharing: Same conv filter can be used at multiple places in the image
  2. Sparsing of connections: Not each i/p needs to be connected to o/p, since most of the o/p only depend on a subset of i/p matrix.

 

 Finding optimal values of Weights:

We use same technique of gradient descent to find the lowest value of cost given different weight matrix, and filters. The derivation is not shown in programming assignment, but look in my hand written notes.

 

 Assignment 1:

 

Assignment 2: