Course 2 - Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

 

This course bulids on deep NN. It has various techniques to optimize our NN to predict better. In absence of right parameters, your NN may not even work. It is a course that can be finished at a good speed. It has multiple python exercises, which should be completed. It has 3 sections:

1. Practical Aspects of Deep Learning: This talks about how to adjust parameters like initialization values, and how to choose initial values that will make our NN work. Can be watched in fast mode. However, the 3 exercises whould be finished. They don't take too much time.

2. Optimization algorithms: This goes over how to optimize the algo for finding thew lowest cost. It talsk about techniques of gradient descent (gd) as mini batch, gd with momentum, gd with RMs prop, gd with Adam and learning rate decay. There is a programming assignment to apply these various techniques on a NN, and observe the impact.

3. Hyperparameter tuning, Batch Normalization and Programming frameworks: This introduces google's framework called TensorFlow where we write a program to classify sign digits from 0 to 5.

 

PT flow:

Any STA signoff tool needs to be run separately for all the timing corners after synthesizing and doing final PnR netlist. This ensures that timing is met across all corners, including those that haven't been checked in synthesis or the PnR flow. Here we detail the flow for running Primetime from Synopsys.

Ex Dir=> /project/Primetime/digtop

Setup:

Before running PT, we need to have .synopsys_pt.setup in this dir. This is similar to .synopsys_dc.setup, that we use for DC. Only difference is that link library now has *_CTS.db, as the gate level netlist has CTS cells.CTS cells are clock cells, they may be named differently in other libs. This has lib path, etc (look in "synthesis DC" for more details). We don't specify target_library and link_library in this file, but when running PT for each corner, as we have to use diff .db files for diff corners.
1. search_path: set to path where all .db files are located.
set search_path "$search_path /home/pdk/lib270/r3.0.0/synopsys/bin \
../../Memories/sram220x64/cad_models"
2. target_library: set to min corner library
set target_library MSL270_W_125_2.5_CORE.db => optional as it's specified again when running diff corners.
3. link_library (or link_path my also be used): set to mem and target library.
set link_library { * PML30_W_150_1.65_CORE.db PML30_W_150_1.65_CTS.db } => optional as it's specified again when running diff corners.

So, only step 1 is needed. Other steps not needed.

Invoking PT:

to run PT in cmd line, type

$ pt_shell -2011.12 => If we don't provide version number (-2011.06 or -2011.12 etc), it picks up default version, set by sysadmin. Once pt_shell comes up, from within PT shell, we can write PT cmds.
$ pt_shell> source file_name => executes scripts in file_name.

To execute cmd script upon startup, start pt shell as:(In this case, the script runs automatically w/o any manual cmds from user. This is the preferred way for running PT once you have debugged the script, and it's working correctly)
$ pt_shell -2011.12 -f file_name

GUI mode:

Apart from writing cmds or running script in pt_shell, you can also invoke gui from within PT. Once pt shell comes up, type below cmd

pt_shell> gui_start

Now we can select a cell, pin, wire etc on the schematic, and we'll see this cmd indicating our selection on the shell

pt_shell> change_selection [get_pins mod1/sync_2/out1] => This way it's lot easy to get pin names, etc instead of figuring it out by looking athe design

pt_shell> get_selection => This will show whatever was selected.i.e for above selection, it'll show "mod1/sync_2/out1"

pt_shell> report_timing -thr [get_selection] => This will report timing through the pins selected.

Running PT cmd script:

There are 2 PT scripts we run. One for running design across various PVT corners, and the other for generating SDF files. First let's look at run_pt_all script which runs STA across all corners. Next we'll run sdf generation script. sdf files are needed only for running gate level sims (GLS). So, if you don't plan to run GLS, then you can skip generation of SDF.

1. run_pt_all:  We run "run_pt_all" script, which calls PT shell for all runs. PT is run across various PVT corners. For 130nm or higher tech nodes, just running it across fastest PVT corner and slowest PVT corner suffices. However, when you run timing for lower node designs (ones below 130nm), we need to run PT across lot more corners, as just the fastest PVT and slowest PVT may not capture all timing paths in design(due to large variations in transistors across the chip, which may result in some worst case paths showing up in intermediate PVT corners). Generally PT is run for 2 cases based on functionality:

  1. Functional mode (No scan): This is the normal functional run of chip. Here "scan_mode" is set to 0.
  2. Scan mode (includes Scan Shift and Scan Capture):This is when the part is put in scan mode to test scan chains. Here "scan_mode" is set to 1.
  3. Scan_Vbox mode (optional): This runs scan mode on chip, the same as 2 above, but here we apply much larger voltage than the PVT max corner and much lower voltage than PVT min corner. These voltages are supposed to be bounding boxes for our PVT corner. We run these ultra high or ultra low voltages for scan mode only (not functional), as we want to see if the chip still functions in scan mode.

Running PT for 6 cases noscan(min/max), scan(min/max), scan_vbox(min/max). max refers to max delay lib being used, while min refers to min delay lib being used for that run. Below we show which reports are being generated for each mode.


A. NO SCAN: scan_mode_in set to 0 (in case_analysis.tcl). so normal clks used. If we don't set scan_mode to 0, then there will be too many paths to analyze as both scan_mode=0 and scan_mode=1 timing analysis is run. So, we separate them out.


1. nonscan_max: max delay lib being used
rpts/digtop.max_timing_post_noscan.max.rpt => setup check with max PVT delay (W, 150C, 1.65V) and max interconnect delay (max.spef) slow corner (PCR=max)
rpts/digtop.min_timing_post_noscan.max.rpt => hold check with max PVT delay (W, 150C, 1.65V) and max interconnect delay (max.spef) slow corner (PCR=max)
rpts/digtop.post_noscan.max.rpt => comprehensive report combining both setup and hold checks for slow corner.


2. nonscan_min: min delay lib being used
rpts/digtop.max_timing_post_noscan.min.rpt => setup check with min PVT delay (S, -40C, 1.95V) and min interconnect delay (min.spef) fast corner (PCR=min)
rpts/digtop.min_timing_post_noscan.min.rpt => hold check with min PVT delay (S, -40C, 1.95V) and min interconnect delay (min.spef) fast corner (PCR=min)
rpts/digtop.post_noscan.min.rpt => comprehensive report combining both setup and hold checks for fast corner.

B. SCAN: scan_mode_in set to 1 (in scan.sdc). In scan, spi_clk used as clock for shift and capture. scan.sdc has clk_defn for spi_clk, case_analysis to set scan_mode to 1 and all IO delay set wrt scan_clk. It should not have any false paths as all of the digital logic is run by spi_clk. spi_clk is run at lower freq, and i/o delays are set wrt spi_clk.
IMP NOTE: scan_enable pin IO delay should be matched to real Tetramax delay. Otherwise even if scan_en is not buffered appr and has large transition delays, it may still be able to meet timing wrt rising edge of clk. Thus it may not be captured as a violation here but may show up in Tetramax gate level sims. scan_en is the only IO pin that has real timing path to CLK in DUT. So, it should never be tied to 0 or 1 (by setting set_case_analysis), as that will cause constant propagation, so we will not get a path with rising/falling edge of scan_en pin (it will be reported as unconstrained path in PT). That may hide real failure on this path.

1. scan_max: max delay lib being used

2. nonscan_min: min delay lib being used

C. VBOX: vbox tests run during scan to see if the chip still functions (simple Iddq patterns run to see if lkg is within limits). We choose just 2 corners: vbox_hi=strong tran at high voltage (min delay), and vbox_lo=weak tran at low voltage (max delay), to see if setup, hold etc passes.
#NOTE: We mostly care about vbox_hi rpts to be clean, as that indicates that there is enough hold slack (setup rpts will be clean at vbox_hi anyway as it's run at much faster corner). vbox_lo will be mostly clean for hold, but may have violations for setup as it's a very slow corner. In nutshell, there should be no hold viol at any of vbox_lo/hi, but only setup viol at vbox_lo (assuming design is barely meeting setup timing).


1. scan_vbox_hi: digtop.post_scan.vbox_hi.rpt: high voltage vbox conditions with min PVT delay (S, 25C, 3.2V) and min interconnect delay (min.spef) fast corner at 25C (PCR=vbox_hi). Run at normal freq (12MHz)


2. scan_vbox_lo: digtop.post_scan.vbox_lo.rpt: low voltage vbox conditions with max PVT delay (W, 25C, 0.95V) and max interconnect delay (max.spef) slow corner at 25C (PCR=vbox_lo). Run at 1/2 normal freq (here it's 6.25MHz, as high freq may not be supported at vsuch low voltages)

details of run_pt_all: run_pt_all script calls pt_shell 6 times as shown above for 6 different corners. We only show the script for noscan_max (case A, bullet 1 above) and scan_max (case B, bullet 1 above). Similarly we have it for noscan_min, scan_min, vbox_max and vbox_min

All scripts below run these basic steps:

  1. Read Library: set target_lib and link_lib to appr PVT corner
  2. Read Netlist and SPEF: read verilog gate netlist (from PnR tool), and spef file (from QRC extraction for appr PVT corner).
  3. Read constraints: read constraints for clk,false_paths,case_analysis,etc (func has func.sdc while scan has scan.sdc)
  4. Report timing: report_timing for both max_delay(setup) and min_delay(hold)

no scan max script (check_timing_post_nonscan_max.tcl): Runs functional PT run for max delay corner (i.e worst PVT, that gives max delay). This is for case A, bullet 1 above.

pt_shell -2010.06 -f scripts/check_timing_post_nonscan_max.tcl | tee logs/run_pt_post_nonscan_max.log  => This script sources below 2 scripts:
source -echo scripts/import_post_max.tcl => This runs step 1 and 2 (Read lib, netlist and spef)
source -echo scripts/constrain_post_nonscan.tcl => This runs step 3 and 4 (Read constraints, report timing)

1. import_post_max.tcl => This script is called for 2 max corners above. We have similar import_post_min.tcl for 2 min corners above

#1. Read Library: Read max delay lib. For min corner, we read min delay lib.

set target_library { LIB_W_150_1.65_CORE.db LIB_W_150_1.65_CTS.db } => IMP: whenever, we have new line starting it should have "\" to continue on next line. If starting/closing braces { } are not on same line as file name, then we should have "\" to continue on next line. else we get linking error.
set link_library { * LIB_W_150_1.65_CORE.db LIB_W_150_1.65_CTS.db } => IMP: same as above. "\" should be used when starting new line, or we get linking error.

NOTE: target_library above refers to max delay library. This max delay lib is used for both setup/hold runs. If we want to use max delay lib for setup and min delay lib for hold, we should do this:
set_min_library LIB_W_150_1.65_CORE.db -min_version LIB_S_-40_1.95_CORE.db => specifies max lib, and corresponding min lib by using -min_version.
#For OCV runs, where we want to have min/max library for data/clk path for both setup/hold, in set_operating_condition, we should specify max and min libraries to use for ocv runs.

We can also use following 2 cmds instead of previous 2 cmds:
#set link_path "* LIB_W_150_1.65_CORE.db LIB_W_150_1.65_CTS.db"
#set default_lib "LIB_W_150_1.65_CORE.db LIB_W_150_1.65_CTS.db"

#To make sure, that when any link is unresolved, we get appr error, set the var below in .synopsys_pt.setup
set link_create_black_boxes false => this prevents PT from creating blackbox for unlinked ref, and decalring linking as successful

#2A. Read netlist: read gate netlist generated from PnR tool
read_verilog /db/.../final_pnr_files/digtop_final_route.v

set TOP "digtop"
current_design $TOP => working design for PT.
link => link design to resolve all references in design. shows all lib that are being used to link design. No module/gate should be unresolved here, as that would mean missing module/gate defn

#additional cmds for debug
list_libraries => lists all libraries and their paths
report_design => lists attr of current design, incl min/max op cond used, WLM, design rules, etc.
report_cell => This reports all cells in current design, and .lib they refer to. If current_design is set to DIG_TOP, then it only shows cells for DIG_TOP and NOT for sub-modules within it. This is helpful to find out which .lib is used for timing run (especially if multiple .lib have been loaded in memory)
report_cell "dig_top/cell1" => This reports cell1, it's ref, it's area and other attr.

report_reference => This shows .lib references for cells in current design

#2B. Read spef: read max spef file (generated from rc extractor from within PnR tool)
read_parasitics -format spef /db/.../final_pnr_files/digtop_qrc_max_coupled.spef

2. constrain_post_nonscan.tcl: This file contains all the constraints for functional mode runs. This sources functional sdc file.

#these below settings can be done in .synopsys_pt.setup too.
set timing_self_loops_no_skew true
set timing_disable_recovery_removal_checks "false"

current_design $TOP

#3A. Read constraints: We have 2 options for importing constraints in PT

#OPTION 1:  we source all constraints files individually that we used in DC (instead of using autogenerated file in DC). We don't source env_constraints.tcl (that was used in DC) in PT as we don't want "set_operating_conditions" and set wire load model directives in this file to be used in PT. 
source -echo /db/Synthesis/digtop/tcl/clocks.tcl => clks defined (no scan clk in this)
source -echo /db/Synthesis/digtop/tcl/constraints.tcl => all i/o delays + environment specified.Use same values as used in synthesis. See sdc section (Synopsys (standard) design constraints) for details of these cmds. The units are not specified in below cmds, but are instead based on "set_units" cmd or cap units from the lib that was the last one to be loaded.

  • set_driving_cell => to set i/p driver cell. If driving cell is not set, then we need to set i/p tran time via: set_input_transition 100 [get_ports IN_*] => sets tran time to 100 units. In this case, it's 100ps as lib has "time_unit : "1ps".
  • set_load => to set specified load on ports and nets. We set it on ports only (as we don't want to specify our own cap on internal nets, we let tool calc cap on nets). -max or -min options specify max or min cap to be used in max or min corner runs (as we may not want to use same cap for both max and min runs, applicable only when running timing in max/min mode)
    • ex: set_load 5 [get_ports OUT_*] => sets load of 5 units on all ports OUT_*. Units are based on lib units loaded. In this case, it's 5 ff as lib has "capacitive_load_unit (1.000000, ff);" defined
    • report_port => this cmd used to to report cap + other attr on all ports (or specified port if port specified, i.e report_port [get_ports OUT_*]
  • set_input_delay / set_output_delay => sets i/p, o/p delay on ports


source -echo /db/Synthesis/digtop/tcl/false_paths.tcl => false paths
source -echo /db/Synthesis/digtop/tcl/multicycle_paths.tcl
source -echo /db/Synthesis/digtop/tcl/case_analysis.tcl => scan mode is set to 0. other flops set for func mode. case_analysis.tcl has following:

  • set_case_analysis 0 scan_mode_in => turn OFF scan mode.

#OPTION 2: Instead of sourcing all the above constraints files, we can use constraints.sdc file that is autogenerated by DC to get all the constraints.
source -echo /db/Synthesis/digtop/tcl/constraints.sdc => It has env _constraints (pin loads, driving cells), clks, i/o delays, false paths, case-analysis, dont_touch, etc (basically all the constraints in option 1 above, except the env_constraints). Since the env constraints is also present in this file, we get section that has "set_operating_conditions" set for 1 PVT corner for which we ran DC (it comes from env_constraints.tcl file that was used for DC). So when running PT for other corners, we 'll get errors like "Error: Nothing matched for lib (SEL-005)". So, we'll need to comment out that line when running PT with the above autogenerated file. Or we can comment out the whole env_constraints section.

#after applying path exceptions, do report_exceptions to see list of timing exceptions applied
report_exceptions => -ignored will show those cmds too that are fully ignored. This is important to do as it will tell us if any of the FP/MCP are getting applied or dropped due to syntax errors, path not existing, etc.

#3B. Other constraints: We set clk to propagated, and analysis type to ocv.

#propagate clks: clk is propagated here (In DC, we didn't use propagated clk, so clk was treated as ideal, that means even gate delays in clk path weren't included anywhere in timing reports. We treat clk as ideal in DC, because buffers are going to be inserted later during PnR, so we don't want DC fixing clk paths, with it's own buffers). With this setting, all gate+buffer delays included in clk delay, when running timing.
set_propagated_clock [all_clocks] => Very imp to set this, else timing reports will be all incorrect.

#We set analysis type to OCV even when running it in single mode (specifying only max library for max run, and only min library for min run). So, in reality it's not running ocv here, as we have only one lib loaded to run on all paths. OCV is necessary for PBTA (path based timing analysis to be discusse later) to work.

set_operating_conditions -analysis_type on_chip_variation => This cmd explained in detail in "PT - OCV" section.

#4: Reports: Timing reports generated below. report_timing is the main cmd that causes PT engine to run timing.

#rpt file for setup (-delay max)
set rptfilename [format "%s%s" $mspd_rpt_path $TOP.max_timing_post_noscan.$PCR.rpt]
redirect $rptfilename {echo "digtop constrain_post_noscan.tcl run : [date]"}
redirect -append $rptfilename {report_timing -delay max -path full_clock_expanded -max_paths 100} => provides timing, most powerful cmd in PT.

#rpt file for hold (-delay min)
set rptfilename [format "%s%s" $mspd_rpt_path $TOP.min_timing_post_noscan.$PCR.rpt]
redirect $rptfilename {echo "chip constrain_post_noscan.tcl run : [date]"}
redirect -append $rptfilename {report_timing -delay min -path full_clock_expanded -max_paths 100}

#rpt file for all violations
set rptfilename [format "%s%s" $mspd_rpt_path $TOP.post_noscan.$PCR.rpt]
redirect $rptfilename {echo "digtop constrain_post_noscan.tcl run : [date]"}
redirect -append $rptfilename {report_clocks }
redirect -append $rptfilename {check_timing -verbose} => check_timing checks for constrain problems.
redirect -append $rptfilename {report_disable_timing} => You can also eliminate paths from timing consideration by using the set_disable_timing command. report_disable_timing reports such paths. It shows disabled timing arcs for all the cells. Most of them are "u=user-defined" paths. Spare cell paths are reported as "p=propagated constant" since all i/p pins are tied, so no paths exist. Some paths get reported with"c=case-analysis" since case analysis ties some pins.
ex:

Cell or Port                From    To      Sense                 Flag  Reason
--------------------------------------------------------------------------------
Mod1/req/u4             A1      ZN      positive_unate     C     A2 = 0 => Arc from pin A1 to pin ZN of cell u4 is disabled.

redirect -append $rptfilename {report_constraint -all_violators} => reports the results of constraint checking done by PrimeTime. -all_violators reports all violations incl setup, hold, max cap, max FO, max transition time (slew rate or transition time is measured 10/90 or 20/80 or whatever based on the characterized lib and slew derate factor), min pulse width, clk gating checks, recovery checks and removal checks. {report_constraint -all_vio -verb} gives verbose info about violations

#report_global_timing -group [get_path_group CLK*] => generates a top-level summary of the timing for the design. Here generates a report of violations in the current design for path groups whose name starts with 'clk'. If we run this in a for loop with all path_groups, then we get separate reports for each group.
ex: foreach_in_collection path_group [get_path_groups] { report_global_timing -group $path_group >> viol_summary.rpt } => reports timing for groups = **async_default**, **clock_gating_default**, **default**, CLK10, SYSCLK20, and other clocks in design.

#report_analysis_coverage => Generates a report about the coverage of timing checks
#report_analysis_coverage -status_details {untested} -check_type {setup} => Once we see coverage missing, we can get detailed report about status of untested, violated or met checks. Check types can be "setup, hold, recovery, removal, clock_gating_setup, clock_gating_hold, min_pulse_width, min_period, nochange".

#report_clock_timing -type summary -clock [get_clocks *] => lists clock timing info summary, which lists max/min of skew, latency and transition time over given clk n/w.
# report_clock_timing -type skew -setup -verbose -clock [get_clocks *] => This gives more detailed info about given clk attr (over here for skew). By default, the report displays the values of these attributes only at sink pins (that is, the clock pins of sequential devices) of the clock network. Use the -verbose option to display source-to-sink path traces.

#PBTA: pbta is path based timing analysis. By default, pba_mode is set to "none" => pbta is not applied (gba is applied). It's useful as PBA doesn't have pessimism of GBA, so we always use PBA though at expense of runtime.
#report_timing -pba_mode path => pba applied to paths after collecting, but worst case may not be reported. It takes the worst slack path, and just recalcualtes it, but it's possible that next worst case path right behind it, doesn't get as much improvement from pba, and so might become worst case path, but we never analyzed this next worst case path.
#report_timing -pba_mode exhaustive => provides worst case path after recalc. It looks at all the paths to a particular endpoint, and applies pba on each path to that endpoint. Path with worst slack after applying recalc to all paths to each endpoint is shown. so, the optimism inherent with "-pba_mode path" is not present anymore.

#rpt file for pbta setup (-delay max)
set rptfilename [format "%s%s" $mspd_rpt_path $TOP.pbta_max_timing_post_nonscan.$PCR.rpt]
redirect -append $rptfilename {report_timing -pba_mode exhaustive -crosstalk_delta -transition_time -delay max -path full_clock_expanded -nworst 10 -max_paths 500 -slack_lesser 1.0} => -crosstalk_delta reports delta delay and delta transition time, which were calculated during crosstalk SI analysis (provided PTSI is enabled). -transition_time reports transition time which is helpful to figure out nets which have very slow transition due to crosstalk.

#rpt file for pbta hold (-delay min)
set rptfilename [format "%s%s" $mspd_rpt_path $TOP.pbta_min_timing_post_nonscan.$PCR.rpt]
redirect -append $rptfilename {report_timing -pba_mode exhaustive -crosstalk_delta -transition_time -delay min -path full_clock_expanded -nworst 10 -max_paths 500 -slack_lesser 1.0}

scan max script (check_timing_post_nonscan_max.tcl): Runs scan PT run for max delay corner (i.e worst PVT, that gives max delay). This is for case B, bullet 1 above.

pt_shell -2010.06 -f scripts/check_timing_post_scan_max.tcl | tee logs/run_pt_post_scan_max.log  => This script sources below 2 scripts:
source -echo scripts/import_post_max.tcl => This runs step 1 and 2 (Read lib, netlist and spef)
source -echo scripts/constrain_post_scan.tcl => This runs step 3 and 4 (Read constraints, report timing)

1. import_post_max.tcl => This script is same as what we used in func mode above

2. constrain_post_scan.tcl: This file is used for scan runs (and is diff than func script above). This sources scan sdc file. No other constraints files are sourced when running scan. This is because the constraints for scan are totally different than ones in func mode.
source -echo /db/.../scan.sdc
#scan.sdc has following:
create_clock -name spi_clk -period 66 -waveform { 0 33 } [get_ports {spi_clk}] => create clk for PT and specify its characteristics. rising edge at 0ns and falling edge at 33ns.
set_propagated_clock [get_clocks {spi_clk}]
set_case_analysis 1 [get_ports {scan_mode_in}] => turn ON scan mode.
set_input_delay 10 [all_inputs ] => i/p timing conditions
set_output_delay 10 [all_outputs ] => o/p timing requirements
set_dont_touch scan_inp_iso
set_driving_cell -lib_cell IV110 [all_inputs]
set_load 4.2 [all_outputs]

 

2. SDF generation script: Once we have run PT across all 6 corners as shown above, we use PT to generate SDF files. Thses are delay files that will be used in gate level simulations. If we don't plan to run GLS, then we don't need to run this step.

sdf file generation requires:

  • .lib files => for all cells/macros as they have cell delays, setup/hold timing checks, c2q arcs etc.
  • gate level verilog netlist => netlist is needed so that all nodes of netlist are appened with appr cell delay and net delay. nets which are not connected to anything are reported as driverless nets
  • spef file => has R,C info for all nets (no dly info for nets and cells)

NOTE: we don't require verilog model files for cells, macros, etc as we are just generating delay file. All arc info comes from .lib files. Delays for sdf are calculated from R,C in spef file and cell delay (with appr load for that cell coming from spef file) from .lib file.

we've 2 scripts for generating max and min sdf. Each net and instance in verilog is matched with a net parasitic in spef file, and a cell timing in .lib file. target library and link library are set to *.db (liberty files for gates) for that particular corner, just as we do in DC.


1. max sdf: (for max delay, so worst timing corner)
> pt_shell -2010.06 -f scripts/MaxSDF.tcl | tee logs/run_GenMaxSDF.log

#MaxSDF.tcl has following: read .lib files, verilog file and max.spef file (generated from EDI). write_sdf writes final sdf file, other options in write_sdf needed to have aligned sdf, otherwise when we do sdf annotation, we may not get sdf file arcs aligned with verilog model file arcs. sdf file arcs come from .lib arcs, while during annotation, we check it against arcs in verilog model files.

set target_library { PML48_W_125_1.35_COREL.db \
PML48_W_125_1.35_CTSL.db \
felb2x01024064040_W_125_1.35.db }
set link_library { * \
PML48_W_125_1.35_COREL.db \
PML48_W_125_1.35_CTSL.db \
felb2x01024064040_W_125_1.35.db }
echo $target_library

read_verilog /db/NIGHTWALKER/design1p0/HDL/FinalFiles/digtop/digtop_final_route.v
current_design digtop
link

read_parasitics -format spef /db/NIGHTWALKER/design1p0/HDL/FinalFiles/digtop/digtop_qrc_max_coupled.spef

#timing checks
check_timing -verbose => report shows all endpoints as unconstrained as no sdc file provided. OK
report_timing => report shows no constrained paths, as no clk provided. OK
report_annotated_parasitics -max_nets 150 -list_not_annotated => Provides a report of nets annotated with parasitics in the current design for both internal and port nets. -list_not_annotated lists nets that are not back annotated.
write_sdf -version 3.0 \ => SDF version 1.0, 2.1 or 3.0
-exclude {default_cell_delay_arcs} \ => specifies which timing values are to be excluded from sdf file. default_cell_delay_arcs indicates that all default cell delay arcs are to be omitted from the SDF file if conditional delay arcs are present. If there are no conditional delay arcs, the default cell delay arcs are written to the SDF file. NOTE: This may be an issue when running gatesims, as verilog models will have default_delay_arcs, while those will be missing from sdf files, so annoatation will have "missing annotation" warnings.
-include {SETUPHOLD RECREM} \ => SETUPHOLD:combine SETUP and HOLD constructs into SETUPHOLD. RECREM:combine RECOVERY and REMOVAL constructs into RECREM.
-context verilog \ => context for writing bus names for verilog, vhdl or none, so that [], () are not escaped.
-no_edge \ => SDF should not include any edges (posedge or negedge) for both comb and seq IOPATHs. It takes the worst of posedge/negedge values and assigns it to the delay arc
-no_negative_values {cell_delays net_delays} \ => Specifies a list of timing value types whose negative values are to be zeroed out when writing to the SDF file. Allowed values for timing values are timing checks, cell delays and net delays.
sdf/digtop_max.pt.sdf
quit

#look in logs/run_GenMaxSDF.log for errors.
A. look "report_annotated_parasitics" section detailed info. Here all internal nets (nets connected only to cell pins) and boundary/port nets (net connected to any I/O port of top level design, this should match the number of I/O ports in design) are reported. These nets are classified into pin to pin nets, driverless nets and loadless nets. All nets should connect from pin to pin, unless they are either floating o/p (loadless nets) or floating i/p (driverless nets). Nets which have no driver and no load are counted as driverless nets. Nets are everything reported as wires in digtop_final_route.v file. Any wire in this netlist that it's not able to find associated with any cell, it reports those as "floating nets". NOTE: parasitics from spef file are only annotated on nets, and not to cells. These parasites cause INTERCONNECT delay on nets, and eventually affect the loading on cells. Eventually cell arc from .lib file is used to find out real delay based on cell o/p pin load.
B. look warnings in "write_sdf" section. Most common one is "The sum of the setup and hold values in the cell 'soc_top/i2cBlk/i2cLink/sdaOut_reg' for the arc between pins 'CLK' and 'SCAN' is negative, which is not allowed. To make it positive, the minimum hold value has been adjusted from -0.613501 to -0.594033. (SDF-036)". This is because setup+hold should be > 0

2. Min sdf: for min delay, so fastest timing corner chosen.
> pt_shell -2010.06 -f scripts/MinSDF.tcl | tee logs/run_GenMinSDF.log

#MinSDF.tcl has following: only diff is min delay lib, and min spef chosen

set target_library { PML48_S_-40_1.65_COREL.db \
PML48_S_-40_1.65_CTSL.db \
felb2x01024064040_S_-40_1.65.db }
set link_library {* \
PML48_S_-40_1.65_COREL.db \
PML48_S_-40_1.65_CTSL.db \
felb2x01024064040_S_-40_1.65.db }

echo $target_library

read_verilog /db/NIGHTWALKER/design1p0/HDL/FinalFiles/digtop/digtop_final_route.v
current_design digtop
link

read_parasitics -format spef /db/NIGHTWALKER/design1p0/HDL/FinalFiles/digtop/digtop_qrc_min_coupled.spef

check_timing -verbose
report_timing
report_annotated_parasitics -max_nets 150 -list_not_annotated
write_sdf -version 3.0 \
-exclude {default_cell_delay_arcs} \
-include {SETUPHOLD RECREM} \
-context verilog \
-no_edge \
-no_negative_values {cell_delays net_delays} \
sdf/digtop_min.pt.sdf
quit

 

PT reports:
----------
PT reports timing for clk and data path. 1st section "data_arrival_time" refers to data path from start point, while 2nd section "data_required_time" refers to clk path of end point. 1st section shows path from clk to data_out of seq element and then thru the combinational path all the way to data_in of next seq element, while 2nd section shows primarily the clk path of final seq element, ending at clk pin. In the 2nd section, it shows the final "data check setup time" inferred from .lib file for that cell.

reports are shown for per stage. A stage consists of a cell together with its fan out net. So, transition time reported is at the i/p of next cell. delay shown is combined delay from i/p of cell to o/p of cell going thru the net to the i/p of next cell. & in report indicates parasitic data.

Ex: a typical path from one flop to other flop
Point Incr Path
------------------------------------------------------------------------------
clock clk_800k (rise edge) 1.00 1.00 => start point of 1st section
clock network delay (propagated) 3.41 4.41
.....
Imtr_b/itrip_latch_00/SZ (LAB10) 0.00 7.37 r
data arrival time 7.37
-----
clock clk_800k (rise edge) 101.00 101.00 => start point of 2nd section (usually starts at 1 clk cycle delay, 100 ns is the cycle time here)
clock network delay (propagated) 3.85 104.85
.....
data check setup time -0.04 105.76 => setup time implies wrt clk, data has to setup. So, we subtract setup time from .lib file to get data required time (as +ve setup time means data should come earlier)
data required time 105.76
------------------------------------------------------------------------------
data required time 105.76
data arrival time -7.37
------------------------------------------------------------------------------
slack (MET/VIOLATED) 98.39

Course 1 - week 3 - Shallow Neural Network:

This course introduces 2 layer Neural networks. NN was introduced in previous lecture, but it was mostly logistic regression. In logistic regression, we took a linear function f(x), assigned weights to various pixels, and computed if the picture can be classified as cat or not. It was single layer, as input X passed thru only one function f(x) = σ(w1*x1+w2*x2+...+wn*xn + b).

In Multi layer NN, we pass input X thru 2 functions f(x) and g(x) which may be same or different. If we choose f(x) as a func above, then f(x) returns single value, and passing it thru another function g(x) doesn't give anything new. i.e g(x) and f(x) could be combined as one function h(x). So, in above example, we can combine sigmoid function with g(x) to give a new function h(x)=g(σ(x)). This arrangement implies that instead of choosing sigmoid as an activation function, we chose some other function h(x) as activation function. So, we just replaced one function with another, and 2 layer result could have been achieved with one layer.

What if we allow a combination of f(x) functions to get more curves on the surface that's trying to fit our data set (in case of cat picture, it's fitting our pixels better)? Let's try to make various combinations of f(x) as f1(x), f2(x), etc. Then we can combine these f1(x), f2(x), ... with varying weights and feed that combination to g(x).So, this is what it would look like:

f1(x) = σ(w11*x1+w12*x2+...+w1n*xn + b1)

f2(x) = σ(w21*x1+w22*x2+...+w2n*xn + b2)

..

fk(x) = σ(w21*x1+w22*x2+...+w2n*xn + b2)

Now, we define g(x) the same way as f(x), but now the inputs are the outputs of above functions. Here we assign weights to functions f1(x), f2(x), ... and pass it thru sigmoid func to get g(x)

g(x) = σ(v1*f1(x)+v2*f2(x)+...+vn*fk(x) + c)

It turns out that this gives a better fit than the logistic regression fit that we attained in week 2 example. Reason is that g(x) in logistic regression was of form g(x) = σ(v1*x1+v2*x2+...+vn*xn + c), but now instead of having x1,x2,... in it's input, it has functions of x1,x2,.. in it's input (i.e f1(x1,x2,...), f2(x1,x2,...),...). This allows it to take more complicated shapes and fit the given data better.

2 Layer NN:

The above scheme becomes a 2 layer NN. It's called a shallow NN, as it has very few layers (in our example, only 2 layers). We can extend this concept from 2 layers to any number of layers, and surprisingly (or may be not so after all), the fit keeps on becoming better. This is because we have more and more dimensions of freedom in playing with variables to get better fit. We may be able to achieve higher accuracy with logistic regression, but it will need infinitely large number of weights to fit the curve. And still, it won't be able to fit the data as it won't be able to generate any curves with a linear function.

Let's revisit the section on "Best Fit Function". There we saw that sigmoid functions can be linearly added and fed into a sigmoid function to generate complex shapes. We saw plots for 2 dimensional i/p (i.e x,y), but it can be generalized to any number of inputs. By using appr weights and adding sigmoid functions, we were able to generate complex shapes.

ReLU or any other non linear functions can also be used instead of sigmoid functions.

NOTE: One very thing to keep in mind is that weight W need to be initialized to random values, instead of being initialized to 0. The lecture explains why.

Programming Assignment 1: This is a simple 2 layer NN. It tries to predict if a given dot is red or blue given it's location coordinate (x,y). Since the shape is in form of a flower, the 1 layer NN with it's linear equation can never form a boundary that can separate out the blue and red petals (as linear eqn can't form complex surface). Only 2 layer NN and higher layers can form a complex surface that can separate out various regions. We'll run our pgm thru both 1 layer NN and 2 layer NN.

Here's the link to pgm assigment:

Planar_data_classification_with_onehidden_layer_v6c.html

This project has 3 python pgm, that we need to understand.

A. testCases_v2.py => There are bunch of testcases here to test your functions as you write them. In my pgm, I've them turned off.

testCases_v2.py

B. planar_utils.py => this is a pgm that defines couple of functions. 

planar_utils.py

These functions are:

  • load_planar_dataset(): This function builds coordinates x1,x2 and corrsesponding color y (red=0, blue=1). The array X=(x1,x2) and Y for all the points is returned back. So, no database is loaded here from any h5 file. It's built within the function.
  • load_extra_datasets(): This loads other optional datasets as blobs, circles, etc. These are on same style as petals, where a linera logistic regression can never achieve high enough accuracy.
  • plot_decision_boundary(): This plots the 2D contour of the boundary where the function changes value from 0 to 1 or vice versa. However, this boundary is better visualized in 3D. So, I added options for 3D contour, 3D surface and 3D wireframe (on top of default 2D contour). I've set 3D surface as default, as that gives the best visual representation.
  • sigmoid(): This calculates sigmoid for a given x (x can be scalar or an array)

We'll import this file in our main pgm.

C. test_cr1_wk3.py => This pgm calls functions in planar_utils.  Here, we define our algorithm for 2 layer NN to find optimal weights, by trying out algorithm on training data.. We then apply those weights on training data itself to predict whether the whether the dots were red or blue. There is no separate testing data. We just want to see how well our surface fits training data. Below is the whole pgm:

test_cr1_wk3.py

Below are the functions defined in our pgm:

  • layer_sizes() => Given X,Y as i/p array, it returns size of input layer, hidden layer and output layer
  • initialize_parameters() => initializes W1,b1 and W2,b2 arrays. W1, W2 are init with random values (Very important to have random values instead of 0), while b1,b2 are init to 0. It puts these 4 arrays in dictionary "parameters" and returns that. NOTE: To be succinct, we will use w,b to mean W1,b1,W2,b2, going forward.
  • forward_propagation() => It computes output Y hat (i.e output A2). Given X, parameters (parameters has all w,b), this func calculates Z1, A1, Z2, A2 which are stored in dictionary "cache" and returned. NOTE: here didn't use sigmoid func for both layers. Instead we used tanh function for 1st layer (hidden layer), and sigmoid for next layer (output layer). Lectures explain it why.
  • compute_cost() => computes cost (which is the log function of A2,Y).
  • backward_propagation() => This computes gradients dw1, db1, dw2, db2 by using the formulas in lecture. It stores dw1, db1, dw2, db2 in dictionary "grads". It returns dictionary "grads". NOTE: above 3 functions were combined into one as propagate() in the previous exercise from week2, but here they are separated out for clarity.
  • update_parameters() => This function computes new w,b given old w,b and dw,db. It doesn't iterate here, rather iteration is done in nn_model() below
  • nn_model() => This is the main func that will be called in our pgm. We provide the training data array (both X,Y) as i/p to this func. It then returns to us the optimal parameters (w,b). It calls above functions as shown below:
    • calls func initialize_parameters() to init w,b,
    • It then iterates thru cost function to find optimal values of w,b that gives the lowest cost. It forms a "for" loop for predetermined number of iterations. Within each loop, it calls these functions:
      • forward_propagation() => Given values of X,w,b, it computes A2(i.e Y hat). It returns A2 and cache.
      • compute_cost() => Given A2,Y, parameters (w,b), it computes cost
      • backward_propagation => Given X,Y, parameters (w,b) and cache (which stores intermediate Z and A), it computes dw,db and stores it in grads.
      • update_parameters() => This computes new values of w,b using old w,b and gradients dw,db. New "parameters" dictionary is returned.
    • In beginning, w and b are initialized. We start the loop and in first iteration, we run the 4 functions listed above to get new w,b based on dw, db, and learning rate chosen. Then we start with next iteration. In next iteration, we repeat the process with newly computed values of w,b fed into the 4 functions to get even newer dw, db, and update w,b. We keep on repeating this process for "num_iterations", until we get optimal w,b which hopefully give lot lower cost than what we started with.
    • It then returns dictionary "parameters" containing optimal W1,b1,W2,b2
  • predict() => Given input picture array X and weight w,b, it predicts Y (i.e whether point is blue or not). It uses w,b calculated using nn_model() function. It calls forward_propagation() func to get A2 (i.e Y hat). If A2>0.5, it sets predictions to "1" else sets it to 0, and returns array "predictions".
  • Accuracy is then reported for all coordinates on what color they actually were vs what our pgm predicted.

Below is the explanation of main code (after we have defined our functions as above):

  1. We get our datset X,Y from any of the multiple sets available. We have our petal flower set (which is the default set). We can also choose optional noisy_circles, noisy_moons, blobs, gaussian_quantiles. We use func loadplanar_dataset() to load petal dataset, while we use load_extra_datasets() to load the other 4 datasets. We plot the data X,Y in a scatter plot. 
  2. We then run 2 classifiers on our data: 1 is logistic regression, while other is 2 layer NN:
    1. Logistic regression:
      1. Here we run logistic regression classifier on this X,Y dataset. Instead of building our own logistic regression classifier (as we did in week 2 exercise), we use sklearn's inbuilt classifier on X,Y set.
      2. We then use func plot_decision_boundary() to plot 2D/3D decision boundary (or predicted Y values, i.e Y hat values) to check how how fitting surface looks like with logistic regression classifier. It's a a single sigmoid function as expected (with a straight line seen in 2D contour)
      3. Then we print accuracy of logistic regression which is pretty low as expected.
    2. Two layer NN:
      1. Here we run our 2 layer NN. We call function nn_model() with i/p X,Y and number of hidden layers set to 4.
      2. Next, we use func plot_decision_boundary() to plot 2D/3D decision boundary (the same way as in regression classifier)
      3. Then we print accuracy of NN which is lot higher than logistic regression.
  3. In above exercise, we used a fixed number "4" for our hidden layer number. We would like to explore what does increasing the number of hidden layers do on the accuracy of prediction. So, we repeat the same exercise as we did in 2 layer NN, but now we vary hidden layer size from 1 to 50. As expected, larger the number of hidden layers, more the number of surfaces we have to play with, and hence better the fit we can achieve. So, prediction accuracy goes to 90%.

Below are the plots for different hidden layer size (sizes ranging from 1 to 20). NOTE: number of layers is still 2.

1. Petal data: First we show plots for Petal data set

A. below is how petal data looks like. Here o/p Y is the color, while i/p X are the coordinates (x1,x2)

 

B. When we run logistic regression on above data to get best fit, this is how logistic regression final output Y plot looks like:

 

 

C. Now, we run the same datset on optimal w,b calculated in our pgm above, but with different size of hidden layer ranging from 1 to 20. Here we plot A2 (not Y, but Y hat), so that we can see what values these sigmoid plots range from (i.e did they all the way to 0 or 1, or were they stuck in between values). If we plot finally Y (predicted values), then we lose this info. As can be seen, we get more and more tanh plots to arrange and get better fit, as we increase hidden layer size. Hidden layer size of 1 means only 1 tanh function, size=2 means 2 tanh functions, size=3 means 3 tanh functions, and so on. So, for size=3, activation function A2=C1*tanh+C2*tanh+C3*tanh can generate a lot more surfaces (about 3+3+1=7 possible surfaces).

 

 

2. noisy circles data: Next we show data for Noisy circles data set

A. below is how noisy circles data looks like. Here o/p Y is the color, while i/p X are the coordinates (x1,x2).

B. When we run logistic regression on above data to get best fit, this is how logistic regression final output Y plot looks like:

 

 

C. Now, we run the same datset on optimal w,b calculated in our pgm above, but with different size of hidden layer ranging from 1 to 20. As in petals case, we plot A2 (not Y, but Y hat). Results show the same thing as petals case: we get better fit, as we increase hidden layer size. Here blue and red dots are more randomly spread, so there should be more of  tanh functions that are added together, so that they can separate out red and blue dots. So, a larger hidden layer size helps.

 

Summary:

By finishing this exercise, we learnt how to build 2 Layer NN and figure out optimal weight for coordiantes (x,y) so that it can predict blue vs red dot. We played around with different size of hidden layer, and saw that higher the size of hidden layer, better is the fit, though beyond a certain optimal number, increasing the size of hidden layers don't add any extra value. We compared results to those predicted by logistic regression. Logistic regression (which is basically a single layer NN) could never match the accuracy provided by NN with 2 layer.

Course 1 - week 2 - Neural Network Basics:

This is the first technical introduction to NN. Well, the material for this week doesn't really talk about NN, it talks about regression, and how to do a linear and logistic regression. But in later weeks, you will see that these regressions are the simplest kind of NN. Logistic regression is a concept from statistics, but this defines the building block for AI.

For Linear and Logistic regression, see the AI section on "Statistics - Regression". This is all this week's lecture is about. Trying to do binary classification on a picture with nx pixels, to find out if it's a cat or not. First, we give m such training pictures to our regression engine, let it find weights which gives it the lowest cost, and then use the weights to predict on a test picture. If our weights are optimal, and the test picture is close to our training set picture, then our regression algo would do a good job in classifying the picture correctly.

However, just from common sense it looks like this approach of simple regression will never work, as cats can come in any color, shape, position, background, etc. Regression is just matching pixels and trying to minimize distance, it has no spatial information (i.e if there are 10 pixels next to each other to form a eye, then our logistic regression model doesn't care if these 10 pixels are on 10 different corners of the picture, or they are next to each other).

As an example, consider 8X8 pixel black and white picture. Each pixel can have 2 values: 0 for black and 1 for white. So, total possibilities of all pictures possible is 2^(8*8)=2^64 unique pictures possible. Our regression analysis is trying to go thru limited set of such possible combinations and predict what each picture is going to be. It's impossible to do that even for 8x8 pixel black and white picture. Just imagine how to do that for 64x64 colored picture !! And then for even larger pictures. It's just not possible by brute force "least error" regression technique. Something better has to be done. That's for later courses !!

 This week has a programming assignment, that is an absolute must to be completed, if you want to learn AI. It helps you go thru simplest NN that's possible, which is actually logistic regression. All new concepts are developed. Take your time to finish this assignment.

Programming Assignment 1: This is a simple image recognition pgm. It reads a file of images to get trained (using whatever algorithm we use, here we use logistc regression), and then we run the pgm on test images to see how well our algorithm works.

Here's the link to pgm assigment:

Logistic_Regression_with_a_Neural_Network_mindset_v6a.html

This project has 2 python pgm, that we need to understand.

A. lr_utils.py => this is a pgm that defines a function "load_dataset". We'll import this file in our main pgm. However, instead of writing it as a separate pgm, I copied the function defined in this file in the main python pgm.

The function load_dataset() reads 2 files: test data and training data. Below are the two h5 files that contain our training data and test data. Feel free to download the 2 files by right clicking and choosing "save link as" (If you directly click on the link below, it will open the h5 file in the browser itself, which will look garbage as it's not a text file that browser knows how to display):

train_catvnoncat.h5

test_catvnoncat.h5

1. training data: This data is used to train our algo. It has 209 training data set with label="train_set_x". It has 209 2D pictures, which are each 64x64 pixels, and each picture has a triplet of R,G,B values

2. testing dat: This data is used to test our algo. It has 50 testing data set with label="test_set_x". It has 50 2D pictures, which are each 64x64 pixels, and each picture has a triplet of R,G,B values.

 Below I'm writing the function "load_dataset" from lr_utils.py

import numpy as np
import h5py
    
def load_dataset():
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features. We store this data into an array of 209X64X64X3
    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels. This stores the type=0 for non cat and 1 for cat corresponding to 209 pictures.It's a 1D array with 209 elements, but since it's 1D, we convert it to 2D array as shown later

    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features. Similarly for test set, we have 50 pictures, array is 50X64X64X3
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels. This stores the type for these 50 pictures

    classes = np.array(test_dataset["list_classes"][:]) # the list of classes

    print("train = ",train_dataset, "test = ",test_dataset, "classes = ",classes,classes.shape)

    print("OLD", train_set_x_orig.shape, train_set_y_orig.shape, test_set_x_orig.shape, test_set_y_orig.shape)
    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
    print("NEW", train_set_x_orig.shape, train_set_y_orig.shape, test_set_x_orig.shape, test_set_y_orig.shape)
    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

 

result:

train =  <HDF5 file "train_catvnoncat.h5" (mode r)> test =  <HDF5 file "test_catvnoncat.h5" (mode r)> classes =  [b'non-cat' b'cat'] (2,) => train_dataset, test_dataset are just pointers. classes is a 1D array with just 2 string values [non-cat cat]

OLD (209, 64, 64, 3) (209,) (50, 64, 64, 3) (50,) => The y labels are 1D array here
NEW (209, 64, 64, 3) (1, 209) (50, 64, 64, 3) (1, 50) => They y labels have been converted into 2D array here (X labels are still 4D array)

 

B. test_cr1_wk2.py => This pgm calls func load_dataset() defined in lr_utils, and we define our algorithm for logistic regression here to find optimal weights, by trying out algorithm on training data.. We then apply those weights on test data to predict whether the picture has a cat or not.

Below is the whole pgm, including the function defined in lr_utils.py

test_cr1_wk2.py

Below are the functions defined in our pgm:

  • sigmoid() => defines sigmoid func for any input z
  • initialize_with_zeros() => initializes w,b arrays with 0
  • propagate() => computes total cost. Given X, w, b, this func calculates activation A (which is the sigmoid function of linear eqn w1*x1+... wn*xn +b) and then computes cost (which is the log function of A,Y). Then it computes gradients dw, db. It stores dw, db in dictionary "grads". It returns scalar "cost" and dictionary "grads"
  • optimize() => This function iterates thru cost function to find optimal values of w,b that gives the lowest cost. It forms a "for" loop for predetermined number of iterations. Within each loop, it calls function propagate() with given values of X,w,b. In beginning, w and b are 0. propagate() returns new dw,db. Then it updates w,b with new values based on dw, db, and learning arte chosen. Then it starts with next iteration. In next iteration, it feeds newly computed values of w,b into propagate() to get even newer dw, db, and updates w,b. It keeps on repeating this process for "num_iterations", until it gets to w,b which hopefully give lot lower cost than what we started with.
  • predict() => Given input picture array X, it predicts Y (i.e whether pic is cat or not). It uses w,b calculated using optimize function. We can provide a set of "n" pictures here in single array X (we don't need to provide each pic individually as an array). This is done for efficiency purpose, as Prof Andrew explains multiple times in his courses.
  • model() => This is the main func that will be called in our pgm. We provide both training and test pictures as 2 big arrays as i/p to this func. It calls above functions as shown below:
    • calls func initialize_with_zeros() to init w,b,
    • then calls optimize() to optimize w,b to give lowest cost across the training set.
    • It then calls predict() to predict on any picture. predict is called twice for both training set and test set to predict cat vs non cat.
    • Accuracy is then reported for all pictures on what they actually were vs what our pgm predicted.

Below is the explanation of main code (after we have defined our functions as above):

  1. We load the datset X,Y for m pictures stored in h5 files.
  2. Then we enter in a loop, where we can repeat running this program as many times as we want for whatever reason. NOT really needed.
  3. Inside the loop, we flatten and normalize array X that we read from dataset in h5 file. We flaltten array of R,G,B pixels for each picture into shape(nx*nx*3,1). This flattening is done since our weight array also flattened. We want one weight for each pixel, so both weight and pixel value have to be 1D array, so that we can just multiply them directly as w1*x+w2*x2+...+wn*xn. In our implementation of this in numpy, we make them 2D array, but they still have only 1 row or col filled (i.e they behave like 1D).
  4. Now we run function model() on array X (which already has m training pics in it), and find optimal w,b by running it on training set. Function model() then runs prediction() and reports prediction accuracy for both training and test set.
  5. Then we have a choice of trying various learning rates, and see the effect on minimal cost achieved by our pgm. Learning rates matter a lot, as we see by trying small/large rates.
  6. Then finally we have a choice of trying 10 diff random images (these images are in all_data dir), which are predicted by calling predict(). Prediction value for each image is reported. We see that accuracy is bad (about 50%). Here we used Image module from PIL library. I couldn't get "imread" from matplotlib to work.

Summary:

By finishing this exercise, we learnt how to do logistic regression to figure out optimal weight for each pixel of a picture so that it can predict a cat vs non cat picture.

 

Intro to Deep Learning: Course 1 - Week 1

This is very introductory material.

Neural Network (NN) is just taking a dataset and fitting it with a eqn i.e given input features X1, X2, ... Xn, and a output Y, which we try to fit with a complex eqn Y = F(X1,X2,...,Xn). Once we find this best fit eqn F, we use this to predict Y given X1,X2,...Xn.

Process of getting this eqn is called network training. The term neural network came into being, since this complex eqn that we get resembles a chain of neurons passing information from one to the next, until we get to the output stage. From statistics, we know how to find best fit, but those eqn are flat (i.e Y=A*X+b*X^2+C*X^3...,). However, they never worked well on fitting new data, but these neural network based fitting eqn work well on new data too. They are very good with unstructured data (i.e identifying cat from a picture), while conventional fitting algorithms were good with only structured data (i.e predicting price of a house based on age, size, location, etc).

Diff kind of NN:

1. Standard NN

2. Convolutional NN

3. Recurrent NN

Deep Learning (DL): NN are called deep when they have a lot of layers. Reason, DL is getting so popular is because they work amazingly well. Reason for them working so well is due to the fact that deep neural network keep improving their prediction accuracy with more and more data, while earlier methodologies saturated and their prediction accuracy didn't improve even if they were loaded with more data.

 DL is very compute intensive since it needs to run thru large number of layers on lots of data.