GDP = Gross Domestic Product. Now you know it !!

We hear this term so much in everyday life, yet I knew nothing about it. So, started reserahing about it to find out what is it, and if it has any relevance at all?

Wages and salaries: If we add up everyone's wages and salaries in a given year that may give us some idea of how much more money are people making every year, compared to previous year. However ieven if a person makes more money this year than last year, it doesn't mean much, unless with his increased paycheck, he could buy more things.

Let's say with $100 in wages last year, a person was able to but 20 kgs of rice (at $5/kg). This year, let's say he made $110, but could still but only 20kg of rice, then he didn't really make any more money, since he could still but the same amount of rice as last year (and nothing more since price of rice this year is $5.50/kg).

Historical GDP for each country can be found here: https://data.oecd.org/gdp/gross-domestic-product-gdp.htm

Total GDP =$80T, USA=$20T, China=$12T, Japan=$5T, Germany=$4T. Next is India, UK, France and Brazil. Just top 15 countries, with 50% of world population account for 75% of world GDP. GDP grows by rate of 3% per year. From $1.4T in 1960, it has grown 50 fold in last 50 years. No matter which part of world you live, you need to spend about $1K/year on your food and basic supplies to survive. So, 8B people would imply a minimum of $8T in GDP. Of course, top 10% of the world spend more than that on a phone every year, so rest of the GDP comes from those rich people. About 2.5B people in world are very poor, do not get enough to eat and are malnourished. 70% of world population lives on less than $10/day. Only 7% or 500M people live on > $50/day. Most of these rich people are in USA, Canada, UK, Australia and western European countries.

Usually countries with high population also have large GDP. Since GDP is closely tied to increasing population (more people, more consumption, more GDP), countries with fast increase in population will have higher GDP growth every year.

USA GDP:

USA GDP data is more reliable than world GDP data. And since we'll be looking at US economy data in more detail, it's better to look at USA GDP. I'll mostly be talking about nominal GFP and not real GDP.

Nominal GDP = GDP as in current US dollars

Real GDP = Nominal GDP - Inflation (For this we consider some particular year as a baseline, and then compare real GDP compared to that year).

US nominal GDP: https://fred.stlouisfed.org/series/NGDPNSAXDCUSQ

 

 

 

 

 

 

NumPy: Numerical python. Very popular python library used for working with arrays. Python has native lists that work as arrays but they are very slow. NumPy is very fast. It has a lot of functions to work with the arrays too. It is the fundamental package for scientific computing with Python. Numpy is used heavily in ML/AI, so we need to have this installed. All exercises in AI use numpy.

Official numpy website with good intro material is: https://numpy.org/doc/stable/

A good tutorial is here: https://www.geeksforgeeks.org/python-numpy

Installation:

CentOS: We install it using pip.

 $ sudo python3.6 -m pip install numpy => installs numpy on any Linux OS. We can also run "sudo python3 -m pip install numpy",

Arrays:

Basics of Array: Number of square brackets [ ... ] in the beginning or end determine the dimension of array. so, [ ... ] is 1 dimensional, while [ [ ... ] ] is 2 diemensional and so on, as you will see below.

1 Dimensional array is a an array which has only 1 index to find out any element. ex: arr_1D = [ 1 2 3 4 5 ] => This is a 1D array with 5 elements. arr_1D[0]=1, arr_1D[1]=2, ...

2 Dimensional array is an array which has 2 indices to find out any element. So, we have 2 square brackets here.

ex: arr_2D = [ [1 2 3 ]  [7 8 9]  [4 5 6]  [2 4 6] ] => Here we see that outer array has 4 elements (similar to 1D array), but now each element of this outer array is itself an array. so, if we try to print each element of this outer array, it will print the array element. ex: arr_2D[0] = [1 2 3], arr_2D[1] = [7 8 9], and so on. Now if we want to print element of each internal array too (i.e the final value stored in array), we have to provide that index too, i.e arr_2D[1][2] = 9 => here arr_2D[1] points to array [7 8 9], and then for this we can report any index. So, if var=arr_2D[1] = [7 8 9], then var[0]=7, var[1]=8. var[2]=9. But here var happens to be arr_2D[1], so arr_2D[1][2] gives 2nd internal array and 3rd entry in this array. So, full array range is arr_2D[0:3][0:2].

Sometimes writing 2D array in other way is more visual. Writing above array in row/col format, we now see that there are 4 rows and 3 cols. So, it's 4X3 matrix array, i.e outermost has 4 elements and each of that contains 3 elements.

[ [ 1 2 3 ] 

  [7 8 9] 

  [4 5 6] 

  [2 4 6] ]

ex: arr_2D = [ [1] [2] [3] ] => Each element is 1D array. So, it's a 3X1 matrix, i.e outermost has 3 elements and each of that contains 1 element. So, shape is 3X1, and dimension is 2.

[ [1]

 [2]

 [3] ]

ex: arr_2D = [ [1 2 3] ] =>Each element is 1D array with 1X3 matrix. So, shape is 1X3, and dimension is 2.

3 Dimensional array is an array which has 2 indices to find out any element.

ex: arr_3D = [ [ [1 2 ] [3 4] ] [ [5 6] [7 8] ]  [ [9 0] [1 4] ] ]. Here outer array has 3 elements, all 3 of which are 2D array. The 2D array is 2X2. So, full array range is arr_3D[0:2][0:1][0:1]. so, it's a 3X2X2 matrix, i.e outermost has 3 elements and each of that contains 2 elements and each of these 2 elements finally contains 2 elements. So start with innermost entries, that determines the final dimension of matrix. Then move outward.

[ [ [1 2 ] [3 4] ]

 [ [5 6] [7 8] ]

 [ [9 0] [1 4] ] ]

Usage of Numpy:

We saw array module in python section to create arrays. However, it's highly preferred to use numpy module to work on arrays, instead of using array module, that's included in python by default.

Import numpy module:

First we need to import numpy module in our python script in order to use it:

ex: import numpy => imports numpy. Now, we can call numpy functions as numpy.array, etc.

NumPy is usually imported under the np alias, so that we can use the short name np instead of longer NumPy

ex: import numpy as np

Creating numpy array:

After importing numpy module, we can use array( ) function in numpy module to create numpy array object. The class of this array object is ndarray (it will be seen as "numpy.ndarray" object in pgm). See in "python: Object Oriented" section on how classes are created.

array() function: Input to array function can be python objects of data type list, tuples, etc. See in Python section for list, tuples, defn. These list, tuples, etc are converted into numpy array object of class "ndarray" by the array() func. Array in Numpy is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. The type of array created is figured out automatically based on the type of input contents (i.e if list has int type, then array created has int type). If we have mixed contents, then type is undefined. We can also explicitly define a type for ndarray object, that we'll see later.

import numpy as np
arr = np.array( [1, 2, 3, 4, 5] ) #here input is a python list, with all integers. An ndarray object is created with all integer elements, i.e arr = [1 2 3 4 5]

arr = np.array((1, 2, 3, 4, 5)) # here tuple is provided as an input to array() function.

Print: can be used to print elements of array
print(arr) => prints array elements [1 2 3 4 5]. "arr" is ndarray object. It has no commas when it's printed. We don't know what form is ndarray object stored internally, but "print" func prints it in this form. This is 1D array. arr[0] = 1, arr[4]=5, and so on

NOTE: In above ex, the input list, tuple etc, has elements which are separated by a comma (as per the syntax of list, tuple, etc), and they get printed the same way with commas. However, the output of array() func is ndarray object, which is printed with no commas. i.e arr = [1 2 3 4 5]. However, [1 2 3 4 5] is not ndarray object (it's just the printed o/p), arr is the ndarray object. If we try to apply any numpy func on [1 .. 5], we'll get an error: i.e. arr=np.array([1 2 3 4 5]) gives syntax error.

We can't create numpy array by just assigning a python "list" to a var.

arr= [ [1,2], [3,4],[5,6] ] => This assigns the "list" to var "arr". Since it's not numpy array (since we didn't use numpy.array() function on this), we wouldn't expect any numpy function/method to work on this list. However, surprisingly it does work for a lot of functions i.e np.squeeze(arr) will work, even though arr is a list (and NOT ndarray object). Not sure why? Maybe, most numpy func automatically convert input arg which is list or tuples into ndarray object, if it's NOT ndarray to start with. Best thing to do is to convert list/tuple into numpy  "ndarray" object using np.array() func, and then work on it. Later, we'll see many other functions to create numpy array (besides the array() func)

Data types (dtype): data types in NumPy are same as those in Python, just a few more. They are rep as int32, float64, etc or we can specify it in short form as single char followed by number of bytes, i.e int8 is rep by "i1", "f4" for 32 bit float, "b" for boolean, "S2" for string with length 2, etc. Instead of S type, we use U (unicode) type string in Python 3. See details for unicode in regular python section.

W don't have a separate type for each element of ndarray object, as ndarray can have elements of only one type. As we saw above, numpy array object inherits the "type" from type of list/tuple. This type becomes the data type of whole array. It's referred to as attribute "dtype" of the array object.

print(arr.dtype) => property "dtype" prints data type of an array. Here it prints "int64", since data is integer rep with 64 bytes

When declaring array using array() func, we may specify dtype explicitly. Then those array contents are converted to that data type and stored (if possible)

ex: arr=np.array(['2', '72', 'a'], dtype='int64') => Here it errors out since 3rd entry 'a' can't be converted to int type. '2' and '72' are OK to be converted even though they are strings. If a' is replaced by 823, then arr would be [2 72 823] , i.e array with int64 elements and NOT string.

arr=np.array(['23', 'cde', 71],dtype='S2') => Here we are creating an array of 3 elements with dtype as string of 2 byte. So, numpy converts 71 (which is without quotes, and so an int) to a string too. However, 'cde' needs 3 bytes, but since we are forcing it to 2 bytes, 'e' is dropped and only 'cd' is stored

print(arr.dtype, arr) => It returns => |S2 [b'23' b'cd' b'71'] => S2 means it's dtype is string with 2 Byte length. b'23 means string "23" is stored as bytes. Here array got printed with these b', which we don't want. To print only the string, we can convert these to utf-8 by using decode method, i.e arr[1].decode("utf-8")) returns "cd" unicode string

ex: arr=np.array(['2', '32', 7],dtype='i4'); print(arr.dtype, arr) => returns => int32 [ 2 32  7] as dtype is int32 and array elements are converted to 4 byte integer, so string '2' and '32' become integer 2 and 32.

arary with multiple data types: ex: np.array( ['as', 2, "me", 4.457]  ) => here all 4 elements of array are of diff data types. By default, dtype here is U=Unicode. This is valid.  Since 4.457 has length=5, so it's type is Unicode with length=5 or U5. So, all elements of this array are U5 irrespective of whether it's string or int. Basically all array elements got converted to unicode (or string in loose sense). Just that operations like arr[2] + arr[3] may not be valid, since not all operations apply on unicode type.

shape: A tuple of integers giving the size of the array along each dimension is known as shape of the array, i.e the shape of an array is the number of elements in each dimension.

print(arr.shape) => returns a tuple with each index having the number of corresponding elements. Here it returns (2,3) meaning array is 2 dimensional, and each dimension has 3 elements, so it's 2X3 array.

Since shape is a tuple, we can access each element of this tuple cia index, i.e shape[0] returns 2 (num of arrays), while shape[1] returns 3 (elements in each array)

Dimension (ndim): This shows dimension of an array as 1D, 2D and so on. In Numpy, number of dimensions of the array is also called rank of the array (i.e 2D array has rank of 2).

print(arr.ndim) => "ndim" attr returns the number of dimension of an array. since arr has 1 dimension, this returns 1

0D array => a_0D = np.array(2) => This is an array with just element of the array, i.e there is only 1 value. so, it's not really an array, but a scalar. It shows ndim=0. I shows blank for shape, i,e a_0D.shape = ( )

1D array => b_1D = np.array( [2] ) => By adding square brackets, we convert 0D array into 1D array. It shows ndim=1, and b_1D.shape = (1, ). We would have expected it to show (1,1) since there's 1 row and 1 col, but for 1D array, number of rows is 0 (since if there were any rows, it would become 2D array. 1D array just has columns. So shape tuple omits rows, and only shows cols for 1D array. This is called a rank 1 array, and because it's neither  row vector nor a col vector (explained below), it's difficult to work with. So, avoid having these 1D arrays, as they won't yield desired results in AI computations. We usually use reshape function (explained later) to transform it to a 2D array as row vector.

ex: b_1D = np.array( [2, 3, 5] ) => This shows shape as (3, ) since this has 3 columns.

2D array => c_2D = np.array( [[1, 2, 3], [4, 5, 6]] ) => This is 2D array with 1st row [1 2 3] and 2nd row [4 5 6]. c_2D.ndim=2, c_2D.shape = (2,3) since there are 2 rows and 3 columns. NOTE: there are comma in between elements and in between arrays.

print( c_2D[0]) => prints 1st element of array c_2D which is "[1 2 3]", c_2D[1]=[4 5 6], c_2D[1,2] = 5

arr_2D = np.array( [[1, 2, 3]] )=> This is 2D array which has only 1st row which is a 1D array with 3 elements. So, arr_2D.ndim=2, arr_2D.shape=(1,3). NOTE: this 2D array has 1 row and 3 columns, unlike 1D array which had no rows and just 3 columns.

row vector: These are 2D array of shape (1,n) i.e they have a single row. ex: [ [ 1 2 3 ] ]

column vector: These are 2D array of shape (m,1) i.e they have a single col. ex: [ [1] [2] [3] ]


3D array => d_3D = np.array( [[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]] ) => 3D array can be seen as each row itself being 2D array. d_3D.ndim=3, d_3D.shape = (2,2,3) since it has 2 outermost entries, then each of these 2 entries has 2 array, and each of these 2 have 3 elements each.

Axis of Numpy array: In numpy, number of dimension of array is called as number of axis of an array, i.e 3D is called an array with 3 axis. 1st axis or axis=0 is the outermost array. Then 2nd axis or axis=1 is the next inner array and so on.

For N dim matrix as (N1, N2, ... Nn) => There are N1 data-points of shape (N2, N3 .. Nn) along axis-0. Applying a function across axis-0 means you are performing computation between these N1 data-points.. Each data-point along axis-0 will have N2 data-points of shape (N3, N4 .. Nn). These N2 data-points would be considered along axis-1. Applying a function across axis-1 means you are performing computation between these N2 data-points. N3 data points would be considered along axis-2. Similarly, it goes on. The dimension of the array is reduced as well, since 1 or more axis are gone.

As an ex: For a 2D array, Let's try computing across the 2 axis. ex: data = numpy.array([[1, 2, 3], [4, 5, 6]]);

1. axis=0: adding across 1st axis or axis=0 means adding across all rows, i.e adding all col (vertically down) for each row.

ex: result = data.sum(axis=0); print(result) => prints [1+4 , 2+5, 3+6] = [5 7 9] => This is a 1D array now instead of 2D array.

2. axis=1: adding across 2nd axis or axis=1 means adding across all cols, i.e adding all row (horizontally across) for each col.

ex: result = data.sum(axis=1) => prints [ [1+2+3] [4+5+6] ] = [6 15]  => this is again a 1D array

More ways to generating a new array: there are many functions in numpy to generate a new array with any given shape, and inititalize it with values.

1. arange: arange function returns an ndarray object containing evenly spaced values within a given range (i.e arange= array range). The array is 1D and it's size is the range of numbers that will fit in that array.

Syntax: numpy.arange(start, stop, step, dtype) => "stop" is required (final element value is n-1), all others are optional. By default start=0, step=1 and dtype is same type as stop, so if stop is float, then type is float too.

x = np.arange(5) => returns [0 1 2 3 4]. Here range is defined as 0 to 4 with step of 1. data type=integer since 5 is integer.

x = np.arange(10,20,2) => returns [10 12 14 16 18]. It's 1D array with 5 elements in it.

2. linspace: Similar to arange. It returns ndarray object with evenly spaced numbers over a specified interval.

Syntax: numpy.linspace(start, stop, num) => start, stop are required. num=50 by default

np.linspace(2.0, 3.0, num=5) => returns array([2.  , 2.25, 2.5 , 2.75, 3.  ]) => Here 5 samples are included b/w 2 and 3 with equally spaced values.

3. zeros/ones: These are 2 other functions that init an array with zeros or ones.

Syntax: numpy.zeros(shape, dtype) => Returns an array containing all 0 with given shape. dtype is optional and is float by default.

x = np.zeros(2) => returns [0. 0.]. 2 implies 1 dim array with shape of (2,)

y = np.ones((3,2), dtype=int) => returns a 2 dim array of shape (3,2), with type of 1 as integer, so it's 1 and NOT 1.0 or 1. (i.e NOT decimal 1, but integer 1)

[[ 1  1]
 [ 1  1]
 [ 1  1]]

4. random: There is a random module in NumPy to generate random data. It has lots of methods which are very useful in AI and ML for generating random dataset.

from numpy import random => this is not really needed generally, but here we need it since numpy has it's own random module (while python has it's inbuilt random module), and we want to use numpy's random module. When we import numpy, we import all it's modules and methods, including random module. So, "from numpy import random" is not needed. But then we have to use np.random everywhere, to indicate that we are using numpy random module. If we just call "random", then we'll be calling python's inbuilt random module. So, we add this line "from numpy import random" to use numpy random directly. since using random is more convenient (instead of np.random).

Seed: All random numbers generated for a given seed. Seed provides i/p to pseudo random num generator to generate random numbers corresponding to that seed. Different seeds cause numpy to genrate diff set of random numbers.

np.random.seed(1) => this will generate pseudo random numbers for all random functions using seed=1. We could use any integer number as seed. We don't really need to provide this seed at all, since by default, numpy chooses a random seed and generates random num corresponding to that seed. But then our seq of random numbers generated will be diff for each run of pgm, which will be difficult to debug or reproduce. so, we usually assign a seed, when coding our program the 1st time. Once we have debugged the pgm with couple of seeds, we can get rid of this seed function.

randint():

ex: x = random.randint(100) => randint method says to generate integer random number, and arg=100 says the range is from 0 to 100-1 (i.e 0 to 99). Note, we could have written np.random.randint(100) too, but we don't need that since "from numpy import random" imports random into current workspace.

To generate 1D or 2D random numbers, we can specify size.

ex: random.randint(50, size=(3,5)) => generates a 2D array of size=3X5, with each element being a random int from 0 to 49

rand():

ex: random.rand(3) => just "rand()" method returns random float b/w 0 to 1. Number inside it reps the size of array, i.e 3 means it's a 1D array of size 3. i.e random.rand(size=(3)), however we don't write it that way (size=3) with rand method, we directly specify the size, as rand method is different than randint

ex: x = random.rand(3, 5) => returns 2D array with matrix=3X5.

[[0.14252791 0.44691071 0.59274288 0.73873487 0.22082345]

[0.00484242 0.36294206 0.88507594 0.56948479 0.15075563]

[0.69195833 0.75111379 0.92780785 0.57986471 0.6203633 ]]

randn(): returns samples from standard normal distribution. Std normal dist is gaussian distribution with mean mu, and spread sigma. So here instead of having equal probability for different numbers, it has probability distribution that is higher for numbers closer to mean, and the probability keeps on falling down as you get away from mean. 99% of the values lie within 3 sigma of mean. We provide shape of array as i/p.

ex: x=random.randn(3,4,5) => returns 3D array of shape=(3,4,5)with random float in it which have mean=0, sigma=1.

To get values corresponding to other mean and sigma, just multiply the terms appr:

ex; Two-by-four array of samples from N(3, 6.25): Here mean=3, sigma=√6.25 = 2.5

3 + 2.5 * np.random.randn(2, 4) => 67% of numbers will be b/w 3-2.5 to 3+2.5, i.e in b/w 0.5 to 5.5
array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],   # random
       [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]])  # random

ex: np.random.randn() => returns single random float "2.192...".  Here only float is returned since no array shape specified. 


Operations on Array:

array slicing => array[start: end-1: step] => If omitted, start=0, step=1, end=last index of array. For nD array, we can slice each index of the array.

IMP: When we provide the last index of array, it's last_index-1. So, arr[2:4] will have arr[2], arr[3], but not arr[4], as range is from 2:(4-1). This is the same behaviour that we saw with lists/tuples/arrays in python. One other thing to note is that complex slicing is allowed on numpy multi dimensional arrays which were not possible on lists/tuples/arrays in python. This is where numpy turns out to be much more powerful in terms of operations being done on arrays. Also, in numpy, we access elements of array via arr[2,3,0], while the same element accessed in a list/tuple/array via arr[2][3][0] (i.e commas are needed in numpy. However, python list syntax works for numpy too, i.e arr[2][3][0] is equally valid in numpy, but we don't access numpy arrays that way)

np_arr = [[300, 200,100,700,212], [600, 500, 400,900,516], [21, 23,45,67,45]]

ex: np_arr[1] = returns entry of index=0, which is itself an array, so returns all of that array => [300 200 100 700 212]

ex: np_arr[1,2] => returns 400 (since it's index=1 for outer array and index=2 for inner array. So, for multi dim array, we specify indices separated by commas.  np_arr[1][2] also works, though as explained above, that's not the right way.

ex: np_arr[0:2:1] = Here we provided the outermost array index (since there are no commas for inner indices). The range is from 0 to 1 with increment of 1. So, this returns as below:

[[300 200 100 700 212]
 [600 500 400 900 516]]

ex: arr_3D[1,0,1:2] = [ [ [8 9] ] ] => since it's 3D array, it reports the final slice of the array as 3D array. Here we take index=1 for axis=0 which is [[7 8 9] 010 11 12]], then we take index=0 for axis=1, which is [7 8 9] , then we take slice 1:2 of this final one which [8 9]

x=np_arr[0:2,3:1:-1]  => Here we provide index range for both dimension of array. axis=0 goes from 0 to 1 (since range 0:2 implies 0 and 1), while axis=1 goes from index 3 to 2 in reverse direction (if we do 1:3:-1, this would return empty array, since 1:3 index can never be achieved by going in reverse dir. This is important to remember). NOTE: array entries are now reversed, i.e the array x gets assigned the values as [700 100] instead of [100 700] as in original array. Array "x" still remains a 2D array.

x =

[[700 100]
 [900 400]]

ex: [1 2 3 4 5]; arr[: : 2] = [1 3 5] => prints every other element (since start and end are not specified, start=0 and end=length of array.

ex: arr = np.array([[[1, 2], [3, 4]], [[5, 6],[2,3]]])

ex: print(arr[:]) => prints all elements of array since no start/end specified. All 3D elements printed. Same as what would have been printed with print(arr)

ex: print(arr[0,:]) => This prints all elements of index=0 for axis=0. The same o/p is printed with arr[0][:] (i.e list/tuple format in python) prints [[1 2]   [3 4]]. NOTE: it's 2D array now.

print(arr[:,0]) => prints [[1 2] [5 6]]. This says that for axis=0, slice everything since no range specified, so the whole array is returned. Then 0 says that for axis=1 return index=0. Array for axis=1 is [ [1 2] [3 4] ] and [ [5 6] [2 3] ]. index=0 is [1 2] from 1st one and [5 6] from 2nd one.

reshape: Reshaping means changing the shape of an array. reshape(m,n) changes an array into m arrays with n elements each (i.e turns the array into 2D array), provided it's possible to do that, else it returns an error. similarly reshape(p,q,r) changes an array into 3D array with p arrays that each contains q arrays, each with r elements. reshape(1)

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(4, 3) => this changes the above 1D array into 2D array with 4 arrays and each having 3 elements. So, newarr.ndim=2, newarr.shape=(4,3) since it has 4 arrays with 3 elements in each.

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


ex: n_arr = arr.reshape((1, arr.shape[0])) => Since arr is 1D array, shape=(12, ) i.e 12 followed by blank. To convert it into 2D array, we use reshape method as shown. since shape[0] returns 12, this becomes newarr=arr.reshape(1,12). This becomes 2D array with 1 row and 12 elements in each. So arr=[1 2 .. 12] while n_arr = [ [ 1 2 ... 12] ]. NOTE: 2 square brackets in n_arr, as compared to single brackets in arr (since it's a 2D array now). n_arr.ndim=2, n_arr.shape=(1,12)

assert (a.shape == (1,12)) => This asserts or checks for the condition that shape of array a is (1,12). This is helpful to figure out bugs in code, since if the shape is not as expected, this will throw an error.

newarr.reshape(12) => When only 1 integer provided, then result is 1D array of that length. So, this returns [1 2 ... 12]. We can also provide -1 as the length of array to get same result.

newarr.reshape(-1) => flattens the array, i.e converts any array into 1D array. So, this returns [1 2 ... 12]. However, if we provide other integers for new shape along with -1 as last integer, then array is converted into required shape, with other values inferred.


newarr.reshape(d_3D.shape[0],-1) => Here 1st value is 2 (from above example). So tuple is (2,-1) meaning it's 2X6 (since 6 is inferred automatically. -1 implies flatten other dimension, so 6 is the only other value). result is: 

[[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]]

Other way to flatten an array is by using func ravel() or method ravel. It's same as reshape(-1).

ex: np.ravel(newarr) => converts newarr array into flattened 1D array. We could also apply method ravel on newarr as newarr.ravel()

We usually want a 2D array, with one row, instead of 1D array with 1 row. It's easier to work with 2D array. NOTE: They are kind of same except that there are 2 square brackets in 2D array, while only 1 square bracket in 1D.

new_arr = arr.shape(arr.shape[1]*arr.shape[2]*arr.shape[3], arr.shape[0]) => Here we convert an array of shape (m,n,p,q) into array of shape (n*p*q, m) i.e we convert 4D array into 2D array with outer m array not flattened, but everything inside it flattened.

squeeze: this func removes one-dimensional entry from the shape of the given array. This is used in opposite scenarios where 2D array is converted to 1D array. Axis to be squeezed should be of length=1. By default, axis0 is squeezed. We can specify axis to be squeezed

ex: y=np.squeeze(x) => x is 3D array with shape (1,3,3) while y now becomes 2D array with (3,3), i.e axis0 is squeezed

X = 
[[[0 1 2] [3 4 5] [6 7 8]]] Y = [[0 1 2] [3 4 5] [6 7 8]] The shapes of X and Y array: (1, 3, 3) (3, 3)
r_: This is a simple way to build up arrays quickly. Translates slice objects to concatenation along the first axis. dd
ex:
np.r_[np.array([1,2,3]), 0, 0, np.array([4,5,6])] => returns array([1, 2, 3, 0, 0, 4, 5, 6]) => This ex concatenates 1D array then 0, 0, then another 1D array. It concatenates along axis=0, still returns 1D array.
ex:
np.r_['1,2,0', [1,2,3], [4,5,6]] => the numbers within '...' before the array specifies how to concatenate. Here number 1 specifies concat along axis=1 (2nd axis) array([[1, 4], [2, 5], [3, 6]])
 
c_: Translates slice objects to concatenation along the second axis.
np.c_[np.array([[1,2,3]]), 0, 0, np.array([[4,5,6]])] => returns array([[1, 2, 3, 0, 0, 4, 5, 6]]) => This ex concatenates 2D array along axis=1 (2nd axis).


Matrix Operations:

matrix transpose: This is one of the useful functions to find transpose of a matrix. Transpose of 2D array is easy to see, it's rows and columns are swapped, so rows become columns and columns becomes rows (i.e 3X4 matrix becomes 4X3 matrix, w/o any change to any of the contents). You can transpose any n Dim matrix too, and specify how to transpose it. By default for n Dim marux, the order is rveresed, i.e 2X3X4 matrix becomes 4X3X2 matrix.

ex: np.transpose(newarr) => changes newarr from 4X3 to 3X4 array

Instead of using function, we can also use method to transpose.

ex: y=newarr.T => Here we are applying "T" (T is the name for transpose) method to newarr object. Result is same as transpose function above.

matrix dot operation: To find dot product of 2 matrix, we use dot function. NOTE: dot operation is different than multiplication operation. Mult just multiplies each element of 1 array with that of other array, while dot operation is the mult/add of differnt elements of array. You can find more details of dot operation on matrix in "high school maths" section. For 2-D vectors, it is the equivalent to matrix multiplication. For 1-D arrays, it is the inner product of the vectors. For N-dimensional arrays, it is a sum product over the last axis of a and the second-last axis of b. The dimensions of two matrix being dot has to compatible for matrix dot operation, else we'll get an error. Instead of using dot function, we can write a for loop and iterate over each element of 2 array and sum them appr. However, this for loop takes a long time to run, as it can't use parallel instructions such as SIMD (single inst multiple data). Dot function in python uses these SIMD inst or GPU (if available), which significantly speeds up the multiplication/addition part.  Using dot operation is called vectorization, and in AI related courses, we'll always hear this term, where we'll always be asked to vectorize our code (meaning put it an array form and then use dot functions to do multiplication)

a = np.array([[1,2],[3,4]]) => 2D array of 2x2
b = np.array([[11,12],[13,14]]) => 2D array of 2x2
np.dot(a,b)
This produces below 2D array of 2x2 which is calculated as follows =>
[[1*11+2*13, 1*12+2*14],[3*11+4*13, 3*12+4*14]]
[[37  40] 
 [85  92]] 

matrix add/sub/mult/div operations: All other matrix operations as add, divide, multiply, abs, log, etc can be done by using specific matrix functions similar to matrix mult shown above, instead of using for loop.

ex: c=np.add(a,b) => adds 2 matrix a and b. Each element of matrix a is added to corresponding element of matrx b. Similarly for np.subtract(a,b)

ex: c=np.divide(a,b) => divides 2 matrix a and b. Each element of matrix a is divided by corresponding element of matrx b. Similarly for np.multiply(a,b)

Other misc operations: Many other operations defined working on single array.

log: log: ex: c=np.log(a) => computing log of each element of array "a"

abs: ex: c=np.abs(a) => computing absolute value of each element of array "a"

sum: There is other operator "sum" (NOT add) which adds the each row or column of an array to return 1D array.

ex: A = [ [300, 200,100], [600, 500, 400] ]
C=np.sum(A,axis=0) => adds each col (since axis=0) and returns 1D array with shape=(3,). result=[900 700 500]

C=np.sum(A,axis=1) => adds each row (since axis=1) and returns 1D array with shape=(2,). result=[600 1500]

C=np.sum(A) => adds all rows and cols (since no axis specified, it adds across all axis) and returns a scalar 2100.

Broadcasting: Array broadcasting is a concept in Python, where we can perform matrix operations, even when the matrix are not entirely compatible. Python expands the required rows or columns by duplicating them. Certain rules apply as follows:

Rule 1. matrix of dim=mXn operated with matrix of dim=1Xn (1 row only) or with matrix of dim=mX1 (1 col only) => operations are +, -, *, /. The matrix 1Xn or mX1 are converted into matrix mXn by duplicating rows or col, and then operation is performed.

ex: A = [ [200, 100] , [300, 400] ] , B = [ [1, 2] ] => Here A is 2X2 matrix, while B is 1X2 matrix.

C= np.sum(A,B) => Here, B is broadcast to 2X2 matrix, by duplicating 1st row. so, result is C = [[201 102]  [301 402]]

Rule 2: matrix of dim 1Xn or of dim mX1 => We can do operations of +, -, *, / on these matrix with a real number. The real number will be converted into 1Xn or mX1 matrix and then operation performed.

ex: B = [ [1, 2, 3] ] => This is 1X3 matrix. If we add real number 2 to this matrix, then it's converted to [ [ 2, 2, 2 ] ] and then addition performed.

C=np.add(B,2) => [ [1, 2, 3] ]  + 2 = [ [3 4 5] ]

Other operations on array: iteration over elements of array, join, split, search, sort, etc are miscellaneous functions provided to work on arrays.

 

HDF5 => HDF5 file stands for Hierarchical Data Format 5. It's also called h5 in short. The h5py package is a Pythonic interface to this HDF5 binary data format.

It is an open-source file which comes in handy to store large amount of data. As the name suggests, it stores data in a hierarchical structure within a single file. So if we want to quickly access a particular part of the file rather than the whole file, we can easily do that using HDF5. This functionality is not seen in normal text files.

HDF5 files are the ones used in AI projects, since they can be store TB of data, and can easily be sliced as if they were NumPy arrays.

We'll need to install HDF5 module in Python. To use HDF5, numpy also needs to be imported. Look in numpy section for it's installation.

Installation:

CentOS: We install it using pip.

sudo python3.6 -m pip install h5py => installs HDF5 for python 3.6

HDF5 Format:

Very good tutorial on HDF5 is on this link: https://twiki.cern.ch/twiki/pub/Sandbox/JaredDavidLittleSandbox/PythonandHDF5.pdf

or from local link HDF5

HDF5 includes only two basic structures: a multidimensional array of record structures, and a grouping structure. H5py uses straightforward NumPy array and python dictionary syntax. For example, you can iterate over datasets in HDF5 file, or check out the .shape or .dtype attributes of datasets.

HDF5 files are organized in a hierarchical structure, with two primary structures: groups and datasets.

  • HDF5 group: a grouping structure containing instances of zero or more groups or datasets, together with supporting metadata. A group has two parts:
    • A group header, which contains a group name and a list of group attributes.
    • A group symbol table, which is a list of the HDF5 objects that belong to the group.
  • HDF5 dataset: a multidimensional array of data elements, together with supporting metadata. A dataset is stored in a file in two parts: a header and a data array.
    • dataset header header contains information that is needed to interpret the array portion of the dataset, as well as metadata (or pointers to metadata) that describes or annotates the dataset. Header information includes the name of the object, its dimensionality, its number-type, information about how the data itself is stored on disk, and other information used by the library to speed up access to the dataset or maintain the file's integrity.
    • data array: Data array is where actual data is stored.
Ex of HDF5 file: trefer1.h5

HDF5 "trefer1.h5" { GROUP "/" { DATASET "Dataset3" { DATATYPE { H5T_REFERENCE } DATASPACE { SIMPLE ( 4 ) / ( 4 ) } DATA { DATASET 0:1696, DATASET 0:2152, GROUP 0:1320, DATATYPE 0:2268 } } GROUP "Group1" { DATASET "Dataset1" { DATATYPE { H5T_STD_U32LE } DATASPACE { SIMPLE ( 4 ) / ( 4 ) } DATA { 0, 3, 6, 9 } } DATASET "Dataset2" { DATATYPE { H5T_STD_U8LE } DATASPACE { SIMPLE ( 4 ) / ( 4 ) } DATA { 0, 0, 0, 0 } } DATATYPE "Datatype1" { H5T_STD_I32BE "a"; H5T_STD_I32BE "b"; H5T_IEEE_F32BE "c"; } } } }

Usage:

Most of the times when doing an AI project, we waon't be doing anything more than reading or writing HDF5. Let's look at these 2 operations.

ex: reading an h5 file

import numpy as np

import h5py
test_dataset = h5py.File('dir1/test.h5', "r") #opens the file in read mode
test_set_x = np.array(test_dataset["test_x"][:]) # get all of array from beginning to end
 
ex: writing an h5 file

 f=h5py.File("testfile.hdf5")

arr=np.ones((5,2))

f["my dataset"]=arr #this stores the 5X2 array into file testfile.hdf5

 

SAT = Scholastic Aptitude Test

SAT is a standardized test used by most colleges in USA for undergraduate admission in any department. If you want to apply to any college in USA, your chances of getting accepted are greatly improved if you have a high score in SAT. However, SAT is just one component. Your GPA in school, and recommendation from your high school teachers carry lot more value than SAT scores, typically for high ranked colleges. More info here on wiki: https://en.wikipedia.org/wiki/SAT

Your kid will be taking the SAT exam in his high school, if he wants to attend a college after his high school. some colleges don't require SAT at all, while most colleges accept either ACT or SAT exam (both are of similar difficulty).

SAT is a 3 hour long exam, and has four sections: Reading, Writing and Language, Math (no calculator), and Math (calculator allowed). The optinal essay writing 5th section is not really required. The total score possible is 1600 (400 from Reading, 400 from Writing and Language, and 800 from Maths)

1. Reading: It has one section with 52 multiple-choice questions and a time limit of 65 minutes. There are 5 passages to read, and then answer 10-11 questions related to the passage. The passages are from 4 different fields, and do not require any prior knowledge, except the ability to read and infer correctly.

2. Writing and Language: It has one section with 44 multiple-choice questions and a time limit of 35 minutes. Not sure how many passages are supposed to be there, but I've seen 4 passages with 11 questions in each. The passages here are similar to ones in Reading section, but they focus more or writing side, i.e suggest corrections, punctuations, improving sentence structure, etc.

3. Maths: Maths portion is divided in 2 sections. It has total 58 questions (45 are multiple choice, while 13 require you to write an answer) and a time limit of 80 minutes.

  • The Math Test – No Calculator section has 20 questions (15 multiple choice and 5 grid-in) and lasts 25 minutes.
  • The Math Test – Calculator section has 38 questions (30 multiple choice and 8 grid-in) and lasts 55 minutes.

There are many sample papers in link below in Resources. One such sample paper is here: SAT_practise.pdf

Percentile performance:

Depending on the score, you will get a percentile score, and that decides how well you performed. The avg score for SAT is 1060 out of 1600.  A score of over 1500 out of 1600 is considered very good, and will place you in top 98% - 99% of the kids who took the test nationally. The wiki link, shows what your percentile scores are for different scores. Just over 2.2M students took SAT from the class of 2019. That means high school students who are graduating in 2019, took the SAT test anytime in 2017, 2018, 2019, etc, but the total number was 2.2M. Most students take SAT test in 11th grade (since 12th grade gets too busy applying for colleges). Since number of high school kids is about 18M as seen in section "USA basic facts", that means there are about 4M kids graduating every year from high school. So, out of these, more than half end up taking SAT exam, and these are the students who are serious about going to college. A significant portion of students who take SAT exam apply for colleges. Even then, note that only 30% of US workforce is comprised of people with 4 year college degree, so even though about 50% of the students take exam, only half of these kids end up completing the 4 year college and get a degree, rest end up dropping out from college.

Resource:

1. college board: SAT is wholly owned by collegeboard.org, which is a non profit. It has a lot of free resource, sample papers to help you practice for the exam.

https://collegereadiness.collegeboard.org/sat/practice/full-length-practice-tests

2. Khan Academy: This is a wonderful resource for SAT, and I don't see a reason as to why you will ever want to get paid services for SAT preparation. In fact, starting 2015, college board has partnered with Khan Academy to provide free SAT preparation. Here's a link to get started:

https://www.khanacademy.org/mission/sat

3. mometrix: A guy named George sent me an email with the link to the website www.mometrix.com, and I really liked the free sample papers available on this website. Thanks George for the contribution !!  I've included the link below. It has free sample papers for all 3 subjects.

https://www.mometrix.com/academy/sat-practice-test/

I'll keep adding more links as I find them .....

 

 

Retirement Plans:

There are a host of retirement plans available in USA to save you taxes. You can save money yourself, and invest in any investment of your choice, instead of going thru the hassle of retirement accounts, but then you don't get any tax break. These specifically designed retirement accounts save you on taxes. Of all the plans out there, 401K retirement plans are the best retirement plans to shove all your money in. Only when you have maxed out your 401K plan, should you consider other plans. Let me give you a brief introduction of various plans available to you. All retirement accounts are per individual, so if you are married, then both you and your spouse can have their own retirement accounts.

For all the retirement accounts, you are only allowed to contribute if you are working (i.e have a W2 income). If you are not working and are having passive income from CD, stocks, etc, you are not allowed to contribute to any retirement account. Also, Retirement accounts do not have a concept of family, they are meant for the individual person, and that person needs to have earned income in order to contribute to his/her retirement account. There is however one exception to this : A family where one of the spouse is not employed is still allowed to have a spousal retirement account. Since one of the spouse doesn't have an income, the other partner is allowed to contribute to the retirement account of spouse, provided the other partner has earned income. The rational behind allowing a spousal retirement account is that the non working spouse is taking care of the family/kids, and so would be left with nothing in retirement just because he/she is not working.

 

various retirement plans

1. 401K plan:


The retirement plan got its name from its section number and paragraph in the Internal Revenue Code -- section 401, paragraph (k). It was enacted into law in 1978. Final regulations were published for it in 1991. This plan is offered by your employer. You can enroll in this plan when you join a job, as most of the employers offer this plan. Once you enroll in the plan, a set amount of money (whatever you indicate on your enrollment form) gets deducted from every paycheck. Your employer usually hires a Trustee to hold the assets of 401K plan. (for ex fidelity may be one such trustee). You usually don't pay any fees to the trustee, since your employer pays for any kind of fees. However, recently I've started seeing some of these trustees deducting plan management fees from their clients accounts (usually $100/year or something like that for basically doing nothing, and your employer forcing you into having an account with that trsutee).  In such cases, complain to your employer and ask them to pay for any such fees, or allow you an option to choose a trustee of your choice.

There used to be only 1 kind of 401K plan until 2006. It used to be called 401K, or traditional 401K. But since 2006, govt allowed other type of 401K plan called as Roth 401K plan. I'll discuss only traditional 401K for now, and discuss Roth 401K later. All info below is for traditional 401K plan unless noted otherwise.

Qualifications:

1. Employed: The most important requirement for this plan is that you have to be employed and be receiving a paycheck large enough to cover your contributions to your 401K plan. For ex, if you want to contribute $1K per paycheck, you should have an income in that paycheck for more than $1K, and do a direct deposit from your paycheck to your 401K account. You can't contribute to 401K any other way (i.e transferring money from your bank account, or sending a check). However, on the positive side, you can change your contribution to 401K account any time. So, if you feel that you want to contribute more in the second half of the year because you contributed too low in the first half of the year, just change the amount of contribution on your trustee's website. If you are without a job for let's say 2 months, you cannot make any contribution to your 401K while you are out of job, but once you get your job back, you can start contributing again.

2. Limit.  There is no income limit for contributing to 401K. i.e you can make millions of dollars and still contribute to the maximum amount allowed. Most of the other retirement plans have an income threshold beyond which you are not allowed to contribute to that plan, but NOT for 401K. That is why 401K is the first choice for a retirement account. However, there's an annual limit on the amount of money you can contribute. It increases every year and is announced by IRS in advance. For 2009, it was $16,500. For 2020, it's $19,500. If you are age 50 or older by year end, then you are eligible for an additional catch up contribution for that year (for year 2020, this additional limit is $6500, so anyone over 50 can contribute a total of $19,500+$6500=$26K for year 2020). This money is contributed tax free (or in other words, it's deductible from your salary to determine taxable salary). That means your salary is deducted by your 401K contribution amount, before any federal/state taxes can be calculated on it. However, you still pay social security and medicare taxes on this amount (which is about 7.5% of your contribution amount). One side advantage of putting money in 401K is that your Adjusted Gross Income (AGI) gets reduced by the amount you contributed to 401K. Your AGI determines your eligibility for many kind of tax credits and tax breaks, so by putting yourself in a lower income range, you may avail of many other tax savings. That's an indirect benefit of contributing to a 401K account.

  • 2023 Limits = $23K  (+ $7K for > 50 yr old).

Benefits:

Employer match: To top it, your employer also matches a certain portion of your contribution to your 401K account. Typically, employers match anywhere from 50% to 100% of your contribution up to a maximum amount. So, let's say if you contribute 10% of your salary to 401K, the employer may match 50% of your contribution for the first 6% => i.e putting extra 3% of your salary in your 401K account. This is absolutely free money. However, it's done for every paycheck, so if you missed getting 3% from your employer for a certain paycheck (because you just joined a job and 401K is not setup, or you lowered your contribution, so that you are not able to maximize your employer's contribution), then you cannot do a catch up for your employer contribution. Many times people contribute too much to their 401K amount in the first few months, and then in the last few months, they can't contribute anything because it goes over the contribution limit. In such a case, you lose your employer's match for those months. In all such cases, that employer contribution is lost for ever for those few months. There is nothing anyone can do retroactively. So, be careful with 401K accounts whenever you change your contribution amounts. Make sure that you can maximize your employer contribution (since it's free money). I've lost this free match for multiple months, whenever I readjusted my contribution limit (since sometimes I get bonus which also get contributed to 401K resulting in my contribution going over the limit. By the time I realize this, it's too late). Just recently, I came to know that there is an option on some of these "retirement trustee websites, where you can choose the option to "maximize employer contribution", as well as "maximize my contribution". That way, it will always readjust to make sure you never lose your employer contribution, no matter how wildly your paycheck varies from month to month. Never choose "fixed % contribution" as 6%, 10%, 15% etc as that will land you in NOT getting "maximum employer match". 

Just as you as an employee have a contribution limit, your employer also has a contribution limit. It varies from year to year. For 2020, the maximum contribution limit to 401K from all sources in $57K (or $63.5K if age 50 or above). Since your contribution limit is $19.5K, that means your employer can match only up to $57K - $19.5K = $37.5K. I don't know of any employer matching more than 6%, so in order to exceed that threshold, you will need to have a base salary of more than $300K (at $300K, you contribute 6% and your employer contributes 6% for a total of $36K). So, for most of the employed out there, you will never exceed this $57K threshold, so don't even bother about it.

Tax free Growth:

Your contribution as well as your employer's contribution goes into your 401K account, and the money grows tax free till it is withdrawn. Your withdrawals are taxed at ordinary income tax rate at the time of withdrawal, provided you have reached 59 1/2 years of age. If you withdraw any of this money before reaching 59 1/2 yrs, you have to pay an additional 10% penalty.

Changing employers:

I've gotten bitten by it, so I should share this. When you change employer (due to job change), you can either leave your 401K retirement account at old trustee, or you can move it to new trustee (wwhover is the new trustee with your new employer), or you can just open a new Roll over IRA account at any brokerage firm, and transfer your 401K portfolio to the brokerage firm. I would advise to move your 401K account to Roll Over IRA account at any brokerage firm of your choice. There are multiple advantages to this as noted below:

  • Almost always, your company trustee charges you a yearly maintenance fee for keeping your account. In many cases your emplyer pays that fee and you don't see it (in some cases, you have to pay that fee as your employer may not be paying that fee). But when you leave your current employer, the trustee will start charging you that fee, and may even raise that fee. You are at the mercy of that trustee. There is no reason to pay any fees to these trustees.
  • When you move your 401K account to a brokerage firm, they will usually give you a handsome bonus for bringing the 401K to them. They also don't charge any fee of any kind. So, you make money 2 ways - first by getting the bonus, and secondly by not paying any fees.
  • You can keep changing brokerage every few years to take advantage of bonus amounts that they give. This can easily net you about $1K per year in bonuses, just moving your roll over IRA accounts.
  • With discount brokerage firms, you have the option to invest in any stock or fund you want. With your company's trustee, you may have very limited choices, and they may even charge you higher expense ratio for these funds. With discount brokerage, you pay no transaction fee, and expense ratio for ETF are as low as 0.02%.
     

2. IRA (Individual retirement account):


These are called individual accounts because any individual can open one and contribute to it. (these are outside of any corporate retirement plan). These accounts are owned by you, and you decide where to open these accounts or move them to other bank/brokerage on your will at any time. These accounts can be opened on top of 401K accounts discussed above. You are free to invest the money in any way you like in these IRA accounts. You can setup this IRA account at any brokerage firm (i.e Schwab, etc) or you can open a savings IRA account or a CD at any bank. You should always open an IRA account at a brokerage firm, as they charge you no fees of any kind, and will generally give you a bonus for opening or transferring your IRA account to them. Then you can buy stocks in these IRA accounts. You can open any number of IRA accounts, but the total amount of money that you can put in all of them combined is limited as per IRS limits for that year. For year 2009, the limit was $5K (+$1K additional catch up for age 50, for a total of $6K). For year 2020, the limit was $6K (+$1K additional catch up for age 50, for a total of $7K). You can contribute to the maximum amount, but you may not be able to deduct all of it. You may be able to deduct all or part of it in your tax returns depending on your participation in 401K and your income.

If you are not contributing to a workplace 401K account, then you can deduct (in your tax return) up to the contribution limit irrespective of your income. However if you or your spouse is participating in a workplace 401K account, then there is an income limit. If you are above the income limit, then neither you nor your spouse can deduct the full contribution amount from federal tax returns. The limit is based on MAGI (modified adjusted gross income). MAGI is adjusted gross income with some deductions and exclusions added back in. For a married couple filing income tax jointly in 2020, the MAGI limit was $104K. That means if your joint tax return had a MAGI of < $104K, then both you and your spouse can contribute and deduct up to $6K (or $7K if over age 50). Then two of you combined can contribute $12K (or $14K if over age 50). Assuming you are in 25% tax bracket, you can save about $3K on taxes. However, the amount you save in taxes is dependent on type of IRA plan. If your MAGI is in between $104K to $124K, then you can deduct your contribution only partially, while if your combined MAGI is > $124K, then you can deduct nothing. These limits apply only if both spouses are working and are participating in 401K plan.

However, if one of the spouse is not working, then higher MAGI limits are allowed. For 2020, this MAGI limit was $196K. That means if your joint tax return had a MAGI of < $196K, then both you and your spouse can contribute and deduct up to $6K (or $7K if over age 50). Then two of you combined can contribute $12K (or $14K if over age 50). If your MAGI is in between $196K to $206K, then you can deduct your contribution only partially, while if your combined MAGI is > $206K, then you can deduct nothing.

There are 11 flavors of IRA plan. However, we'll discuss 2 most common ones.

A. Traditional IRA: These are Traditional and as such you don't pay federal/state taxes right now on this contribution, but pay them at time of withdrawal. The withdrawals are taxed as ordinary income. Besides, there'a 10% penalty if withdrawn before age 59 1/2 yrs. You can setup tax deductible or non-tax deductible traditional IRA. Non-deductible traditional IRA are pretty much useless, so we'll only talk about tax deductible traditional IRA.

B. Roth IRA: Under this IRA, any contributions you make are not tax deductible. You contribute money to this plan after taxes have already been deducted. However, you can withdraw the money tax free after age 59 1/2. So, it's kind of opposite of traditional IRA. In traditional IRA, you weren't taxed upfront, but were taxed on withdrawal. For Roth IRA, you are taxed upfront, but are not taxed on withdrawals. For year 2010, income limits were around 120K for singles, and $180K for married couples to be eligible to contribute to a Roth IRA.

NOTE: you can open any number of the above IRA accounts, but the total amount you can contribute and deduct is limited to the contribution limit for that year. Many people open both Traditional and Roth IRA and contribute to both of them, but total contribution can't exceed $6K for 2020 across all these IRA accounts combined. You may contribute more, The deduction you can take for contributing to these accounts is based on your MAGI. If your MAGI exceeds a certain threshold, then there's no point contributing to these IRA accounts, as you won't get any tax break. In such a case, you should just invest any savings by putting it in your personal brokerage account. The deduction limit goes up every year, although much more slowly than the limit for 401K account. Married people can have slightly higher tax deductible contribution. The limits go up a little bit every year depending on inflation.

 


 

Should you invest in retirement plan and if so, which retirement account?

401K: The top choice for retirement plan is 401K plan. You should contribute to a 401k plan up to the max limit allowed. Why is that beneficial? Because you don't pay any taxes on those amounts now, you pay taxes when you withdraw money in retirement. This equates to 0% long term tax rate on those earnings. If you were to take out that money and invest it in non-retirement account, you would get hit with 15% long term tax. Let' see how:

Let's consider an example where you have $10,000 to invest either in 401K or pay taxes and take it out, and invest it in personal non retirement account. Remember, that in retirement account, all of your money including your capital gain is taxed at your personal income tax rate (no long term capital gain tax rate allowed), while the money kept in your personal non retirement account is subject to lower long term capital gain tax rate (if held for > 1 yr). So, that helps, but not enough in most cases to offset the retirement account benefit.

CASE A: Assume you are in 25% tax bracket now, and in retirement too. There is 15% long term capital gain tax rate in effect.

scenario 1: Money invested in 401K in XYZ fund, and then taxed when withdrawn (assume no penalty as it's withdrawn after 60)

money to invest after taxes (tax 0%) money growth in XYZ fund (8%/yr) money after 30 yrs tax on withdrawing (25%) net money after 30 yrs
$10,000 $10,000 $90,000 $100,000 $25,000 $75,000

scenario 2: Money taken out and invested yourself in XYZ fund. You have to pay taxes before you can invest that money, as there's no deduction. So, money invested is lower at the beginning because of Uncle Sam's tax portion.

money to invest after taxes (tax 25%) money growth in XYZ fund (8%/yr) money after 30 yrs tax on withdrawing (15%) net money after 30 yrs
$10,000 $7,500 $67,500 $75,000 $67,500*0.15=$10,125 $65,000

As you can see in tables above, scenario 1 results in 15% more money than scenario2, as scenario1 resulted in effective 0% long term capital gain tax rate, while scenario 2 resulted in 15% long term capital gain tax rate.

 

CASE B: If we assume that we go in higher tax rate bracket in retirement, then the difference gets smaller. If the long term capital gain tax rate gets smaller in future (<15%), then our scenario 2 will give even better return.

scenario 1: Money invested in 401K in XYZ fund, and then taxed when withdrawn (assume no penalty as it's withdrawn after 60). Here we assume that we go in 35% tax bracket in retirement.

money to invest after taxes (tax 0%) money growth in XYZ fund (8%/yr) money after 30 yrs tax on withdrawing (35%) net money after 30 yrs
$10,000 $10,000 $90,000 $100,000 $35,000 $65,000

 scenario 2: Money taken out and invested yourself in XYZ fund. You have to pay taxes before you can invest that money, as there's no deduction. Here we assume that our Long term capital gain tax rate is reduced to 10%.

money to invest after taxes (tax 25%) money growth in XYZ fund (8%/yr) money after 30 yrs tax on withdrawing (10%) net money after 30 yrs
$10,000 $7,500 $67,500 $75,000 $67,500*0.1=$6,750 $68,250

Here, we see that scenario 2 gives us more return. The reason that happens is because the 0% tax rate became an effective long term tax rate of 10% since we moved into higher tax bracket.

Conclusion for 401K:

Let's assume that long term capital gain tax rate will be 15% for people in bottom 99%. It's hard to see how it can go much higher than this, when it's stayed at this level from 2000-2020. Also, maximum tax rate has been < 40% for people in bottom 99%. So, assuming worst case scenario where you move from 25% tax bracket in present to 40% tax bracket in future, and long term capital gain tax rate of 15%, both scenario1 and scenario2 will give you almost the same return. So, the chances are very high that you will end up getting more return by putting that money in some form of retirement account than in your non retirement account. Some may argue that in scenario 2, since you are in higher tax bracket, you will pay higher tax on withdrawal. However remember that all your capital gains are considered long term gains (if you hold your stocks for more than a year) and are taxed at a maximum of 15% (for year 2003-2020). Also all your dividends are considered qualified dividends (if you held the stocks for more than 2 months) and are taxed at a maximum of 15% (for year 2003-2020). The Congress may go back to a maximum of 20% tax rate for long term capital gains and qualified dividends (as has happened since 2019), but it will always be for top 1% of people. Also, you will always have the opportunity to sell them anytime, if they do announce that tax rates are going to go higher. Also you may sell them in a year when you have lower income, so that you will end up paying a much lower tax rate than 15%. People do sell their long term holdings in pieces every year to minimize paying long term capital gains => by spreading out the long term income, you may even pay 0% taxes on these long term profits.

 


 

IRA: After you have exhausted your 401K limits, that is when you should look into contributing to your IRA accounts, assuming you have money left for investment and you are still eligible. Now, which account to invest in: Traditional or ROTH? Let's do some analysis:

Let's take the same example as above.

CASE A: Assume you are in 25% tax bracket now, and in retirement too. There is 15% long term capital gain tax rate in effect.

scenario 1: Money invested in Traditional IRA in XYZ fund, and then taxed when withdrawn (assume no penalty as it's withdrawn after 60)

money to invest after taxes (tax 0%) money growth in XYZ fund (8%/yr) money after 30 yrs tax on withdrawing (25%) net money after 30 yrs
$10,000 $10,000 $90,000 $100,000 $25,000 $75,000

scenario 2: Money invested in Roth IRA in XYZ fund. It's taxed now, but not taxed when withdrawn .

money to invest after taxes (tax 25%) money growth in XYZ fund (8%/yr) money after 30 yrs tax on withdrawing (0%) net money after 30 yrs
$10,000 $7,500 $67,500 $75,000 $0 $75,000

scenario 3: Money taken out and invested yourself in XYZ fund. You have to pay taxes before you can invest that money, as there's no deduction.

money to invest after taxes (tax 25%) money growth in XYZ fund (8%/yr) money after 30 yrs tax on withdrawing (15%) net money after 30 yrs
$10,000 $7,500 $67,500 $75,000 $67,500*0.15=$10,125 $65,000

As you can see in tables above, both scenario 1 and scenario 2 result in exactly the same return. If we believe that we are going to be in higher tax bracket range in old age, then scenario 2 would give better return, while if we believe that we are going to be in lower tax bracket range in old age, then scenario 1 would give better return. Scenario 3 is still worse than other 2 scenarios.

Conclusion for IRA: Let's assume that in 30 years from now, our tax bracket would not go higher than what we are in now. Then both Traditional and Roth give you same return. Personally, I would go with Traditional IRA for following reasons:

  • You save money in taxes now (real savings). These are your prime money making years, so most likely you will will save lot more in taxes now. There may be some extra tax rebates that you may qualify for by reducing your AGI. In retirement, your income may fall off the cliff, and you may move into lower tax bracket anyway.
  • In retirement you decide how much money you want to withdraw from retirement accounts, and hence what tax bracket you want to put yourself in. So, you can control how much tax you want to pay for Traditional IRA/401K withdrawal. We have this option to play in retirement when living off Traditional/401K, so that our taxes are minimized.

If you are really lured towards scenario2, I would ask you to look into scenario 3 => invest the money in non retirement account and pay 15% long term capital gain tax rate on withdrawal. That way you can withdraw the money any time you want, or even better, not withdraw the money at all, and pass it on to your kids, so that they don't become future slaves. That way you pay 0% long term tax rate (There is inheritance tax on passing wealth to kids, but it's applied only when the wealth is above $5M threshold)

 


 

Where should you invest your money?

Investment: Now, comes the fun part of investing your money. Your employer generally offers 10 to 20 different investment options through the trustee, and you can choose a mix of these in the 401K account that you have with the trustee. You can always change the mix of investments, and usually there is no charge for doing that. This can be done online with the account you have with the trustee. Different kind of investment options being offered are:

  • A. Stock mutual funds
  • B. Bond mutual funds
  • C. Stable value accounts
  • D. Money market accounts.

Note that you can't keep cash in 401K account at any time, as any money you put in 401K is supposed to be invested at all the times (or at least that's what my employer's trustee told me, and there's no option for cash parking). Money-market accounts and stable value accounts usually consist of certificates of deposit and U.S. Treasury securities. They are very secure, but offer very small return. From 2009-2022, the returns on these have been close to 0%. The reason is the low interest rate (close to 0% interest rate). Even if these treasuries or CDs offer little return, these funds themselves have a very high expense ratio (usually more than 0.5%), so after deducting this expense, the net return to customer is still 0 (fortunately, these accounts don't give you a negative return yet as of 2022). So, any money you put in these is money lost.

Bond mutual funds are pooled amounts of money invested in bonds. With interest rates going lower and lower every year, you are guaranteed to lose money in bonds. So, you should not invest a dime of your money in bonds.

So, the only option left is "stock mutual funds". Stock mutual funds are portfolios of company stocks. They are considered risky. But they are the least risky of all investments, because they are guaranteed by FED, the government entity that prints money. So, stock market can never lose money, thanks to FED. For stock mutual fund, just invest in something similar to S&P500 index fund, or something with very low expense ratio (0.1% or less). For more information about these, read in our stocks/bonds section. 

 

Is it bad to have too much money in retirement account?

You do not want to end up with too much money in retirement account. Why? Because, there is a clause in IRS that mandates RMD (required Minimum distribution). It forces you to withdraw 4% of your retirement account money every year. So, you end up paying tax on that money. Also, retirement accounts can't be passed on to your kids w/o incurring tax hit (regular income tax rate being applied to all of the money in your retirement account since this tax was never paid in first place). So, no matter how cleverly you plan, you will have to pay 25% or more in taxes on all of your money in your retirement accounts.

Let's say you contribute $20K/year in your 401K plan and $5K/year in your IRA plan. Then you will have contributed about $25K*30=$750K to your retirement accounts in 30 years. Assuming 8% return per year, you will end up with about $4M in your retirement. If your spouse also contributed the same way, then you will have a combined retirement wealth of $8M. You will end up paying about $2M-$3M just in taxes.

Paying so much in taxes is not all gloom and doom. It's a problem I'll love to have !!!

According to an Employee Benefits Research Institute analysis of Federal Reserve data, as of 2024, about 2% of US households have $2M in retirement a/c and 1% have > $3M. This is insane money in retirement a/c and has been made possible entirely due to gangbuster returns of stock market from 2009-2024. You do want to be in the top 1% of the US households by putting in max money in your retirement a/c, all invested in stock Index fund. There's nothing more to do, you will automatically end up in top 1%. Link => https://www.msn.com/en-us/money/retirement/what-percentage-of-retirees-have-2-5-million/ar-AA1BEqQb

UPDATE Q2 2025: With stock markets at new insane highs ($65T USA stock market cap), number of 401K millionaires crossed 0.5M just for fidelity a/c, which covers 25M 401K a/c and 8M IRA a/c.  Link => https://finance.yahoo.com/news/robust-returns-and-steady-saving-yield-record-number-of-401k-millionaires-211842500.html

 

Final Thoughts:

So, the zist of this article is:

  • you should definitely invest your money in 401K when there's one avilable. Go for maximum limit allowed by IRS.
  • Even after doing this, if you still have money left for investment, open Traditional IRA account and contribute to the maximum amount allowed.
  • If you still have any money left for investment, you should invest in your personal brokerage account (non retirement account).
  • Most important you should never invest a single dime of your retirement money in anything other than stocks/funds. Don't follow the bond/stock mixture crap. Always invest in the market index ETF with lowest expense ratio (such as S&P500, Wilshire5000, etc). See in stock section for more details.

Don't listen to the so called investment experts for your hard earned retirement money. Investment firms are there to maximize their own profits, and governments are there to maximize their taxes. Invest wisely and be on the safe side by participating in government's ponzi scheme of stock market. Good Luck !!