- Details
- Published: Monday, 23 November 2020 02:57
- Hits: 2477
Programming Frameworks:
When learning NN, we wrote a lot of our functions for finding out optimal weights. There were too many parameters to tune, too many algorithms to chase, and writing each of them from scratch for each project isn't very productive. So, idea is to write python modules to do our job. AI/ML people came up with programming frameworks. Programming frameworks provide all these functions pre written for us in a compact library, with a lot of additional features for speed, efficiency, etc. Most popular AI frameworks are PyTorch, TensorFlow, Keras, etc. These are all open source.
TensorFlow:
TensorFlow (tf) is one of the programming frameworks used in AI. It was developed by google and is open source now. TensorFlow framework, provides a collection of libraries to develop and train models using pgm languages as python, javascript, etc. for ML We'll concentrate on using TensorFlow in Python only, since tf is most widely used with python. Its flexible architecture allows for the easy deployment of computation across a variety of platforms (i.e CPUs, GPUs, TPUs).
Official website for tensorflow is:
This is a good place to get started with basic syntax and installation:
https://www.tensorflow.org/guide
Gotchas of TensorFlow:
Caution: If you start learning tensorflow, there's actually no clear tutorial or simple documentation for this. So, you learn by examples. you write some cryptic looking code, and it does the job. It's very hard to see why it works, how it works and how to debug it if it fails. In raw python, you can just debug by writing your debug code and having enough "print" statements to see where did it go wrong. In tf, a lot of steps are combined into one cryptic function, and if it doesn't return the desired result, there's little cryptic looking help. There's TensorBoard that supposedly helps you in this debug process. I haven't tried that yet. A lot of AI folks hate tensorflow for it's obscure programming style. One such rant here:
http://nicodjimenez.github.io/2017/10/08/tensorflow.html
A lot of these complains are about initial version of TF known as Tensorflow 1. So, google came out with new revision of Tensorflow, called TensorFlow 2, which supposedly is better than earlier version. More details below.
Installation:
Tensorflow is installed as a python module, just like any other module. tf pkg are available as tensorFlow 1 and tensorFlow 2: tf 1 is older tf, while tf 2 is newer one.
TensorFlow 1: This is original TensorFlow pkg (one with lots of complaints). Version 1.0 of TensorFlow was released in 2017. Final version of TensorFlow 1 is 1.15. For TensorFlow 1.x, CPU and GPU packages are separate:
tensorflow==1.15
—Release for CPU-onlytensorflow-gpu==1.15
—Release with GPU support
TensorFlow 2: When Facebook released their own ML framework called pyTorch, it immediately started gaining ground against TF. By 2018, popularity of TensorFlow started declining. Pytorch seemed more intuitive to people. So, google made a major version release of TensorFlow named as TensorFlow 2. This was released in 2019. TensorFlow 2.0 introduced many changes, the most significant being TensorFlow eager, which changed the automatic differentiation scheme from the static computational graph, to the "Define by run" scheme originally made popular by Chainer and later PyTorch. Here CPU and GPU pkg are in one.
Migration from TF1 to TF2:
We can write our code in TF 1, and then migrate that code to be able to run in TF 2, by applying very few changes to TF 1 code. This link shows how:
https://www.tensorflow.org/guide/migrate
Install TensorFlow 1:
TensorFlow gets installed by installing "tensorflow using pip3". Documentation doesn't say which major version gets installed by running the install cmd. It looks like tf 2 gets installed by default, provided your system meets the requirements. However tf2 needs newer python and pip3 versions. Basic tf installation needs python 3.5 or greater and pip3. For tf 2 we need python 3.8 or greater (not sure??) and pip3 version 19.0 or greater. First check the versions, to find out if tf can even be installed or not, and if so, which major version.
$ python3 --version => returns Python 3.6.8 on my local m/c
$ pip3 --version => returns "pip 9.0.3 from /usr/lib/python3.6/site-packages (python 3.6)" on my local m/c
TensorFlow can be installed on CentOS 7 via following cmd in a terminal (assuming pip is installed).
sudo python3.6 -m pip install tensorflow => Type exactly as is. If you omit any of the options (i.e not doing sudo or or not using python3.6), the cmd will give you a lot of errors, and won't be able to install tensorflow for python 3.6. After installing, check in python3.6 dir to make sure the package is there:
$ ls /usr/local/lib64/python3.6/site-packages/ => shows following tensorflow related new dir. As can be seen, tensorFlow 1.14 got installed (probably due to not meeting python3 and pip3 requirements for tf 2). even though the last release for tensorflow 1 is 1.15, we see that 1.14 got installed (and not the latest 1.15). This may be due to some system requirements not being met.
tensorflow-1.14.0.dist-info/
keras_applications/
Keras_Applications-1.0.8.dist-info/
keras_preprocessing/
Keras_Preprocessing-1.1.2.dist-info/
TensorFlow 1 vs TensorFlow 2: TensorFlow 2 is radical departure from TensorFlow 1, and if you planning to learn Tensorflow, don't even bother about tf 1. Just learn tf 2 and you will saved from a lot of grief. However we are going to learn tf 1, as that is what gets used in Coursera courses in Deep learning. If we use tf 2, we may not be able to get our programming assignments to work (s they are written for tf 1), as tf is still cryptic enough, and any bug is not easy to debug..
NOTE: Documentation on google site doesn't mention what version exactly gets installed when you install tensorflow as above. Nor is there any documentation to help us understand how to install only tf1 or tf2. It just happened that tf1 v14 go installed for me. From the tons of warnings that I receive from the installed version, looks like not everything got installed the right way for tf 1. However the warnings seemed benign, so I continued on. I didn't try to update python3 and pip3 to see if tf 2 would get installed. Python3 latest version may not be available on many linux distro, and may even break your other python applications. It's always a major risk to update python3 and pip3, so I would be careful to do that on my system. So, let's live with tf 1 for now.
Syntax: Even though we have installed tf 1, all documentation on tensorflow.org refers to tf 2. My notes below are relevant for tf 1, but I'll highlight tf 2 wherever applicable.
Tensorflow is like any other module in Python. So, all cmds of python remain same, except that we call functions/methods in tensorflow as "tf" followed by dot and the function/method name (i.e tf.add() just as we do for any other module in python). Some of these functions/methods have to be run a certain way though, which is the start of mystic coding style of tf. We'll see details later.
After installing tf, run a quick pgm to see if everything got installed correctly. Name file as "test_tf.py".
#!/usr/bin/python3.6 import math import numpy as np import tensorflow as tf print(tf.version) #print version |
We see that on running above pgm, we get tons of warnings as below: those are OK. Also, the version gets printed with the file name and module v1. This indicates it's tf 1. For tf 2, we would see v2 instead of v1 (my hunch??).
/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. => These are the warnings ...
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
.... and so on ....
<module 'tensorflow._api.v1.version' from '/usr/local/lib/python3.6/site-packages/tensorflow/_api/v1/version/__init__.py'> => this line shows "v1" implying it's tensorflow 1. So, our TF 1 got installed correctly !!
Tensor Data Structure
TensorFlow is "flow of tensors". Tensors are used as the basic data structures in TensorFlow language. Tensors represent the connecting edges in any flow diagram called the Data Flow Graph. Tensors are defined as multidimensional array or list with a uniform type (called a dtype). If you're familiar with NumPy, tensors are (kind of) like np.arrays. Tensor objects are rep as tf.Tensor.
You can see all supported dtypes at tf.dtypes.DType(). Some of the Dtype objects are float, int, bool and string (i.e tf.float16/32/64, tf.int16/32/64, tf.unit16/32/64, tf.bool, tf.string and few more). These look same as numpy dtype (array1.dtype). so, there's no difference b/w the two, except that one is for numpy, while other one is for tf. Not sure, why they had to define their own data type, when they are the same as numpy data types.
NOTE: All tensors are immutable like Python numbers and strings: you can never update the contents of a tensor, only create a new one.
When writing a TensorFlow program, the main object that is manipulated and passed around is the tf.Tensor. A tf.Tensor object has a shape/rank (dimensions as rows, columns, etc. same as shape in numpy) and dtype (data type of tensor elements, all of which need to be of the same data type). We operate on these tensor objects, i.e add, multiply, etc (just like what we do for any arrays).
Tensors can be of any rank (i.e dimension). See in python - numpy section for details on arrays.
Rank 0 Tensor: Rank 0 means it's a scalar and not an array. ex: 4. All other higher ranks of array are vectors.
Rank 1 Tensor: Rank 1 is an array with 1 dim. Ex: [2, 3, 4]
Rank 2 Tensor: Rank 2 is an array with 2 dim. Ex: [ [2.1, 3.4], [3.5, 4.0] ]
And so on for higher dim Tensors.
NOTE: comma are needed to separate individual elements as in numpy arrays. numpy arrays can be used in many TensorFlow functions/methods. To carry out any computation on these tensors, as matrix multplication, matrix addition, etc, we use tf functions/methods instead of numpy functions/methods. Just as in numpy, where we define arrays of same data type for a given array, in a tensor, the values present in a tensor hold an identical data type with the known dimensions of the array. So, tensors are same as arrays in numpy for all practical purposes.
ex: tensor_2d = np.array([(1,2,3,4),(4,5,6,7),(8,9,10,11),(12,13,14,15)]) => declares a 2D tensor., and can be used in tensor operations NOTE: tensor_2d is just 2D numpy array.
Specialized Tensors: Constants, Variables, and Placeholders: We'll learn about how to create Tensors of different data types, and different ranks. There are many functions available to do this, but we'll look at 3 most important ones.
1. tf constants: TensorFlow constant is the simplest category of Tensors. It is not trainable and does not have a fixed dimension. It is used to store constant values. "constant" function is used here to declare constants of any rank.
syntax: constant(value, dtype=None, shape=None, name=’ Length ’, verify_shape=False ) => where, value is a constant value that will be used; dtype is the data type of the value (float , int, etc.); shape defines the shape of the constant (it’s optional); name defines the optional name of the tensor, and verify_shape is a Boolean value that will verify the shape.
ex: L=tf.constant(10, name="length", dtype=tf.int32) => Defines constant 10, with name "length" and of type int32.
print("L=",L) => prints L= Tensor("length_1:0", shape=(), dtype=int32) => This shows that the object is a Tensor object with shape blank (since it's a scalar) and type int32. NOTE: it doesn't display the value of constant. In tf 1, values are computed when session is run. We'll learn running sessions later. In TF 2, looks like the data is printed right here, even w/o running the session (that is what google tensorflow tutorials show).
ex: c = tf.constant([[4.0, 5.0 1.2], [10.0, 1.0 4.3]]) => Defines a rank 2 tensor. Since type not specified, it's automatically inferred based on contents of tensor. Here type is float32.
print("c=",c) => prints c= Tensor("Const_2:0", shape=(2,3), dtype=float32) => This shows object is a Tensor of rank=2 with shape=(2,3). As expected, type is assigned as float, even though we never explicitly assigned the type. Name here is "Const_2.0", since we didn't assign a specific name.
2. tf placeholders: TensorFlow placeholder is basically used to feed data to the computation graph during runtime. Thus, it is used to take the input parameter during runtime. We need to use the feed_dict method to feed the data to the tensors during session runtime. How to do this is explained later. Function "constant" discussed above had a constant value assigned at time of declaration, but here we assign value when running the session. Declaration of TensorFlow Placeholder is done via function "placeholder"
syntax: placeholder(dtype, shape=None, name=None) => Here, dtype is the data type of the tensor; shape defines the shape of the tensor, and name will have the optional name of the tensor.
ex: L2= tf.placeholder(tf.float32) => placeholder of type float32. Here we didn't define the shape, so any shape tensor can be stored into it.
print("L2=",L2) => prints L2= Tensor("Placeholder:0", dtype=float32). => Note: name here is "Placeholder:0" since we didn't specify a name.
ex: sess.run(L2, feed_dict = {L2: 3}) => This assigns a value of 3 to L2 during session run time. NOTE: We have to put L2 as 1st arg in sess.run to actually run Tensor "L2". If we don't do that, it will error out.
ex: L2= tf.placeholder(tf.float32, shape=(2,3)) => Here it's array of rank=2. NOTE: order of args have to be the same as defined in syntax above. Else it errors out.
print("L2=",L2) => prints L2= Tensor("Placeholder:0", shape=(2, 3), dtype=float32)
ex: sess.run(L2, feed_dict = {L2: [[2, 3, 1],[1, 2, 1]]}) => This assigns array values as shown to L2 during session run time.
3.tf Variables: These are variables used to store values that can change during operation. We can assign initial values and they can store other values later. These are similar to variables in other languages. We use function "Variable" to define a var. tf Variables act and feel like Tensors and are backed by tf.Tensor. Like tensors, they have a dtype
and a shape, For all purposes, we can treat them as Tensors.
ex: Here we define a constant, and then create a var using that constant as the initial value. We don't define type and shape as they are automatically inferred. We don't define a name either.
my_tensor = tf.constant([[1.0, 2.0], [3.0, 4.0]]) => Here we define a constant
my_var = tf.Variable(my_tensor) => Here we defined a variable "my_var" which has the initial value defined by the constant above. It's shape is automatically inferred to be (2,2) and type as float32.
print("my_var = ",my_var) => prints my_var = <tf.Variable 'Variable:0' shape=(2, 2) dtype=float32_ref> => NOTE: here it doesn't print "Tensor" but instead prints "tf.Variable" as Varaibles are not Tensor objects, but are backed by tf.Tensor. We have to explicitly convert them to tensors (by using tf.convert_to_tensor func explained later), if any function requires a Tensor as i/p.
ex: my_var = tf.Variable([[1.0, 2.0], [3.0, 4.0]]) => This is exactly same as above. We just put the initial value of variable into the function itself.
ex: Var= tf.Variable(tf.zeros((1,2)), dtype=tf.float32, name=”Var1”) => Here we create 2D tensor with shape=(1,2) named "Var" that we init to 0. See syntax of tf.zeros later.
print("var1 = ",Var1) => prints var1 = <tf.Variable 'Var:0' shape=(1, 2) dtype=float32_ref>
Non trainable variables: Although variables are important for differentiation, some variables will not need to be differentiated. You can turn off gradients for a variable by setting trainable
to false at creation. An example of a variable that would not need gradients is a training step counter.
ex: step_counter = tf.Variable(1, trainable=False) # initial value is assigned to 1. However by declaring it as non-trainable, we prevent it from differentiation.
ex: Var=tf.Variable( tf.add(x, y), trainable=False) => Here we define variable "Var", which is sum of tensors/array x,y, but we don't init to anything. This is because init values are picked up from x, y.
Irrespective of whether we inittialized variables or not, actual initialization of these variables does NOT take place at time of defining, but when we run func "global_variables_initializer()".
ex: init= tf.global_variables_initializer() #Here we assigned this func to "init". Now, initialization takes place when we run init as "sess.run(init)"
tf.get_variable() => Gets an existing variable with these parameters or create a new one. Not sure about the diff b/w tf.Varaiable() and tf.get_variable(). May be here we get many more options for initialization, regularization, etc.
Syntax: tf.get_variable(
name, shape=None, dtype=None, initializer=None, regularizer=None, ... many more options):
ex: tf.get_variable("W1", [2,3], initializer = tf.contrib.layers.xavier_initializer(seed = 1)) => This creates a var "W1" with shape=(2,3) and initializer set to Xavier initialization.
Difference b/w constants, placeholders and variables: constants are easy = their value remains fixed. Placeholders are like constants, but they allow us to change their values at run time so that we can run the pgm with many different values. Variables are like variables in any other pgm language => They allow us to store results of any computation.
shape, type, numpy: We can get shape or type of any Tensor by using Tensor.shape, Tensor.dtype, etc (i.e my_var.shape, my_var.dtype). We can also convert Tensors to numpy by using Tensor.numpy() (i.e my_var.numpy() will print array [[1.0, 2.0], [3.0, 4.0]] => However, in my installation, it gives an error => AttributeError: 'RefVariable' object has no attribute 'numpy'.
TensorFlow programs: Once we have created tensors (constants, placeholders, variables), we can use these in TensorFlow programs. Making tf pgm involve three components:
- Graph: It is the basic building block of TensorFlow that helps in understanding the flow of operations.
- Tensor: It represents the data that flows between the operations. Tensors are constants/variables that we created above. operations are add, multiply, etc. In the data flow graph, nodes are the mathematical operations and the edges are the data in the form of tensor, hence the name Tensor-Flow.
- Session: A session is used to execute the operations. Session is the most important and odd concept in TF. More details later.
Writing and running programs in TensorFlow has the following steps:
- Create Tensors (constants, placeholders, variables, as shown above) that are not yet executed/evaluated.
- Write operations between those Tensors (i.e multiply, add, etc). These operations can be done via tf functions as add, mul, etc or by using plain +, * etc as these operators are overloaded in tf (since these are originally in python). We can also use numpy arrays as i/p to Tensor operators, as the arrays will automatically be converted into Tensors.
When you specify the operations needed for a computation, you are telling TensorFlow how to construct a computation graph. We put them in computation graph, but we haven't run them yet. This is different than what we do in conventional programming, where computation is carried out, as soon as we write the operation. The computation graph can have some placeholders whose values you will specify only later.
- Initialize your Tensors. Constants are already initialized via func "constant()", but variables need to be initialized using func "global_variables_initializer()" shown above.
- Create a Session using function "Session()". A
Session
object encapsulates the environment in whichOperation
objects are executed, andTensor
objects are evaluated.
- syntax: tf.Session(
target= ' ', graph=None, config=None). Usually no args are provided, so, just call Session() and pass the handle to a var.
- ex:
sess=tf.Session().
- syntax: tf.Session(
- Run the Session, using method "run()" on that session. By running the session we can get values of Tensor objects and results of operations.This will run the operations you'd written above. You have to specify the function/method inside run() to run that particular func. If you have defined placeholders, you need to assign their values here. When you run the session, you are telling TensorFlow to execute the computation graph.
- syntax: run(
fetches, feed_dict=None, options=None, run_metadata=None) =>
Runs operations and evaluates tensors infetches
. This method runs one "step" of TensorFlow computation, by running the necessary graph fragment to execute everyOperation
and evaluate everyTensor
infetches
, substituting the values infeed_dict
for the corresponding input values. options and run_metadata are not used for our purposes. - The
fetches
argument may be a single graph element, or an arbitrarily nested list, tuple or dict containing graph elements at its leaves. A graph element can be one of the following types (there are few more, but we list 2 that we mostly use: Operation and Tensor):- A
tf.Operation
. The corresponding fetched value will beNone
. - A
tf.Tensor
. The corresponding fetched value will be a numpy ndarray containing the value of that tensor. This is important to note that the fetched value is not Tensor but numpy ndarray.
- A
- The optional
feed_dict
argument allows the caller to override the value of tensors in the graph. Each key infeed_dict
can be atf.Tensor
, the value of which may be a Python scalar, string, list, or numpy ndarray that can be converted to the samedtype
as that tensor. Each value infeed_dict
must be convertible to a numpy array of the dtype of the corresponding key. - The value returned by
run()
has the same shape as thefetches
argument, where the leaves are replaced by the corresponding values returned by TensorFlow. - ex:
a = tf.constant([10, 20])
; b = tf.constant([1.0, 2.0])
- v = sess.run(a) => Here a is evaluated. Since "fetches" arg is a single graph element of type Tensor, return value is numpy array [10, 20]
- v = sess.run([a, b]) => Here a and b are evaluated. Since "fetches" arg is a list of 2 graph elements of type Tensor, return value is a list with 2 numpy array [10, 20] and [1.0, 2.0]
- syntax: run(
- Close the session, using method "close()" on that session. A session may own resources, so by closing the session, we release the resources.
So, why do we do these complicated steps of making a graph, and then running them via session? Most likely, this is to map these computations to different nodes of CPU/GPU/TPU, etc. We keep on defining various operations of the final graph (i.e add, mul, etc to calculate cost function), and then map it to various nodes of GPU/TPU. Once we've mapped these, then in the very last step, we simply provide i/p values to the i/p nodes, and processor can easily compute the node value for all nodes in the graph.
ex: Below is an example where we multiply 2 constants to get the result.
a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a,b) # we can also write c=a*b, as operators are overloaded
print(c) # You will not see result for c=20, but instead get this Tensor => "Tensor("Mul:0", shape=(), dtype=int32)".
print("res =", sess.run(c**2) => We could write any operation, i.e c**2 and it would compute c^2=400. This prints 400.
|
ex: Here we solve eqn y=m*x+c. Here y is computed for constant m, c and var x varying from 10 to 40.
m= tf.Variable( [2.7], dtype=tf.float32) #define m as var with initial value 2.7 C= tf.Variable( [-2.0], dtype=tf.float32) #define C as var with initial value -2.0 X=tf.placeholder(tf.float32) #X is defined as placeholder, as it's value is going to be assigned during session runtime later. Y = m*X + C #we write the eqn directly instead of using func add, mul, as this involves single value computation sess = tf.Session() #create session init = tf.global_variables_initializer() #func to Initialize all the variables, as var initialization takes place only via this func sess.run(init) #running session for init print(sess.run( Y, feed_dict = {X :[10, 20, 30, 40]})) #Running session for computing Y. feed_dict func used to feed X data. We see o/p as: [ 25. 52. 79. 106.] sess.close() |
In both the examples above, we see a lot of warnings related to many of these names being deprecated. This may be possibly due to TensorFlow 2 (v2) now in release, so earlier names for TensorFlow 1 (v1) have been moved to tf.compat.v1.* (compatibility version v1) so as to not cause confusion. I see these warnings: (compat.v1 needs to be added to get rid of the warnings). you can add these to get rid of the warnings. I haven't tried that yet (UPDATE: on trying that, a lot of other things broke, so not worth it to fix these warnings).
WARNING:tensorflow:From ./test_tf.py:14: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From ./test_tf.py:25: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From ./test_tf.py:28: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.
Running Sessions: We ran session above using one of the ways to run sessions. There are actually 2 ways to run sessions:
Method 1: This is the method we used above
sess = tf.Session()
# Run the variables initialization (if needed), run the operations
result = sess.run(..., feed_dict = {...})
sess.close() # Close the session
Method 2: This is the method that is more concise (requires less lines of code)
with tf.Session() as sess:
# run the variables initialization (if needed), run the operations
result = sess.run(..., feed_dict = {...})
# This takes care of closing the session for you :)
TensorFlow Functions: We'll look at few important functions used in tf. Many functions take i/p as Tensors or numpy ndarray with no issues.
tf.one_hot() => This returns a 1 hot tensor (tensor is array or list). One hot is used very widely in AI in multi class classification. Here we have a o/p vector Y which has a classification for each i/p vector. As an example, consider a picture which can be cat, dog, mouse, others. So, given a picture it can be any of these 4 classes. We give these classes number as cat=0, dog=1, mouse=2 others=3. In AI, we write o/p vector for 6 different pictures as Y=[1 3 0 2 0 2] => This implies 1st picture is dog(class=1) 2nd picture is others(class=3) and so on. However, we can't this Y vector directly in NN equations, as we need to write it in form which says whether each picture is cat/not-cat, dog/not-dog, mouse/not-mouse, other/not-others. This is the same form as what we wrote for 2 class classification, which said if picture is cat or not-cat.
To write in above form, we need to have 4 cols for each picture, each of which says whether it's cat/not-cat, dog/not-dog, mouse/not-mouse, other/not-others.
So, Y(1-hot) =
[ [ 0 1 0 0 ] => 1st row is for picture 1, says that picture is not-cat, is dog, is not-mouse and is not-others (implying it's a dog picture, but written in 1 hot form. It's 1 for dog, and 0 for others)
[ 0 0 0 1 ]
[ 1 0 0 0 ]
[ 0 0 1 0 ]
[ 1 0 0 0 ]
[ 0 0 1 0 ] ] => 6th row is for picture 6, says that picture is not-cat, is not-dog, is mouse and is not-others (implying it's a mouse picture, but written in 1 hot form. It's 1 for mouse, and 0 for others)
syntax: tf.one_hot (indices, depth, on_value=None, off_value=None, axis=None, dtype=None, name=None) => Of all args, important ones are indices and depth. Indices is a tensor (or an array or list) and the locations represented by indices in indices
take value on_value
(default=1), while all other locations take value off_value
(default=0). depth is a scalar (i.e a single number) that defines the depth of 1 hot dimension.
If indices
is a scalar the output shape will be a vector of length depth
.
If the input indices
is rank N
, the output will have rank N+1
. The new axis is created at dimension axis
(default is -1: which means the new axis is appended at the end). If indices is 1D (i.e rank=1), then for axis=-1, the shape of output is (length_of_indices X depth), while for axis=0, the shape of output is (depth X length_of_indices)
ex:
indices = np.array([1,2,3,0,2,1]) #Here indices is a 1D array with rank=1
depth=4 #depth is needed since we don't know how many total classes we have for classification. indices may not contain all the classes.
one_hot = tf.one_hot(indices, depth) #one_hot is a 2D tensor of shape (indices, depth) => since axis is not specified, it's set to default of -1.
print("one_hot = \n", one_hot) => This prints the Tensor w/o computing it. So, it prints => Tensor("one_hot:0", shape=(6, 4), dtype=float32). We need to run session in order to compute the graph.
sess = tf.Session() #create session
one_hot = run.sess(tf.one_hot(indices, depth))
sess.close() #session can be closed once computation graph has run.
print ("one_hot = \n", one_hot) => This prints the one_hot tensor vector as below. one_hot is a 2D tensor, with shape (6,4)
one_hot =
[[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]
[1. 0. 0. 0.]
[0. 0. 1. 0.]
[0. 1. 0. 0.]]
tf.zeros() / tf.ones() => These functions initialize a vector to zeros or ones. It takes in a shape and return an array of dimension shape full of zeros and ones respectively. Here shape is a list of integers, a tuple of integers, or a 1-D Tensor. These are same as numpy np.zeros / np.ones except that in numpy shape are in form of tuple (a,b), while in tf, shape can be in form of list or 1D tensor = [a,b].
ex: tf.zeros((1,2)) => Here we create 2D tensor with shape=(1,2). NOTE: we specified shape in tf.zeros as (1,2) which is same syntax as numpy. However, we can specify shape as array too, i.e tf.zeros([1,2]). This is what you will see used more commonly.
ex: tf.ones([2, 3], tf.int32) => This returns 2D tensor of shape(2,3) = [[1, 1, 1], [1, 1, 1]]
ex: tf.zeros([3]) => This returns 1D tensor of shape (3,) = [0. 0. 0.]
tf.convert_to_tensor() => This converts Python objects of various types to Tensor
objects. It accepts Tensor
objects, numpy arrays, Python lists, and Python scalars. "tf.Variable" which is not a Tensor object is converted into Tensor type by using this func.
syntax: tf.convert_to_tensor(
value, dtype=None, dtype_hint=None, name=None
) => This converts "value" into a tensor.dtype is the lement type for the returned tensor. If missing, the type is inferred from the type of
value
.
ex: W1 = tf.convert_to_tensor([[1.0, 2.0], [3.0, 4.0]]) => This converts the array into tensor object.
print(y1) => returns "y1= Tensor("Const_4:0", shape=(2, 2), dtype=float32)" => this shows that it's a Tensor now with type float32 which is inferred from the i/p type.
tf.train.GradientDescentOptimizer(learning_rate = 0.005).minimize(cost) => tf.train.GradientDescentOptimizer is an Optimizer that implements the gradient descent algorithm. It uses the learning rate specified. It has a method "minimize" which adds operations to minimize loss
by updating var_list
. Minimize() method simply combines calls compute_gradients()
and apply_gradients()
. This whole function with it's method is called in Tensorflow to do back propagagtion and parameter update for 1 iteration on the "loss" equation. We iterate over it multiple times to get optimal "weights" to get lowest loss.
syntax of minimize: minimize(loss, var_list=None) => loss is a
Tensor
containing the value to minimize. var_list is an Optional list or tuple of "tf.Variable"
objects to update to minimize loss
. Defaults to the list of variables collected in the graph under the key GraphKeys.
.
For Adam optimizer, we can use AdamOptimizer.
ex: optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
tf.nn.softmax_cross_entropy_with_logits: => This function computes softmax cross entropy between logits and labels. logits are the o/p of last nn layer, before it feeds into the exponential function. So, Z[L] is the logit. It's a matrix of shape (c,m), where c=num of classes, m=num of examples. It feeds into sigmoid function for a binary classifier to yield a[L]. For Multi class classifier, it feeds into exponent function to yield a[L] which is a matrix of same shape as Z[L] . Labels is a matrix of same shape as Z[L]. Softmax cross entropy is the loss function that is defined in AI section, i.e Loss(Y, Yhat) = - ∑ Yj * loge(Yhat(j)) where Yhat = a[L] and Y is output labels vector. Backpropagation will happen into both logits
and labels
.
syntax: tf.nn.softmax_cross_entropy_with_logits(
labels, logits, axis=-1, name=None
)
=> It returns a Tensor
that contains the softmax cross entropy loss. Its type is the same as logits
and its shape is the same as labels
except that it does not have the last dimension of labels
. So, loss returned is a vector where each entry is for each example.
ex: Here it's a (2,3) matrix for logits and lables. As per syntax, logits and labels are transposed, so shape of logits and labels feeding into this function is (m,c). So, below we have data for 2 examples, and 3 classes. classes don't need to be 1-hot, they can be probability values that add up to 1.
logits = [[4.0, 2.0, 1.0], [0.0, 5.0, 1.0]]
=> Here Z[L] for 1st example is [4,2,1], while for 2nd example it's [0,5,1]labels = [[1.0, 0.0, 0.0], [0.0, 0.8, 0.2]]
=> Here probability of 3 classes for 1st example is 1,0,0, while for 2nd example it's 0,0.2,0.8.print(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)
)
Tensor("softmax_cross_entropy_with_logits_sg/Reshape_2:0", shape=(2,), dtype=float32) => o/p shape is 1D vector with 2 entries, 1 for each example
print("y=",sess.run(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)
))
y= [0.16984604 0.82474494] => This is the loss value computed for the 2 examples.
This how it's computed: NOTE: log is with base e(i.e it's ln and NOT log with base 10)
Loss for 1st example = -( 1.0*log(e^4/(e^4+e^2+e^1)) + 0.0*log(e^2/(e^4+e^2+e^1)) + 0.0*log(e^1/(e^4+e^2+e^1)) ) = - ( 1*log(54.6/54.6+7.4+2.7) + 0 + 0 ) = -log(54.6/64.7)= - (-0.17) = 0.17
Loss for 2nd example = - ( 0.0*log(e^0/(e^0+e^5+e^1)) + 0.8*log(e^5/(e^0+e^5+e^1)) + 0.2*log(e^1/(e^0+e^5+e^1)) ) = - ( 0 + 0.8*log(148.4/1+148.4+2.7) + 0.2*log(2.7/1+148.4+2.7) )
= - ( 0.8*(log(148.4/152.1) + 0.2*log(2.7/152.1) ) = -( 0.8*(-0.03) + 0.2*(-4.07) ) = 0.828
This computation matches closely with what's computed by the softmax function.
logits = [[4.0, 2.0, 1.0], [0.0, 5.0, 1.0]]
labels = [[1.0, 0.0, 0.0], [0.0, 0.8, 0.2]]
tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)
tf.reduce_mean: This computes mean across all entries of an array (same as numpy np.mean). This is used in conjunction with above softmax function to calculate final cost. Final cost is mean of all the costs (i.e sum of al the costs for "m" examples, divided by "m").
ex: tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)
)) => This returns y= 0.4972955, which is the mean of entries in array returned above = (0.17+0.82)/2 = 0.49
--------