VNC: Virtual Network Computing

VNC is a graphical desktop sharing system, used to remotely control another PC. It's same as other software as chrome remote desktop,  TeamViewer, , etc which allow you to control remote PC. VNC is very popular among enterprises, and is open source under GNU license. VNC was orinally developed in UK. Many other commercial or open source products based on VNC original source code developed. In 2002, VNC R&D center was closed. It's developers formed RealVNC which developed open source as well as commercial product under the same name. Most of the time when people say VNC, they mean RealVNC.

Intro material on Wiki: https://en.wikipedia.org/wiki/Virtual_Network_Computing

 

VNC server/client model:

VNC software has 2 parts: a sever software, and a client software. You install server software on the desktop which you want to control. You install client software on the desktop from where you want to control. The client software knows how to connect to server software. The client software displays the desktop screen on remote desktop which is running the server software for VNC.

Installation:

Install RealVNC from here: You will need to install both RealVNC server and RealVNC client. They will need to be installed on different computers. Server on the computer to be controlled, and client on the computer that controls the server computer. Choose appropriate OS and then download it.

Real VNC server: https://www.realvnc.com/en/connect/download/vnc/

RealVNC client: https://www.realvnc.com/en/connect/download/viewer/

Running VNC:

Once installed, you can start VNC server on server desktop by clicking on RealVNC icon or typing "vncserver" on terminal. Once started, VNC server always starts on powerup. OWhen vncserver is running, it shows the ip address for the computer on which it is running. It's something like this kind of message:

$vncserver

......

New desktop is raspberrypi (192.168.1.109)

....

Now on the client machine (where you have the VNC client software installed), you enter this number in address section (here, it's 192.168.1.109). Once enetered, it brings up an icon, on which you click, and you can see the remote desktop screen (where VNC server is running). Now if you work on this screen, it seems as if you are directly working on the remote desktop (the screen refreshes amazingly fast. Keystrokes from client to server, as well as pixels from server to client are transmitted pretty fast, especially if both client and server are connected to high speed internet).

vncserver command on Linux terminal can be used with a lot of options to set the display options. One helpful option is:

vncserver -geometry 2560x1024 -depth 24

NOTE: that when we say display, we mean the physical screen that is on the monitor of remote desktop. However, the pixels of that display are stored in memory, and the monitor is just displaying whatever is stored in that memory. So, we can have another display which has pixels stored in memory only and doesn't go to any monitor. This is called a "virtual display". VNC allows these virtual displays to be created on the server machine, and then be accessed using VNC viewer. Thus we can have 10's of display on a single server desktop, where one of them is real display connected to physical monitor, while all others are all virtual displays.

Headless servers or servers which don't have any monitor connected, don't start the gui program for the display. In such cases, VNC server has nothing to display since it always shows the physical display by default. So, in turn VNC server program doesn't start at startup. in such cases, we start VNC server by logging into the server machine via ssh. Then VNC server creates a virtual display and this virtual display can be seen via a VNC client.

On the top center of VNC session, we have a way to kill VNC or set many options. Look thru them, if you need to set anything else.For ex, if you have 2 monitors, and it's not working, try setting "UseAllMonitors to true" over there.

NOTE: we have .vnc dir in the home dir. Inside this dir, is a config file, which controls how the vnc desktop should look. To get full screen extended on desktop,  we add these lines to config file, so that we don't have to type it every time on cmd line:
-geometry 2560x1024
-depth 24

Guest Access: In VNC, allowing guest acces to others is easy. Steps:

  • Run "vncconfig &" on cmd line. This cmd has to be run on lindesk terminal and not on lsf terminal
  • On pop up box, click commands->options. On new pop up box, choose advanced.
  • Change guest access to "Interactive" and click apply.
  • On main pop-up box, if we click on options, we should see a "tick mark" on Guest Login. If not, tick that by clicking.
  • Now, anyone can connect using login "guest" and no password.
  • When user requests access, a new box appears on bottom. click on "accept" to allow guest access to your vnc m/c.

 

Putty:

 If we are on a windows machine, and don't have terminal to connect to, we can use a program called Putty, that supports a lot of protocols as ssh, ftp, etc. It has a GUI interface, and is a lot easier to use.

First download Putty. Then Use Putty to SSH to the above machine. That brings up a terminal on remote machine to which you work in usual way. When done, log out of Putty and close the window.


-----------------------------------------------------

 

ALL JUNK BELOW. NEED TO MOVE ELSEWHERE FIXME ???

LSF:


Any jobs can now be run only in lsf machine. So open xterm on lsf
Open Xterm on an LSF machine: bsub \-Is \-R "linux&&bit64" "xterm" & => OBSELETE
Open Konsole on an LSF machine: bsub \-Is \-R "linux&&bit64" "konsole" & => OBSELETE
Open Konsole on an LSF machine on RHEL6 OS: bsub \-Is \-R "select[ws60]" "konsole" & => use ws40 for RHEL4 (ws60 is latest). ws60 provides latest AME tools.
Open Konsole on an LSF machine on SUSE11 OS: bsub \-Is \-R "select[sles11]" "konsole" & => this was needed to get latest AME tools, but not anymore. SUSE not used anymore
Run <tool_name> -ame on both OS to see which gives you latest tools. Some newer versions may be avilable on 1 OS and not on other.

OS for Artisan:
Artisan 5.2.1 and earlier will only run on the legacy SuSE11 OS.
Artisan 5.3 will run on both SuSE11 and RHEL6.
The upcoming Artisan 5.4 will only run on RHEL6.

Run icfb on suse m/c: bsub -R "select[sles11]" -Is icfb -artisan-2.91p1 &

NOTE: to get around check and save issues, run icfb on suse m/c: icfb -artisan-5.2.1


For LSF jobs submitted, if we want to know what OS job got submitted on, look in the log file (i.e irun.log) to find name of lsf m/c. Then run:
ex: /home/kagrawal/ > lshosts machine1.com => last 2 RHS entries show OS

HOST_NAME      type    model  cpuf ncpus maxmem maxswp server RESOURCES
dlewz2732.d LIN_X64 p4x_3400 417.0    12 262047M 262145M    Yes
(bit64 cs dc X64 linux srvClass01 maxmem32G linux26 maxmem64G p4x maxmem128G warm maxmem256G sles suse sles11p2 !sles11) => OS is sles11.2
#Preventing jobs from getting killed in lsf:
jobexclude --add <jobid> => to add a job
jobexclude --list => to list all added jobs


#snapshot
In any dir, there is .snapshot dir, within which is there are dir with timestamp. Just cd into appr dir, and cp stuff that is to be retrieved.

dssc cmds:
---------
dssc -help => lsits all options
dssc <cmd_name> -help => lists syntax of a specific cmd

checkin:
checkin for 1st tme: dssc ci -new <file/dir> -com "comments_here" -rec => -rec needed for recursive
checkin a file after editing with lock: dssc ci <filename>
dssc ls -report status -rec => shows revs of all checked out files/dir (current rev vs .)
dssc diff "file1.v" "file1.v;Latest" => diffs b/w current modified checked out file against one in repository.
dssc retire -force => to completely remove it from database.
dssc ci -new <file/dir> -skip => use ci with -skip after retiring a file in db. Else, old retired file will be checked in.

checkout:
checkout all files recursively in read mode: dssc pop -rec => does it starting from current dir
checkout a file in readmode: dssc co <filename>
checkout a file in editmode with lock: dssc co -lock <filename>
checkin a file after editing with lock: dssc ci <filename>

dssc cancel:
dssc cancel -force <filename> => this is to cancel checked out file (even with edits), and to repop with original version.
checkout a file in editmode without lock: dssc co -get <filename> => This gives unlocked copy which can be modified. Do a chmod to 664 or 755.
Then to populate the original file (and discard the current modified file), do:
dssc pop -force <filename>
dssc unlock <filename> => To unlock files (i.e remove lock). This can be useful when files can't be checked out or something else gone bad.

C++ programming lang:

C++ is backward compatible with C, that means you can use all your C code in your C++ pgm, and pgm will still compile fine. So, you already can write C++ pgm, if you know C. Infact all C pgms that you have can be renamed as C++. However, a lot of object oriented features were added in C++, which programmers take advantage of, by modifying existing C code. So, C++ allows you to take incremental steps from a C pgm. All C functions like printf, malloc, etc are available in C++ too.


http://www.learncpp.com/

sample pgm: hello.cpp (C++ files can also be named as .cxx, it's an old style extension that's still used)
----
#include <iostream> //needed for std io func, newer system header files do not have .h extension
#include "myfile.h" //for user defined header files, use " "

void printA(int x, int y) //each func needs to be defined separately
{
    std::cout << "A" << x << y << std::endl; //std:: prefix says that cout is in std namespace
}
 //If func printA is defined after main(), then we need to do forward declaration
void printA(int x, int y); //using func prototype for forward declaration. also used if printA is defined in separate file by itself

// Definition of main()
int main()
{
  int a;
  std::cout << "Enter number " ; //cout to print out on screen. << indicates RHS is transferred to LHS. string is transffered to cout
  std::cin >> a; //cin to take i/p from screen. >> indicates LHS is transferred to RHS. cin transfers val to a
  std::cout << "Hello World num is " << a << std::endl; //endl to put newline. multiple prints can be in same stmt as long as separated by <<
    std::cout << "Starting main()" << std::endl;
    printA(1,a);
    std::cout << "Ending main()" << std::endl;
    return 0; //any +ve num returned means error
}


complie C++:

g++ ~/hello.cpp
execute: ./a.out

Enter number 1
Hello World num is 1
Starting main()
A 1 1
Ending main()

------------------------

Keywords : C++ reserves a set of 84 words (as of C++14) for its own use.
ex: char, int, for, case, while, struct, void, ...

identifiers: The name of a variable, function, type, or other kind of object in C++ is called an identifier. The identifier can only be composed of letters (lower or upper case), numbers, and the underscore character. identifiers are case sensitive

Literals: A literal is a fixed value that has been inserted (hardcoded) directly into the source code, such as 5, or 3.14159. Literals always evaluate to themselves.

operands: Literals, variables, and function calls that return values are all known as operands.

operators: Operators tell the expression how to combine one or more operands to produce a new result.

data types:
boolean: bool (true or false). true is stored as int 1, false as int 0
character: char, char16_t(C++11 only), char32_t(C++11 only). char16_t and char32_t store char in 16 or 32 bit as UTF-16 or UTF-32 Unicode char. ex: 'c'. char stored as 1 byte int (usually signed). Since char is 8 bits, ASCII char numbers are b/w 0 to 127. some char from 0 to 31 are escape char, \n=newline, \t=tab, char code 27 is escape \.
floating point: float, double, long double. signed by default
integer: short, int, long, long long (C++11 only). signed by default (avoid unsigned)
void: for functions that do not take any param or return a value. In C++, empty param are allowed
 ex: int Value(void) { ...} same as int Value() {...}
sizeof(char) => returns size of char in bytes. C++ gurantees min size of each data type, actual size may be bigger. int is min 2 bytes, while float is min 4 bytes. Fixed size int, etc were defined later in C++ inside std namespace. i.e int8_t, uint8_t, int16_t, ...

3 ways to init a var
A. int nval = 5; bool b1=true; //copy initialization
B. int nval(5); //direct initialization
C. int nval{5}; //uniform initialization, works for all data types, but only with C++11. Note: curly braces instead of circle brackets. recommended to use this style.
 int value{}; // default initialization to 0

var assgn (not init):
int nValue;
nValue = 5; // copy assignment (no way to do direct or uniform assignment)
const double gravity { 9.8 }; => assigns const val. can't be changed
const int maxNameLength { 30 }; => assigned const 30
-----------

preprocessor:
ex: #define NUM 7
ex: #ifdef PRINT_J std::cout << "joe"; #endif
ex: header guards
#ifndef SQUARE_H
#define SQUARE_H
....
#endif

---------
namespace:
sample pgm:
ex: constants.h
namespace constants
{
    const double pi(3.14159);
    const double avogadro(6.0221413e23);
    // ... other related constants
}
ex: myfile.cpp
#include "constants.h"
double circumference = 2 * radius * constants::pi;

-------------

C programming lang: Most popular language of last 50 years. Must learn language which can be used to write any simple or complex pgm


C library ref guide: http://www.acm.uiuc.edu/webmonkeys/book/c_guide/index.html

C language has it's own syntax, which may not seem very easy for a person who is used to more modern languages like python. We'll start with very basic "Hello World" pgm which prints "Hello World" on screen. This pgm is explained under "gcc" section, but I'm repeating it here. Name the pgm hello.c

#include <stdio.h> //include files: explained under gcc article

int main (void) { //main function explained below

  printf ("Hello, world!\n");

  return 0;

}

 
main function: Every C pgm needs a main function. This is where the pgm starts executing from line by line.

arguments to main() function are optional. argc and argv variables store the parameters passed in cmd line of the shell when running the pgm.On POSIX compliant systems, char *envp[] contains vector of program's environment var (i.e SHELL environment var that are passed to the program

argc=num of arguments in cmd line including cmd name,

argv[]= vector storing arguments in cmd line, argv[0]=cmd_name (1st argument), argv[1]=2nd argument (1st option), and so on

envp[] = vector storing ENIRONMENT variables

argc, argv[] and envp[] can be any names, i.e main(int cnt, char * arr_option[], char *arr_env[]); is equally valid, though usimg argc, argv and envp is standard

ex: a.out 1 myname 17cd => argc=4 (4 arguments on cmd line), argv[] = {"a.out"  "12", "myname", "17cd" }; Note that argv is an array of strings(char), so even numbers are stored as string. Char strings "12" can be converted to numbers using atol() for 64 bit int conversion or atoi() for 32 bit. i.e y=atoi(argv[1]); => This will assign y to integer 12. Plain casting using (int) will not work, as (int)"12" will convert string 12 to integer which will be ascii code of 1,2,\n converted to integer.
 
int main (int argc, char *argv[]) { // int before main refers to the return value of main. In early days of C, there was no int before main as this was implied. Today, this is considered to be an error (even if main doesn't return any value).

int main (void) { => no arguments to main
...
exit (1); //return value from int. Can also do: return 1;
}

Compile C pgm: We can't run C pgm directly by typing hello.c on terminal. We need to compile it first, which generates an executable "a.out", and then run "a.out" on terminal.To compile above pgm, we use gcc compiler. Type below cmd on terminal. Look in "gcc" section for details.

gcc hello.c => generates a new file a.out. Now type ./a.out on terminal, and it runs the pgm

Syntax of C pgm:

1. comments start with // for single line comment, For multi line comment, use /* comment */

2. Each stmt ends with semicolon ;. We don't need semicolons for blocks

3. main() function is needed. We can define other functions as needed.

4. All variables need to defined before using them. We define the var and specify their type. Types are explained below.

5. For loops, if-else,etc are supported. Many reserved keywords are defined for these, and they can't be used as variable names,

Std Input/Output functions:

printf and scanf are 2 std functions that are going to be used the most. They are inbuilt library functions in C programming language which are available in C library by default. These functions are declared and related macros are defined in “stdio.h” which is a header file in C language. That's why we have to include “stdio.h” file.

1. printf: printf() function is used to print the (“character, string, float, integer, octal and hexadecimal values”) onto the output screen.

ex:

int var1=5; // var1 used in printf below. We define it to be of integer type and assign it a value of 5.

printf ("Value is = %d \n",var1); // %d specifies that var1 is int type, and should be printed in decimal format. var1 needs to be defined before it can be used. var1 is put outside " .. ", while %d is inside " ...". \n is used to print a newline.

2. scanf: scanf() function is used to read character, string, numeric data from keyboard.

ex:

char ch; //var ch is defined of type "character", and is used to stre the i/p entered by the user.

printf("Enter any character \n"); //This printf is same as before. We can't print anything from within scanf. It can only take input. So, we usually precede scanf with printf.

scanf("%c", &ch); //Same as in printf, we have to specify the type of i/p. Here %c says it's character type, and whatever character user enters on prompt is stored in var "ch". Here var "ch" has to be defined as a "char type" before using it. We use an & with ch. That is needed, and will be xplained in pointer section below. No & is needed for var in printf function.



Type:  In C pgm, we have to define data type of all the variables, else C compiler will error out. These are 2 categories of data types for any var. Primary data types and derived data types
-----

I. Primary data type: These are fundamental data types in C namely integer(int), floating point(float), character(char) and void.
arithmetic type: 4 basic types: char, int, float, double and 4 optional type specifiers (signed, unsigned, short, long). void is a data type for nothing.

1. char: 8 bits.
 char, signed char, unsigned char => signed char stored as signed byte. If we use "char" to store character then ascii code for that character is stored. In that case signed or unsigned doesn't matter. However, if we use char to store 8 bit number, then signed unsigned matters, as it represents 8 bit signed or unsigned integer. As "int" stores integers in 16 bit or larger, the only way to store 8 bit integers is by using signed/unsigned char. signed char is from -128 to +127, while unsigned char is from 0 to +255.
 Normal char (no signed/unsigned) are represented in ASCII codes. See in ASCII part of "bash shell scripting language" section.

2. int: 16 bits to 64 bits.
 A. short integer type. atleast 16 bits in size => short, short int, signed short, signed short int, unsigned short, unsigned short int.
 B. basic integer type. atleast 16 bits in size => int, signed, signed int, unsigned, unsigned int. usually 32 bits.
 C. long integer type. atleast 32 bits in size => long, long int, signed long, signed long int, unsigned long, unsigned long int. usually 64 bits, but on embedded arm uP, these are 32 bits.
 D. long long integer type. atleast 64 bits in size => long long, long long int, signed long long, signed long long int, unsigned long long, unsigned long long int. Usually 128 bits, but on embedded arm uP, these are 64 bits.

3. float: IEEE 754 single precision floating point format (32 bit: 1 sign bit, 8 exponent bit and 24 significand precision(23 explicitly stored since 1 bit is hidden)).

4. double: IEEE 754 double precision floating point format (64 bit: 1 sign bit, 11 exponent bit and 53 significand precision(52 explicitly stored since 1 bit is hidden)). "double double" is extended precision floating-point type. It can be 80 bit floating point format or some non-IEEE format.
 
NOTE:  The above types don't have particular size as part of C lang, as it's target processor dependent. So, their size is provided via macro constants in two headers: limits.h header defines macros for integer types and float.h header defines macros for floating-point types.
limit.h: (look in http://www.acm.uiuc.edu/webmonkeys/book/c_guide/2.5.html)
-------
In cortex M0, char size is defined in /apps/arm/rvds/4.0-821/RVCT/Data/4.0/400/include/unix/limits.h as follows:
#define CHAR_BIT 8 => Number of bits in a byte. The actual values depend on the implementation, but user should not define these to be lower than what's put here.
#define CHAR_MIN 0 => minimum value for an object of type char
#define CHAR_MAX 255 => maximum value for an object of type char

similarly signed char is defined as -128 to +127 (8 bits), unsigned char is 0 to 255, signed short int or int is -32768 to +32767 (16 bits), signed long int is -2147483648 to +2147483647(32 bits).

float.h: (look in http://www.acm.uiuc.edu/webmonkeys/book/c_guide/2.4.html)
--------
for float(32 bits), double(64 bits), long double(80 bits).
ex: In cortex M0 float.h, we have:
#define FLT_MAX  3.40282347e+38F

Compiler uses these files to determine size of these primary data types. So, the same pgm may compile differently on different m/c if these files have diff values.

II. Derived data types: Derived data types are nothing but primary datatypes grouped together like array, stucture, union and pointer.

Boolean type: _Bool (true/false). It's an unsigned byte. False has value 0 while true has value 1. It can be seen as derived from int.

In cortex M0 stdbool.h, bool is defined as _Bool, true as 1 and false as 0. (for C compilers pre C99 std). C++ has bool type instead of _Bool type (as per C99 std).
#define bool _Bool
#define true 1
#define false 0

pointer type:

Pointer type var are a new class of var that store addr. Addr could have been stored as type int, but a special "pointer" type was declared to store the addr. For every type T (i.e char, int, etc) there exists a type pointer to T. Variables can be declared as being pointers to values of various types, by means of the * type declarator.
eg:
char v; declare var v as of type char. To get addr location of any variable, use &. So, addr of v is &v
char *pv; (can also be written as char* pv; or char * pv;)=> declares a variable "pv" which is a pointer to value of type char. pv doesn't store a char, but rather an addr. That addr stores a char var. *pv refers to contents stored at memory pointed to by this addr (the addr stored in var pv). &pv is the addr of this var pv, just like &v is the addr of var v. Since, var v istores char, it is just 1 byte in length, while var pv stores an addr, which is 32 bits on a 32 bit system. We can initialize pv to any addr, including special value "NULL" (pv=NULL;):
char *pv=0x1000; => assigns pv to addr value 1000. This is similar to char *pv;pv=0x1000; However, this will error out with this msg: "a value of type "int" cannot be used to initialize an entity of type char *". Reason is although C language allows any integer type to be converted to a pointer type, but such conversions cannot be done implicitly; a cast is required explicitly to do this non-portable coversion. so, we have to do: char *pv = (char *)0x1000; => This casts 0x1000 to pointer type pointing to a char var, which is exactly what pv is, so both sides become same type.

char v='a';
char *pv=&v; => assigns pv to addr loc of variable v. If we print various values of v and pv, this is what we might get:

printf ("v=%c, &v=%x, pv=%x, &pv=%x, *pv=%c",v,&v,pv,&pv,*pv); => v and *pv both store char, so we use %c, while pv, &v and &pv store addr, so we use %x

v=a, &v=f98ca5cf, pv=f98ca5cf, &pv=f98ca5c0, *pv just happens to store the char


pv = &v; => assigns addr loc of v to pv. If pv wasn't declared as a pointer, we could not do this.
pv = (char *)0x1000; => pv is assigned addr value of 1000. The addr 0x1000 stores a char type.
*pv = 0; => store 0 in contents stored at addr pointed by pv.

initializing ptr var: We can initialize ptr var to any addr val. However we can also initialize the contents of ptr var via this:

char *p="Name"; => Here p is created as a ptr and values assigned from addr p, p+1 and so on. So, *p='N', *(p+1)='a', and so on to *(p+4)='\n'. *(p+5) is not initialized to anything, and contains garbage value.


Below 2 assignments are used extensively for rd/wrt of reg mem locations:
1. Rd from given mem loc: 0x1000 or "#define addr 0x1000"
data = *((char *)0x1000); => grab contents of addr loc 0x1000 and assign it to data.
2. Wrt to given mem loc: wrt 1 or "0x01" (since char, so 1 byte only, so 0x01. If short int, use "0x0001")
*((char *) 0x1000) = 1; => This stores 0x01 at addr 0x1000 (Note: it is not char '1' as char '1' is ascii value 0x31. Here we cast 0x1000 to type "pointer of type char" and then "*" in front of it refers to contents stored at addr 0x1000. So, we store "1" at that addr. We can omit extra brackets and do: *(char *) 0x1000 = 1; NOTE: if we try to do: *0x1000=1; then we get this error "operand of "*" must be a pointer" as 0x1000 is a int and not a pointer, so explicit conversion is required.

double pointer: When the pointer stores addr of other pointer (instead of storing addr of char, int, etc), it's pointer to a pointer or double pointer

ex:

int **dpv; => here dpv is double ptr to type int. dpv stores addr of pointer pv for ex, where pv pointer points to data tof ype int

int *pv; => pv is the ptr pointing to data type int.

int v; => regular var decalred as int

pv = &v => single ptr pv is assigned addr of var "v".

dpv = &pv; => double ptr dpv is assigned addr of ptr pv. If dpv wasn't declared a double ptr, then &pv couldn't be assigned to dpv. If dpv was declared a single ptr (i.e int *dpv), then dpv could store addr of regular var "v" and not addr of single ptr pv.

use of double ptr: We use it arg of functions, when we want to change the contents of a var which is outside the function, from inside the function.

1. We use it in arg of main func:

ex: int main(int argc, char **argv) => here we could use char **argv or char *argv[]. It represents an array of char sequences.

2. We use it  in arg of regular func:

 

Array: An array is defined as finite ordered collection of homogenous data, stored in contiguous memory locations.

Array types are introduced along with pointers, as in C, arrays are just a pointer to the first object of that data in the memory. In most contexts, array names "decays" to pointers. In simple words, array names are converted to pointers. That's the reason why you can use pointer with the same name as array to manipulate elements of the array.

ex: char p[10] = "Name"; => This declares an array of 10 char type var = p[0] to p[9]. It is initialized to value "Name", so p[0]='N', p[1]='a',..p[4]='\n', and p[5] to p[9] are uninitialized. p[0] to p[9] are stored in continuous mem locations, and *p or p[0] refers to first element of array *(p+1) or p[1] to second element of array and so on. The way compiler translates array is that it stores it as stores "p" as ref addr, and then uses it to figure out p[0] = *p, p[1] = *(p+1) and so on. This is similar to char *p="Name";

ex: printf("ptr: p addr in hex= %x p addr in dec= %d &p= %x &p[0]=%x *p=%c &p[1]=%x *(p+1)=%c\n",p,p,&p,&p[0],*p,&p[1],*(p+1));

o/p => ptr: p addr in hex= 23fccc60 p addr in dec= 603769952 &p= 23fccc60 &p[0]=23fccc60 *p=N &p[1]=23fccc61 *(p+1)=a //as can be seen, p, &p and &p[0] all refer to addr of start of array p. We can use any of them to access array p. p is used as a ptr to start of array p[9:0].

There are some special cases when an array doesn't decay into a pointer. ex: char str[] = "Name"; Here the size of array is not explicitly specified, so not treated as pointer.

2D array: 2D array are simply double pointers. To see it, consider 1D array, where first element of array is referred via ptr *p. ptr p stores the addr of p[0].  Now consider 2D array. So, p[0][0] to p[0][n] is the first row of 2D array, p[1][0] to p[1][n] is the second row of 2D array and so on. So, to store each row of array, we can have a ptr *p for 1st row, ptr *q for next row and so on. Or we can have an array of ptr *p[] that will store 1st row, 2nd row and so on. So, ptr *p[0] points to 1st element of 1st row, ptr *p[1] points to 1st element of 2nd row and so on. So, it becomes 1D array of pointers, char *p[0], char *p[1] and so on. But any 1D array can be represented by pointers, so p[0], p[1] etc which are pointers can be rep by *p. But *p means p[0], p[1] are char and not ptr. To specify that p[0], p[1] etc are pointers to char, and not char themselves, we specify it as a double ptr, char **p or an array of single pointers as char *p[].

struct:

Struct type is collection of different kind of data which we want stored as single entity. For ex a person's account may contain name, age, income, etc.We can obviously make 3 different types of array: char type for name, int type for age, float type for income, etc. But that would be not easy to manage. However, we can group all these items together in a structure named "account", and then access each of these. That way, we have to create an array of just this "account" type, and it's much easier to manage data.


struct pointer:
------
struct reg { int a; char ch;};
struct reg arr[] = {{1,'s'},{2,'f'}}; //defines arr[0],arr[1] with values of reg type
struct reg *my0 = &arr[0]; //my0 is ptr to type "reg" and is assigned to addr of arr[0] which is of type "reg" too.
struct reg *my1 = &arr[1];

function pointers: http://denniskubes.com/2013/03/22/basics-of-function-pointers-in-c/
-----------------
function names actually refer to addr of func. So, we can have pointers to functions by using func name as addr (no need to use & with func name for addr).
char *pv=&v; => as explained above, this assigns pv to addr loc of variable v. *pv refers to contents of pv.
but for func, we need to have paranthesis around pointer. Also, instead of char, we need to have return val of func, and also need to provide all args of func (If no args, we can write "void" or leave it empty). i.e
void (*FuncPtr)() = HelloFunc; //Hellofunc is function defined with no i/p or o/p. OR
void (*FuncPtr)(void) = HelloFunc;
or:
void (*FuncPtr)(void); => define the ptr function
FuncPtr = HelloFunc; => assign ptr to addr of HelloFunc.

Now to reference any pointer value, we use *pv to read contents. (a=*pv reads contents of pv and stores in a). For function, we do the same, but we need to have *pv inside paranthesis to indicate it's a func. Also, we need (), since () operator is needed to call a function (with args in it if any). i.e
(*FuncPtr)(); //This calls the func HelloFunc() => equiv to calling HelloFunc(); directly.
FuncPtr(); //This also works and is exactly equiv to above line. This is since a func name (label) is converted to a pointer to itself. This means that func names can be used where function pointers are required as input. &func and *func are same and refer to func name, which is a pointer to itself. So, instead of using &func or *func, we should use func or func() directly

ex: (with args)
int (*FuncCalcPtr)(int, int) = FuncCalc; //or *FuncCalc, &FuncCalc, **FuncCalc all are same. FuncCalc fn is subtracts 2 int and returns an int.
int y = (*FuncCalcPtr)(10,2); //or FuncCalcPtr(10,2) is the same. returns 10-2=8, and stores it in y.

Func ptr can also be used as parameters to pass into another func. This is primary use of func ptr. ex:
int domath(int (*mathop)(int, int), int x, int y) {
  return (*mathop)(x, y);
}
in  main, do: int a = domath(add, 10, 2); //this calls domath func with ptr to "add" func.

typedef with pointers:
----
Above ex needed extra typing everytime fn ptr was defined. typedef can simplify this.
ex: Using typedef with pointers:
typedef int *intptr; => newptr is new alias with pointer type int*. "typedef int dist" creates "dist" as a synonym for int. similarly "intptr" is a synonym for "pointer pointing to int"
intptr ptr; => this defines a var "ptr" with type int. So, ptr is a pointer which can point to a memory with int type. We could have also written it as "int *ptr".

ex: Using typedef with fn pointers:
typedef int (*FuncCalcPtr_def)(int, int); => creates "FuncCalcPtr_def" as a synonym for a pointer to a fn of 2 int args that returns an int
FuncCalcPtr_def FuncCalcPtr; => this defines a var "FuncCalcPtr" with type "FuncCalcPtr_def", which is a ptr to fn. We could have written this as "int (*FuncCalcPtr)(int, int);

--------------------
size type:  http://www.acm.uiuc.edu/webmonkeys/book/c_guide/2.11.html
----------
size_t and ptrdiff_t were defined as separate type, since they are mem related. existing arithmetic types were deemed insufficient, since their size is defined according to the target processor's arithmetic capabilities, not the memory capabilities, such as available address space. Both of these types are defined in the stddef.h header as "typedef size_t", "typedef ptrdiff_t".

size_t: used to represent the size of any object (including arrays) in the particular implementation. It is used as the return type of the sizeof operator and is unsigned int.
ptrdiff_t:  used to represent the difference between pointers.

extended integer data type: *_t.
-------------------
to make code portable across diff OS, since existing int type have various sizes depending on system. The new types are especially useful in embedded environments where hardware supports usually only several types and that support varies from system to system. For ex: int N; may be 16 bit with certain complier/processor, while it may be 32 bit with other. Generally we don't care, but if these are being used in structure (as in MMIO), and we use a pointer to refer to various elements of that structure, then size of int does matter. One solution is to define new type as this:
typedef unsigned short uint16; => Here, we still need to know size of char,short,int,long on that complier/processor and modify this defn as needed.

an example depicting this problem is this:
typedef struct {
 volatile uint16_t CTRL1;           // control register
 volatile uint16_t CTRL2; }  CWT_TypeDef;
#define CWT                     ((volatile CWT_TypeDef*)  0x50000000  

Now in main.c, we do:
CWT->CTRL1=0x0012; CWT->CTRL2=0xFF55; => since compiler understands uint16_t to be of 2 bytes, it adds 2 to base addr to store CTRL2.
STRH     r0,[r1,#0];   => stores 0x0012 at base addr
STRH     r0,[r1,#0x2]; => stores 0xFF55 at base_addr+2

However, if we tried to use the same compiled code on some arch which had "unsigned short" implemented as 32 bits, then this code will incorrectly wrt 0x0012 into lower 2 bytes of CTRL1 and then 0xFF55 into upper 2 bytes of CTRL1. CTRL2 will not get written. In order to get rid of this problem, we have to change typedef in stdint.h to the correct 16bit integer and then we don't need to change our C code anywhere else. We recompile and generate correct binary for new arch. It saved us changing code from multiple places.

C99 std of ANSI-C, defined all additional data types as ending in _t, and user is asked not to define new types ending in _t. All new types are to be defined in inttypes.h header and also in stdint.h header by the compiler vendors. We can define types for exact width, least/max width type, etc. Exact width integer types are guaranteed to have the same number N of bits across all implementations. Included only if it is available in the implementation = intN_t, uintN_t. eg: uint8_t = unsigned 8-bit, uint32_t, etc.
sizeof() function can be used to find out the size of int, short, uint32_t, etc.

In cortex M0 stdint.h, int16_t is defined as type of "signed short int" =>  typedef signed short int int16_t;
similarly for uint8_t: typedef unsigned char uint8_t; => Note that in C, integers are represented in 16 bits or larger, so the only way to rep integer in 8 bits is by using signed/unsigned char.

---------------------
keywords & variables: http://www.acm.uiuc.edu/webmonkeys/book/c_guide/1.2.html#variables
---------------------
1. keywords: reserved keywords that can't be used as a variable identifier. ex: for, char, const, extern, etc.
---------
char short int long float double short signed unsigned => type and type specifier   
void
volatile const => type qualifier


2. variables: used to store values of a particular type
--------------------
names of identifiers: The use of two underscores (`__') in identifiers is reserved for the compiler's internal use according to the ANSI-C standard. Underscores (`_') are often used in names of library functions (such as "_main" and "_exit") and are reserved for libraries. In order to avoid collisions, do not begin an identifier with an underscore. Having __ both before and after variable name (eg __Symbol__) almost gurantees that there would be no name collision, as such identifiers with double underscores are extremely rare in user code.

A variable is defined by the following: <storage-class-specifier> <type-qualifier> <type-specifier> <type> variable-names,...
ex: extern const volatile unsigned long int rt_clk; => defines real time clk variable rt_clk

I. storage-class-specifier: storage class reflects data's lifespan during program execution.
1. typedef: The symbol name "variable-name" becomes a type-specifier of type "type-specifier". No variable is created.
ex: typedef long int mytype_t; => declares a new type mytype_t to be long int. From here on, we can use mytype_t instead of long int.
typedef most commonly used with struct to reduce cumbersome typing.
ex:
struct MyStruct {
    int data1;
    char data2;
};
with no typedef, we define var "a" of type Mystruct struct as follows: struct MyStruct a;
however with typedef, we can just define a new type as follows: typedef struct my_struct newtype;
or we can directly do typedef with struct defn as follows:
typedef struct  MyStruct {  
    int data1;
    char data2;
} newtype;
then we can define var "a" as being of type "newtype" as follows: newtype a;

2. extern: Indicates that the variable is defined outside of the current file. This brings the variables scope into the current scope. No variable is created.
3. static (permanent): Causes a variable that is defined within a function to be preserved in subsequent calls to the function. Variables declared outside the body of any function have global scope and static duration (for ex var declared outside main() are static, as they are not within any function). Although initial values may be assigned to global var, these are usually uninit. Since main() itself is a function, all var defined in it are local and auto, so we use "static" for these var to make them permanent. variables decalred outside main() are global for all functions in that file. static var are not released from mem on exit of function, so they consume mem space.
NOTE: we sometimes define function itself as "static".  In C, a static function is not visible outside of its translation unit, which is the object file it is compiled into. In other words, making a function static limits its scope. You can think of a static function as being "private" to its *.c file.
4. auto (temporary): Causes a local variable to have a local lifetime (default). Any variables declared within body of a function, including main(), have local scope and auto duration.
5. register: Requests that the variable be accessed as quickly as possible. This request is not guaranteed. Normally, the variable's value is kept within a CPU register for maximum speed.

II. type-qualifier: any declaration, inlcuding those of variables, struct/union, enum, etc can also have type-qualifier (volatile, auto) which qualifies the decl, instead of specifying it.
1. volatile: added to C pgm later. It causes the value to be fetched from memory every time it's referenced. It tells the compiler that the object is subject to sudden change for reasons which cannot be predicted from a study of the program itself, and forces every reference to such an object to be a genuine reference. This is used for defining variables stored in peripherals, so that uP doesn't read the variable from the register, which it might have stored a while back.
ex: volatile int j;

2. const: const means that something is not modifiable, so a data object that is declared with const as a part of its type specification must not be assigned to in any way during the run of a program.
ex: const int ci = 123; => declares a simple constant ci which always has a value of 123.
ex: const int *cpi; => declares a pointer "cpi" to a constant. cpi is an ordinary, modifiable pointer, but the thing that it points to must not be modified. comipler will check that cpi never points to something whose value changed.
ex: int *const cpi => declares a pointer "cpi" which is constant. It means that cpi is not to be modified, although whatever it points to can be modified \ the pointer is constant, not the thing that it points to.

III. type-specifier: void, all arithmetic types, boolean type, struct, etc.

3. enumerated tags:
-----------------

4. Arrays:
--------
array is defined as: <type-of-array> <name-of-array> [<number of elements in array>];
ex: int arr[10] => defines an array of 10 integers. Array elements are arr[0]. arr[1], ...  arr[9]. Each integer is 4 bytes (let's assume), so total size of array=4*10=40 bytes.

to initialize arrays:
we can either use for loop in main pgm, or init it at time of declaration.
int arr[] = {'1','2','3','4','5'}; => this init an array of 5 integers as follows: arr[0]=1, arr[1]=2 and so on. NOTE: there is no need to mention any value in the subscripts []. The size will automatically be calculated from the number of values. In this case, the size will be 5.
to init array with string 2 ways:
A. char arr[] = {'c','o','d','e','\0'};
B. char arr[] = "code"; => equiv to ex A above. We don't need an explicit null char here, since double quotes do that for us.

To access values in an array:
int j=arr[2]; => assigns j to arr[2] value which is 3.

We can also define array of structures, as well as array elements within structures.

5. structures and union:
----------------------
struct st{
    int a;
    char c[5];
};
int main()
{
    struct st st_arr[3]; // Declare an array of 3 structure objects
    struct st st_any[] = { {0,'c'},{1,'f'}}; //declares an array of 2 struct obj with values assigned    
    struct st st_obj0; // first structure object
    st_obj0.a = 0;
    st_obj0.c = 'a';
}

6. const:

7. strings: simply an array of characters encapsulated in double quotes. At the end of the string a null character is appended.
ex: char x="\x41" or "A" are the same string. x[0]=A, x[1]=null character.

8. define, ifdef: preprocessor that are remved by compiler depending on directive
#define NEW 0
#ifdef NEW => since NEW is defined this portion is kept by compiler
#define var ab
#else => this portion is removed by compiler
#define var cd
#endif

ex: this very commonly used in defines, so that we don't redefine something in multiple files
#ifndef CHAR_T => this piece of code can be placed in multiple files, but it will be compiled only from 1 file.
#define CHAR_T 0x45
#end
 
-----------------------------------

random number gen:
------------------
srand((unsigned) time(&t)); => inits rand num gen with time (in sec since epoch)
rand() % 50; => generates rand num b/w 0 to 49 using above seed. otherwise rand() will always gen same rand num seq since it will always use same seed (if we don't use srand).

Python:

Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming language. Similar to perl.
Python provides interfaces to all major commercial databases. Python supports GUI applications that can be created and ported to many system calls, libraries and windows systems. It can be easily integrated with C, C++, Java, etc.

Python has so much support from community, that almost everything can be done by using python's vast library or modules. You can build a website, write a program for raspberry pi, build games with gui, etc. Infact, it's one language that you can learn and get by doing everything without learning any more languages. I've avoided python in the past because of it's huge confusion over python2 vs python3, but now this issue looks settled.

 

Python 2 Vs Python 3:


Python 3 is latest version, python 2 is going to be EOL in 2020. So, switch to python 3 for all coding. There are significant differences b/w python2 and python3, so don't waste your time learning python2. However, we may still want to install both python2 and python3, as many pgms are still written in python2. So, not having python2 will cause those pgms to error out. When we install python 2, it is installed as both python (python is just a soft link to python2) and python2, while python 3 is installed as python3. Since python is just a soft link to python2, we can change it at any time to point to either python2 or python3. We might be tempted to change "python" soft link to point to "python 3". However, it's very risky as pgms written in python 2 may suddenly start failing if python3 is installed as python. There are many system programs that might be written in Python2, and they rely on the python link to point to Python2. So, we keep soft links for all python, python2 and python3 as is. We can change the soft link of "python" to point to "python3" if needed for a short term. However, it's advisable to leave python soft link pointing to python2, and have a separate link for python3.

python3 itself has several versions as python3.4, python3.6, etc. Latest stable version is python3.8 as of 2020. 

NOTE: On any linux distro, after installing latest version of python3, change the soft link of python3 to point to python3.8 or whatever is the latest version. That way, your latest python version would be available for your programs, by just typing python3.

cmd: cd /usr/bin; rm python3; ln -s python3.8 python3; => now calling python3 calls python3.8

All the discussion below is for python3, unless explicitly specified. I'll point out the differences where applicable. Be careful when searching for python on internet, as many of them are for python2, and may not work for python3.

Official doc is on python.org site: https://docs.python.org/

geeksforgeeks also has very elaborate and fantastic tutorial here: https://www.geeksforgeeks.org/python-programming-language/

 

Python installation:

On any Linux distro, to install python3, there are 2 ways: one thru package mgmt, while other by manually downloading and installing.

1. Pckage mgmt:

CentOS: For CentOS, we install using yum. Below are the ways to install python2 and python3.

A1. rpm pkg for python2: sudo yum install python => by default, it installs puthon2. It specifically installs python 2.7 in /usr/bin/python2.7. A soft link is created to python2 in /usr/bin dir (/usr/bin/python2 -> python2.7). Another soft link "python" is made to python2 (/usr/bin/python -> python2).  and python2 are soft links to python2.7 already installed

A2.  rpm pkg for python 3.4: sudo yum install python34 => installs python 3.4 in /usr/bin/python3.4. A soft link is created to python3 in /usr/bin dir (/usr/bin/python3 -> python3.4) python and python2 are soft links to python2.7 already installed

A3. rpm pkg for python 3.6: sudo yum install python36 => installs python 3.6 in /usr/bin/python3.6. A soft link is created to python3 in /usr/bin dir (/usr/bin/python3 -> python3.6). python and python2 are soft links to python2.7 already installed

A4: rpm pkg for python 3.7: sudo yum install python37 => Although latest version of python is python3.7, yum repo still doesn't have it, and gives an error that "no such package found". Run "yum info python37" to find out if python 3.7 available or not.

NOTE: one very important thing to note is that "yum" is written in python2. So if you change soft link of python to change to python3 (after installing python3), then yum will not work and will throw this error:
  File "/usr/bin/yum", line 30
    except KeyboardInterrupt, e:
 SyntaxError: invalid syntax

To fix this, do one of 2 things:

1. change python version being called in yum to python2: In /usr/bin/yum file, change first line from "#!/usr/bin/python" to "#!/usr/bin/python2". This will force python2 soft link to be used, instead of using python link.

2. change softlink in python to python2. This will cause yum to still use python2 as softlink python is pointing to python2. However, this may cause other pgms to fail, which may rely on pyton3, and need python ink to point to python3. To fix this, any pgm that needs to have python3, change first line in that pgm to point to python3 instead of python

First choice is preferred, as python3 is the step forward, so keeping soft link python pointing to python3 is going to work for most pgms.

Linux Mint: On LinuxMint, we install using apt. Latest python is 3.8 as of June 2021.

A1. sudo apt install python3.8 => This installs python version 3.8. Look in /usr/bin/ dir to make sure you see python3.8 over there.

 

2. manual: not tried yet. It's not recommended way, as it requires lot more effort, and there's no reason to do it (as all linux distro allow you to install via pkg mgmt)

 

Python syntax:

1. comment: Python comment is anything after a # at start of line or end of line. Multiline comments are not supported, but can be mimicked by putting any comment within triple quotes, i.e " " " .... multi line comment " " "

ex: a=2 #comment


2. case sensitive: Python is case sensitive. So, var and Var are different.


3. End of line: Each stmt ends with newline \n. So, no ; needed (this is in contrast to other languages which use ; etc to indicate end of line). However for multi stmt in single line, we need ; In cases where we need line to continue, we use line continuation char \. Recall that \ hides metacharacter immediately following it and treats it as literal, so newline metacharacter is hidden from shell interpretor. What the interpretor sees is just a space.
ex:
total = item_one + \
        item_two + \
        item_three


4. Blocks of code: no braces provided, instead all statements within the block must be indented the same amount. This is unique feature of python, and also very confusing as all other languages use brackets or keywords to mark begin or end of block, but never rely on spaces or indentation. This indentation needs to be a tab or 4 spaces to signify a block. 2 tabs or 8 spaces signifies another block nested within the outer parent block. Similarly 3 tabs signifies yet another nested block within outer 2 blocks and so on. We can have 1 space also to indent a block, but for readability, we keep it as 4 spaces or 1 tab (most editors automatically convert tab into 4 spaces, so it's the same thing) . All of the code with same number of spaces at start of line is considered part of one block. NOTE: we can't have 0 spaces to identify a block, as that will error out. We do need some indentation.


ex:
if True: => header line begin stmt with keyword (if, else, while, etc), terminate with : and are followed by suite
   print "True" => this group of stmt in single code block called suite. This is indented by a tab or 4 spaces, so it's part of if block

   print "I'm here" => This is part of if block too, as it's same indentation.
else:
   print "False" => this is part of else bock, as it's indented by a tab

print "end" => this is not part of if-else block as it's not indented at all.

NOTE: these are 2 of the most distinct departure from other languages.

  1. One is the absence of end of line character (i.e no semicolon etc, just a newline marks end of cmd in a line). We can always add a semicolon at end of line and python will work just fine, but correct way is to not put a semicolon.
  2. Second is the use of tabs or spaces to identify blocks of code. Usually high level languages don't rely on spaces for correct functionality, but python is all about spaces. Most other languages use curly braces  { ... }  to define scope of loops, functions, etc.

 

5. reserved keywords: Like any other pgm lang, python has reserved keywords, which can't be used as var names or for any other purpose. ex:

1. if else,
2. and, not, or
3. for, while, break, continue
4. print, exec, try, return, assert, class. print function is most used function and is explained later under "Functions" section.

6. quotes: single quotes and double quotes have same meaning, and so are interchangeable in python. We use one or the other when it's absolutely needed, i.e use double-quotes if your string contains a single-quote

Running python:

We can run python interactively or run a python pgm via a cmd line

python --version => returns version num. If it's 2.x it's older, if 3.x it's newer. We can also run "python -V" to get version num.

1. interactively:

typing python brings up python shell. Prompt is >>>. We can type any python cmd in it. Type "Ctrl + D" to exit shell.

>>> print("Hello")

prints Hello on screen

2. via cmd line:

file: test.py => here are are specifying python3 as interpretor instead of python (since python is usually set as a soft link to python2)

#!/usr/bin/python3
print ("Hello, Python!") #this is a comment:
# single line comment


> type ./test.py to run above file. (do chmod 755 test.py to make it a executable file).

> python3 test.py => This also runs the above python file. We could do "python test.py" too. This will work as long as syntax in test.py is python2 syntax.

 

Data Types and variables:

 

I. Variables:

As in any programming language, we need to define variables which store data, which may be of multiple data types supported by the language. var do not need explicit declaration of data type. This declaration happens automatically when you assign a value to a variable using = sign (i.e var2=1.1 => assigns float num 1.1 to var2)

variable names or Identifiers: starts with letter (a-z) or _. Variables are not declared beforehand to be of a particular type (as in C), but this is a common practice in most shell programming. The type is figured out by python during assignment.

II. Data types: Python is strongly typed language, meaning we would need to convert one data type to other to use it, else it will give error.

A. primitive data types: In python, we have 4 primitive data types:

1. numbers: numbers may of 4 types:

 A. int (signed integers) ex: var1=10
 B. long (long int, can also be oct or hex) ex: var2=-579678L; var3=0xDEADBEEF
 C. float (fp real) ex: var4=15.2; var5=32.3e18
 D. complex (complex num) ex: 3.14j, 4.5-3j

2. Strings: strings are continous set of char in double quotes "...." or single quotes '....'. In both of these quotes, values are not substituted but printed as is. There are special formatting available that allow substitution within single or double quotes that is explained later. This is different from other scripting languages and common programming languages which treat single and double quotes differently. There is also a triple quote in python that allows string to span multiple lines (so newline tab etc are treated as part of string).

ex: var1 = "my name"

ex: address = ''' my house is at => due to triple quotes, everything on this line and below is part of string including new llines. print(address) will print all 3 lines as is.

1207 goog ln,

los angeles "'


Subsets of strings can be taken using the slice operator ([ ] and [:] ) with indexes starting at 0 in the beginning of the string and working their way from -1 at the end.
The plus (+) sign is the string concatenation operator and the asterisk (*) is the repetition operator.
ex: str='My world'; print(str[0]) => prints M; print(str[2:5]) => prints worl; print(str+"TEST") => prints My worldTEST

There are 2 types of string in python. traditional str type (which is 1 byte or 8bit char), and newer unicode type (which are 2 byte or 16 bit char and can be upto 4 bytes for UTF-8). On any string type, we can put a char "u" infront of the string to indicate that it's unicode type. u or U refers to UTF-8 style where each string can be variable length from 1 byte to 4 bytes. (UTF-8 is used widely now, since 1 byte could only store ASCII char and can't handle millions of other char out there. UTF-8 is compatible with 1 byte ASCII code)

There are many other prefixes besides "u" to indicate how the string is going to be interpreted. "r" means raw string type (so that anything inside the string is going to be treated as literal and not interpreted. ex: r"me\n" => this is not going to treat \n as new line but instead as 2 literals \ and n.

str=u'Me .op' => this string is now unicode type (since u preceedes the string). So, each character is stored as 16 bits instead of 8 bits. u'text' is just a shortcode for calling unicode('text')

formatted strings: In version 3.6 of python, formatted string literals were introduced. So far, no substitutions happened for any characters inside strings, but with formatted string (or f-string), we can provide replacement fields by enclosing them within { ... }. Any python expr is allowed within these curly braces.

ex: name = "Fred"

a = f"He said his name is {name}." => This substitutes name with variable "name"., since we have f in front of string.

a = "He said his name is {name}."=> no substitution occurs

NOTE: char: There is no char var type in python. char are rep with string with length of one.

There are many string methods available to operate on strings. Look in python link for such methods.

ex: str.upper() returns a copy of string, with all letters uppercased. "My name".upper() returns string "MY NAME"

3. boolean: 2 values: True and False.

 

B. Compound data types:

4. List: most versatile. similar to arrays in C, except that items belonging to list can be of diff data types. NOTE: there are no array data type in Python. List are superset of arrays, so we use list in it's place. list have syntax same as that of array. On internet, lot of articles talk about array in python. In reality they are not talking about array, but list.

list contains items separated by commas and enclosed within [].

1D Lists:

ex: mylist = [] => this defines an empty list (since [] used, it implies a list). However, the size of list is not defined here, i.e if the list has 10 entries or 100 entries isn't mentioned, so it's not possible for compiler to reserve memory for this list at this point in time.
ex: list1 = ['A',1,"john", 23]; print(list1[1:3]) => prints [1, "john"] => Here, we specify entries of the list. So, here compiler/interpretor reserves memeory for list depending on how many entires are in the list, and the size of each entry. NOTE: the range specified includes item with index=1,2 but NOT index=3, as range is up to index-1. Also, commas preserved when we print the list
ex: list1[3]=102 => this updates value 23 with new value 102, not possible with tuple since it's read only
ex: for x in [1, 2, 3]: print x, => prints 1 2 3

Assigning values to list: We saw one way to assign initial values to list. Let's see if we can assign initial values to a list in other way.

my_list[0]=4;=> Here my_list is defined for the 1st time with 0th entry having value 4. Previously, we assigned list values as my_list=[4] which worked. This will give a Name Error: "NameError: name 'my_list' is not defined". This is because we are accessing indices of list, and python doesn't know what indices it has. So, let's define an empty list.

my_list = [];  my_list[0]=4; my_list[1]=2; => This will give an Error: "IndexError: list assignment index out of range". This is because python doesn't know the size of the list. If we assigned values to this list as my_list = [4,2] => then python knows the size of list as 2, and assigns value as my_list[0]=4 and my_list[1]=2. Then we can access value as my_list[0].

One way to resolve above issue is to define the list with size specified. ex: my_list = [0]*4; => This defines a list with 4 elements [0,0,0,0]. Now we can do my_list[0]=4. However, here list elements must all be of same type, else *4 won't work.

2D lists:

2D lists are en extensions of 1D list, i.e each element of a 2D list is in itself a 1D list.

ex: my_arr = [ [300, 200,100, 900], [600, 500, 400, 700] ]; => This is a 2D list, where each list element is 1D list.

Accessing list elements: We access it the same way as in 1D list, except that we provide the index of 1D list also.

print(my_arr[1][0:2]) => prints [600, 500]. This is called slicing of array/list/tuple. format is [start_index:stop_index-1:increment of index]. See in numpy module section for more details. so, my_arr[0][3:1:-1] =  [900, 100]

ex: print(my_arr[:]) => This prints entire 2D array since blank start means start from 1st index and blank end means stop at last index. Since no dimensions specified, it includes all dimension, so o/p is: [ [300, 200,100], [600, 500, 400] ]. This applies to any dimension array. arr[:] will all elements of the array. For some reason, slicing across multiple indices don't work, i.e my_arr[1:3][0:5] returns empty array.

We define a 2D list same way as 1D list. i.e list_2d = []. However, we can't do something like list_2D[0][0]=5, without having this list already specified for same reasons as 1D list above) with values as: list_2D=[[67,34],[35,67]]. Now we can do: list_2D[0][0]=5.

We can initialize 2D list as: my_list = [[0]*2]*3; => This will create 2D list of 2x3 with all values as 0, i.e [[0,0,0],[0,0,0]]

list operators: There are multiple operators for manipulating lists. Some of them are: cmp(list1, list2); len(list3); list.sort(list4);

Arrays: Lists behave almost same as arrays, but are not efficient. Lists are more generic than array (in that they allow multiple data types, while array allow same data type only), but they also get less efficient for storing/retrieving, computing, etc. Most scientific computations can be easily carried out with arrays, since they usually work on only one kind of data (i.e int, float, etc). Python doesn't enforce typing (i.e one particular type of data as int, etc), so they never created an array data type in Python. For most basic uses, lists serves our purpose, and we don't care about speed. However, if performance becomes critical because of large amount of data to work with, then Arrays are needed.

We said previously that python doesn't have arrays. However, python supports modules which allow us to use arrays. 2 ways to create arrays in Python

A. array module: Python has module "array" that can be imported to get arrays.  We specify the type of data elements, and all elements of array have to be of that data type. There are many functions available to operate on array. This method is not recommended method for creating arrays, use 2nd method using numpy module.

ex: import array as arr => We don't need to install any module for this. More details about array module can be found on internet

my_array = arr.array('i', [2, 4, 6]); print(my_array) => prints array('i', [2, 3, 5]) => NOTE: everything in array including data type is printed. Again, commas preserved while printing array (same way as in lists)

B. numpy module: There is NumPy module that can be used to create arrays. It's not included by default with Python distribution, so will need to be installed (see in NumPy section). This is the recommended method for creating arrays.

5. tuples: similar to list, specified using (). however they cannot be updated (i.e read only). We can apply slicing across tuples also. Used very rarely in simple codes.
ex: tuple1 = ('ab', 2.7)

6. sets: sets are similar to sets in maths where we can take union, intersection, etc. Sets defined using curly braces { .. }. They contain any number of objects, and of different types. Sets are unordered: the original order, as specified in the definition, is not necessarily preserved. Additionally, duplicate values are only represented in the set once. set elements must be immutable. For example, a tuple may be included in a set, as it's immutable. However lists and dictionaries are mutable, so they can’t be set elements. Other way to create set is using the set() function.

ex: x = {'foo', 'bar', 'baz', 'foo', 'qux', 12, (1,2), None}

print(x) => {none, 'foo', 12, (1,2), 'bar', 'baz', 'qux'} => NOTE: duplicate entries are removed, and order of elements is not preserved

Many operators as union, intersection, difference, |, &, ^, etc are allowed on sets. sets are also very rarely used in simple programs.


7 dictionary: They are like hashes in perl. They are also known as associative arrays. They consist of key-value pair. key/values can be any data type.
Dictionaries are enclosed by { } and values can be assigned and individual elements are accessed using [ ... ]. since both sets and dictionary use { }, we distinguish b/w the two via presence of ":". Since { } is used to rep empty dictionary, we can't use {} to rep empty set (since then python interpretor has no way of knowing if the object is a set or a dictionary). In that case,, we use set() func with no args to create empty set. We use ":" to assign key:value pair for each element

1D dictionary: Just like 1D list, we have 1D dictionary:

ex:

tinydict = {'name': 'john','code':6734, 'dept': 'sales'} => Assigns key value pair as follows: name->john, code->6734, etc. print(tinydict.keys()) prints ['dept', 'code', 'name'] while tinydict.values() prints ['sales', 6734, 'john']

tinydict['name'] prints "john", tinydict['code'] prints "6734" and so on

Assigning values to list: There are 2 ways to assign dict key/value pair.

A. We can assign dict key/value pair as we did in 1D list, and as shown in ex above.

ex: tinydict = {"name": "john",5:9}

B. We can also assign dict values in array form as shown below. This is different than in 1D list, where we weren't allowed to do dict[0]=5 and so on.
dict = {} => initialize dict. This is needed for dictionary, as w/o this there is no way to know for python compiler/interpretor to find out if dict[0]=1 is list assignment or dictionary assignment.

dict[0]=5 => Now we are allowed assignment like these. NOTE: 0 is a key her, and not index number. It just happens to be a integer key here, as 0 is not enclosed in quotes. The value is also integer as it's not enclosed in quotes.
dict['one'] = "This is one" => print (dict['one']) prints "This is one". Here both key and value are strings.
dict[2]     = "This is two"

2D dictionary: Just like 2D list, we can have higher dim dict as 2D, 3D, etc. However for 2D dict, we can't do something like dict_2D['a']['b']='xyz". Th reason might be that 2nd index it needs to know the range. So, we have to first define 1D dict, and then use that 1D dict as elements of 2D dict.

ex: dict1D['age']=35; dict1D['salary']=300;

dict2D['ramesh']=dict1D => Now dict2D['ramesh']['age']=35, dict2D['ramesh']['salary']=300 and so on. dict2D['mohan']={'age':50,'salary':500}. So 2D dict are just an array of 1D dict.

So, 2D dict are little cumbersome to write as you will first need to form 1D dict and then use that as elements of 2D. It would have been nice to just directly assign elements to 2D dict.


Operators:

Just like in other lang, we have various operators to operate on variables. Mostly operators are used for number data type (int, float, etc), but some of them can be used on other data types too. How the operator behaves depends on the data type of it's operands.

1. arithmetic: +, -, *, /, etc. ex: a+b. + and * are used in strings to concatenate or repeat strings.
2. comparison: ==, !=, >, >=, etc ex: (a<b)
3. assignment: =, +=, -=, etc ex: c=a+b;
4. bit wise : &, |, ^, ~, <<, >>, etc ex: a=16, b=2, a&b
5. logical: not, or, and

Control statements:

1. if elif else: This is same as if stmt in other languages. elif is substitute for "else if". Both elif and else are optional. An ifelifelif … sequence is a substitute for the switch or case statements found in other languages.

ex:Below if .. elif .. else stmt needs appr tab spaces for each block of code. NOTE: if, elif and else are at start of line with no tab.

if x < 0:

  x = 0

  print('Negative changed to zero')

elif x == 0:

  print('Zero')

else:

  print('More')

ex: if ( var == 100 ) : print ("val is 100") #for single line suite, it can be on same line


2. for: for stmt differ from that in C. There is no start, end or iteration index. Python’s for statement iterates over the items of any sequence (a list or a string), in the order that they appear in the sequence.

ex: below iterates over the list and prints each word and length

words = ['cat', 'window', 'defenestrate']

for w in words:

  print(w, len(w))

ex: to iterate over a seq of numbers just as we do in for loop in C pgm, we can use range() function. syntax of range is (start,stop,step), where stop is required parameter, while star/step are optional. sop value is not included in range (i.e range is upto stop-1).  range(10) generates 10 values, from 0 to 9 (doesn't include 10). range(5,9) generates 4 values = 5,6,7,8. range(0,10,3) indicates step value of 3, so it generates 4 values = 0, 3, 6, 9. So, by using range() function, we can achieve what we do using for loops in C pgm.

for i in range(5):

  print(i) => prints 0,1,2,3,4

ex: To iterate over the indices of a sequence, you can combine range() and len() as follows:

for i in range(len(words)):

  print(i, words[i])) => This prints index 0,1,2 and prints the 3 words

3. while: The while statement is used for repeated execution as long as an expression is true.

ex: infinite loop below since expr is set to "True"

while True:

  print("infinite loop")

4. break, continue, else: break, continue and else claues can be used in loops as "for" and "while", "break" breaks out of the innermost enclosing for or while loop, while "continue" continue thru next iteration of loop. Else clause can be used for loops as "for" and "while". a loop’s else clause runs when no break occurs. Look for more details in the python website link above.


Functions: Function syntax is similar to those of other lang. All functions require parenthesis and optional args inside it.

1. Builtin: Python provides many builtin functions as print(), int(), abs(), open(), sorted(), etc.

A. print( ) : print function is one of the most used functions to o/p something on screen. It wasn't a function in python2 (it was just a statement), so no ( ) were required with print, but it's a function in python3, so it needs ( ). i.e: print("Hello, Python!"); However () works in python 2 also. So, it's preferred to use print as a func with parenthesis ( .... )

Python2: print "The answer is", 2*2, "My name=", name, "var=", 2*var1

Python3: print("The answer is", 2*2, "My name=", name, "var=",2*var1) => this will work in python2 also as parenthesis work in python2. Anything within quotes is printed as literal string, anything outside quotes is computed if it can be computed based on data types, or the value is just printed if it's a var. A newline is appended by default, but if we put a comma at the end of args 9i.e just before closing parenthesis), it suppresses newline.

We can use strings, list, var, etc to be printed using print. With List and tuples, full list will be printed, w/o requiring us to iterate over each element of the list.

ex: formatted string and other string type can be used inside print

name = "Fred"; print(f"He said his name is {name}." ) => This substitutes name with variable "name"., since we have f in front of string.

% operator for strings: String objects have one unique built-in operation: the % operator (modulo). This is also known as the string formatting or interpolation operator. Given format % values (where format is a string), % conversion specifications (as d, s, f, etc) in format are replaced with zero or more elements of values. The effect is similar to using the sprintf() in the C language.

ex: name="afg"; age=2;

my_format = "his name is %s, his age is %2d"; my_values =  (name, age) => NOTE: my_values need parenthesis since they are tuples (not curly braces or square brackets)

print(my_format % my_values) => Here %s and %2d in format string are replaced with values in var "name" and "age".NOTE: the whole thing here can be treated as a string, that is put inside print function. Whatever is the o/p of this formatting operator is passed to print func as an argument.

o/p is => his name is afg, his age is  2

ex: print( ' %(language)s has %(number)03d quote types.' % {'language': "Python", "number": 2}) => outputs "Python has 002 quote types". Here "s" after %(language) is a conversion spec saying convert 'language' object into a string using function str(). similarly 03d spec asks it to convert "number" into signed integer with 3 digits. Here values are not tuples, but hash, so curly braces used. NOTE: there is no comma after single or double quotes of string, as it's "format % value" that is being used inside print function, and not the typical "string followed by variable" syntax

ex:  We can use % operator on string inside print func, along with other regular args, as strings, var, etc to be printed. The whole format string is just another string arg to print func.

var2=23; var3 = "my stuff"
print('The value of pi is approximately %5.3f.' % math.pi, var2, "good", var3) => Here math.pi is formatted with total of 5 digits and 3 digits of precision (%. %5.3f means width=5, precision=2).

o/p is => The value of pi is approximately 3.142 23 good my stuff

format method: above are older ways of formatting print o/p. Now, we use format method to format strings.

ex: print('{0} and {1}'.format('Geeks', 'Portal'))=> {0} is replaced by string in 0 position which is 'Geeks' and {1} is replaced by string in position 1 which is 'Portal', so o/p is => Geeks and Portal. NOTE: there is no comma here after single or double quotes but a dot, since we are using the method on print argument, so it's not typical print variable.

B. input( ): input function is other widely used function to get input from user. There are diff b/w how this func behaved in python2 vs python3.

Python 2:

python2: str = raw_input("Enter your input: "); => raw_input() reads 1 line from std i/p and returns it as string w/o the newline
python2: str = input("Enter your cmd: "); => same as above except that valid python expr can be provided, and it will return result. result is still stored as string.
  Enter your cmd: [x*5 for x in range(2,10,2)]
  Recieved input is :  [10, 20, 30, 40] => str stores this list

Python 3:

python3: raw_input() function from python2 has been deprecated and instead replaced by input() func.So, no python expr can be provided.

python3: input() function from python2 is depracted, and instead stmt eval(input()) must be used to get same behaviour as input() func of python2. We don't use this stmt much, instead input() func above is used.

With all these input functions above, the result is stored as string, so in order to do numeric computation, we have to do data conversion using func below. Also, no expr are allowed, i.e expr will be treated as strings, and won't be computed.

ex: here 2 numbers are provided as i/p, but have to be converted to int in order to add them

num1=input("1st number")

num2=input("2nd number")

sum=int(num1)+int(num2)

print("Sum is", sum); #Here if i/p is 1 and 2, then o/p is 3. If we just did "sum=num1+num2", then it would concatenate the 2 strings and print "12"

C1. type(): type is an inbuilt func to find data type of any var or object (in case of OOP discussed later):

ex: age=50; print(type(age)) => prints type as "int".

ex: type_var = type(tinydict) => assigns "dict" string to type_var (as tinydict defined above is of type "dict")


C2. data conversion: data can be converted from one type to other by casting. Some of the casting functions are:
ex: int(x), str(x), list(y), hex(x), dict(d)

ex: python3: var_int = int(input("Enter any number: ")); var1=var_int+1; => here, var_int stores integer (i.e any number entered is a string, but then int() func converts it to int, so that we can do airthmetic computation on it.

C3. isinstance(): The isinstance() function returns True if the specified object is of the specified type, otherwise False.

ex: if (isinstance("Hello", str) ): print("true") => This checks if "Hello" is of type string. It returns True since anything within ".." is a string

ex: my_num=4.7; var1=isinstance(my_num, (str,list,dict,tuple)); print(var1) => this prints "False", since my_num is of type "int", while allowed types that this func is checking for are str,list,dict and tuple.

D. Maths:
ex: abs(x); log(x); max(x1,x2,...); pow(x,y);
ex: random()
ex: cos(x); radians(x);
ex: constants: pi, e

E. File functions: Python has file functions for reading/writing files just as in other lang.

file read/write ex shown below:
fo = open("foo.txt", "w+") => opens file for both rd/wrt, ptr at start of file. w=wrt_only, r=rd_only, (a=append_mode, ptr at end of file)
fo.write( "Python is a great language.\nYeah its great!!\n");
str = fo.read(10); => reads 10 bytes from file, if no arg provided, then reads whole file until EOF
print "Read String is : ", str
fo.close

exception: when script encounters a situation that it cannot cope with, it raises an exception. An exception is a Python object that represents an error. exception must be handled, else pgm terminates.
ex:
try:
   fh = open("testfile", "r")
   fh.write("This is my test file for exception handling!!") => trying to wrt to rd only file, raises an exception of IOError
except IOError: => std exception raised when an I/O operation fails
   print "Error: can\'t find file or read data" => This gets printed when IOError exception happens in try block
except ... => some other exception code can be put here for a diff exception raised. There are about 30-40 different exception errors that we can specify

except: => "except" stmt w/o any Exception code means raise this exception for any exception error
else:
   print "Written content in the file successfully" => If no exception, then run this block

Assert: An assertion is a sanity-check. An expression inside assert stmt is tested, and if the result comes up false, an exception is raised. Assertions were added in Python 1.5. They are usually placed inside function definition to check for valid inputs or to check for valid outputs. AssertionError exceptions can be caught and handled like any other exception using the try-except statement, but if not handled, they will terminate the program and produce a traceback. Assertions are very useful in exposing bugs, and should always be used extensively.

assert (Temperature >= 0),"Colder than absolute zero!" => This checks for Temperature variable to be +ve. If -ve, then the stmt following assert is printed "Colder ..." and pgm terminates.

assert(isinstance(b, float) or isinstance(b, int)) => Here on failure of assertion (i.e b is neither float nor int), no stmt is printed, but pgm terminates with traceback. If there are many assertions in pgm, it may be tedious to figure out which assertion failed, so it's good practice to have "text" following assert keyword.

 

2. User defined: Besides the built in functions provided by python, we may define our own function also. There are 2 kinds of function defined in python:

A. Normal function: These are regular function definition (as is common in other pgm lang)

defining a func:
def functionname( parameters ): => i/p param or args
   "function_docstring" => optional: explains what this func does
   function_suite
   return [expression] => If no expr provided, it returns none

ex:
def printme( str ):
   "This prints a passed string into this function"
   print str
   return;

printme("I'm first call to user defined function!") => calls printme func

NOTE: All parameters (arguments) in Python are passed by reference. It means if you change what a parameter refers to within a function, the change also reflects back in the calling function.

If var defined within func, then they are local to func, and are diff from same var declared outside the func.
total = 0; # This is global variable.
def sum( arg1, arg2=10 ): //default val of arg2 is 10
   total = arg1 + arg2; # Here total is local variable.
   return total; //here 30 is stored in total and returned.

# Now you can call sum function
total1 = sum( arg1=10, arg2=20 ); //here total1 is 30. We use arg1 to specify that 10 is for arg1, so on. This allows to place args out of order
print total; => here total is printed as 0, as it's global var

Passing func as an arg: We can also pass a func as an arg to another func

ex:

def shout(text): 
    return text.upper() 
def greet(func1): => Arg of greet function is func1
    greeting = func1("hi") => func1 is called with arg specified
    print(greeting)
  
greet(shout) => This calls greet func with arg "shout", which is itself a func. shout gets called with arg "hi", so o/p returned is HI.

B. anonymous function: These are functions w/o a name, and are faster way of implementing simple one line functions. "lambda" keyword is used to create anonymous functions. This function can have any number of arguments but only one expression, which is evaluated and returned. It's also called as lambda func and can also have another function as an argument. 

ex: square = lambda x1:x1 * x1 => Here, we define square as lambda func with one arg "x1". It computes square.Here lambda func is assigned a var "square", which points to the lambda func

print(square(5)) => This calls the var pointing to func "square" with arg =5. It returns 25.

ex: cube = lambda func1:func1**3 => here func1 is an arg to lambda func.

print(cube(square(2)) => here cube func is called with arg "square(2)". Now, square func is called with arg 2, which returns 4. This 4 is now cubed to get final answer

More Topics: More advanced topic are in next section.

Perl

perl stands for practical extraction and report language. It is designed for tasks that are too heavy for shell, and too complicated to code in C.

perl is highly portable. It runs on any unix like system that has C compiler. It runs on most platforms, since package comes with configuration script that pokes dir looking for things it requires, and adjusts include files and defined symbols accordingly. Perl originated in 1990's, became very popular, but now is getting over shadowed by python. Before perl came, awk and sed scripting languages wre used. Perl was a big improvement over these, which contibuted to it's rise. However, syntax wise, python is easier for beginners than perl. I've included perl on this site, as many legacy programs at various companies are still written in perl, which you may need to debug, so knowing little perl is going to be useful. However, if you are looking to learn a new scripting language, move to python. Python has lot more support than perl, and is increasingly preferred for future scripts.

Unlike shell pgm, perl is not a true interpreter . It compiles the file, before executing any of it. Thus it is compiler and interpreter (just like awk and sed).

Link for beginners (There are lot of other useful link for beginners on this site.): http://perl-begin.org/tutorials/perl-for-newbies/ 

Official perl documentation: https://perldoc.perl.org/

perl version: very important to verify version before starting to work, as syntax/features changes a lot b/w versions. Perl version 5 and beyond have lot more changes compared to earlier versions.

perl -v => returns v5.18.4 on centOS 6 running on my machine.

perl -V => (note capital V). This shows lot more details as compiler, library, flags, etc used for perl on this system

simple perl pgm: test.pl => we name the file with extension .pl as a convention. Unix doesn't care about file extensions, as it's not used for anything.

#!/usr/bin/perl
use strict;
use warnings;

print "Hello ARGS = @ARGV $0 \n";

Save file above as test.pl, then type:

chmod 755 test.pl => This makes the file executable.

./test.pl cat dog => this gives "Hello ARGS = cat dog ./test.pl"

Basic Syntax:

Just like any other programming language, perl has variables to store diff data types, has reserved keywords or commands and special characters. These allow the language to do all sorts of tasks.


1. comments: start with # till the end of line. No multi line comments


2. semicolon ; => all stmt terminated by ;


3. whitespace (spaces, tabs, newline, returns) => optional. whitespace is mandatory only if putting 2 tokens together can be mistaken for another token, else not needed. However, as we have seen with other scripting languages, we should always put whitespace to avoid ambiguity

4. curly braces {} => curly braces are used to group bunch of stmt into 1 block. Mostly used with control stmt.


5. parenthesis () => parenthesis for built in functions like print are optional. ex: print ("Hello\n"); print "Hello";

6. use <mod_name> <LIST>; => This function imports all the functions exported by MODULE, or only those referred to by LIST, into the name space of the current package. LIST is optional, but saves time and memory, when all functions in MODULE are not needed

ex: use Cwd qw(cwd chdir); => imports functions "cwd" and "chdir" from module Cwd

ex: use Time::HiRes "gettimeofday"; => imports function "gettimeofday" from module Time::HiRes

use strict; => this is perl pragma that will require all var to be declared before being used (all var will need to be declared with "my", else it will generate an error). pragma is a directive to compiler/interpretor on how to process its i/p (sort of cmd line options). "use" stmt applies pragma even before the pgm starts. This is enabled by default on perl 5.12 and later, so no need to explicitly code this.
ex: my $a=2; => "my" makes var local to the scope, so that once code is out of this scope, var is restored to its original value outside of scope. This helps prevent conflicts from having same name in multiple places. This is useful using in subroutines, as all var are global by default.  Since we used "strict" above, if my wasn't used to declare $a, then any refernce to $a would be an error (i.e $a=2 is error). This helps to find typing errors.

ex: our $q; => our var can be accessible from any code that use or require that file/package by prepending with appr namespace. $pkg1::q (if we used my $q, then this would have given error as q won't be accessibke outside the pkg)

use warnings; => This turns on warnings, and got introduced in perl 5.6. It hss same effect as -w on 1st line of perl (#! ... -w)
reserved words are almost always lowercase. So, use uppercase for user defined var. var names are case sensitive. so var and VAR are 2 diff names.

Data types:

Perl has 3 data types: scalar, array of scalar and hash of scalar. Any var not assigned has undef value (0 for num and empty string for string). print function can be used to print scalar and array data type directly, while hash needs each element to be printed separately.

1. scalar: preceded by $ sign, and then a letter followed by letters,digits,_. It's single unit of data which may be int, float, char, string or a reference.  Data itself is scalar, so var storing it is scalar.
operators (as + or concatenate) can be used on scalars to yield scalars.
ex: $salary=25.1; $name="John Adf"; $num = 5+ 4.5; #perl interprets scalars to get correct computation. Note: 5+4.5 with no spaces also works.
ex: $camels = '123'; print $camels + 1; => prints 124 as scalars are interpreted automatically depending on operator

Scalar comes in 3 different flavors: number, string or reference.

A. Numbers: Though numbers can be int or float, internally perl deals only with double precision float. i.e int are converted to float internally. number literal can be 1.25, -12e24, 3485, etc. These are called as constants in other pgm languages. perl supports octal and hex literals also. Numbers starting with 0 are octal, while those starting with 0x are hex.

$num ="129.7"; => here even though number is in double quotes and a string, it will be converted to number if numeric operator (i.e +) used


B. Strings: seq of char, where each char is 8 bit value from entire 256 character set. string lieterals can be any char in these 2 flavors:
  I. single quoted strings: anything inside ' ... ' is treated as it is (similar to bash where it hides all special char from shell interpretation), except for 2 exceptions = backslash followed by single quote and backslash followed by backslash. backslash followed by anything else is still treated as it is.
    ex: 'don\'t' => this is converted to string don't. 'don't' gives a syntax error, as it treats don as a string and sees t' later which is not valid token.
    ex: 'hello\\n' => this is treated as hello\n. 'hello\n' will be treated as it is.
    ex: $cwd = 'pwd' => here pwd string is printed instead of dir, as special char "pwd" is hidden due to single quotes
  II. double quoted strings: acts like c string. It is similar to bash where all whitespace char are hidden from shell, but all other special char are still interpreted. Here backslash takes full power to specify special char as \n=newline, \x7f=hex 7f, etc. Also, variables as $x are interpolated in "...", while they aren't in ' .. '.
    ex: "coke\tsprite" => coke tab sprite => tab space is added b/w coke and sprite

NOTE: print function can have both single quotes or double quotes, and they are treated same way as above.

ex: $a='my name'; print $a; => prints var a, which is "my name"

ex: print "$a";=> will print "my name" as substitution done within " ... "

ex: print '$a'; => will print "$a" as special char are treated as is within ' ... '

C. Reference: This is explained below after array and hash.


2. array: preceded by @ sign and stores ordered list of scalars. array list @var also accessed via $var[0], $var[1], etc
ex: @ages = (25,30,40); print "$ages[0] $ages[1] $ages[2]" => 25 30 30
ex: $#ages => gives index value of last element of @ages, in this case 2 (since 0,1,2)
ex: $ages = (25,30,40); => since $ages is scalar, length of array is assigned to $ages, which is 3. $ages=@ages also gives 3. So, scalar(@ages)=$#ages+1 => always true since scalar() returns length of list
ex: @names = ("john W", @ages, "amy r12", 1);
ex: print "@names"; => This will print the whole array (no need to separate it out into individual elements (as $names[0], etc)
ex: ($me,$lift)=@names; => sets $me="john W", $lift="amy r12"
ex: ($alpha, $omega) = ($omega, $alpha); => this swaps the 2 values, occurs in parallel (nt like C)

various builtin functions available for array:
A. push/pop, shift/unshift, reverse, sort
ex: sort(Fred, Ben, Dino) => returns Ben, Dino, Fred.
ex: @guys=("Fred", "Ben"), others is a func that returns Dino. sort(@guys, others()) => returns same sorted list as above.
B. chomp(@ages); => chops last char from each element of array

C. qw => quote word function creates a list from non whitespace parts b/w (). Instead of bracket, we can also use other delimiters as {..}, /../, etc.
ex: @names = qw(john amy beth); => creates list "john","amy","beth". no " ... " or , required. list built by removing whitepsaces.

D. q => this returns single quoted string, no whitespace separation or interploation of any var done. Just whole string returned with single quotes.

ex: $a = q(I am good $NAME is); => $a = "I am good $NAME is"

D. scalar(@_); => returns num of elements in array. See ex above.

3. hash: preceded by % sign, and used to store sets of key/value pair. key/value pair may have single quotes, double quotes or no quotes as needed.
ex: %data = ('john p', 1, 'lisa', 25); print "$data{'john p}"; => prints value 1. "john p" is the key.
    @dat1 = %data; => This assigns hash data to array dat1. So @dat1 = ('john p', 1, 'lisa', 25); We can also assign one hash to other: %dat2=%data. %data=@dat1 converts array to hash (odd entries are key, while even entries are values)
ex: %map=(); => clear the hash
ex: %map = (red=>0xff; green=>0x07; ...); #other way of assigning hash values
ex: $global{Name} = "Ben"; => Name is key, while Ben is value
ex: $GLOBAL{"db_path"} = "$GLOBAL{db_root}/$GLOBAL{version}/verification"; => substitution happens to provide complete path

ex: print %map; => This will print nothing. This is because hash elements can't be printed directly. We will have to use "each" (with a while loop) or "foreach" function as below. However, if hash is passed into a sub, it gets converted to an array @_, and printing @_ will print the list (as arrays can be printed directly).

various builtin functions available for hash:
1. keys: keys(%ages) => returs all keys. i.e returns all odd numbered elemenets of array (1st,3rd,5th,etc)
2. values: @num = values %ages; => returns all values, i.e all even numbered elements. note no brackets as they are always optional
3. each: returns key/value pair for all elements of list
ex: while (($name, $age) = each(%data)) { print $name $age ; } => each key/val pair ssigned to var.
ex: foreach $key (keys %ages) {print $ages{$key};} => other way to access key/val pair

NOTE: => is used in hash, but one other use is as fat comma. It's a replacement for comma.

ex: Readonly my $foo => "my car"; #here ReadOnly module (an alternative to constant) is called, It's syntax is 'Readonly(my $foo, "my_car")' to assign "my_car" to $foo as constant. Since ( ) around args in sub are optional, this can be written as 'Readonly my $foo, "my_car". Here , can be replaced with =>, resulting in ' Readonly my $foo => "my_car" '.


NOTE: hash can be converted to array which can be converted to scalar($age). Array is just collection of scalars stored in index 0 onwards ($age[0]), while hash is collection of scalars stored in array whose index values are arbitrary scalars($age{john}).
perl maintaines every var type in separate namespace. So, $foo, @foo and %foo are stored in 3 different var, so no conflict.

typeglob: Perl uses an internal type called a typeglob to hold an entire symbol table entry. The type prefix of a typeglob is a * , because it represents all types. This used to be the preferred way to pass arrays and hashes by reference into a function, but now that we have real references (see below in subroutine section), this is seldom needed.

The main use of typeglobs in modern Perl is to create symbol table aliases.

ex: *this = *that; => this makes $this an alias for $that, @this an alias for @that, %this an alias for %that, &this an alias for &that, etc. Much safer is to use a reference, as shown below.

ex: *var1 is same as \@var1, as both are ref to array @var1

Another use for typeglobs is to pass filehandles into a function or to create new filehandles. If you need to use a typeglob to save away a filehandle, do it this way:

$fh = *STDOUT; #here we get ref to STDOUT by prefixing it with *, and store that ref in scalar $fh. We can also do it as a real reference like this: $fh = \*STDOUT; Now, $fh can be used instead of STDOUT, i.e

ex: print $fh "print this line"; #instead of "print STDOUT "print ...""

ex: use LogHandle; $fh = LogHandle->hijack(\*STDOUT); $fh->mute(); *fh->autoflush(); #here we are passing ref to STDOUT to LogHandle::hijack module. We get as return value a scalar $fh. We can call functions using $fh or *fh. This is perfectly valid.

1. Scalar: we talked about numbers and strings in scalars, but there's a third kind of scalar called reference.

C. Reference: reference are similar to pointers in C. They hold the location of another value which could be scalar, arrays, or hashes. Because of its scalar nature, a reference can be used anywhere, a scalar can be used. Being scalar (since addr is a scalar quantity), reference stores as $var. Reference can be static or dynamic.

1. static reference is one where changes made to reference change the original value. To create a static reference to any var, precede that var by backslash (\).

$scalarref = \$foo; => Here we create ref (addr) for var $foo, by preceeding it with \. Now, $scalarref has addr of $foo

$arrayref = \@ARGV; => for array

$hashref = \%ENV; => for hash

$coderef = \&handler; => function/subroutine

$globref = \*foo; => globref for foo (foo may be scalar, array, hash, func, etc.

Function reference: ex: sub print_m { ... }; $print_ref = \&print_m; &$print_ref(%hash); => calling func by ref. Useful in OO pgm.

Function ref(): This returns the var type of any reference. So, ref($mapref) returns HASH (since $mapref is refrencing hash type). It can return SCALAR, ARRAY, HASH, CODE, etc. If arg is not a reference, then it returns FALSE.

2. Dynamic reference are ones where a dynamic copy of the object is made, and if changes are made to this reference, then the original var doesn't change with it. To create a dynamic reference to any var, enclose that var within [ .. ] for array and within { .. } for hash. Mostly used with constants (i.e when we don't have a var assigned to store these constants)

array: Use [ ... ]

@ages = (25,30,40);=> stores array in var @ages.

$ages = (25,30,40); => stores size of array. So, $ages=3

$agesref = [25,30,40, ['q','a']]; => Since square brackets used, it creates copy of this and stores ref of that array in $agesref

$agesref = [ @ages ]; => this creates dynamic ref to array @ages

hash: Use { ... }

%data = ('john p', 1, 'lisa', 25); => stores hash in var %data.

$dataref = {'john p', 1, 'lisa', 25}; => Since curly brackets used, it stores ref of this hash in $dataref

%map = (red=>0xff, green=>0x07, blue=>(magenta=>45, ...), ..); => other way to store hash

my $mapref = {red=>0xff, green=>0x07, blue=>{...}, other=>[ ...], ...}; => Since curly brackets used, it creates copy of this ref and stores ref of this hash in $mapref. Note that inside we can have multi level hash/array (i.e other is a arrayref in this ex, since it has array in [ ... ]).

$hashref = [ %data ]; => this creates dynamic ref to hash %data

DeReference of var: Derefrencing means getting the var back from the addr. It's same for both static or dynamic ref. Use $,@ or % in front of ref var. We can use { ... } around scalarref for clarity or when the expression inside them is complex.

$scalarderef = $$scalarref; or ${$scalarref}; => putting a $ in front of the ref var, gets the value pointed to by that ref var. (or using ${$scalarref} is the same thing)

$arrayderef = @$arrayref; or @{$arrayref}; => printing $arrayderef prints the whoele array (though with no spaces)

$hashderef = %$hashref; or %{$hashref}; => printing $hashderef prints the whoele hash key+value (though with no spaces)

&$coderef(args); => function call using reference. To call function via indirect way, we do "&handler(args)". See in Function section below.

arrow operator ->: An arrow operator is used in C pgm to access individual elements of struct pointer (reference to struct). i.e for struct person *p_ptr with element age, we do "p_ptr->age". We use similar concept in perl to access elements of array or hash reference.

1. array: use -> followed by [ .. ]. Inside [ ], enter Number "n" which will get the value of nth element of array.

ex: $cont = [ 1,2,ab,cd]; $cont->[3] refers to 4th element = cd

2. hash: use -> followed by { ... }. Inside { }, enter the "key", which will get the value corresponding to the key

ex: $cont = {"title me"=>a, name=>"john c", addr=>{city=>aus, zip=>12231} }; $cont->{"title me"} gets value "a". $cont->{addr}->{city} gets value "aus"

3. mixture of array and hash: use [ ]  for array and { }  for hash. For multilevel, we may omit subsequent -> after the first one.

ex: $cont = {"title me"=>a, name=>"john c", addr=>{city=>aus, zip=>12231, addr=>[{street=>"main"}, {house=>201}] } }; print "$cont->{addr}->{addr}->[1]->{house}" gets value "201"

4. class/subroutine (or object/method): use -> followed by ( ... ) => We use ( ) for args of methos, and not for method itself. this is used in OOP section later. class is treated as reference to data and subroutine. It points to mem loc of 1st element of class.

ex: $class1->new("matt",10); #Here class1 is a package named "class1". We are calling subroutine named "new" in this class.

ex: $obj->{name}; #here $obj is ref to class "$class1", and has hash data type. So, it's similar to case 2 abov, where we get the value corresponding to key "name". NOTE: { ..} used here since it's referring to hash object.


operators: these operate on scalar or list, and returns scalar or list. () defines precedence of operations in case if ambiguity.

scalar operators:
1. for numbers: arithmetic: +,-,*,/,**(exponent),%, comparison (returns true/false): <,>,<=,>=,==,!=
2. for strings:
   A. concatenation (.). ex: "hello"."world" => "helloworld"
   B. comparison: eq, ne, lt, gt, le, ge. ex: 7 lt 30 gives false, as 7 and 30 are treated as strings, and string "30" comes before string "7" as 3 has lower ascii code than 7. If numeric operator < was used, then it would return true as literals would be converted to numbers, and 7<30 is true.
   C. string repeatition: consists of single lowercase letter x. ex: "fred" x 3 => "fredfredfred"          
      ex: (3+2) x 4 => "5555" as 3+2=5 is treated as string since there's a string operator on it.

perl converts numbers to strings or viceversa depending on operator. If operator is nummeric, literals are converted to float, and if operator is string, then numbers are converted to string. If literal can't be converted to correct type for that operator, then an error is printed. So, even though perl doesn't have types for scalar, it uses operator type to figure out literal type as number or string.
ex: $name="john"; print $name+1; => here john can't be converted to number, so error "Argument "john" isn't numeric in addition (+) "
ex: $name="123"; print $name + 1; => here 123 can be converted to numeric, so + is carried out and 124 printed. NOTE: spaces don't matter

= is also an operator.  $a=17; a gets value=17, but this whole expresssion is also given value of $a (which is 17)
 ex: $a= ($b=15); => b is assigned 15, but then a is assigned value of ($b=15) which is $b which is again 15.

shorthand operators:
$a += 3; => $a=$a+3;
$str .= "me"; => $str = $str . "me";
$e = ++$a; => $a=$a+1;$e=$a; => prefix version of autoincrement, $e gets incremented value
$e = $a++; =>  $e=$a;$a=$a+1; => sufffix version of autoincrement, $e gets non-incremented value

defined operator: scalar can be defined or undefined. undefined scalar, or scalar with null string("" , i.e nothing within the string) or number 0 are all interpreted as FALSE when scalar is used in Boolean expr, while anything else is treated as TRUE.

ex: if (defined($args)) { ... } #We can omit brackets around args of function. so "if (defined $args)" is also valid


control stmt: control expr is evaluated as string.  If empty "" or "0" string, treated as false, everything else is true

1. if/else:
if ($ready) { $a=1;}
elsif { ...}
else { ... }

2. while/until: while => repeat while expr is true. until => repeat until expr is false
while ($tcks <100) { $sum += ... }
while (@ARGV) { process(shift @ARGV); }

3. do/while: with while, if cond is false, loop will not execute even once. do/while causes loop to execute atleast once
do { ... } while ($cnt <100);

4. unless:
unless ($dest eq $home) {print ...;}

5. for/foreach: These can be converted into equiv while stmt.
for ($sold=0; $sold<100; $sold++) { ... }
foreach $user (@name) { if $user ... } => here each element of @name is assigned to $user and loop run for each element. Modifying $user modifies original list (since it's a reference and NOT a copy)

6. next/last: next allows to skip to next iteration, while last allows to skip to end of block, outside of loop
foreach $user @user {
  if ($user eq "root") {next;} #skip to next iteration
  if (...)             {last;} #comes out of loop
}
If we specify loop by a var, then we can specify which loop to break out of by specifying loop name.
LINE: while ($line = <FILE1>) { # this loop is anmed LINE
       last LINE if $line eq "\n"; => we get out of loop LINE when we encounter 1st blank line
       next LINE if $line =~ /^#/; => skip comment line

      do something .....
      }

7. goto: ex: goto LINE;

8. switch: For switch cmd to work, "Switch" module needs to be used, which requires some other modules to be installed. syntax same as in other languages.

use Switch;

switch(arg) {

 case "a"  {print "name"; .... }

 case /\w+/ {print "..."}

 else { print ...}

}


Built in functions: perl provides a lot of built in functions that are very helpful. Most of the times you can use these functionsto write more complex ones

1. chop($x) => it takes a scalar var, and removes last char from string value of that var
ex: $x="hello"; $y=chop($x); => $x becomes hell. $y gets assigned the chopped char "o".

2. chmop($x); => removes only the newline char at end if present, else does nothing.

3. print/printf: printf is C like providing formatted o/p
ex: printf "a=%15s b=%5d c=%10.2f \n",$a,$b, $c; => string $a is printed in 15 char field, decimal number $b in 5 char field, fp num $c in 10 char field with 2 decimal places

4. split/join:
ex:@fields = split(/:/,$line); => split $line using : as delimiter and assign it to $fields[0], etc.

5. system cmds: any linux cmd can be executed using system or backtick

A. system cmd: any unix cmd run using "system" should be avoided as it makes perl unportable and may break perl script for other users or other linux machines. This is because, cmds run using "system" run on current shell of user which may be bash, csh, etc. So, if some other user has a different shell, which supports some other version of this cmd, then the system cmd may not work any more. Also, the return status of system cmd is 0 on success (any non zer value indicates a failure which is different than how all other cmds behave). Other problem is that system cmd has3 diff forms, and depending on which one is used, i may behave differently. So, avoid "system" cmd all together. Instead use perl modules as mkdir, chdir, etc.


system("date"); #
$status=system($cmd); #runs whatever $cmd is. $status is assigned 0 on success
system "grep fred in.txt >output";
system "cc -o @options $files"; #var substitution occurs

B. backtick or qx: any cmd inside backtick or qx is executed. backtick is an operator.
my $output = `script.sh --option`; #using backtick, cmds within `` are executed and results returned to STDOUT (in this case to $output)
my $output = qx/script.sh --option/; #similar to above as qx/.../ same as ``

We have system cmds for cd, pwd, etc that we can execute using "system" or backtick. However, it's preferred to use perl provided modules for doing this, as they work across all platforms. These modules eventually end up making the system call, but do it cleanly.

1. getcwd: This gets current working dir. same as unix "pwd" cmd.

use Cwd qw(getcwd);

$cur_dir = `pwd`; => this returns unix pwd but has a trailing newline at end. This is stored in var $cur_dir

$cur_dir = getcwd; => same as above, except no newline at end

2. chdir: This changes dir to specified dir. same as unix "cd" cmd.
use Cwd qw(chdir);

$save_pwd_dir = `pwd`;

chomp $save_pwd_dir;

$status=chdir($save_pwd_dir);=> since `pwd` above returns newline at end, using it in chdir module will return status of 0 (i.e error, so no cd happens). Only if we remove the newline by using "chomp", is when the chdir cmd will work and will status of 0 (i.e success)

$cur_dir= getcwd;

chdir($cur_dir); => This cmd works and returns status of 1, since there's no newline in $cur_dir (since getcwd cmd was used)

chdir("/home/ajay"); => Here we change to given dir, by directly specifying the name

6. here: It's not a function. It's the same "here" as in bash script. syntax is "<<IDENTIFIER; .... Any Stmts .... IDENTIFIER". The same effect can be achieved with print stmt, but that will need multiple print cmds, one for each line. To interploate variables in stmts use double quote around "IDENTIFIER", else use single quotes 'IDENTIFIER'.

ex:below ex in perl script will print stmt1 and stmt2 on screen, since default for print is STDOUT. newlines if present in text are automatically printed.

print <<Foo;

My name is

You are ill)

Foo

ex:below will print the stmt in $file1 handle which is opened in write mode

open my $file1 '>', "file.txt" or die $!

print $file1 <<My_text;

this is test;

My_test

 
subroutines:

Declare: ex: sub NAME1; #forward declaration of a subroutine NAME1. If we have args, use sub NAME1(PROTO); All sub have default arg list stored in @_ array, which can be used inside body of sub(@_ stores args, i.e @_[0], @_[1]). @_ is private to that invocation of sub (i.e local copies made), so nested sub can be called, w/o these values getting overwritten.
To declare and define all in one place, just add the block to it. i.e sub NAME1 (args) {BLOCK } => NOTE: args of sub is in ( ... ), but body is in { ... }. arg list is optional even if args are used by the calling func, as @_ will store the args for any sub. Also ( ) brackets are optional for args, so "sub NAME1 arg1 arg2 {BLOCK }" is perfectly valid


ex: sub say_hello { print "hello $what"; return $a+$b; } => any var used within sub are global by default (diff than conventional C pgm). To make a var local, declare it with my() i.e: my($sum, @arr, %a); my($n,@values)=0; my $a; "local" can also be used to declare local var. return value is what's specified, or the last expression evaluated. We can return any data type, i.e. scalar, array, hash. If no return value provided, then the last calc performed becomes the return value (if print is the last calc done, then 1 is the return value). If we retrun more than 1 array or hash or a combo, then their separate identities are lost. In such cases we use references.

To call subroutine, 2 ways
1. direct calling: Here we call by directly providing name of sub with optional args. ex: NAME1; NAME1(list); NAME1 LIST; => any of these 3 ways is fine.
ex: $a=3+say_hello(); => here sub returns value of $a+$b
ex:
sub bg { my(@values) = @_; foreach $_ (@values) { return @result; } }
@val = bg(1,2,3); => any number of args can be provided as @_ stores them in array (as many as needed). Note, sub above doesn't have any arg list in it's defn (it's implied). Return value is stored in array @val.
 
my $cont = get_contents(); => this stores return value from func in scalar $cont. If return value is array or hash, then conversion happens, as explained in array/hash section above.


2. indirect calling: Here we call by providing reference to function (i.e pointer to addr of func). This was used in Perl 5.0 and before, but not recommended. ex: &NAME1;
ex: &bg(1,2,3); => same o/p as above except that func called via reference.

ex: $func_ref = \&bg; &$func_ref(1,2,3); => same o/p as above. here addr of func "bg" passed on to $func_ref. Now we accesss "bg" by derefrencing the addr $func_Ref.

Passing args to functions: Args can be any data type, and they can be passed via value or via reference. We pass them via reference, when we want to alter the original arg itself.

ex: $tailm = my_pop(\@a, \@b); Here array @a,@b are passed by reference, so whatever we do to @_ inside my_pop func, modifies @a and @b too.

 

module / package:

  1. module: A Perl module is a reusable collection of related variables and subroutines that perform a set of programming tasks. There are a lot of Perl modules (>100K) available  on the Comprehensive Perl Archive Network (CPAN). You can find various modules in a wide range of categories such as network, XML processing, CGI, databases interfacing, etc. Each perl module put in it's own separate file called as file1.pm, having same syntax as perl file. It can be loaded by other pgm or modules, by using do, require or use.
  2. Package: Packages are perl term for namespaces. Namespaces enable the programmer to declare several functions or variables with the same name, and use all of them in the same code, as long as each one was declared in a different namespace. Packages are the basis for Perl's objects system (explained later). Our main perl script itself is in "main" package (so all var can be referenced as main::a, or just plain "a"). We switch package using "package" keyword. Then our namespace changes to package_name (until the end of file). Now we can use var in this package using new package namespace.

Diff b/w module and package: Although package and module are used interchangeably, they are completely different. package is a container (a separarte namespace), while module is a perl file that can contain any number of namespaces. It doesn't need to have any kind of pkg declaration inside it. To load a module in anaother file, we use any of do/require/use keyword. "use dir1::File1" just loads a file named dir1/File1.pm. To remove confusion b/w these, perl programmers obey these 2 laws, so that package and module can be treated as same thing:

  1. A Perl script (.pl file) must always contain exactly zero package declarations.
  2. A Perl module (.pm file) must always contain exactly one package declaration, corresponding exactly to its name and location. So, every module goes with same package name.

I. writing your own module: Filelog.pm => pm means perl module

package Filelog; => makes Filelog  module a package. We adhere to law 2 above (name of module file exactly same as package name, or else it will error out). So, now namespace is "Filelog" instead of main or anything else. So, we don't have to worry about using my() for each var. all var/sub from here on will be in namespace Filelog

use strict;

my $LEVEL = 1; //put global var $LEVEL to 1, so that any subroutine can access it

sub open_my{ .... $a = shift; ... } => write subroutines for diff functions to do

1; => this is required to return a true value from this module to the calling pgm. Newer versions of perl do not require this. We keep it for backward compatibility

II. Using above module in other pgm: pgm1.pl (we do not need separate file for package, we can put all code for package "Filelog" in pgm1.pl too)
    
#!/usr/bin/perl
use strict;
use warnings;
 
use FileLog; =>load Filelog module.  we could use any 1  of these 3 stmt: do, require, use. Since there is also a package declaration with same name, new namespace "Filelog" can be used.
 
FileLog::open_my("logtest.log"); //sub in modules called by using namespace separator (::). args within brackets passed to subroutine "open_my"
 
FileLog::log(1,"This is a test message"); //sub "log" in namepsace "Filelog" with 2 args

$STDERR = LogHandle->hijack(\*STDERR); #this is other way of calling sub in package "LogHandle". See in package section later
 

Read cmd line args: All languagaes have way of reading cmd line args. We can write our own code to get args or use perl module for that. Getopt is a very popular module to get args of a cmd line.

1. Regular way: Al cmd line args in perl are  stored in @ARGV array (after the name of script). $#ARGV is the subscript of the last element of the @ARGV array, so num of args = $#ARGV+1. $0 stores the name of the script, that we are running

ex: ./test.pl cat dog => here @ARGV stores "cat dog" array. so, $ARGV[0]=cat, $ARGV[1]=dog and so on. $#ARGV=1 (since num of args=2). $0 stores ./test.pl

2. test.pl

use Getopt::Std; #load Getopt/Std.pm module

my %options=(); => we declare empty hash "options"
getopts("hj:", \%options); => We store args in ref to hash "options". here we are capturing arg values specified via flags -h -j. : indicates that there is more stuff coming after -j. So our cmd line is something like this "./test.pl -h -j my_help". There are many different ways of storing args via getopts. Look in perl doc.
print "option $options{h} , $options{j}\n";

run: ./test.pl -h -j amit => prints "options 1 , amit"

Signal trap pragma:

use sigtrap qw(handler my_handler normal-signals); => This pragma is simple i/f to installing signal handlers, so that when the program abruptly quits, we can do graceful exit, by having a sub execute on receiving interrupt. Here "my_handler" sub is called on getting interrrupt. There are many signals as INT, ABRT, TRAP, etc that causes perl script to terminate. The last arg "normal-signals" says that employ this handler for only normal-signals as INT, TERM, PIPE and HUP, and not for other interrupt signals,

sub my_handler {

   my $signal = shift; #gets the signal causing the pgm to terminate

   die " Pgm killed with signal $signal";

}

special code blocks:

There are five specially named code blocks that are executed at the beginning and at the end of a running Perl program, if present in the pgm. These are the BEGIN, UNITCHECK, CHECK, INIT, and END blocks. These code blocks are not subroutine, even though they look like it. "BEGIN" is exxecuted at the very beginning of script, while "END" block is run at the very end, just before the interpreter exits. Multiple BEGIN, END, etc blocks can be in same pgm, and they are exxecuted in reverse order of where they are in code. Usually 1 BEGIN, 1 END block suffices.

ex:

END {
  my $program_exit_status = $?; #Inside END block, $? contains the value that the program is going to pass to exit()

  print "Exit status is: $program_exit_status"; #we can have an stmt here that we want to be executed at end

}


Format: Perl supports formatting so that scripting languages "sed" and "awk" may no longer be needed, as perl supports more complex formatting.

format => defines a format, and writes data in that format
ex: defining a format. keyword format NAME = <some format> . => . at end is important
format LABEL1 =
 ==========
 | @<<<<< | => @<<<<< specifies a left justified text field with 5 char
 $name
 | @< |
 $state
 ==========
 .
open(LABEL1, ">file.txt"); => filehandle name needs to be same as format name
($name,$state) = ...;
write(LABEL1); => this writes into file.txt

Regular expressions: These are same as ERE we studied in Linux section. However, perl RE have slight variation from POSIX ERE. Perl RE have become so widely used, that when people say RE, they usually mean Perl RE. Perl RE basics are best explained here: https://perldoc.perl.org/perlre

Perl RE are way of describing a set of strings w/o having to list all strings in the set. All ERE regex still valid. Following are the Perl RE metacharacters:

  • dot . => matches any single char except newline
  • * => matches 0 or more of preceeding char
  • + => matches 1 or more of preceeding char
  • ? => matches 0 or 1 of preceding char.
  • \ => backslash to escape next metachar
  • ^, $ => matches beginning or end of line
  • (), {}, [] => () is for grouping subexpressions, {m,n} and [abc], same as in ERE. These are treated as metachar, so use backslash to use them as literals
  • <> => used for capture grps in conjunction with (). This is different than ERE as ERE doesn't use this (BRE uses this but for different purpose)
  • | => Or or alteration. Used inside (), but may be used without () too.
  • - => used to indicate range inside []
  • # => comment

Above metachar are used for pattern matching, substitution, spliting, etc
ex: /foo/ => // is pattern matching operator looking for foo
while ($line = <FILE2>) {
 if $line =~ /http:/ { print $line; } => matches pattern for http:. =~ is pattern binding operator  asking it to do this
}
while <FILE1> { print if /http:/ ;} => our default is $_. pattern binding operator =~ automatically applied to $_. o/p exactly same as above

quantifier:
{min,max} => preceeding item can match min number of times upto max number of times
+ => {1,} matches one or more of preceeding items
* => {0,} matches zero or more of preceeding items
? => {0,1} matches zero or one of preceeding items

common patterns:
/[a-zA-Z]+/ => matches one or more of alphabets
/[\t\n\r\f]/ => matches any of tab, newline etc. Instead of this, we can also use /[\s]/
/[0-9]/ => matches any digit. Same as /\d/. /\d+/ matches any number of digits
/\d{7,11}/ => matches min 7 digits but no more than 11 digits. ex: telephone number
/[a-zA-Z_0-9]/ => matches any single word char. equiv to /\w/. /\w+/ matches an entire word
/./ => matches any char whatsoever (except a newline). needs to be atleast 1 char
/a./ => matches a followed by . => a followed by any char after that => matches all strings that have a in them, and "a" is not the last char
/\S\W\D/ => uppercase provides negation. \D means any non digit char
/(\d+)/=> match as many digits as possible and put it in var $1. If more (), they are stored in $2,$3 etc.
/\bFred\b/ => \b matches at word boundary. So, this matches "the Fred Linc", but not "Fredricks"
/^Fred/ => matches lines beginning with Fred. ^ is anchor for beginning of line, while $ for end of line
/Fred|Wilma|Bren/ => matches any of 3 names
/(..):(..)/ => matches 2 colon separated fields each of which is 2 char long

pattern matching/substitution:

m/pattern/gimosx =>  m=matching, m is optional as by default pattern matching is implied. gimosx is modifier such as g,i,. g=match globally(find all occurences), i=case insensitive matching.
ex: ($key,$val) =~ m/(\w+) = (\w+)/ => extracts key value pair from $_

s/pattern/replacement/egimosx => s=substitution
ex: $paragraph =~ s/Miss\b/Mrs/b/g => substitute Miss with Mrs globally in $paragraph. By default, it works on $_

ex: $MAILBOX->{'ScriptName'} =~ s/.*\/// => it substitutes the script path name with just the script name (strips out everything before the last /). So, "../dir1/./file.txt" will return "file.txt". Useful when trying to get name of script from cmd line.


Files:
----
To read/write files, we need to create an IO channel called filehandle. 3 automatic file handles provided: STDIN, STDOUT, STDERR corresponding to 3 std IO channels.
open (HANDLE1, "file1.txt") or "Cannot open for Read: $! \n"; => Read file. $! contains error msg returned by OS
open (HANDLE1, "<$file1"); => same as above. Read file
open (HANDLE1, ">$file1"); => create file and write to it
open (HANDLE1, ">>$file1"); => append to existing file
close (HANDLE1); => need to close file

File test operator: -e is the operator that operates on any scalar operation
-e $a => true if file named in $a exists
-r $a => true if file named in $a is readable, -w=writable, -x=executable, -d=is_directory

ex:
$name="index.html";
if(-e $name) {print "EXISTS";} else {print "ABSENT";};

<> = Line reading operator
print STDOUT "type number";
$num = <STDIN>; reads complete text line from std i/p upto first newline. That string is assigned to $num (including \n). <> returns undef when there's no more data to read (as in end of file). STDIN can be ommitted here (since default in STDIN)
print STDOUT "num is" chomp($num); \n is removed here.
chomp($num = <STDIN>); => this also works as any action refers to action on LHS of = operator
@num = <STDIN>; => This stores all lines of input in array until CTRL+D is pressed (i,e EOF). Each line is stored separately in $num[0], $num[1] and so on ..

ex:
while (<>) { print $_; } => $_ is the default storage var, when no var specified. this is equiv to
while (defined($_ = <STDIN>) { .. } => At end of file when there are no more lines to read <> returns undef

ex:
#! /usr/local/bin/perl -w => -w for turning ON warning
$num_args = $#ARGV + 1;
if ($num_args != 1) {
  print "\nUsage: def_report_nets.pl  name_of_def_file\n";
  exit;
}
open (DEF, "$ARGV[0]") || die "Cannot open $ARGV[0] for Read ...";
while (<DEF>) { //or while ($_ = <DEF>)
  if (/count : (\d+) ;/) {
    $count = $1; //$1 is assigned to whatever matches in first (). Here $1=(\d+)
    $count_sum += $count;
    print DEF1 "count = $count, sum=$count_sum"; => write this into DEF1 file handle (assuming it's open for edit)
 }
}

Object Oriented (OO):

perl is unique to be both procedural language as well as OO language. OO is not the best soln for every problem. It's particularly useful in cases where system design is already OO, and is very large and expected to grow. OO concept in perl is similar to those in other languages. OO system is either protocol based (as in Javascript) or class based (as in most other languages, as C++, Java, Python, etc). Inheritance, overloading, polymorphism, garbage collection are all provided in perl OO similar to other languages. Perl buil in OO is very limited, and many OO systems have been built on top of this, which are typicaly used (Moose is one such ex). For our purpose, built in OO for perl is good enough. class, method, object and attributes are 4 concepts related to OO. I've put this OO section to get some basics, but if you need to do OO, python is preferred (python is preferred in generate over perl)

class: Class is a name for a category (like phones, files, etc). package explained above declares a class (i.e package Person; declares class "Person"). In Perl, any package can be a class. The difference between a package which is a class and one which isn't is based on how the package is used.

attribute: these are data var associated with the class. An instantiation of class, known as object, assigns values to these attributes.

method: This class (package) has var and sub that work on these var. Sub used within this package are called methods.

class and method in OO term are thus package and subroutine that we studied earlier.

object: Let's create an instance of this class, which is known as object. When we create an object, we actually create a reference to attr/method in the class. All objects belong to a specific class (i.e we can define an object "LG_phone" belonging to class "phone"). We can have multiple objects for a given class. An object is a data structure that bundles together data and subroutines which operate on that data. An object's data is called attributes, and its subroutines are called methods. You can use any kind of Perl variable (scalar, array, hash) as an object in Perl. Most Perl programmers choose either references to arrays or hashes (ref to hashes are most common).

ex: Person.pm => this *.pm file name has to be same as package name, as package is searched for looking for file with name = file_name.pm

package Person; #creates class "Person"

sub my_new { #sub for creating an instance of this class. This is called a constructor, and is usually named "new", but can be anything. This constructor is just like any other method (most OOP languages have special syntax for constructors, but not for perl)

my $class = shift; #First arg passed to any method call is the method's invocant. Whenever we call new() method in perl, it automatically passes class name "Person" as first arg. So, this is the object instance name for this class.

my $self = { Name => shift, ssn => shift }; # this class has 2 attr: Name and ssn. Here the object of this class is of ref to hash data ype, but could have been any type = scalar, array, hash, etc. These 2 attr are the args to this method call. $self is a scalar storing ref to hash

#instead of shift, we could have also used @_. my ($class, $name, $ssn) = @_; my $self = { Name => $name, ssn => $ssn }; 

print "class is $class\n"; print "Name is $self->{Name}\n"; print "SSN is $self->{ssn}\n"; #print values: class=Person, Name=matt, ssn=1224

bless $self, $class; #Turning a plain data structure into an object is done by blessing that data structure using Perl's bless function. W/O this, data structure won't become an obj. 1st arg to bless func is the refrence to data, while 2nd arg is class. So, ref $self is blessed to of class "Person". Otherwise $self remains ref to hash data, just like any regular hash ref.

#we can also combine, $self and bless in same line as below
#my $self = bless { Name => $args->{Name}, ssn => $args->{ssn} }, $class;

return $self; #we return the ref to hash. This is a scalar ref. This becomes the ref of the new object being created.

}

sub setName {

my ( $self, $Name ) = @_; #1st arg is always object ref, so we store in $self

$self->{Name} = $Name if defined($Name); #Now, we can access attr of object

return $self->{Name};

}

sub getName {

my( $self ) = @_;

return $self->{Name};

}

1;

test.pl => this file uses the above package

use Person; #Person package is now included
 
my $object = Person->my_new("Mary",22345); #we are passing args as list (scalar,array,hash) and not as reference. "Person" is also passed as arg so that the object created is associated with class "Person". $object is now a refrence to hash data type containing 2 data (1 keys/value pair). It's associated with class "Person", so is little different than ref to regular hash data type, but still can be treated as "reference to hash data type" for most purpose.

#my $object = my_new Person("Mary",22345); #this is another way to create object

my $name = $object->{Name}; => This references the object "$object" and gets the value for key "Name" which is "Mary"

print $name; => prints Mary. Note printing "$object->{Name}" doesn't work, as print only expands $object which is an addr, so it prints "Person=HASH(0x3f4578A0)->{Name}", i.e -> is not expanded to get the correct value

$name = $object->setName("James"); => sets $object->{Name} to "James". Could have done directly too via: $object->{Name} = "James", however using subroutines as part of a class ids preferred, as it keeps the object organized, by having everything related to an object in 1 place.

Inheritence: Object inheritence is common concept in OOP, so that any class can be derived from any other class. This is useful, if we want to add few more data or sub to an existing class. Instead of modifying an existing class or duplicating everything in existing class to create a new class, we inherit the old class, and just add new code in new class. @ISA cmd achieves that.

package Bar; => new package Bar declared

use foo; => existing class foo

@ISA=qw(foo); => inherit foo into this package Bar

sub my_add { .... }; => we now add new subroutines to package Bar. All subroutines and var in original package "foo" are accessible to this package.

1; => return value for older perl pgm

Misc modules:

1. reading excel files: This is very commonly used to import excel sheet data into perl program:
ex: read excel sheet from libre office .xlsx files => this makes use of OOP. Uses Spreadsheet module available in std modules.

#!/apps/perl/5.14.2/bin/perl
use lib "/apps/perl/modules-1503/lib"; => this adds lib path to existing lib paths to search for modules
use Spreadsheet::XLSX; //here subroutine XLSX from perl module Spreadsheet is loaded (perl module is reusable package defined in a file)

my $spreadsheet = "$ENV{VERIFICATION}/my_testlist"; //here my_testlist is open office excel sheet
if (! -e "$spreadsheet") { //checking for existence of spreadsheet
    print "Spreadsheet $spreadsheet not found. Please try again.\n";
    exit 0;
}

my $excel = Spreadsheet::XLSX -> new ($spreadsheet, $converter);
foreach $sheet (@{$excel -> {Worksheet}}) {
    printf("Sheet: %s\n", $sheet->{Name});
    foreach $row (($sheet -> {MinRow} +1) .. $sheet -> {MaxRow}) { //skipping 1st row=title row
        $testname           = ($sheet -> {Cells} [$row][0]) -> {Val}; #0 means 1st col
        $rtl_count          = ($sheet -> {Cells} [$row][4]) -> {Val}; #4 means 5th col
        ... //do more processing
    }
}

Some useful perl cmds:

1. perl cmd to substitute and replace one pattern with some other pattern in mutiple files (below cmds can be run on cmd line in bash shell, as long as perl is installed):
perl -pi -e s/old_pattern/new_pattern/g dir1/subdir1/*.tcl => does it for one dir only
perl -e s/old_pattern/new_pattern/g -pi.backup $(find dir1 -type f) => does it for all directories and files in dir1. (-pi with .backup creates backup of old original files with .backup extension). works only in bash shell as $(find dir1 -type f) is bash syntax
ex: perl -pi -e s/1p0/2p0/g $(find . -type f) => replaces 1p0 with 2p0 in all subdir starting with current dir. works only in bash shell.