Deep Learning with Neural Networks - Class Notes

Day #10 - Deep Learning, Neural Networks, Time Series
In [1]:
import tensorflow as tf

# a placeholder just tells the system the type of the variable which will
a = tf.placeholder('int32')
b = tf.placeholder('int32')

# the mul is the multiplication
y = tf.multiply(a,b)

# start the session
sess = tf.Session()

# feed the placeholders in the model and have it run
sess.run(y,feed_dict={a:2,b:5})
Out[1]:
10

The Tensor Data Structure

It can be identified by 3 parameters: RANK, SHAPE, TYPE

  • RANK: the number of dimensions of the tensor
    • n-dimensional array or list
    • As examples: Rank 0 (scalar), Rank 1 (a 1 by n vector), Rank 2 (n by m matrix), Rank 3 (n by m by q matrix)
  • SHAPE: the number of rows and columns the tensor has
  • TYPE: the data type
    • Examples: tf.int8, tf.string, tf.bool,...
In [2]:
import numpy as np
tensor_1d = np.array([1.3, 1, 4.0, 23.99])
print (tensor_1d)
[  1.3    1.     4.    23.99]
In [3]:
print (tensor_1d[0])
print (tensor_1d[2])
1.3
4.0
In [4]:
# the indexing is the same as python
tensor_1d.ndim # number of dimensions, this is like the rank
Out[4]:
1
In [5]:
tensor_1d.shape
Out[5]:
(4,)
In [6]:
tensor_1d.dtype
Out[6]:
dtype('float64')
In [7]:
import numpy as np
my_numbers = np.array([1,2,3,4,5])
my_numbers.dtype
Out[7]:
dtype('int32')
In [8]:
# convert the array to TF tensor
import tensorflow as tf

tf_tensor = tf.convert_to_tensor(tensor_1d,dtype=tf.float64)

print (tf_tensor)
Tensor("Const:0", shape=(4,), dtype=float64)
In [9]:
# running the session we can then visualize the tensor and it's elements
with tf.Session() as sess:
    print (sess.run(tf_tensor))
    print (sess.run(tf_tensor[0]))
    print (sess.run(tf_tensor[2]))
[  1.3    1.     4.    23.99]
1.3
4.0

Tensor Handling

In [10]:
# let's build 2 integer arrays
matrix1 = np.array([(2,2,2),(2,2,2),(2,2,2)],dtype='int32')
matrix2 = np.array([(1,1,1),(1,1,1),(1,1,1)],dtype='int32')

# visualize them:
print ('matrix1 = ')
print (matrix1)

print ('matrix2 = ')
print (matrix2)
matrix1 =
[[2 2 2]
 [2 2 2]
 [2 2 2]]
matrix2 =
[[1 1 1]
 [1 1 1]
 [1 1 1]]
In [11]:
# defining matrix operations which will not be run until the run is called

matrix_product = tf.matmul(matrix1,matrix2) # matrix multiply
print (matrix_product)

matrix_sum = tf.add(matrix1,matrix2)
print (matrix_sum)
Tensor("MatMul:0", shape=(3, 3), dtype=int32)
Tensor("Add:0", shape=(3, 3), dtype=int32)
In [12]:
# new matrix to be used to compute a matrix determinant
matrix3 = np.array([(2,7,2),(1,4,2),(9,0,2)],dtype='float32')
print ('matrix3 = ')
print (matrix3)

matrix_det = tf.matrix_determinant(matrix3)
matrix_det
matrix3 =
[[ 2.  7.  2.]
 [ 1.  4.  2.]
 [ 9.  0.  2.]]
Out[12]:
<tf.Tensor 'MatrixDeterminant:0' shape=() dtype=float32>
In [13]:
with tf.Session() as sess:
    result1 = sess.run(matrix_product)
    result2 = sess.run(matrix_sum)
    result3 = sess.run(matrix_det)
In [14]:
# print the results
print ('matrix1 * matrix2 = ')
print (result1)

print ('matrix1 + matrix2 = ')
print (result2)

print ('matrix3 determinant = ')
print (result3)
matrix1 * matrix2 =
[[6 6 6]
 [6 6 6]
 [6 6 6]]
matrix1 + matrix2 =
[[3 3 3]
 [3 3 3]
 [3 3 3]]
matrix3 determinant =
56.0

Tensor Board

  • A visualization tool
  • Aims at analyzing data flow graph
  • Helps understanding machine learning tools
  • Can be somehow confusing

Neural Networks and Deep Learning

  • Biologically inspired programming paradigm which enables a computer to learn from observational data
  • DEEP LEARNING is a powerful set of techniques that is based on the way human brain processes imformation and learns, responding to external stimuli
    • It consists of a machine learning model at several levels of representations in which the deeper levels take as input the outputs of the previous levels, transforming them and always abstracting more
  • Neural networds and deep learning currently provide the best solutions to many problems in image recognition, speech recognition and NLP

Artificial Neural Networks (ANN)

  • Information processing system whose operating mechanism is inspired by biological circuits
  • Generalizations of mathematical models of human cognition of neaural biology
    • Information is processed at many nodes called neurons
    • Signals are transfurred from one neauron to another via a link
    • Each connection link has an associated weight
    • Each neuron applies an activation function to the net input they are receiving
  • Nodes - elements in a layer
  • Weights - strength of connections between nodes of layers
  • Layers
    • Input - The input nodes
    • Hidden - Every Interior layer in the network
    • Output - The result
  • Activation functions
    • Input 1 strength of connection 1 + input 2 strength of connection 2 ----- apply function ------> output
    • Identity function: f(x) = x
    • Step function with a theshold T:
      • f(x) = 1 if x is bigger or equal to T
      • f(x) = 0 if x is less than T
    • Logistic of sigmoid fucntion between 0 and 1
    • Hyperbolic tanget (-1 to 1 values)
    • Relu()

The first run throughout the Net is called feed forward and it will give a bad result usually. Then it takes the output and

Types of Neural Networks

  • Perceptron
  • Feed Forward
  • Radial Basis Network
  • Recurrent Neaural Network
  • Convolution Neural Network

Where are Neural Networks Being Used?

  • Pattern recognition
  • Signal Processing
  • Medicine
  • Speech recognition
  • Business

Strenghts and Weaknesses

  • Strenghts:
    • Relatively simple learning algorithm (SGD and backprop)
    • Can almost learn any function
    • Scales well to large datasets
    • Can significantly out-perform other models when the right conditions are met
  • Weaknesses:
    • Hard to interpret the model
    • NNs are a black box
    • Do not perform as well on small data sets

Perceptron

  • Single Layer Perceptron
  • Multi Layer Perceptron
  • Multi Layer Perceptron Classification
  • Multi Layer Perceptron function approximation

Basic Steps of Training:

  • The weights are initialized with random values at the beginning of the training
  • For each element of the training error is calculated, that is the difference between the desired output and the actual output. This error is used to adjust
  • The process is repeated, resubmitting to the networ, in a random error, all the examples of the training set until the error made on the entire training set is less than a certain threshold, or until the maximum number of iterations is reached

MNIST with Multi Layer Perceptron

  • Images are 28x28 or 784 input nodes
  • Predict number solely on the image data in a form of array
In [15]:
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/',one_hot=True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

Data Format

The data is stored in a vector format, although the original data was a 2d matrix with values representing how much pigment was at a certain location

In [16]:
type(mnist) # to find out the type of mnist
Out[16]:
tensorflow.contrib.learn.python.learn.datasets.base.Datasets
In [17]:
mnist.train.images.shape #to find the number of rows and cols we have in 
Out[17]:
(55000, 784)
In [18]:
sample = mnist.train.images[0].reshape(28,28)
In [19]:
%matplotlib inline
plt.imshow(sample,cmap='Greys')
Out[19]:
<matplotlib.image.AxesImage at 0x18be56393c8>

Parameters

We need to define (w) parameters from the stochastic gradient descent

- Learning Rate - how quickly to adjust the cost function, or how quickly the network forgets about older information
- Training Epochs - how many training cycles to go through (example, going through the 55K images above is an epoch)
- Batch Size - size of "batches" of training data (most of the Time all data will not fit into memory so it needs to be split in batches)
In [20]:
# parameters - they needs to be set up for

learning_rate = 0.001
training_epochs = 35
batch_size = 100

Network Parameters

  • Neaural Network Dependent
  • Input data your data looked like
  • What kind of net you would want to build
  • Learning Rate:
    • Small learning rate leads to slow and very lengthy learning
    • Large learning rate may:
      • Lead to a very long training time
      • Output saturation
  • Weights:
    • Make sure to avoid larger weight at they can lead to a saturation of the output of the first layer
    • They should be small weights (-0.5 and 0.5)
In [21]:
# Due to the fact that we are working with images and images are stored in 
# Choose 256 but you can choose whatever you want

n_hidden_1 = 256 # first layer number of features
n_hidden_2 = 256 # second layer number of features
n_input = 784 # MNIST data input (image shape 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
n_samples = mnist.train.num_examples # the number of images in the dataset (55000)

Build Multilayer Mode

  • We first receive the input data array (784 pixels) and then send it to the first hidden layers
  • The data will begin to have a weight (w) attached to it between layers (this is initially a random value) and then sent to a node to undergo an activiation function (along with a bias (b))
  • Then it will continue on to the next hidden layer and so on until the final output layer
  • In our case we will do just 2 hidden layers, the more you use the slower the calculations but the higher the chance of getting better results

Build Multilayer Mode - Cost (Lower Error)

  • Apply an optimization function to minimize the cost; this is done by adjusting weight values accordingly across the network
  • We will use the Adam Optimizer
  • Adjust the optimizer by changing the learning rate parameter
  • The lower the rate the higher the possibility for accurate training results
  • Use the RELU activation function

Cost function at output (Adam Optimizer)

Creating Multilayer Perceptron

We will crete our model, we'll start with 2 hidden layers, whcih use the RELU activation function

In [22]:
def multilayer_perceptron(x, weights, biases):
    '''
    x: place Holder for Data Input
    weights: Dictionary of weights
    biases: Dictionary of biases
    '''

    layer_1 = tf.add(tf.matmul(x,weights['h1']),biases['b1'])
    layer_1 = tf.nn.relu(layer_1)

    layer_2 = tf.add(tf.matmul(layer_1,weights['h2']),biases['b2'])
    layer_2 = tf.nn.relu(layer_2)

    out_layer = tf.matmul(layer_2,weights['out']+biases['out'])

    return out_layer

Weights and Biases

  • In order for the tf model to work we need to create 2 dicts containing our weight and bias objects for the model tf.variable
  • A variable maintain state in the graph across calls to run()
In [23]:
# In this case we are initializaing the weights with random ones
weights = {
    'h1': tf.Variable(tf.random_normal([n_input,n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1,n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2,n_classes]))
}

# Same for the biases
biases = {
    'b1':tf.Variable(tf.random_normal([n_hidden_1])),
    'b2':tf.Variable(tf.random_normal([n_hidden_2])),
    'out':tf.Variable(tf.random_normal([n_classes]))
}

Construct Model & Cost Optimization Functions

In [24]:
# We are defining the input and output as placeholders
x = tf.placeholder('float',[None,n_input])
y = tf.placeholder('float',[None,n_classes])
In [25]:
# initialize the pred variable and pass into it the function we've defined
pred = multilayer_perceptron(x,weights,biases)
print (pred)
Tensor("MatMul_3:0", shape=(?, 10), dtype=float32)
In [26]:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred,labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
In [27]:
init = tf.global_variables_initializer()

Illustrative Purpose:

In [28]:
# We are grabbing the first batch
Xsamp,ysamp = mnist.train.next_batch(1)

plt.imshow(Xsamp.reshape(28,28),cmap='Greys')

# Remember indexing starts at zero!
print (ysamp)
[[ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]]

Running the Session, we will start the interactive session

In [29]:
sess = tf.InteractiveSession()

sess.run(init)

for epoch in range(training_epochs):

    avg_cost = 0.0

    total_batch = int(n_samples/batch_size)

    for i in range(total_batch):
        batch_x, batch_y = mnist.train.next_batch(batch_size)

        _, c = sess.run([optimizer,cost],feed_dict={x:batch_x,y:batch_y})

        avg_cost += c/total_batch

    print ('Epoch: {} cost= {:.4f}'.format(epoch+1,avg_cost))

print ('Model has completed {} Epochs of Training'.format(training_epochs))
Epoch: 1 cost= 1621.5909
Epoch: 2 cost= 21.9404
Epoch: 3 cost= 5.1439
Epoch: 4 cost= 3.0614
Epoch: 5 cost= 2.5197
Epoch: 6 cost= 2.2830
Epoch: 7 cost= 2.1415
Epoch: 8 cost= 2.0404
Epoch: 9 cost= 1.9527
Epoch: 10 cost= 1.8770
Epoch: 11 cost= 1.8121
Epoch: 12 cost= 1.7474
Epoch: 13 cost= 1.6893
Epoch: 14 cost= 1.6277
Epoch: 15 cost= 1.5745
Epoch: 16 cost= 1.5317
Epoch: 17 cost= 1.4611
Epoch: 18 cost= 1.3815
Epoch: 19 cost= 1.3201
Epoch: 20 cost= 1.2615
Epoch: 21 cost= 1.1663
Epoch: 22 cost= 1.0351
Epoch: 23 cost= 0.9568
Epoch: 24 cost= 0.8641
Epoch: 25 cost= 0.7890
Epoch: 26 cost= 0.7407
Epoch: 27 cost= 0.6983
Epoch: 28 cost= 0.6677
Epoch: 29 cost= 0.6415
Epoch: 30 cost= 0.6119
Epoch: 31 cost= 0.5628
Epoch: 32 cost= 0.5350
Epoch: 33 cost= 0.4947
Epoch: 34 cost= 0.4713
Epoch: 35 cost= 0.4503
Model has completed 35 Epochs of Training

Model Evaluations

  • TF comes with some built infunctions to help evaluate our model, including tf.equal and tf.cast with tf.reduce_me
  • Predictions == y_test. In our case, since we know the format of the labels is a 1 in an array of zeros, we can compare argmax() location of that 1
  • Remember that y here is still
In [30]:
correct_predictions = tf.equal(tf.argmax(pred,1),tf.argmax(y,1))
In [31]:
print (correct_predictions[0])
Tensor("strided_slice_2:0", shape=(), dtype=bool)
In [32]:
correct_predictions = tf.cast(correct_predictions,'float')
print (correct_predictions[0])
Tensor("strided_slice_3:0", shape=(), dtype=float32)

Now we use the tf.reduce to get the prediction accuracy

In [33]:
accuracy = tf.reduce_mean(correct_predictions)
In [34]:
type(accuracy)
Out[34]:
tensorflow.python.framework.ops.Tensor
In [35]:
mnist.test.labels
Out[35]:
array([[ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.],
       ...,
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
In [36]:
mnist.test.images
Out[36]:
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       ...,
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32)
In [37]:
print ('Accuracy: ',accuracy.eval({x:mnist.test.images,y:mnist.test.labels}))
Accuracy:  0.8491
In [ ]:

rss facebook twitter github youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora