<< Go Back

Neural Networks are one of the biggest black boxes, yet they are present in every corner of our lives. In essence, they are functions that are able to learn whacky patterns. But what about it makes it a challenge to understand? Is it the heavy calculus involved? “There are so many derivatives to compute". Okay, but if you know this one thing called the Chain Rule, you have the weapon you need to defeat the mini-boss that is calculus.

If you don’t know how to do matrix multiplication or dot products, I
suggest you head over to MIT Opencourseware and guzzle their content
down because that is an excellent way to build enough confidence to
start the vector hell that is Neural Networks. Okay. So if you are
confident with vectors, dot products, and matrix multiplication,
congratulations. You hold the key to level 1: A 2-Layer
Perceptron(composed of a hidden layer and output layer).

Let’s say each sample in our training data has 3 features and we
have 100 samples. Our hidden layer has 5 neurons and finally our
output layer has 1 neuron. One thing to keep in mind is that
**each neuron produces one output and one output only.**

import numpy as np

# initialize random data

X = np.random.random((3, 100))

# initialize the first layer

W1 = np.random.random((5, 3))

B1 = np.random.random((5,1))

# initialize random data

X = np.random.random((3, 100))

# initialize the first layer

W1 = np.random.random((5, 3))

B1 = np.random.random((5,1))

There are 5 neurons in our hidden layer **stacked vertically**.
The weights that we initialized are also 5
**vertically stacked rows**
of weights. Therefore, each **row represents one neuron**.
Consequently there is 1 bias per neuron/row. You might also be
wondering why the weights are 3 x 100 and not 100 x 3. Well, in the
visual above, the input features are stacked vertically. So often,
you are going to want to transpose your data when you first obtain
it raw because each sample is a row in most datasets.
*Luckily, transposing is super easy in numpy(read the docs).*

Next, we will need an activation function for this layer to introduce non-linearity. We will be using the sigmoid activation function. This function will be element-wise, so it will not affect the output's dimensions.

def sigmoid(x):

return 1/(1+np.exp(-x))

return 1/(1+np.exp(-x))

Now we will initialize the second/output layer.

W2 = np.random.random((1, 5))

B2 = np.random.random((1,1))

B2 = np.random.random((1,1))

So in this case, we have one neuron for our output layer. Thus, we use one row of 5 weights for each of the 5 outputs from the previous layer. Now, since we have initialized our weights, we can start forward propagating.

# layer 1

Z1 = W1 @ X + B1

A1 = sigmoid(Z1)

# layer 2

Z2 = W2 @ A1 + B2

A2 = sigmoid(Z2)

Z1 = W1 @ X + B1

A1 = sigmoid(Z1)

# layer 2

Z2 = W2 @ A1 + B2

A2 = sigmoid(Z2)

#Complete Code:

import numpy as np

#Define the activation function

def sigmoid(x):

return 1/(1+np.exp(-x))

# initialize random data

X = np.random.random((3, 100))

# initialize the first layer

W1 = np.random.random((5, 3))

B1 = np.random.random((5, 1))

# initialize the second layer

W2 = np.random.random((1, 5))

B2 = np.random.random((1,1))

# layer 1

Z1 = W1 @ X + B1

A1 = sigmoid(Z1)

# layer 2

Z2 = W2 @ A1 + B2

A2 = sigmoid(Z2)

import numpy as np

#Define the activation function

def sigmoid(x):

return 1/(1+np.exp(-x))

# initialize random data

X = np.random.random((3, 100))

# initialize the first layer

W1 = np.random.random((5, 3))

B1 = np.random.random((5, 1))

# initialize the second layer

W2 = np.random.random((1, 5))

B2 = np.random.random((1,1))

# layer 1

Z1 = W1 @ X + B1

A1 = sigmoid(Z1)

# layer 2

Z2 = W2 @ A1 + B2

A2 = sigmoid(Z2)