Hi James,

Great article, thanks for putting this out there! I had a couple of questions about backpropagation function. In the video you included, the delta for the second weight is as follows:

dC0/dW(L) = 2(a(L) — y) * sigmoid_derivative(z(L)) * a(L-1)

However, the code that you included for the corresponding weight updates is written as follows, where self.output corresponds to a(L):

#in feedforward function

self.output = sigmoid(np.dot(self.layer1, self.weights2))#in backprop function

d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output)))

The discrepancy is that the code includes the sigmoid derivative of self.output (or a(L)) while the equation includes the sigmoid derivative of z(L). If the equation was derived correctly, shouldn’t the code look something like this:

#Changes to feedforward function

z2 = np.dot(self.layer1, self.weights2)

self.output = sigmoid(z2)#Changes to backprop function

d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(z2)))

If so, then the deltas for the first set of weights should also be updated accordingly:

#Changes to feedforward function

z1 = np.dot(self.input, self.weights1)

self.layer1 = sigmoid(z1)#Changes to backprop function

d_weights1 = np.dot(self.input.T, (np.dot(2*(self.y - self.output) * sigmoid_derivative(z2), self.weights2.T) * sigmoid_derivative(z1)))

Is my thinking correct here? Any information you could provide would be very helpful.

Thanks!

Brian