-
Notifications
You must be signed in to change notification settings - Fork 621
Description
In the "Put it all together" section from the lecture for module 1, there's the following code;
# Weighted sum of inputs / weights
weighted_sum = np.dot(inputs, weights)
# Activate!
activated_output = sigmoid(weighted_sum)
# Cac error
error = correct_outputs - activated_output
adjustments = error * sigmoid_derivate(activated_output)
The adjustments should be adjustments = error * sigmoid_derivate(weighted_sum). Intuitively, this is because we want the derivative of our activation function at the points defined by the weighted sum, not at the points after activation. It is the activation function we're taking the derivative of, after all.
The NeuralNetwork class constructed throughout module 2 applies sigmoidPrime to the activated output.
self.z2_delta = self.z2_error * self.sigmoidPrime(self.activated_hidden)
should be
self.z2_delta = self.z2_error * self.sigmoidPrime(self.hidden_sum)
and
self.o_delta = self.o_error * self.sigmoidPrime(o)
should be
self.o_delta = self.o_error * self.sigmoidPrime(self.output_sum)
By sheer coincidence, this error cancels out with the error in #134. As it happens, the difference between the incorrect and correct sigmoidPrime implementations is an application of sigmoid. The difference between hidden_sum and activated_hidden is also an application of sigmoid. So that means, self.sigmoidPrime(self.activated_hidden) (using the incorrect definition of sigmoidPrime) is equivalent to self.sigmoidPrime(self.hidden_sum) (using the correct definition for sigmoidPrime). This means that the implementation of NeuralNetwork would break if sigmoid were swapped with just about any other activation function, but it works with sigmoid, seemingly by accident.