ai-engineering-from-scratch/phases/03-deep-learning-core/02-multi-layer-networks/quiz.json at main · rohitg00/ai-engineering-from-scratch · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
[
  {
    "question": "What is the purpose of a hidden layer in a multi-layer network?",
    "options": ["To store the training data", "To transform inputs into new feature representations that enable nonlinear decision boundaries", "To reduce the number of parameters", "To apply the loss function"],
    "correct": 1,
    "explanation": "Hidden layers apply weights, biases, and activation functions to create new feature representations. These intermediate features allow the network to learn nonlinear mappings that a single layer cannot.",
    "stage": "pre"
  },
  {
    "question": "What does the forward pass do in a neural network?",
    "options": ["Updates the weights using gradients", "Computes gradients for backpropagation", "Pushes input through each layer to produce an output", "Shuffles the training data"],
    "correct": 2,
    "explanation": "The forward pass is pure computation: for each layer, multiply by weights, add bias, apply activation, and pass the result to the next layer. No learning happens during the forward pass.",
    "stage": "pre"
  },
  {
    "question": "For a layer with 3 neurons receiving input from 2 neurons, what is the shape of the weight matrix?",
    "options": ["(2, 3)", "(3, 2)", "(3, 3)", "(2, 2)"],
    "correct": 1,
    "explanation": "The weight matrix has shape (neurons_in_current_layer, neurons_in_previous_layer) = (3, 2). Each row contains one neuron's weights across all inputs.",
    "stage": "post"
  },
  {
    "question": "What does the Universal Approximation Theorem guarantee?",
    "options": ["Any network can be trained in polynomial time", "A single hidden layer with enough neurons can approximate any continuous function", "Deeper networks always outperform shallow ones", "Neural networks can solve any computational problem"],
    "correct": 1,
    "explanation": "Cybenko (1989) proved that a network with one hidden layer and sufficient neurons can approximate any continuous function to any desired accuracy. In practice, deeper networks achieve the same with fewer total parameters.",
    "stage": "post"
  },
  {
    "question": "Why does the sigmoid activation function make learning possible, unlike the step function used in perceptrons?",
    "options": ["Sigmoid outputs larger values", "Sigmoid is faster to compute", "Sigmoid is smooth and differentiable everywhere, so gradients exist for backpropagation", "Sigmoid outputs negative values"],
    "correct": 2,
    "explanation": "The step function has zero gradient almost everywhere and undefined gradient at the threshold. Sigmoid is smooth with a well-defined derivative (s*(1-s)) at every point, which is essential for gradient-based learning.",
    "stage": "post"
  }
]