Skip to content

Latest commit

 

History

History
51 lines (48 loc) · 1.89 KB

File metadata and controls

51 lines (48 loc) · 1.89 KB

  • x = features
  • f(x) = function ~ Makes a prediction/classification based on the inputs
  • Need a loss function
  • Need an optimization function
  • Pink: input
  • Blue: H1
  • Yellow: H2
  • Green: Output
  • Need to make sure that shape of matrices are correct
  • Output matrix of first step is 1x3
  • Either trying to get a probability distribution
  • Sigmoid will give a value between 0 and 1
  • Softmax will give a value where all values will sub to 1
    • Max of 32k, so english language needs to be broken down into tokens
      • Means that you will likely go faster because you are using the tokenized values

Batching

  • We will be using back propagation
    • We will calculate a loss based on how far off our answers were from the correct answer

Cross Entropy

  • Entropy
    • Uncertainty
      • An unfair coin (two heads) has an entropy of 0
        • Outcome is certain
      • A fair coin has an entropy of 1
        • 50/50
      • 1000 sided die
        • The outcome is less certain 1 in 1000
    • D is the divergence value
    • Loss will not be 0 unless you did something wrong
      • There will always be some non-zero ambiguity with text
      • Today will be ______ ___ (could be day, weather, event, etc.)
    • Useful for sparse categorical