What did I learn today

  1. RNN
    Imugr
    • image source
    • : prior hidden-state vector
    • input vector at time t
    • current hidden-state vector
    • RNN function
    • output vector at time t
    • the occurance depends on the task. Sentiment analysis? -> t at the last time interval.
  2. Types of RNN
    • One to Many: other than the single input time interval, zero-vector input!
    • Many to One: no output except one time interval!
    • Many to Many 1: e.g. after reading the sequence input, output the translation of the given sentence
    • Many to Many 2: While receiving the input, output the result at spot!
  3. RNN Characeter-level Language Model
    Imgur
    • image source
    • Straigh forward!
    • The larger the hidden laer dimension is, the more information the network retains!
    • Very time & resource comsuming because the network has to forward the entire seuqene to get the loss and do deriviation to computet the gradient. -> break into smaller chunks and train with that.
    • Why we don’t use RNNs: due to vanishing, exploding gradient problem.
  4. LSTM
    • Solves vanshing gradient problem!
    • Overall Model
      Imgur
      • image source
      • (upper inflow from the prior model) has more information than the hidden state vector.
      • (below inflow from the prior model) mainly has information about the output of next layer and is used as an input for the output of next layer.
    • Variables Imgur
      • image source
      • Sigmoid: has value between 0 and 1. Decides what percentage the function will preserve from the intial value.
      • tanh: has value between -1 and 1. Used when conveying information
    • Forget Gate Image
      • image source
      • the elements of is between 0 and 1. Decides the ratio of information retention.
    • Gate Gate Image Image
      • image source
      • why is multiplied? To remove excess information!
    • Output Image
      • image source
      • Use tanh to present the result as an info
      • Utilize only the part of the info as an output
      • has all the info, even those that the model do not need at the present sequence.
      • Modifying , the model can get that only has the information we need at the present sequence
    • Why use tanh? To give non-linearity!
    • Why tanh in specific? To prevent the vanishing graidnet problem!
  5. GRU image
    • image source
    • Cell state and hidden state combined
    • The larger the input gate, , the more information is lost from the prior hidden state and vice versa.
  6. Why GRU and LSTM no gradient vanishing & exploding problem?
    • Use addition, not redundant multiplication

Peer Session