- To solve the problem in dealing with sequential data; the input often changes in sequence and length
- Recursive x attententive o
- Encoder (attention)
- Not regressive
- Create query vector, key vector, and value vector for each word
- Dot product the query and key vector
- Regularize it with the root dimension of key vector and use softmax
- the weighted sum of value vectors is the output from the encoder
- encoder has information of the relationship with vectors
- Multiheaded attention: perform the same attention task multiple times
- Positional Encoding: the output of the encoder is just a weight sum of value vectors. Independent of the sequence