1. Mitigating Training Bias

  • A lot of bias present; models can learn sexual bias

a. Traing Bias in Reader Model

  • Reader models are prone to train only on a limited designated number of documents
  • If the reader model is given a text from a relatively new subject, the reader model will have hard time producing the answer
  • Solution:
    • For the reader model: by enabling the model to provide a “no answer”, we can train the model on larger sets of documents with no answer
    • For the retriever model: train the DPR with negative samples with high BM25 scores

2. Annotation Bias from Datasets

  • If the dataset creator already knows the answer of a question, then the question can have too much hints, making the retrieval task too simple
  • We can adopt natural questions that real users have produced on the internet to avoid such bias
  • Ground truth might not include all possible answers, penalizing the model that actually produced a correct answer