What I Learned Today

  1. Pre-processing
    • 80% time of the project
    • Image Dataset
      • bounding box: sometimes to remove noise
  2. Generalization of Dataset
    • Underfitting (High Bias) vs Good vs Overfitting (High Variance)
    • Train / Validation
      • Why need validation set?: to validate the training process. The model is already trained with the train set so need a seperate set to check if the model is heading to a right direction
    • Data Augmentation
      • Diversity of cases and states
      • torchvision.transforms.Compose([ … ])
      • Albumentations (faster and more diverse than the torch library)
    • Importance of experiemnt: there is almost no technique that always enhancese the performance. Always try and validate
  3. Data Generation
    • CPU bottleneck (slow feeding speed)
    • e.g. data augumentation
    • Dataset defining code
       from torch.utils.data import Dataset
       class MyData(Dataset):
        def __init__(self):
        # support slicing
        def __getitem__(self,index):
            return index
        # support len()
        def __len__(self):
            return None
    • Dataloader code
       from torch.utils.data import DataLoader
       loader = DataLoader(

