What I Learned Today
- Pre-processing
- 80% time of the project
- Image Dataset
- bounding box: sometimes to remove noise
- Generalization of Dataset
- Underfitting (High Bias) vs Good vs Overfitting (High Variance)
- Train / Validation
- Why need validation set?: to validate the training process. The model is already trained with the train set so need a seperate set to check if the model is heading to a right direction
- Data Augmentation
- Diversity of cases and states
- torchvision.transforms.Compose([ … ])
- Albumentations (faster and more diverse than the torch library)
- Importance of experiemnt: there is almost no technique that always enhancese the performance. Always try and validate
- Data Generation
- CPU bottleneck (slow feeding speed)
- e.g. data augumentation
- Dataset defining code
from torch.utils.data import Dataset
class MyData(Dataset):
def __init__(self):
pass
# support slicing
def __getitem__(self,index):
return index
# support len()
def __len__(self):
return None
- Dataloader code
from torch.utils.data import DataLoader
loader = DataLoader(
train_set,
batch_size=bs,
num_workers=num,
drop_last=True
)
오늘 할일