What I studied today:
- How Kernerls Work
- Input size, kernel size, output size
- Knowing only two above the above, can tell the size of the other
- The Larger the Parameter Size, the Less Robust the Model Is
- Padding
- Enlarging the input size with additional slots e.g. zero padding
- Stride
- The space that the kernel takes when it slides over the input
- 1 x 1 Convolution
- Reducing Dimension
- Decreasing the number of parameters and increasing the depth
- e.g. 128x128x128 –(Kenerl:1x1x128x16)–> 128x128x16
- Trend of CNN
- Lesser parameter, deeper layers, better performance
- AlexNet:
- 11*11 Kernel. Larger receptive field, but leads to too large parameter
- VGGNet:
- 3x3 Kernel. Why? Two 3x3 layers = One 5x5 layer. Two 3x3 layer has less parameter. Larger receptive field with less parameter
- GoogLeNet:
- Inception Block (1x1 Kernel). Performs channel-wise dimension reduction. Performing 1x1 kernel convolution before another convolution as such 3x3 kernel operation can reduce the number of parameters significantly
- Deeper layer and significantly ligther
- ResNet:
- Deeper layers did not train after certain number of operations
- Channeling the input to the output channel directly with or without inception block. The layer aims to train on x + f(x)
- DenseNet:
- DenseNet is similar with ResNet, but it concatenates instead of adding. The channel increases geometrically, thus need to use 1x1 kernel convolution
- Segmentation
- Fully Convolutional Network (convolutionalization):
- Removing the dense layer and replacing it with convolutional layer
- Number of the pararmeters are the same
- Result layer small. Need to size up (deconvolution)
- Not the other way around of deconvolution because the lost information due to small scale does not come back
- R-CNN:
- Many random bounding box + CNN
- Run the nn for every bounding box, slow
- SPPNet:
- Run the nn only one time for one image. Use on;y the corresponding weights of that bounding box
- Fast R-CNN:
- SSPNet + bouding box regression (how to move the bb)
- Faster R-CNN:
- Network also selects the bounding box
- Region Proposal Network + Fast R-CNN -> Faster R-CNN
- Region Proposal Network?
- Anchor Boxes: presized detection boxes
- Anchor Boxes: the nn decides what size range, how much the size should change, and whether the box is useful
- YOLO
- extremely fast object dectection (bounding box)
- No discrete region proposal network, thus faster
- Divides the input image into NxN grid
- Each bounding box is reponsible for selecting five cells and classifying the class
- Output: NxNx5xC