number of citation : 224 (last checked at 23/08/09)

paper - https://arxiv.org/pdf/2104.05704.pdf

code - https://github.com/SHI-Labs/Compact-Transformers


Before Review

As the first paper review, the problem is 'complete the review of this paper.' To overcome this problem, choose a paper only with my interest. If I have no interest in it, it must be hard to complete the review, I think.

This paper talks about how to train a Transformer on small datasets.

In real life, data is very expensive and hard to collect. Even more, if data includes personal information, it makes more difficult. So I believe training data needs to be as small as possible.

That's why I have an interest in this paper.

0. Background Knowledge & Keywords

1 - Inductive bias

2 - Raster scan

3 - Residual Connection

1. Introduction

Convolutional neural networks (CNNs) have been the standard for computer vision, since the success of AlexNet.

Convolution and Pooling’s feature & Adventage for computer vision

  1. Sparse interaction

    In convolution calculation, input feature affect on specific region of output due to filters. So unlike FC Layer, CNN has locality and transition invariance.

  2. Weight sharing

    By using filter, weights can be drastically reduced. Compare to the image, filter is very small.

  3. Equivariant representations

    Convolution calculation is equivariant to shifting. It means, when image shifted left, output of convolution calculation shifted left as much as image shifted.