ResNet

ResNet 은 최초로 100 개 이상의 Layer 를 쌓으면서, layer가 깊어져도 성능이 향상되지 않던 한계 (Degradation problem) 를 극복하며 좋은 성능을 이끌어낸 모델
인간의 능력을 뛰어넘으며 ImageNet Classification 뿐아니라, localization, Detection, Segmentation 도 1등 차지

Model Keypoint

Architecture
Shortcut connection

Architecture

시작 part
- 7x7 Conv layer 1개
- He initialization
  - ResNet 에 적합한 initialization
  - He 가 아닌 일반적인 initialization 을 사용하면,
    skip connection 과정에서 처음부터 더해지는 값이 커지게된다
Residual Block part
- Stack residual blocks
- Every residual block has two 3x3 conv layers
  - residual block 내부의 layer 는 모두 3x3 conv 사용
  - parameter 수가 급격하게 증가하지 않으며, 상대적으로 연산이 빠른 이유가 된다
- Batch norm after every conv layer
- Doubling the number of filters and spatially down-sampling by stride 2 instead of spatial pooling
  - residual block part 가 바뀔때마다,
    down sampling 을 통해 공간상의 크기는 절반으로 줄고, 채널 수는 두 배씩 증가
최종 출력
- Only a single FC layer for output classes
- avg pool 을 적용한 후, 하나의 FC layer 로 구성
cf) stride 2로 spatially down sampling 하는 이유
- downsampling : 이미지의 크기를 줄이는 과정
  - 참고 : 08. Downsampling (tistory.com)
- Convolutional Neural Network에서 feature의 resolution을 줄일 때, stride=2 또는 max/average pooling을 이용하여 resolution을 1/2로 줄이는 방법을 많이 사용
- convolution layer를 이용하여 stride = 2로 줄이면 학습 가능한 파라미터가 추가되므로 학습 가능한 방식으로 resolution을 줄이게 되나
  그만큼 파라미터의 증가 및 연산량이 증가하게 됩니다.
- pooling을 이용하여 resolution을 줄이게 되면 학습과 무관해지며 학습할 파라미터 없이 정해진 방식 (max, average)으로 resolution을 줄이게 되어 연산 및 학습량은 줄어들지만 convolution with stride 방식보다 성능이 좋지 못하다고 알려져 있다
- 참고 : Stride와 Pooling의 비교 - gaussian37
cf) He initializaiton
- He 초기화(He Initialization)는 ReLU를 활성화 함수로 사용할 때 추천되는 초기화 방법입니다.
  - 참고 : 가중치 초기화 (Weight Initialization) · Data Science (yngie-c.github.io)
cf) Batch normalization
- Batch normalization 는 초기 가중치 설정 문제와 비슷하게 가중치 소멸 문제(Gradient Vanishing) 또는 가중치 폭발 문제(Gradient Exploding)를 해결하기 위한 접근 방법 중 하나
- Batch Normalization 의 효과
- 학습 속도가 개선된다 (학습률을 높게 설정할 수 있기 때문)
- 가중치 초깃값 선택의 의존성이 적어진다 (학습을 할 때마다 출력값을 정규화하기 때문)
- 과적합(overfitting) 위험을 줄일 수 있다 (드롭아웃 같은 기법 대체 가능)
- Gradient Vanishing 문제 해결
참고 : 문과생도 이해하는 딥러닝 (10) - 배치 정규화 (tistory.com)

Shortcut connection

skip connection 이라고도 한다
기존의 gradient 가 vanishing 되더라도, shortcut connection (identity) 에 의한 gradient 가 남아있기에,
gradient vanishing 문제를 해소할 수 있게됨
더 깊게 layer 를 쌓을 수 있게되어 degradation problem 을 해결할 수 있게 됨

Plain Layer
- As the layers get deeper, it is hard to learn good $H(x)$ directly
- $H(x)$ 라는 mapping 을 학습할 때에,
  layer 를 높게 쌓아서 곧바로 $x$ 에서 $H(x)$ 의 관계를 학습하려고하면,
  복잡하기에 학습하기 어렵다
Residual block
- 입력으로 주어진 $x$ (identity) 외의 잔여 부분 (residual) 만 모델링하여,
  학습하게끔 변경
- Target function : $H(x) = F(x) + x$
- Residual function : $F(x) = H(x) - x$

Degradation problem

As the network depth increases, accuracy gets saturated -> degrade rapidly
모델의 parameter 가 많아지면 Overfitting 에 취약할 것이라고 생각했었지만,
더 깊은 층 (56-layer) 의 training, test error 가 모두 더 얕은 층 (20-layer) 의 error 보다 안좋은
성능을 보여주며 Overfitting 문제가 아닌 degradation problem (optimization) 문제라는 결론을 냄
- cf) Overfitting 이 문제였다면, 더 깊은 층의 training error 는 더 좋은 성능을 내지만,
  test error 에서 더 안좋은 성능을 보여주었어야한다

Analysis of residual connection

gradient 가 지나갈 수 있는 2^n^ 개의 input, output path 가 생성된다
- 다양한 경우의 수를 갖는 경로를 통해서 복잡한 mapping 을 학습할 수 있게 된다는 분석
- residual block 이 하나 추가될 때마다 경로 수가 2배씩 증가
Residual networks have $O(2^n)$ implicit paths connecting input and output,
and adding a block doubles the number of paths

AITech study archive CV wiki

Image Classification

Object detection

Segmentation

Human Pose Estimation

CNN Visualization

Image Generation

Multi-modal Learning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResNet

Model Keypoint

Architecture

Shortcut connection

Degradation problem

Analysis of residual connection

AITech study archive CV wiki

Image Classification

Object detection

Segmentation

Clone this wiki locally