Skip to content

UPerNet

Kim Na Young edited this page May 2, 2022 · 2 revisions

scene recognition, object detection, texture recognition, material recognition taskλ₯Ό λ™μ‹œμ— ν•΄κ²°ν•˜λŠ” frameworkλ₯Ό μ œμ•ˆ

Model Keypoint

  • FPN
  • PPM
  • Head

Architecture

ν™”λ©΄ 캑처 2022-05-02 152451
  1. FPN (Feature Pyramid Network)

    • multi-level feature representationsλ₯Ό μ‚¬μš©ν•˜λŠ” feature extractor
    • pyramidal hierarchy ꡬ쑰
    • top-down architecture + lateral connections -> high-level semantic informationλ₯Ό middle ν˜Ήμ€ low level informationκ³Ό μœ΅ν•©
  2. PPM (Pyramid Pooling Module)

    • backbone network의 λ§ˆμ§€λ§‰ layer에 μΆ”κ°€
    • FPN의 top-down branch둜 μ§„ν–‰ν•˜κΈ° 전에 PPM μœ„μΉ˜
    • 효과적인 global prior representations
  3. Head

    • scene recognition, object detection, texture recognition, material recognition taskλ₯Ό λ™μ‹œμ— ν•΄κ²° κ°€λŠ₯ν•˜λ„λ‘ λ§Œλ“¦
      • single networkμ—μ„œ, multiple level에 μ‘΄μž¬ν•˜λŠ” visual attributesλ₯Ό parseν•˜κ³  unify ν•  수 있음
    • Scene Head / Object Head / Part Head / Material Head / Texture Head

Scene Head

ν™”λ©΄ 캑처 2022-05-02 155056
  • Conv 3x3 -> GAP -> Classifier둜 ꡬ성
  • image-level의 highest-level information이 ν•„μš”ν•˜λ―€λ‘œ, GAP 적용

Object Head / Part Head

ν™”λ©΄ 캑처 2022-05-02 155104
  • Conv 3x3 -> Classifier둜 ꡬ성
  • μ‹€ν—˜ κ²°κ³Ό, FPN의 λͺ¨λ“  feature map을 μœ΅ν•©ν•˜μ—¬ μ‚¬μš©ν•˜λŠ” κ²½μš°κ°€, highest resolution의 feature map만 μ‚¬μš©ν•˜λŠ” κ²½μš°λ³΄λ‹€ κ²°κ³Όκ°€ 더 μ’‹μ•˜μŒ

Material Head

ν™”λ©΄ 캑처 2022-05-02 155104
  • material recognition을 μœ„ν•΄μ„œ context informationκ³Ό local featureκ°€ ν•„μš”
  • λͺ¨λ“  feature map을 μœ΅ν•©ν•΄μ„œ μ‚¬μš©ν•˜λŠ” λŒ€μ‹ , highest resolution의 feature map을 μ‚¬μš©

Texture Head

ν™”λ©΄ 캑처 2022-05-02 155114
  • μ—¬λŸ¬ convolutional layerλ₯Ό μΆ”κ°€ν•˜λŠ” κ²ƒμœΌλ‘œ ꡬ성
  • 맀 pixelλ§ˆλ‹€ texture label을 μ˜ˆμΈ‘ν•˜λ„λ‘ λ§Œλ“¦
  • Texture branch의 gradientλŠ” back-propagation λ˜μ§€ μ•ŠμŒ -> Texture Head만 ν•™μŠ΅ν•˜λ„λ‘

Reference

AITech study archive CV wiki

Image Classification

Object detection

Segmentation

Human Pose Estimation

CNN Visualization

Image Generation

Multi-modal Learning

Clone this wiki locally