Skip to content
Jeongsu Lee edited this page Jun 6, 2022 · 2 revisions
  • Beit๋Š” NLP๋ถ„์•ผ์—์„œ ๊ฐ€์žฅ ํฐ ์˜ํ–ฅ๋ ฅ์„ ๋ฏธ์นœ BERT์˜ ํ›ˆ๋ จ๋ฐฉ์‹์ธ Masked language modeling์„ ์ด๋ฏธ์ง€ Task์— ์ ์šฉํ•œ ๋ชจ๋ธ
  • Pre-trained๋œ DALL-E tokenizer๋ฅผ ์ด์šฉํ•ด ์ด๋ฏธ์ง€ ํŒจ์น˜๋“ค์„ tokenizerํ•˜์—ฌ BERT๋ฐฉ์‹์œผ๋กœ ๋ชจ๋ธ์„ ํ›ˆ๋ จ

Model Keypoint

  • DALL-E tokenizer
  • BERT

Architecture

DALL-E tokenizer

Decoder

Pre-Training BEIT : Masked Image Modeling(MIM)

Experiments

แ„‰แ…ณแ„แ…ณแ„…แ…ตแ†ซแ„‰แ…ฃแ†บ 2022-05-02 แ„‹แ…ฉแ„’แ…ฎ 3 35 53

Reference
https://arxiv.org/pdf/2106.08254.pdf
https://velog.io/@rucola-pizza/%EB%85%BC%EB%AC%B8%EB%A6%AC%EB%B7%B0BEIT-Pre-Training-of-Image-Transformer

AITech study archive CV wiki

Image Classification

Object detection

Segmentation

Human Pose Estimation

CNN Visualization

Image Generation

Multi-modal Learning

Clone this wiki locally