Training fails on small vocabulary (V<8192) #778

austinleedavis · 2024-10-10T05:28:28Z

austinleedavis
Oct 10, 2024

Problem:
When training GPT2 with a vocab <8192 (=128*64), the process freezes upon entering fused_classifier_kernel5 at the start of training.

Goal:
I'm trying to pre-train a GPT2 with a custom tokenizer (vocab_size=72) on a custom dataset.

Tests:
I'm running on a single RTX3060 Mobile GPU. I have no problem tokenizing the dataset. However, I encounter the problem stated above when running the code. I've tested combinations of vocab_size (V) and padded_vocab_size (PV) starting from the defaults V=50257; PV=50304, then subtracted multiples of 128 until I could go no lower, stopping at V=VP=8192. Any combination for V and VP less than this threshold causes the process to freeze.

System Specs:
Host OS: Ubuntu
Processor: AMD Ryzen 9 5900HS
CUDA Version: 12.6
Diplay Driver Version: 560.35.03
GPU: NVIDIA GeForce RTX 3060 Laptop GPU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training fails on small vocabulary (V<8192) #778

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Training fails on small vocabulary (V<8192) #778

austinleedavis Oct 10, 2024

Replies: 0 comments

austinleedavis
Oct 10, 2024