Training fails on small vocabulary (V<8192) #778
Unanswered
austinleedavis
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Problem:
When training GPT2 with a vocab <8192 (=128*64), the process freezes upon entering
fused_classifier_kernel5
at the start of training.Goal:
I'm trying to pre-train a GPT2 with a custom tokenizer (vocab_size=72) on a custom dataset.
Tests:
I'm running on a single RTX3060 Mobile GPU. I have no problem tokenizing the dataset. However, I encounter the problem stated above when running the code. I've tested combinations of
vocab_size
(V
) andpadded_vocab_size
(PV
) starting from the defaultsV=50257; PV=50304
, then subtracted multiples of 128 until I could go no lower, stopping at V=VP=8192. Any combination forV
andVP
less than this threshold causes the process to freeze.System Specs:
Host OS: Ubuntu
Processor: AMD Ryzen 9 5900HS
CUDA Version: 12.6
Diplay Driver Version: 560.35.03
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
Beta Was this translation helpful? Give feedback.
All reactions