How to use the library in multiple GPUs #1036

Chunontherocks · 2022-05-24T12:38:56Z

Chunontherocks
May 24, 2022

Hi,

I have recently used the stanza 1.3.0 version for testing. I would like to inquire a few questions related to this.

First, I am wondering if there are some methods or samples how to use the library in multiple GPUs by stanza 1.3.0 version and torch 1.11.0+cu113 version.

However, if these are not available, I would like to further know how we can possibly improve the NER model training time.

Thanks!

AngledLuffa · 2022-05-24T15:43:17Z

AngledLuffa
May 24, 2022
Maintainer

No part of the package uses multiple GPUs at the moment. However, training an NER model takes about an hour on a 2080ti, which doesn't seem too long. How long is it taking you?

…

On Tue, May 24, 2022, 5:39 AM Chunontherocks ***@***.***> wrote: Hi, I have recently used the stanza 1.3.0 version for testing. I would like to inquire a few questions related to this. First, I am wondering if there are some methods or samples how to use the library in multiple GPUs by stanza 1.3.0 version and torch 1.11.0+cu113 version. However, if these are not available, I would like to further know how we can possibly improve the NER model training time. Thanks! — Reply to this email directly, view it on GitHub <#1036>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWJYOY27546IYYQFC5DVLTEW5ANCNFSM5WZNRANQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

1 reply

Chunontherocks May 25, 2022
Author

Hi there,

I spent more than a week to train the model on a A5000. Then, I checked the GPU processing that just used 30%.

Before training, I prepared the data: the dev.json (411K), the test.json (411K) and the train.json (1.3M).

I don't know whether the time I spent is too much or appropriate, and whether the data size is too big to train so it spent more time.

However, since we can't use multiple GPUs, I would like to know if there are some other ways to reduce the processing time and some places that I need to check again.

Thanks.

AngledLuffa · 2022-05-25T17:30:32Z

AngledLuffa
May 25, 2022
Maintainer

Those sizes are in bytes? As in, 1.3MB? It seems very problematic that it would take a week to train a model of that size! Is this a public dataset or something you can share with us so we can take a look? I wonder if there's something degenerate happening which we need to check for, such as everything merged into one long sentence, for example. For reference, one of the datasets which I did recently in an hour or two is Germeval2014, which is 4MB as a .bio or 20MB in our .json format

…

On Wed, May 25, 2022 at 5:30 AM Chunontherocks ***@***.***> wrote: Hi there, I spent more than a week to train the model on a A5000. Then, I checked the GPU processing that just used 30%. Before training, I prepared the data: the dev.json (411K), the test.json (411K) and the train.json (1.3M). I don't know whether the time I spent is too much or appropriate, and whether the data size is too big to train so it spent more time. However, since we can't use multiple GPUs, I would like to know if there are some other ways to reduce the processing time and some places that I need to check again. Thanks. — Reply to this email directly, view it on GitHub <#1036 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWM6H6MIGE2SRTTNNKDVLYMNBANCNFSM5WZNRANQ> . You are receiving this because you commented.Message ID: ***@***.***>

1 reply

Chunontherocks May 26, 2022
Author

Yes, the sizes are KB and MB. Unfortunately, these are private datasets, so I can't share them with you. I would like to know what one long sentence you said is and has more details or samples. Thanks.

AngledLuffa · 2022-05-27T07:59:30Z

AngledLuffa
May 27, 2022
Maintainer

Without being able to see what's going wrong, I'm just guessing. However, one guess that comes to mind is that if the sentences are of unusual length, that might cause a slowdown. Actually, the model itself would train the same as long as the sentences fit in the GPU, but it would loop over the same data many more times before hitting the triggers to check for the end of training.

There should be a block like this when you first start:

2022-05-27 01:26:22 INFO: Loading data with batch size 32...
2022-05-27 01:26:23 INFO: Loaded 7050 sentences of training data
2022-05-27 01:26:24 DEBUG: 221 batches created.
2022-05-27 01:26:24 INFO: Loaded 507 sentences of dev data
2022-05-27 01:26:24 DEBUG: 16 batches created.
2022-05-27 01:26:24 INFO: Training tagger...

What does it say for your dataset?

Another possibility is that you can send me the dataset offline, and I'll delete it after taking a look.

1 reply

Chunontherocks May 30, 2022
Author

It was similar to be shown on the command line when the training was starting.
I would like to know where the email I can send the datasets. Thanks!

AngledLuffa · 2022-05-30T03:21:49Z

AngledLuffa
May 30, 2022
Maintainer

You should be able to see it on my profile

…

On Sun, May 29, 2022 at 8:01 PM Chunontherocks ***@***.***> wrote: It was similar to be shown on the command line when the training was starting. I would like to know where the email I can send the datasets. Thanks! — Reply to this email directly, view it on GitHub <#1036 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWN7T4GUZ67A6K4HKM3VMQVSHANCNFSM5WZNRANQ> . You are receiving this because you commented.Message ID: ***@***.***>

1 reply

Chunontherocks May 30, 2022
Author

An email has been sent. Please check your inbox. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use the library in multiple GPUs #1036

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to use the library in multiple GPUs #1036

Chunontherocks May 24, 2022

Replies: 4 comments · 4 replies

AngledLuffa May 24, 2022 Maintainer

Chunontherocks May 25, 2022 Author

AngledLuffa May 25, 2022 Maintainer

Chunontherocks May 26, 2022 Author

AngledLuffa May 27, 2022 Maintainer

Chunontherocks May 30, 2022 Author

AngledLuffa May 30, 2022 Maintainer

Chunontherocks May 30, 2022 Author

Chunontherocks
May 24, 2022

Replies: 4 comments 4 replies

AngledLuffa
May 24, 2022
Maintainer

Chunontherocks May 25, 2022
Author

AngledLuffa
May 25, 2022
Maintainer

Chunontherocks May 26, 2022
Author

AngledLuffa
May 27, 2022
Maintainer

Chunontherocks May 30, 2022
Author

AngledLuffa
May 30, 2022
Maintainer

Chunontherocks May 30, 2022
Author