GenBank Submit Tool #5403
Replies: 37 comments
-
Our records were just to test, not actually submitted, and we do need a
real test case and documentation.
…On Fri, Sep 3, 2021, 6:07 PM Carla Cicero ***@***.***> wrote:
* [EXTERNAL]*
@dustymc <https://github.com/dustymc> @campmlc
<https://github.com/campmlc> @acdoll <https://github.com/acdoll> I want
to try using the GenBank submit tool for ~1000 Steller's Jay sequences.
Mariel or Andy, I see you have some batches there as test. Has anyone
actually successfully submitted to GenBank using this? I am unclear how
this works, and we don't seem to have documentation (which I can write if I
can figure it out.
Dusty, initial questions/comments:
1.
The People fields - you choose an Arctos agent, then need to manually
add first_name, middle_initial, last_name. Is that necessary and can't it
be done magically by choosing the Arctos agent? What are those separate
fields used for?
2.
Agent Role - Are 'sequence author' and 'reference author' defined
somewhere? And how are these fields used?
I assume that 'sequence author' is the person generated the sequences, but
I'm not sure - and in this case, lots of different students helped with
that but are not co-authors. I assume that 'reference author' is a
co-author on the publication. If you choose a role of 'sequence author'
then that person also can't be a 'reference author' - ?
1.
Under Sequences, I get this ERROR: syntax error at or near "jion"
Position: 223 - I have no idea what that means, but have not tried doing
anything yet with the actual sequences.
2.
Where do I go from here? I've added the Batch Name, Contact Agent
Details, ref_title, and People. Now what do I do???
batch_id=245095978
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3903>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBD3I7XV5GJ4GYSP4PDUAFPMBANCNFSM5DMVD26A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Beta Was this translation helpful? Give feedback.
-
That's my plan, but I have no idea how this is supposed to work. I have real data and am happy to document, just need some guidance from @dustymc |
Beta Was this translation helpful? Give feedback.
-
GenBank wants strange formats - you can save that (as https://arctos.database.museum/info/ctDocumentation.cfm?table=ctaddress_type#formatted_json) so it should be a one-time thing.
That's just what GB accepts - if they have docs I don't know where.
Patched - so you can add sequences. I think I'll have to reinstall some software before you can package. |
Beta Was this translation helpful? Give feedback.
-
@dusty - I have > 1000 sequences. Do I need to enter them one by one? What would it take to be able to upload a batch of sequences in Fasta file format? This is from the GenBank tool: |
Beta Was this translation helpful? Give feedback.
-
Send me your data. |
Beta Was this translation helpful? Give feedback.
-
Yes, I was just trying to see how this works. There isn't anything to load there, but I don't see a way to delete. @dustymc can you wipe mine out of there. |
Beta Was this translation helpful? Give feedback.
-
@dustymc OK, will do. I am double-checking and need to get the file in fasta format. Are you going to use that as a test for uploading a fasta file, or ??? We could first do a test with just a few sequences manually input, and then try the fasta upload. Or what are you thinking/planning? Thanks. |
Beta Was this translation helpful? Give feedback.
-
I"m not thinking anything, I'm just trying to understand. I know very little about how any of this works, or why you might want to use this thing over direct submissions, or - well, anything! Data sometimes help with that. And yes, if you have individual files I just need a few. |
Beta Was this translation helpful? Give feedback.
-
I'm not sure either what the advantage is vs direct submission, but I'll send you the full file and then we can discuss. I suppose it would be useful if it automatically made the GenBank links, but often the GenBank file will include Arctos and non-Arctos specimens so we'd have to do it in both places anyway - ? I'd like to get input from @campmlc. Maybe we can discuss briefly tomorrow at the AWG meeting? |
Beta Was this translation helpful? Give feedback.
-
@dustymc I created loan 9126 (transaction_i=21127705) with 1055 sequences associated with MVZ specimens. I will send you the fasta file separately. I double-checked and the number of sequences in the fasta file should match the specimens in the loan. Sequences are labeled something like '>MX_Chiapas_C1056_MVZ188144' so you can extract the MVZ numbers from the sequence label. Feel free to work some magic and let me know if you think this has possibilities that would be useful to Arctos. If we are going to do this, I have some initial suggestions on the form based on going through the first few Bankit steps in GenBank:
Thanks! |
Beta Was this translation helpful? Give feedback.
-
@ccicero I'll try to look at this during tomorrow's test outage. I think I can get to data from this, but I don't think that's something I can push to a UI. I don't know how fasta works, but if something like
or
or some form of explicit/predictable "headers" is possible/practical then I think I could probably accept that as a file. (I suppose that would need to be part of loan instructions or something???) One obvious problem is that there's no good way to identify the parts in these data - "some subsample of MVZ104097" could probably be stuffed in some remark somewhere, and would probably get a knowledgeable person where they need to be, but "MVZ104097" is not and can't be used for the actual part identifier. #3630 provides a solution to that, but @campmlc 's latest comment precludes implementation so I sort of think that idea is just dead. So, the packager can...
Is there any value in that, or is there some other use case where there is value - should we keep this tool alive? |
Beta Was this translation helpful? Give feedback.
-
@dustymc Fasta format requires one header line for each sequence, so your first example won't work but the second would be ok. However, there is no predictability to what people put there. I also think that there is no point in us re-inventing what GenBank does, and the types of data uploaded to GenBank are going to change with new (and current) genomic methods. So I'm not sure that it's worth pursuing a way of batch loading data to GenBank from Arctos. That said, I think we should think about the #2 option. What about something like this as a use case:
??? |
Beta Was this translation helpful? Give feedback.
-
How do you transmit specimen_voucher to GenBank? You've registered MVZ:Bird (https://handbook.arctosdb.org/documentation/genbank.html), they know what to do with that, but I don't think they can dig it out of the sequence name either.
That needs an issue so it can have focused discussion, but the technical bits are likely trivial (and it sounds like a pretty great idea to me).
https://arctos.database.museum/info/ctDocumentation.cfm?table=ctloan_status#returned |
Beta Was this translation helpful? Give feedback.
-
How do you transmit specimen_voucher to GenBank? You've registered MVZ:Bird (https://handbook.arctosdb.org/documentation/genbank.html), they know what to do with that, but I don't think they can dig it out of the sequence name either. --> Correct. It's put in a separate field as part of the upload process. I'm thinking maybe I should upload these via the normal process but take screenshots/documentation along the way so you can see the process. Would that help? Closed is currently "All items and products such as reprints and measurements are returned and accounted for." But it is often years before we get a publication or GenBank links or any other loan conditions. One thing that would help is to track whether the signed invoice was returned other than in remarks - adding a field for 'Signed invoice returned' [enter date] would indicate that the transaction was successful (i.e., shipment not lost) and allow us to search for all loans lacking a signed invoice but still keep the loan open if we're waiting on 'products' from the research. Another issue is that when we do get pubs or GenBank numbers, they don't have the loan number and it's not always easy to track what loan(s) they came from. Maybe we should add loan number to the ID bulkloader CSV generated from a loan? In fact, we could also use this tool to generate a citation bulkloader to help with the issue of people not citing our specimens. If we generate both a citation bulkloader and ID (NCBI) bulkloader directly from the loan and send them to the recipient along with the invoice, maybe it will push people to properly cite our specimens?! |
Beta Was this translation helpful? Give feedback.
-
AND link the GenBank accession and citation to the loan and to the parts in
the loan via a part identifier.
…On Mon, Sep 13, 2021 at 12:56 PM Carla Cicero ***@***.***> wrote:
* [EXTERNAL]*
How do you transmit specimen_voucher to GenBank? You've registered
MVZ:Bird (https://handbook.arctosdb.org/documentation/genbank.html), they
know what to do with that, but I don't think they can dig it out of the
sequence name either.
*--> Correct. It's put in a separate field as part of the upload process.
I'm thinking maybe I should upload these via the normal process but take
screenshots/documentation along the way so you can see the process. Would
that help?*
Closed is currently "All items and products such as reprints and
measurements are returned and accounted for." But it is often years before
we get a publication or GenBank links or any other loan conditions. One
thing that would help is to track whether the signed invoice was returned
other than in remarks - adding a field for 'Signed invoice returned' [enter
date] would indicate that the transaction was successful (i.e., shipment
not lost) and allow us to search for all loans lacking a signed invoice but
still keep the loan open if we're waiting on 'products' from the research.
Another issue is that when we do get pubs or GenBank numbers, they don't
have the loan number and it's not always easy to track what loan(s) they
came from. Maybe we should add loan number to the ID bulkloader CSV
generated from a loan? In fact, we could also use this tool to generate a
citation bulkloader to help with the issue of people not citing our
specimens. If we generate both a citation bulkloader and ID (NCBI)
bulkloader directly from the loan and send them to the recipient along with
the invoice, maybe it will push people to properly cite our specimens?!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3903 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBGTAPG7UE5UCGOHULTUBZCPBANCNFSM5DMVD26A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Beta Was this translation helpful? Give feedback.
-
What is this and how does a collection get one? I think @ccicero was looking into this and discovered it was no longer a thing? We removed the "register with Genbank" step from data migration because of that discussion which I cannot find. |
Beta Was this translation helpful? Give feedback.
-
That's a question for GenBank, but I can't imagine how a few things could work without it. |
Beta Was this translation helpful? Give feedback.
-
Do we have documentation on how to get a collection registered at GenBank? |
Beta Was this translation helpful? Give feedback.
-
No we don't and as many times as I've asked I have never figured out exactly what that means or how to get it done.... |
Beta Was this translation helpful? Give feedback.
-
I don't know what the process is either, but I'm happy to help figure it out if there's a collection in need of being registered. |
Beta Was this translation helpful? Give feedback.
-
Looks like a lot of collections, from your list.
…On Tue, Feb 8, 2022, 5:21 PM dustymc ***@***.***> wrote:
* [EXTERNAL]*
I don't know what the process is either, but I'm happy to help figure it
out if there's a collection in need of being registered.
—
Reply to this email directly, view it on GitHub
<#3903 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBH7OBFZ5M5CGFZ6LZTU2GXRLANCNFSM5DMVD26A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
you might contact the Curator of Biosciences at NCBI, that's who's recently handled any collections registration for me or updating institutional info, etc.
Shobha Sharma, PhD
Staff Scientist
GenBank Taxonomy
NCBI/NLM/NIH
***@***.***
[https://training.knowbe4.com/pab_uploads_prefix/pab_icon/6661FF0593A413FEE166DD9DE2FFE18C/icon_64x64.png]
To:
*
John Demboski
Cc:
*
Sharma, Shobha (NIH/NLM/NCBI) [E] ***@***.***>
________________________________
From: Mariel Campbell ***@***.***>
Sent: Tuesday, February 8, 2022 6:15 PM
To: ArctosDB/arctos ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [ArctosDB/arctos] GenBank Submit Tool (#3903)
Looks like a lot of collections, from your list.
On Tue, Feb 8, 2022, 5:21 PM dustymc ***@***.***> wrote:
* [EXTERNAL]*
I don't know what the process is either, but I'm happy to help figure it
out if there's a collection in need of being registered.
—
Reply to this email directly, view it on GitHub
<#3903 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBH7OBFZ5M5CGFZ6LZTU2GXRLANCNFSM5DMVD26A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
—
Reply to this email directly, view it on GitHub<#3903 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABLZA2QM5CRWXEPDP5GLRZLU2G52ZANCNFSM5DMVD26A>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
What about updating collection information on Genbank? I just checked the UWYMV, and while our main collections are there, our contact information is out of date. |
Beta Was this translation helpful? Give feedback.
-
I don't know of any way I can update either - you can try emailing linkout at ncbi.nlm.nih.gov or John's contact above. |
Beta Was this translation helpful? Give feedback.
-
Go directly to Shobha for institutional updates and registering collections:
Shobha Sharma, PhD
Staff Scientist
GenBank Taxonomy
NCBI/NLM/NIH
***@***.******@***.***>
From: dustymc ***@***.***>
Date: Thursday, February 10, 2022 at 2:08 PM
To: ArctosDB/arctos ***@***.***>
Cc: John Demboski ***@***.***>, Comment ***@***.***>
Subject: Re: [ArctosDB/arctos] GenBank Submit Tool (#3903)
I don't know of any way I can update either - you can try emailing linkout at ncbi.nlm.nih.gov or John's contact above.
—
Reply to this email directly, view it on GitHub<#3903 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABLZA2UBQSXP3OARXF6BRCLU2QSNRANCNFSM5DMVD26A>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
This issue has split into two: GenBank submit tool, and registering with GenBank. Re: the submit tool, I recently went through a GenBank submission for >1000 sequences and I don't really see how we can develop a tool for that. Moreover, what are we trying to accomplish? It's a complicated process and lots of different options depending on the type of data. I tried to take screenshots (see below). I think the tools we have for looking for GenBank numbers that might match Arctos records are great, but I can't see what we want to try and do beyond that. My two cents, but happy to discuss further. |
Beta Was this translation helpful? Give feedback.
-
@jrdemboski can you send me her email via email so that I can have it to give to new collections that want to do this? |
Beta Was this translation helpful? Give feedback.
-
I wonder if we can somehow work this into Manage Collection. Have options for 'Register with Genbank' and whatever is required to do that, and somehow get this information to GenBank (Shobha for now, but something more generic so that it's not dependent on a single person)? It would be helpful to know what they need for this. |
Beta Was this translation helpful? Give feedback.
-
Yes it would. |
Beta Was this translation helpful? Give feedback.
-
Agree. I foubd Shoba's email |
Beta Was this translation helpful? Give feedback.
-
@dustymc @campmlc @acdoll I want to try using the GenBank submit tool for ~1000 Steller's Jay sequences. Mariel or Andy, I see you have some batches there as test. Has anyone actually successfully submitted to GenBank using this? I am unclear how this works, and we don't seem to have documentation (which I can write if I can figure it out.
Dusty, initial questions/comments:
The People fields - you choose an Arctos agent, then need to manually add first_name, middle_initial, last_name. Is that necessary and can't it be done magically by choosing the Arctos agent? What are those separate fields used for?
Agent Role - Are 'sequence author' and 'reference author' defined somewhere? And how are these fields used?
I assume that 'sequence author' is the person generated the sequences, but I'm not sure - and in this case, lots of different students helped with that but are not co-authors. I assume that 'reference author' is a co-author on the publication. If you choose a role of 'sequence author' then that person also can't be a 'reference author' - ?
Under Sequences, I get this ERROR: syntax error at or near "jion" Position: 223 - I have no idea what that means, but have not tried doing anything yet with the actual sequences.
Where do I go from here? I've added the Batch Name, Contact Agent Details, ref_title, and People. Now what do I do???
batch_id=245095978
Beta Was this translation helpful? Give feedback.
All reactions