Code Table Request - Genome ID #5570

campmlc · 2021-06-09T21:12:46Z

campmlc
Jun 9, 2021
Maintainer

[ Code Table Documentation is https://handbook.arctosdb.org/how_to/How-to-Use-Code-Tables.html ]

Goal
Make it possible to find genomes through a search of OtherIDs = Genome ID

Context
The genomics research community has no centralized repository for whole genomes, and currently genome data may be entered and accessible through a variety of differernt portals with differing levels of consistency and permanency in their urls. These include NCBI Assemblies, Biosamples, and other resources. Quotes from researchers asked about this:
"NCBI is a pain, but if I were to be searching for a reference genome I would search in the assemblies database as these are unique to an individual sample and experiment. "
"I'd used NCBI Assembly, NCBI BioSample, and NCBI BioProject as key terms for NCBI-associated genomic data. Honestly I archive my data with NCBI through SRA, but I use ENA to query/search for genomes and they use "Study", "Experiment", "Run", "Submission", "Accession", and "Taxon" IDs to identify genomes. You could integrate those labels as "ENA Study #", "ENA Experiment #" etc. or just link to "Genomic reads" or "Complete or partial genome assemblies". Raw reads are typically more valuable for reproducing or extending genomic research, whereas assembled genomes are used for reference-guided mapping assemblies. NCBI SRA numbers are included in ENA as "Submission" IDs. Here's an example and the reads for that example."

Given the current confusion, Arctos could provide identifiers for each of these links independently, but a researcher would have to know a priori which to search on or search for an increasingly longer list of potential urls.
We should certainly add these as OtherIDs - later issue.
But this request is to add an identifier = Genome ID where any possible link to genomic data could be entered, and which would allow researchers to search on a single identifier to locate any possible genomic info across a variety of platforms.
This would have to be free -text, and of course prone to error, which is why adding the other identifiers with real linkable urls to the record is advisable. This ID is primarily a search tool or flag that such info exists.

Table
https://arctos.database.museum/Admin/CodeTableEditor.cfm?action=editCollOIDT&tbl=ctcoll_other_id_type]

Value
Genome ID

Definition
An identifier, preferably a url, which references the external repository for genomic data for this record.

Collection type
Mamm, Bird, Herp, Amph, Rept, Fish, Ento, Inv, Para, Env, Herb, Mala, Zoo

Attribute data type
free text

Part tissue flag
yes

Priority
Very High

Jegelewicz · 2021-06-09T21:31:37Z

Jegelewicz
Jun 9, 2021

You know what would be nice? To have this ID be magically populated by any other "genome" ID that gets added....OR maybe we just need a flag in the code table "this other ID is a genome" so that anyone could search across all of them?

Sorry to throw a wrench in! None of the above makes this addition a bad idea - just thinking that it could be magical....

0 replies

campmlc · 2021-06-09T21:33:15Z

campmlc
Jun 9, 2021
Maintainer Author

I absolutely agree with a "genome" flag . . . possible? Or should we just move forward with this for now to make something that can work.

…

On Wed, Jun 9, 2021 at 3:31 PM Teresa Mayfield-Meyer < ***@***.***> wrote: * [EXTERNAL]* You know what would be nice? To have this ID be magically populated by any other "genome" ID that gets added....OR maybe we just need a flag in the code table "this other ID is a genome" so that anyone could search across all of them? Sorry to throw a wrench in! None of the above makes this addition a bad idea - just thinking that it could be magical.... — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#3652 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ7JBBO3LA3NNOBOBAQBJLTR7MURANCNFSM46M2QEGQ> .

0 replies

campmlc · 2021-06-09T21:33:33Z

campmlc
Jun 9, 2021
Maintainer Author

But we still need a way in the interface to search for "genomic data" On Wed, Jun 9, 2021 at 3:32 PM Mariel Campbell ***@***.***> wrote:

…

I absolutely agree with a "genome" flag . . . possible? Or should we just move forward with this for now to make something that can work. On Wed, Jun 9, 2021 at 3:31 PM Teresa Mayfield-Meyer < ***@***.***> wrote: > * [EXTERNAL]* > > You know what would be nice? To have this ID be magically populated by > any other "genome" ID that gets added....OR maybe we just need a flag in > the code table "this other ID is a genome" so that anyone could search > across all of them? > > Sorry to throw a wrench in! None of the above makes this addition a bad > idea - just thinking that it could be magical.... > > — > You are receiving this because you were assigned. > Reply to this email directly, view it on GitHub > <#3652 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ADQ7JBBO3LA3NNOBOBAQBJLTR7MURANCNFSM46M2QEGQ> > . >

0 replies

Jegelewicz · 2021-06-09T21:40:50Z

Jegelewicz
Jun 9, 2021

But we still need a way in the interface to search for "genomic data"

I think the ID proposed in the issue would do that IF it is consistently applied (EVERY record with a current GenBank ID ALSO gets one of these). Which seems like duplication of effort. AND people searching KNOW to search for that particular OtherID, which is highly unlikely. If we can just flag otherIDs in the code table as "genomic", then the work is done for us and IDs only need to be recorded once. Add "only search records with genomic identifiers" (like the require tissues button) and you get what you want.

0 replies

campmlc · 2021-06-09T21:59:35Z

campmlc
Jun 9, 2021
Maintainer Author

Can we put "Find all records with tissues", "Find all records with genomic data", "Find all records with sequence data" into some obvious search place, like in the Catalog Record box on search, but visible without "show more options"? Not just a tiny little check box hiding at top of page only for people who know where to look?

…

On Wed, Jun 9, 2021 at 3:41 PM Teresa Mayfield-Meyer < ***@***.***> wrote: * [EXTERNAL]* But we still need a way in the interface to search for "genomic data" I think the ID proposed in the issue would do that IF it is consistently applied (EVERY record with a current GenBank ID ALSO gets one of these). Which seems like duplication of effort. AND people searching KNOW to search for that particular OtherID, which is highly unlikely. If we can just flag otherIDs in the code table as "genomic", then the work is done for us and IDs only need to be recorded once. Add "only search records with genomic identifiers" (like the require tissues button) and you get what you want. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#3652 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ7JBDBTWOFKRG2XPXBJN3TR7NXFANCNFSM46M2QEGQ> .

0 replies

dustymc · 2021-06-10T15:20:01Z

dustymc
Jun 10, 2021
Maintainer

The objectives are not clear, or perhaps have shifted. I'm not sure if this is a UI issue or a data issue.

One of the proposed solutions is not consistent with #3593, while it seems that the data are mostly identical (there's an external resource of a certain type but in no particular place or format indicating a particular type of usage).

https://www.ncbi.nlm.nih.gov/genome/ exists but I have no idea how it ties in here.

I am adamantly opposed to any denormalization. "EVERY record with a current GenBank ID ALSO gets one of these" will simply not happen, cannot be necessary, and inevitably results in users finding only partial datasets.

0 replies

KyndallH · 2021-06-24T18:03:04Z

KyndallH
Jun 24, 2021

I agree with having a flag that tags individuals with genetic data. I do not want "genomic id" as an ID since that is so vague. I know it adds more on the id list but I want "Genbank", "NCBI BioSample", "BoLD", "Sequence Read Archive", and all the future ways they identify genetic information on outside databases.

0 replies

dustymc · 2021-06-24T18:16:23Z

dustymc
Jun 24, 2021
Maintainer

but I want "Genbank",

https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#genbank

"NCBI BioSample",

https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#biosample

BoLD

https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#bold_barcode_id

"Sequence Read Archive"

https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#ncbi_sequence_read_archive_run_id

One possibly stupid idea: group those by adding some common prefix ("GenBank" becomes "genetic junk: GenBank"). We've done something similar with other data (eg https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#cmnh__carnegie_museum_of_natural_history), so that's not an entirely new flavor of weird. The search is (and probably will remain) a select multiple, users can just pick all options they're interested in. (They can do that now, but they're scattered out.)

Alternate maybe equally stupid idea: The code table has a sort order column, it could also group those things - but the not-so-alphabetical sort makes me twitchy.

0 replies

KyndallH · 2021-06-24T18:23:10Z

KyndallH
Jun 24, 2021

Oh, I know they already exist! And we use them! From what I'm understanding from the discussion is that they want to get rid of those for a "Genomic ID" identifier to make searching for the data easier. I prefer the more descriptive identifiers.

0 replies

dustymc · 2021-06-24T18:25:22Z

dustymc
Jun 24, 2021
Maintainer

get rid of those for a "Genomic ID" identifier to make searching for the data easier.

Oh - yea, that would make things like creating the reciprocals on genbank somewhere between painful and impossible, I'm not a fan.

0 replies

Jegelewicz · 2021-06-24T19:33:07Z

Jegelewicz
Jun 24, 2021

Alternate maybe equally stupid idea: The code table has a sort order column, it could also group those things - but the not-so-alphabetical sort makes me twitchy.

Radical idea - add a column to the code table, "Other ID group". I bet that there are other things that could be grouped together for purposes like this. For instance, MSB could group NK with all of their other "MSB" type identifiers.

0 replies

campmlc · 2021-06-24T20:35:42Z

campmlc
Jun 24, 2021
Maintainer Author

I agree we need some way to " require genomes" in the same way we " require tissues" or find vouchers.

…

On Thu, Jun 24, 2021, 3:33 PM Teresa Mayfield-Meyer < ***@***.***> wrote: * [EXTERNAL]* Alternate maybe equally stupid idea: The code table has a sort order column, it could also group those things - but the not-so-alphabetical sort makes me twitchy. Radical idea - add a column to the code table, "Other ID group". I bet that there are other things that could be grouped together for purposes like this. For instance, MSB could group NK with all of their other "MSB" type identifiers. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#3652 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ7JBHQ5T7DIWA7XMU4TZLTUOB73ANCNFSM46M2QEGQ> .

0 replies

acdoll · 2021-06-24T21:01:12Z

acdoll
Jun 24, 2021
Maintainer

add a column to the code table, "Other ID group".

I like this idea. That Other Identifier Type list is getting pretty unwieldly (fortunately most of mine are near the top).
Group ideas:
General object IDs (collector number, field number, ear tag...)
Arctos Institution IDs (internal IDs used by our collections) - would this need a subgroup for each institution?
Extraneous Institution IDs (IDs used by non-Arctos institutions: rehab centers, government agency IDs...)
Online data repositories/aggreagtors (GBIF, Dryad, Genbank...)

0 replies

Jegelewicz · 2021-06-24T21:03:32Z

Jegelewicz
Jun 24, 2021

Arctos Institution IDs (internal IDs used by our collections) - would this need a subgroup for each institution?

I would skip this and just set up the institutional groups.

Online data repositories/aggreagtors (GBIF, Dryad, Genbank...)

defeats the purpose of putting all of the "genome" ids together but maybe we need to be able to assign IDs to multiple groups? Are we going overboard there?

0 replies

dustymc · 2021-06-25T14:21:44Z

dustymc
Jun 25, 2021
Maintainer

add a column

Given the uses of this, how's that functionally different than an embedded prefix?

Or are there uses beyond "pick from the list..."?

Are we saying that identifier types are somehow data objects in their own right, or is this some UI-thing, or ????

General object IDs (collector number, field number, ear tag...)

I'd not lump field number in there - it's (usually) for a different kind of thing (lot, sorta-I-think, rather than item).

(internal IDs used by our collections) - would this need a subgroup for each institution?

Some of those are functionally pre-printed collector numbers
Some of them find their way across institutions (due to collaborative projects and etc.)

I'm not seeing clear categories in the data, adding arbitrary classifications seems like it would just add confusion. "This is an MSB number" and it's attached to a DMNS record and users pull their hair out and run away screaming.....

0 replies

campmlc · 2021-06-25T16:56:30Z

campmlc
Jun 25, 2021
Maintainer Author

It would be good if identifiers had determiners and dates and remarks, for example, to tie a GenBank number to a particular citation. Or even better, if we could tie the GenBank number to the actual tissue part that was sampled - which means cataloging MaterialSamples

…

On Fri, Jun 25, 2021 at 10:11 AM Teresa Mayfield-Meyer < ***@***.***> wrote: * [EXTERNAL]* Which is maybe this? identifier types are somehow data objects in their own right I believe so, but I could be convinced that I am wrong I think we are asking a lot of these things and maybe we should be looking at them as more complex entities than we do now. See also #2847 <#2847> #2216 <#2216> #1902 <#1902> and maybe some I am missing? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3652 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ7JBDEXYB3SEIWQ6FSER3TUSTBPANCNFSM46M2QEGQ> .

0 replies

dustymc · 2021-06-25T16:59:48Z

dustymc
Jun 25, 2021
Maintainer

Some of those discussions are in regards to assertions that use the types, and some in regards to the types themselves. I think those are very different things (think taxonomy vs. identification) and that those discussions should not be confounded with each other, but I'm also open to the idea that we should be doing something radically different.

That's probably best discussed in a new/dedicated issue, BUT....

I'm sorta wondering if we need types at all. GenBank number (and maybe lots more) isn't necessarily a homogeneous thing, it's just a common place (url, API endpoint, format, etc. - maybe those are attributes of the type after all....) to store a fairly broad category of data. I'm not sure that any label we apply to the type (and so to all assertions using the type) can be adequate. Maybe we need some way to say "the data are at GenBank, and this is a [mitochondrial | whole genome | whatever] sequence" or "this NK number is a squirrel that some grad student ran over and dumped in the local museum, it has nothing to do with MSB or New Mexico or karyotypes."

That discussion really needs to start with big-picture goals; I'm not sure that sniping at the current model is going to lead anywhere useful. What, not how (for now), do we want to do with identifiers?

tie a GenBank number to a particular citation

#1257

which means cataloging MaterialSamples

No....

0 replies

KyndallH · 2021-06-25T17:42:25Z

KyndallH
Jun 25, 2021

I think a flag would be ideal plus a heck of a lot simpler than grouping all the different identifiers we have. The point is to be able to search for specimens with genetic data (has a Genbank, BioSample, BoLD number, etc.) without having to select all the different IDs under Other ID.

@campmlc Though are you wanting to find records that just have GENOMIC data (whole genome sequencing) or any genetic data (partial cyt_b_)?

0 replies

campmlc · 2021-06-25T17:50:10Z

campmlc
Jun 25, 2021
Maintainer Author

The original request was to be able to find specimens with whole or partial genomes. Whatever tool could also be used to flag specimens that have CT scans, for example, or other future data categories that may have multiple different identifiers or urls to related repositories.

…

On Fri, Jun 25, 2021 at 11:42 AM Kyndall Hildebrandt < ***@***.***> wrote: * [EXTERNAL]* I think a flag would be ideal plus a heck of a lot simpler than grouping all the different identifiers we have. The point is to be able to search for specimens with genetic data (has a Genbank, BioSample, BoLD number, etc.) without having to select all the different IDs under Other ID. @campmlc <https://github.com/campmlc> Though are you wanting to find records that just have GENOMIC data (whole genome sequencing) or any genetic data (partial cyt_b_)? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3652 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ7JBCLCNRRWQZFMMUQMS3TUS5YXANCNFSM46M2QEGQ> .

0 replies

dustymc · 2021-06-25T17:55:49Z

dustymc
Jun 25, 2021
Maintainer

(has a Genbank, BioSample, BoLD number, etc.)

That could be "just UI" - it's not great (updating the list used in the query and updating the identifiers would be completely separate, for example) but I think it's workable.

flag specimens that have CT scans,

#3652 (comment)

0 replies

KyndallH · 2021-06-25T17:58:26Z

KyndallH
Jun 25, 2021

"whole or partial genomes" so in my opinion, this request would exclude Genbank numbers. Yes or no?

"just UI" makes it sound easy.

0 replies

dustymc · 2021-06-25T18:03:24Z

dustymc
Jun 25, 2021
Maintainer

"just UI" makes it sound easy.

Yep, I'd just need

the list of ID types to match, and
a plan (label, object type, position) for adding it to the UI

0 replies

campmlc · 2021-06-25T18:06:05Z

campmlc
Jun 25, 2021
Maintainer Author

The idea was that a genome flag would be distinct from a GenBank flag. But if we have the option of a variety of flags, we could have an "ncbi" flag or even more specific - nucleotide, protein, or even specific gene, cytb or CO1. We could have "any genetic info" flag . . . That is, if it is easy to do this in the UI

…

On Fri, Jun 25, 2021 at 11:58 AM Kyndall Hildebrandt < ***@***.***> wrote: * [EXTERNAL]* "whole or partial genomes" so in my opinion, this request would exclude Genbank numbers. Yes or no? "just UI" makes it sound easy. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3652 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ7JBBE6WJ3ZB46PEDWX3TTUS7UXANCNFSM46M2QEGQ> .

0 replies

dustymc · 2021-06-25T18:14:08Z

dustymc
Jun 25, 2021
Maintainer

nucleotide, protein, or even specific gene, cytb or
CO1.
We could have "any genetic info" flag . . .

That's not UI, that's something on the order of #3652 (comment)

0 replies

Jegelewicz · 2021-06-25T21:18:48Z

Jegelewicz
Jun 25, 2021

tie the GenBank number to the actual tissue part that was sampled - which means cataloging MaterialSamples

see #3630 (comment)

0 replies

dustymc · 2022-02-16T16:59:21Z

dustymc
Feb 16, 2022
Maintainer

Maybe this is better addressed in #4101? #3630 is definitely related. Both seem abandoned.

If there's something actionable in this, please clarify. If not, please close.

0 replies

Jegelewicz · 2022-08-18T15:34:45Z

Jegelewicz
Aug 18, 2022

This kinda seems like a saved search? Which identifiers make something have a "Genome ID"? Create the Arctos wide search, save it with a name, modify it if new identifiers show up, somehow share it from the main search page?

0 replies

campmlc · 2023-01-31T01:29:55Z

campmlc
Jan 31, 2023
Maintainer Author

Can we add a genetics/genomics glad to other ID metadata, even if behind the scene? We'll also need this for other types of IDs, eg isotopic.
And tying the sequence to the part is really important.

0 replies

dustymc · 2023-08-24T21:08:19Z

dustymc
Aug 24, 2023
Maintainer

Closing, this is all irrelevant (identifiers don't need to be typed) or trivial (add a 'whatever thing' AKA to issuers of 'whatever thing' identifiers) in the current model.

0 replies

campmlc · 2023-08-24T21:17:51Z

campmlc
Aug 24, 2023
Maintainer Author

So how do we find all cataloged items with genomic sequence data, when these data may be scattered over multiple repositories with different urls?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arctos DB

Code Table Request - Genome ID #5570

{{title}}

Replies: 36 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Code Table Request - Genome ID #5570

campmlc Jun 9, 2021 Maintainer

Replies: 36 comments

campmlc Jun 9, 2021 Maintainer Author

campmlc Jun 9, 2021 Maintainer Author

campmlc Jun 9, 2021 Maintainer Author

dustymc Jun 10, 2021 Maintainer

dustymc Jun 24, 2021 Maintainer

dustymc Jun 24, 2021 Maintainer

campmlc Jun 24, 2021 Maintainer Author

acdoll Jun 24, 2021 Maintainer

dustymc Jun 25, 2021 Maintainer

campmlc Jun 25, 2021 Maintainer Author

dustymc Jun 25, 2021 Maintainer

campmlc Jun 25, 2021 Maintainer Author

dustymc Jun 25, 2021 Maintainer

dustymc Jun 25, 2021 Maintainer

campmlc Jun 25, 2021 Maintainer Author

dustymc Jun 25, 2021 Maintainer

dustymc Feb 16, 2022 Maintainer

campmlc Jan 31, 2023 Maintainer Author

dustymc Aug 24, 2023 Maintainer

campmlc Aug 24, 2023 Maintainer Author

campmlc
Jun 9, 2021
Maintainer

campmlc
Jun 9, 2021
Maintainer Author

campmlc
Jun 9, 2021
Maintainer Author

campmlc
Jun 9, 2021
Maintainer Author

dustymc
Jun 10, 2021
Maintainer

dustymc
Jun 24, 2021
Maintainer

dustymc
Jun 24, 2021
Maintainer

campmlc
Jun 24, 2021
Maintainer Author

acdoll
Jun 24, 2021
Maintainer

dustymc
Jun 25, 2021
Maintainer

campmlc
Jun 25, 2021
Maintainer Author

dustymc
Jun 25, 2021
Maintainer

campmlc
Jun 25, 2021
Maintainer Author

dustymc
Jun 25, 2021
Maintainer

dustymc
Jun 25, 2021
Maintainer

campmlc
Jun 25, 2021
Maintainer Author

dustymc
Jun 25, 2021
Maintainer

dustymc
Feb 16, 2022
Maintainer

campmlc
Jan 31, 2023
Maintainer Author

dustymc
Aug 24, 2023
Maintainer

campmlc
Aug 24, 2023
Maintainer Author