Code Table Request - Genome ID #5570
Replies: 36 comments
-
You know what would be nice? To have this ID be magically populated by any other "genome" ID that gets added....OR maybe we just need a flag in the code table "this other ID is a genome" so that anyone could search across all of them? Sorry to throw a wrench in! None of the above makes this addition a bad idea - just thinking that it could be magical.... |
Beta Was this translation helpful? Give feedback.
-
I absolutely agree with a "genome" flag . . . possible? Or should we just
move forward with this for now to make something that can work.
…On Wed, Jun 9, 2021 at 3:31 PM Teresa Mayfield-Meyer < ***@***.***> wrote:
* [EXTERNAL]*
You know what would be nice? To have this ID be magically populated by any
other "genome" ID that gets added....OR maybe we just need a flag in the
code table "this other ID is a genome" so that anyone could search across
all of them?
Sorry to throw a wrench in! None of the above makes this addition a bad
idea - just thinking that it could be magical....
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#3652 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBBO3LA3NNOBOBAQBJLTR7MURANCNFSM46M2QEGQ>
.
|
Beta Was this translation helpful? Give feedback.
-
But we still need a way in the interface to search for "genomic data"
On Wed, Jun 9, 2021 at 3:32 PM Mariel Campbell ***@***.***>
wrote:
… I absolutely agree with a "genome" flag . . . possible? Or should we just
move forward with this for now to make something that can work.
On Wed, Jun 9, 2021 at 3:31 PM Teresa Mayfield-Meyer <
***@***.***> wrote:
> * [EXTERNAL]*
>
> You know what would be nice? To have this ID be magically populated by
> any other "genome" ID that gets added....OR maybe we just need a flag in
> the code table "this other ID is a genome" so that anyone could search
> across all of them?
>
> Sorry to throw a wrench in! None of the above makes this addition a bad
> idea - just thinking that it could be magical....
>
> —
> You are receiving this because you were assigned.
> Reply to this email directly, view it on GitHub
> <#3652 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ADQ7JBBO3LA3NNOBOBAQBJLTR7MURANCNFSM46M2QEGQ>
> .
>
|
Beta Was this translation helpful? Give feedback.
-
I think the ID proposed in the issue would do that IF it is consistently applied (EVERY record with a current GenBank ID ALSO gets one of these). Which seems like duplication of effort. AND people searching KNOW to search for that particular OtherID, which is highly unlikely. If we can just flag otherIDs in the code table as "genomic", then the work is done for us and IDs only need to be recorded once. Add "only search records with genomic identifiers" (like the require tissues button) and you get what you want. |
Beta Was this translation helpful? Give feedback.
-
Can we put "Find all records with tissues", "Find all records with genomic
data", "Find all records with sequence data" into some obvious search
place, like in the Catalog Record box on search, but visible without "show
more options"? Not just a tiny little check box hiding at top of page only
for people who know where to look?
…On Wed, Jun 9, 2021 at 3:41 PM Teresa Mayfield-Meyer < ***@***.***> wrote:
* [EXTERNAL]*
But we still need a way in the interface to search for "genomic data"
I think the ID proposed in the issue would do that IF it is consistently
applied (EVERY record with a current GenBank ID ALSO gets one of these).
Which seems like duplication of effort. AND people searching KNOW to search
for that particular OtherID, which is highly unlikely. If we can just flag
otherIDs in the code table as "genomic", then the work is done for us and
IDs only need to be recorded once. Add "only search records with genomic
identifiers" (like the require tissues button) and you get what you want.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#3652 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBDBTWOFKRG2XPXBJN3TR7NXFANCNFSM46M2QEGQ>
.
|
Beta Was this translation helpful? Give feedback.
-
The objectives are not clear, or perhaps have shifted. I'm not sure if this is a UI issue or a data issue. One of the proposed solutions is not consistent with #3593, while it seems that the data are mostly identical (there's an external resource of a certain type but in no particular place or format indicating a particular type of usage). https://www.ncbi.nlm.nih.gov/genome/ exists but I have no idea how it ties in here. I am adamantly opposed to any denormalization. "EVERY record with a current GenBank ID ALSO gets one of these" will simply not happen, cannot be necessary, and inevitably results in users finding only partial datasets. |
Beta Was this translation helpful? Give feedback.
-
I agree with having a flag that tags individuals with genetic data. I do not want "genomic id" as an ID since that is so vague. I know it adds more on the id list but I want "Genbank", "NCBI BioSample", "BoLD", "Sequence Read Archive", and all the future ways they identify genetic information on outside databases. |
Beta Was this translation helpful? Give feedback.
-
https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#genbank
https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#biosample
https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#bold_barcode_id
One possibly stupid idea: group those by adding some common prefix ("GenBank" becomes "genetic junk: GenBank"). We've done something similar with other data (eg https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#cmnh__carnegie_museum_of_natural_history), so that's not an entirely new flavor of weird. The search is (and probably will remain) a select multiple, users can just pick all options they're interested in. (They can do that now, but they're scattered out.) Alternate maybe equally stupid idea: The code table has a sort order column, it could also group those things - but the not-so-alphabetical sort makes me twitchy. |
Beta Was this translation helpful? Give feedback.
-
Oh, I know they already exist! And we use them! From what I'm understanding from the discussion is that they want to get rid of those for a "Genomic ID" identifier to make searching for the data easier. I prefer the more descriptive identifiers. |
Beta Was this translation helpful? Give feedback.
-
Oh - yea, that would make things like creating the reciprocals on genbank somewhere between painful and impossible, I'm not a fan. |
Beta Was this translation helpful? Give feedback.
-
Radical idea - add a column to the code table, "Other ID group". I bet that there are other things that could be grouped together for purposes like this. For instance, MSB could group NK with all of their other "MSB" type identifiers. |
Beta Was this translation helpful? Give feedback.
-
I agree we need some way to " require genomes" in the same way we " require
tissues" or find vouchers.
…On Thu, Jun 24, 2021, 3:33 PM Teresa Mayfield-Meyer < ***@***.***> wrote:
* [EXTERNAL]*
Alternate maybe equally stupid idea: The code table has a sort order
column, it could also group those things - but the not-so-alphabetical sort
makes me twitchy.
Radical idea - add a column to the code table, "Other ID group". I bet
that there are other things that could be grouped together for purposes
like this. For instance, MSB could group NK with all of their other "MSB"
type identifiers.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#3652 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBHQ5T7DIWA7XMU4TZLTUOB73ANCNFSM46M2QEGQ>
.
|
Beta Was this translation helpful? Give feedback.
-
I like this idea. That Other Identifier Type list is getting pretty unwieldly (fortunately most of mine are near the top). |
Beta Was this translation helpful? Give feedback.
-
I would skip this and just set up the institutional groups.
defeats the purpose of putting all of the "genome" ids together but maybe we need to be able to assign IDs to multiple groups? Are we going overboard there? |
Beta Was this translation helpful? Give feedback.
-
Given the uses of this, how's that functionally different than an embedded prefix? Or are there uses beyond "pick from the list..."? Are we saying that identifier types are somehow data objects in their own right, or is this some UI-thing, or ????
I'd not lump field number in there - it's (usually) for a different kind of thing (lot, sorta-I-think, rather than item).
I'm not seeing clear categories in the data, adding arbitrary classifications seems like it would just add confusion. "This is an MSB number" and it's attached to a DMNS record and users pull their hair out and run away screaming..... |
Beta Was this translation helpful? Give feedback.
-
It would be good if identifiers had determiners and dates and remarks, for
example, to tie a GenBank number to a particular citation. Or even better,
if we could tie the GenBank number to the actual tissue part that was
sampled - which means cataloging MaterialSamples
…On Fri, Jun 25, 2021 at 10:11 AM Teresa Mayfield-Meyer < ***@***.***> wrote:
* [EXTERNAL]*
Which is maybe this?
identifier types are somehow data objects in their own right
I believe so, but I could be convinced that I am wrong
I think we are asking a lot of these things and maybe we should be looking
at them as more complex entities than we do now. See also #2847
<#2847> #2216
<#2216> #1902
<#1902> and maybe some I am
missing?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3652 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBDEXYB3SEIWQ6FSER3TUSTBPANCNFSM46M2QEGQ>
.
|
Beta Was this translation helpful? Give feedback.
-
Some of those discussions are in regards to assertions that use the types, and some in regards to the types themselves. I think those are very different things (think taxonomy vs. identification) and that those discussions should not be confounded with each other, but I'm also open to the idea that we should be doing something radically different. That's probably best discussed in a new/dedicated issue, BUT.... I'm sorta wondering if we need types at all. GenBank number (and maybe lots more) isn't necessarily a homogeneous thing, it's just a common place (url, API endpoint, format, etc. - maybe those are attributes of the type after all....) to store a fairly broad category of data. I'm not sure that any label we apply to the type (and so to all assertions using the type) can be adequate. Maybe we need some way to say "the data are at GenBank, and this is a [mitochondrial | whole genome | whatever] sequence" or "this NK number is a squirrel that some grad student ran over and dumped in the local museum, it has nothing to do with MSB or New Mexico or karyotypes." That discussion really needs to start with big-picture goals; I'm not sure that sniping at the current model is going to lead anywhere useful. What, not how (for now), do we want to do with identifiers?
No.... |
Beta Was this translation helpful? Give feedback.
-
I think a flag would be ideal plus a heck of a lot simpler than grouping all the different identifiers we have. The point is to be able to search for specimens with genetic data (has a Genbank, BioSample, BoLD number, etc.) without having to select all the different IDs under Other ID. @campmlc Though are you wanting to find records that just have GENOMIC data (whole genome sequencing) or any genetic data (partial cyt_b_)? |
Beta Was this translation helpful? Give feedback.
-
The original request was to be able to find specimens with whole or partial
genomes. Whatever tool could also be used to flag specimens that have CT
scans, for example, or other future data categories that may have multiple
different identifiers or urls to related repositories.
…On Fri, Jun 25, 2021 at 11:42 AM Kyndall Hildebrandt < ***@***.***> wrote:
* [EXTERNAL]*
I think a flag would be ideal plus a heck of a lot simpler than grouping
all the different identifiers we have. The point is to be able to search
for specimens with genetic data (has a Genbank, BioSample, BoLD number,
etc.) without having to select all the different IDs under Other ID.
@campmlc <https://github.com/campmlc> Though are you wanting to find
records that just have GENOMIC data (whole genome sequencing) or any
genetic data (partial cyt_b_)?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3652 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBCLCNRRWQZFMMUQMS3TUS5YXANCNFSM46M2QEGQ>
.
|
Beta Was this translation helpful? Give feedback.
-
That could be "just UI" - it's not great (updating the list used in the query and updating the identifiers would be completely separate, for example) but I think it's workable.
|
Beta Was this translation helpful? Give feedback.
-
"whole or partial genomes" so in my opinion, this request would exclude Genbank numbers. Yes or no? "just UI" makes it sound easy. |
Beta Was this translation helpful? Give feedback.
-
Yep, I'd just need
|
Beta Was this translation helpful? Give feedback.
-
The idea was that a genome flag would be distinct from a GenBank flag. But
if we have the option of a variety of flags, we could have an "ncbi" flag
or even more specific - nucleotide, protein, or even specific gene, cytb or
CO1.
We could have "any genetic info" flag . . .
That is, if it is easy to do this in the UI
…On Fri, Jun 25, 2021 at 11:58 AM Kyndall Hildebrandt < ***@***.***> wrote:
* [EXTERNAL]*
"whole or partial genomes" so in my opinion, this request would exclude
Genbank numbers. Yes or no?
"just UI" makes it sound easy.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3652 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBBE6WJ3ZB46PEDWX3TTUS7UXANCNFSM46M2QEGQ>
.
|
Beta Was this translation helpful? Give feedback.
-
That's not UI, that's something on the order of #3652 (comment) |
Beta Was this translation helpful? Give feedback.
-
see #3630 (comment) |
Beta Was this translation helpful? Give feedback.
-
Maybe this is better addressed in #4101? #3630 is definitely related. Both seem abandoned. If there's something actionable in this, please clarify. If not, please close. |
Beta Was this translation helpful? Give feedback.
-
This kinda seems like a saved search? Which identifiers make something have a "Genome ID"? Create the Arctos wide search, save it with a name, modify it if new identifiers show up, somehow share it from the main search page? |
Beta Was this translation helpful? Give feedback.
-
Can we add a genetics/genomics glad to other ID metadata, even if behind the scene? We'll also need this for other types of IDs, eg isotopic. |
Beta Was this translation helpful? Give feedback.
-
Closing, this is all irrelevant (identifiers don't need to be typed) or trivial (add a 'whatever thing' AKA to issuers of 'whatever thing' identifiers) in the current model. |
Beta Was this translation helpful? Give feedback.
-
So how do we find all cataloged items with genomic sequence data, when these data may be scattered over multiple repositories with different urls? |
Beta Was this translation helpful? Give feedback.
-
[ Code Table Documentation is https://handbook.arctosdb.org/how_to/How-to-Use-Code-Tables.html ]
Goal
Make it possible to find genomes through a search of OtherIDs = Genome ID
Context
The genomics research community has no centralized repository for whole genomes, and currently genome data may be entered and accessible through a variety of differernt portals with differing levels of consistency and permanency in their urls. These include NCBI Assemblies, Biosamples, and other resources. Quotes from researchers asked about this:
"NCBI is a pain, but if I were to be searching for a reference genome I would search in the assemblies database as these are unique to an individual sample and experiment. "
"I'd used NCBI Assembly, NCBI BioSample, and NCBI BioProject as key terms for NCBI-associated genomic data. Honestly I archive my data with NCBI through SRA, but I use ENA to query/search for genomes and they use "Study", "Experiment", "Run", "Submission", "Accession", and "Taxon" IDs to identify genomes. You could integrate those labels as "ENA Study #", "ENA Experiment #" etc. or just link to "Genomic reads" or "Complete or partial genome assemblies". Raw reads are typically more valuable for reproducing or extending genomic research, whereas assembled genomes are used for reference-guided mapping assemblies. NCBI SRA numbers are included in ENA as "Submission" IDs. Here's an example and the reads for that example."
Given the current confusion, Arctos could provide identifiers for each of these links independently, but a researcher would have to know a priori which to search on or search for an increasingly longer list of potential urls.
We should certainly add these as OtherIDs - later issue.
But this request is to add an identifier = Genome ID where any possible link to genomic data could be entered, and which would allow researchers to search on a single identifier to locate any possible genomic info across a variety of platforms.
This would have to be free -text, and of course prone to error, which is why adding the other identifiers with real linkable urls to the record is advisable. This ID is primarily a search tool or flag that such info exists.
Table
https://arctos.database.museum/Admin/CodeTableEditor.cfm?action=editCollOIDT&tbl=ctcoll_other_id_type]
Value
Genome ID
Definition
An identifier, preferably a url, which references the external repository for genomic data for this record.
Collection type
Mamm, Bird, Herp, Amph, Rept, Fish, Ento, Inv, Para, Env, Herb, Mala, Zoo
Attribute data type
free text
Part tissue flag
yes
Priority
Very High
Beta Was this translation helpful? Give feedback.
All reactions