-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: DwCA-Import - imports existing CEs as new records #4101
Comments
@LocoDelAssembly I'm not sure if we presently match on FieldNumber, probably only EventID? |
The importer matches by identifier field of the Identifier (both, eventID and FieldNumber). Sandwich: irb(main):005> Identifier::Local::FieldNumber.where(cached: "BLF19068", project_id: 69)
=>
[#<Identifier::Local::FieldNumber:0x00007fc1a2dbcbd8
id: 2802131,
identifier: "19068", # <<< HERE
type: "Identifier::Local::FieldNumber",
created_at: Sat, 02 Apr 2022 20:14:18.895292000 UTC +00:00,
updated_at: Sat, 02 Apr 2022 20:14:18.895292000 UTC +00:00,
project_id: 69,
cached: "BLF19068",
identifier_object_id: 778200,
identifier_object_type: "CollectingEvent",
relation: nil,
position: 1,
cached_numeric_identifier: nil>] Production: irb(main):014> Identifier::Local::FieldNumber.where(cached: "BLF19068")
=>
[#<Identifier::Local::FieldNumber:0x00007f0a6f5b9110
id: 4446960,
identifier: "BLF19068", # <<< HERE
type: "Identifier::Local::FieldNumber",
created_at: Wed, 22 Nov 2023 00:48:51.624364000 UTC +00:00,
updated_at: Mon, 04 Nov 2024 22:33:43.235800000 UTC +00:00,
namespace_id: 1016,
project_id: 76,
cached: "BLF19068",
identifier_object_id: 553640,
identifier_object_type: "CollectingEvent",
relation: nil,
position: 1,
cached_numeric_identifier: nil>] The dataset (assuming it is "Technomyrmex import4"), has non-prefixed values as fieldNumber. Not sure if data above was the initial state since it looks unification tool was already used. Probably a virtual vs non-virtual namespace? I suppose that it is not actually possible to either create a duplicate FieldNumber on same Namespace (accounting for If the importer actually missed an existing CE, it should have failed to create a new one because it would have attempted to duplicate identifiers and would have failed with "already taken". |
I did fix add the correct CEs and deleted the duplicates with no associated CO record. And yes, the non-prefixed fieldNumber in the Technomyrmex import4 is how I imported them. It worked on a previous import that way. I also unified (I think) only 2 CEs and the rest I updated by hand. |
Experimented with a backup prior to this import. Before import: irb(main):013> Identifier.joins(:namespace).where("cached like '%06693%'").merge(Namespace.where(short_name: 'BLF_AntWeb'))
=> [] After import of second record: irb(main):014> Identifier.joins(:namespace).where("cached like '%06693%'").merge(Namespace.where(short_name: 'BLF_AntWeb'))
=>
[#<Identifier::Local::FieldNumber:0x00007f25f2323358
id: 6769612,
identifier: "06693",
type: "Identifier::Local::FieldNumber",
created_at: Tue, 05 Nov 2024 18:13:07.463666000 UTC +00:00,
updated_at: Tue, 05 Nov 2024 18:13:07.463666000 UTC +00:00,
namespace_id: 917,
created_by_id: 3,
updated_by_id: 3,
project_id: 76,
cached: "BLF06693",
identifier_object_id: 735688,
identifier_object_type: "CollectingEvent",
relation: nil,
position: 1,
cached_numeric_identifier: 6693.0>] When searching by cached (all namespaces): irb(main):020> Identifier.where(project_id: 76).where(cached: 'BLF06693')
=>
[#<Identifier::Local::FieldNumber:0x00007f25e7787440
id: 4292804,
identifier: "BLF06693",
type: "Identifier::Local::FieldNumber",
created_at: Tue, 21 Nov 2023 18:49:53.694374000 UTC +00:00,
updated_at: Tue, 21 Nov 2023 18:49:53.694374000 UTC +00:00,
namespace_id: 1016,
created_by_id: 2478,
updated_by_id: 2478,
project_id: 76,
cached: "BLF06693",
identifier_object_id: 533914,
identifier_object_type: "CollectingEvent",
relation: nil,
position: 1,
cached_numeric_identifier: nil>,
#<Identifier::Local::FieldNumber:0x00007f25e7787300
id: 4113688,
identifier: "BLF06693",
type: "Identifier::Local::FieldNumber",
created_at: Tue, 21 Nov 2023 07:40:14.091645000 UTC +00:00,
updated_at: Tue, 21 Nov 2023 07:40:14.091645000 UTC +00:00,
namespace_id: 1014,
created_by_id: 2478,
updated_by_id: 2478,
project_id: 76,
cached: "BLF06693",
identifier_object_id: 516759,
identifier_object_type: "CollectingEvent",
relation: nil,
position: 1,
cached_numeric_identifier: nil>,
#<Identifier::Local::FieldNumber:0x00007f25e77871c0 #### JUST CREATED BY IMPORT (SAME AS CODE BLOCK ABOVE)
id: 6769612,
identifier: "06693",
type: "Identifier::Local::FieldNumber",
created_at: Tue, 05 Nov 2024 18:13:07.463666000 UTC +00:00,
updated_at: Tue, 05 Nov 2024 18:13:07.463666000 UTC +00:00,
namespace_id: 917,
created_by_id: 3,
updated_by_id: 3,
project_id: 76,
cached: "BLF06693",
identifier_object_id: 735688,
identifier_object_type: "CollectingEvent",
relation: nil,
position: 1,
cached_numeric_identifier: 6693.0>]
irb(main):021> If the importer was supposed to re-use CE with id=533914, there were two problems, one is that Identifier::FieldNumber One thing that confuses me however, is that existing identifiers are in namespaces that were meant to be used for eventID. Was a script run at some point to re-type them as FieldNumber @mjy? The auto-generated namespaces were created before FieldNumber existed. irb(main):024> Identifier.where(project_id: 76).where(cached: 'BLF06693').map(&:namespace)
=>
[#<Namespace:0x00007f25e7789ec0
id: 1016,
institution: nil,
name: "eventID namespace for \"specimen formicidae\" dataset in \"AntWeb\" project [d8386021]",
short_name: "eventID-d8386021",
created_at: Tue, 21 Nov 2023 16:00:18.539462000 UTC +00:00,
updated_at: Tue, 10 Sep 2024 21:56:20.426410000 UTC +00:00,
created_by_id: 2478,
updated_by_id: 2483,
verbatim_short_name: "eventID",
delimiter: ":",
is_virtual: true>,
#<Namespace:0x00007f25e7789d80
id: 1014,
institution: nil,
name: "eventID namespace for \"specimen create names\" dataset in \"AntWeb\" project [2a48b12b]",
short_name: "eventID-2a48b12b",
created_at: Tue, 21 Nov 2023 07:23:22.255120000 UTC +00:00,
updated_at: Tue, 10 Sep 2024 21:51:47.133804000 UTC +00:00,
created_by_id: 2478,
updated_by_id: 2483,
verbatim_short_name: "eventID",
delimiter: ":",
is_virtual: true>,
#<Namespace:0x00007f25e7789c40
id: 917,
institution: "Brian L. Fisher",
name: "Brian L. Fisher [AntWeb]",
short_name: "BLF_AntWeb",
created_at: Mon, 20 Nov 2023 23:29:41.348662000 UTC +00:00,
updated_at: Mon, 20 Nov 2023 23:29:41.348662000 UTC +00:00,
created_by_id: 2478,
updated_by_id: 2478,
verbatim_short_name: "BLF",
delimiter: "NONE",
is_virtual: false>] |
Steps to reproduce the bug
Screenshot
Expected behavior
When I import CO records using existing CE records, I would expect them to link to all of the existing CE data instead of creating a new CE record.
If I Unify the 2 CE records as a temporary solution (screenshot 1), both field numbers show on Edit (screenshot 2) until I delete one of the identifiers (screenshot 3 and 4). Might need to submit a different ticket for this.
Additional Screenshots
SCREENSHOT 1
SCREENSHOT 2
SCREENSHOT 3 & 4
Environment
Production
Sandbox Used
No response
Version
v0.45.0
Browser Used
Chrome Version 130.0.6723.92
The text was updated successfully, but these errors were encountered: