Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: DwCA-Import - imports existing CEs as new records #4101

Open
AntWeb-org opened this issue Nov 4, 2024 · 4 comments
Open

[Bug]: DwCA-Import - imports existing CEs as new records #4101

AntWeb-org opened this issue Nov 4, 2024 · 4 comments
Labels
bug An existing function is broken.

Comments

@AntWeb-org
Copy link

Steps to reproduce the bug

Imported 20 brand new CO records.
All CE records linked to new COs already exist in the database.
All CE records instead imported as brand new records with 1 use. CEs are not linked to any collecting event or GA data (see below screenshot).
 
I used the following fields for the CE record import since all collecting event and GA data already exists:
-fieldNumber
-TW:Namespace:FieldNumber
-TW:CollectingEvent:verbatim_field_number
-TW:DataAttribute:CollectingEvent:VerbatimCollectionCode

Tested on sandwich first and it worked fine – all CEs linked to existing collecting event and GA data. did not work on production version.

Screenshot

Screenshot 2024-11-04 142630

Expected behavior

When I import CO records using existing CE records, I would expect them to link to all of the existing CE data instead of creating a new CE record.

If I Unify the 2 CE records as a temporary solution (screenshot 1), both field numbers show on Edit (screenshot 2) until I delete one of the identifiers (screenshot 3 and 4). Might need to submit a different ticket for this.

Additional Screenshots

SCREENSHOT 1
image

SCREENSHOT 2
image

SCREENSHOT 3 & 4
image
image

Environment

Production

Sandbox Used

No response

Version

v0.45.0

Browser Used

Chrome Version 130.0.6723.92

@mjy
Copy link
Member

mjy commented Nov 4, 2024

@LocoDelAssembly I'm not sure if we presently match on FieldNumber, probably only EventID?

@LocoDelAssembly
Copy link
Contributor

The importer matches by identifier field of the Identifier (both, eventID and FieldNumber).

Sandwich:

irb(main):005> Identifier::Local::FieldNumber.where(cached: "BLF19068", project_id: 69)
=> 
[#<Identifier::Local::FieldNumber:0x00007fc1a2dbcbd8
  id: 2802131,
  identifier: "19068", # <<< HERE
  type: "Identifier::Local::FieldNumber",
  created_at: Sat, 02 Apr 2022 20:14:18.895292000 UTC +00:00,
  updated_at: Sat, 02 Apr 2022 20:14:18.895292000 UTC +00:00,
  project_id: 69,
  cached: "BLF19068",
  identifier_object_id: 778200,
  identifier_object_type: "CollectingEvent",
  relation: nil,
  position: 1,
  cached_numeric_identifier: nil>]

Production:

irb(main):014> Identifier::Local::FieldNumber.where(cached: "BLF19068")
=> 
[#<Identifier::Local::FieldNumber:0x00007f0a6f5b9110
  id: 4446960,
  identifier: "BLF19068", # <<< HERE
  type: "Identifier::Local::FieldNumber",
  created_at: Wed, 22 Nov 2023 00:48:51.624364000 UTC +00:00,
  updated_at: Mon, 04 Nov 2024 22:33:43.235800000 UTC +00:00,
  namespace_id: 1016,
  project_id: 76,
  cached: "BLF19068",
  identifier_object_id: 553640,
  identifier_object_type: "CollectingEvent",
  relation: nil,
  position: 1,
  cached_numeric_identifier: nil>]

The dataset (assuming it is "Technomyrmex import4"), has non-prefixed values as fieldNumber. Not sure if data above was the initial state since it looks unification tool was already used.

Probably a virtual vs non-virtual namespace? I suppose that it is not actually possible to either create a duplicate FieldNumber on same Namespace (accounting for identifier field only, not cached which is what it is actually displayed), nor having the same identifier on two CEs.

If the importer actually missed an existing CE, it should have failed to create a new one because it would have attempted to duplicate identifiers and would have failed with "already taken".

@AntWeb-org
Copy link
Author

AntWeb-org commented Nov 5, 2024

I did fix add the correct CEs and deleted the duplicates with no associated CO record. And yes, the non-prefixed fieldNumber in the Technomyrmex import4 is how I imported them. It worked on a previous import that way.

I also unified (I think) only 2 CEs and the rest I updated by hand.

@LocoDelAssembly
Copy link
Contributor

Experimented with a backup prior to this import.

Before import:

irb(main):013> Identifier.joins(:namespace).where("cached like '%06693%'").merge(Namespace.where(short_name: 'BLF_AntWeb'))
=> []

After import of second record:

irb(main):014> Identifier.joins(:namespace).where("cached like '%06693%'").merge(Namespace.where(short_name: 'BLF_AntWeb'))
=> 
[#<Identifier::Local::FieldNumber:0x00007f25f2323358
  id: 6769612,
  identifier: "06693",
  type: "Identifier::Local::FieldNumber",
  created_at: Tue, 05 Nov 2024 18:13:07.463666000 UTC +00:00,
  updated_at: Tue, 05 Nov 2024 18:13:07.463666000 UTC +00:00,
  namespace_id: 917,
  created_by_id: 3,
  updated_by_id: 3,
  project_id: 76,
  cached: "BLF06693",
  identifier_object_id: 735688,
  identifier_object_type: "CollectingEvent",
  relation: nil,
  position: 1,
  cached_numeric_identifier: 6693.0>]

When searching by cached (all namespaces):

irb(main):020> Identifier.where(project_id: 76).where(cached: 'BLF06693')
=> 
[#<Identifier::Local::FieldNumber:0x00007f25e7787440
  id: 4292804,
  identifier: "BLF06693",
  type: "Identifier::Local::FieldNumber",
  created_at: Tue, 21 Nov 2023 18:49:53.694374000 UTC +00:00,
  updated_at: Tue, 21 Nov 2023 18:49:53.694374000 UTC +00:00,
  namespace_id: 1016,
  created_by_id: 2478,
  updated_by_id: 2478,
  project_id: 76,
  cached: "BLF06693",
  identifier_object_id: 533914,
  identifier_object_type: "CollectingEvent",
  relation: nil,
  position: 1,
  cached_numeric_identifier: nil>,
 #<Identifier::Local::FieldNumber:0x00007f25e7787300
  id: 4113688,
  identifier: "BLF06693",
  type: "Identifier::Local::FieldNumber",
  created_at: Tue, 21 Nov 2023 07:40:14.091645000 UTC +00:00,
  updated_at: Tue, 21 Nov 2023 07:40:14.091645000 UTC +00:00,
  namespace_id: 1014,
  created_by_id: 2478,
  updated_by_id: 2478,
  project_id: 76,
  cached: "BLF06693",
  identifier_object_id: 516759,
  identifier_object_type: "CollectingEvent",
  relation: nil,
  position: 1,
  cached_numeric_identifier: nil>,
 #<Identifier::Local::FieldNumber:0x00007f25e77871c0 #### JUST CREATED BY IMPORT (SAME AS CODE BLOCK ABOVE)
  id: 6769612,
  identifier: "06693",
  type: "Identifier::Local::FieldNumber",
  created_at: Tue, 05 Nov 2024 18:13:07.463666000 UTC +00:00,
  updated_at: Tue, 05 Nov 2024 18:13:07.463666000 UTC +00:00,
  namespace_id: 917,
  created_by_id: 3,
  updated_by_id: 3,
  project_id: 76,
  cached: "BLF06693",
  identifier_object_id: 735688,
  identifier_object_type: "CollectingEvent",
  relation: nil,
  position: 1,
  cached_numeric_identifier: 6693.0>]
irb(main):021> 

If the importer was supposed to re-use CE with id=533914, there were two problems, one is that Identifier::FieldNumber identifier is fully qualified because the namespace is virtual (see below), so since the fieldNumber is without prefix it doesn't match and second, it was expected to be found in BLF_AntWeb namespace, but instead it was in an importer auto-generated namespace (from a much earlier import).

One thing that confuses me however, is that existing identifiers are in namespaces that were meant to be used for eventID. Was a script run at some point to re-type them as FieldNumber @mjy? The auto-generated namespaces were created before FieldNumber existed.

irb(main):024> Identifier.where(project_id: 76).where(cached: 'BLF06693').map(&:namespace)
=> 
[#<Namespace:0x00007f25e7789ec0
  id: 1016,
  institution: nil,
  name: "eventID namespace for \"specimen formicidae\" dataset in \"AntWeb\" project [d8386021]",
  short_name: "eventID-d8386021",
  created_at: Tue, 21 Nov 2023 16:00:18.539462000 UTC +00:00,
  updated_at: Tue, 10 Sep 2024 21:56:20.426410000 UTC +00:00,
  created_by_id: 2478,
  updated_by_id: 2483,
  verbatim_short_name: "eventID",
  delimiter: ":",
  is_virtual: true>,
 #<Namespace:0x00007f25e7789d80
  id: 1014,
  institution: nil,
  name: "eventID namespace for \"specimen create names\" dataset in \"AntWeb\" project [2a48b12b]",
  short_name: "eventID-2a48b12b",
  created_at: Tue, 21 Nov 2023 07:23:22.255120000 UTC +00:00,
  updated_at: Tue, 10 Sep 2024 21:51:47.133804000 UTC +00:00,
  created_by_id: 2478,
  updated_by_id: 2483,
  verbatim_short_name: "eventID",
  delimiter: ":",
  is_virtual: true>,
 #<Namespace:0x00007f25e7789c40
  id: 917,
  institution: "Brian L. Fisher",
  name: "Brian L. Fisher [AntWeb]",
  short_name: "BLF_AntWeb",
  created_at: Mon, 20 Nov 2023 23:29:41.348662000 UTC +00:00,
  updated_at: Mon, 20 Nov 2023 23:29:41.348662000 UTC +00:00,
  created_by_id: 2478,
  updated_by_id: 2478,
  verbatim_short_name: "BLF",
  delimiter: "NONE",
  is_virtual: false>]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An existing function is broken.
Projects
None yet
Development

No branches or pull requests

3 participants