Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Person model, add "name" attribute [Was: [Bug]: Unable to put initials only for identifiedBy when using DwC-A import] #4063

Open
creplog opened this issue Sep 20, 2024 · 8 comments
Labels
enhancement Suggest an improvement to an existing function. model Issue pertains to an app/model.

Comments

@creplog
Copy link

creplog commented Sep 20, 2024

Steps to reproduce the bug

1. Have .csv file containing Collection objects, including the field identifiedBy 
2. For a given collection object, populate identifiedBy with only person's initials (example: P., A.V.). The collection object's determiner only put their initials on the label (screenshot 1)
3. Use the DwC-A import workbench and upload .csv file 
4. import failed for collection object with initials only for identifiedBy. Status: Error last_name can't be blank (screenshot of bench with error included)
5. I tried to edit and rewrite the initials for the collection object within the created import workbench UI, record still errors
6. I can create a Person record with only initials (shown in last screenshot), but when I try to connect it to a record, it will not save (I saved the record after this screenshot, and then reloaded the page, and the determiner P., A.V. I added was not present)
...

Screenshot

image
image
image
image
image
image

Expected behavior

No response

Additional Screenshots

No response

Environment

Production

Sandbox Used

No response

Version

v0.44.0

Browser Used

firefox

@creplog creplog added the bug An existing function is broken. label Sep 20, 2024
@LocoDelAssembly
Copy link
Contributor

Looks the name is parsable as-is:

3.3.4 :001 > DwcAgent.parse("P., A.V.")
 => [#<struct Namae::Name family="P.", given="A.V.", suffix=nil, particle=nil, dropping_particle=nil, nick=nil, appellation=nil, title=nil>] 
3.3.4 :002 > DwcAgent.parse("D., C.J.")
 => [#<struct Namae::Name family="D.", given="C.J.", suffix=nil, particle=nil, dropping_particle=nil, nick=nil, appellation=nil, title=nil>] 
3.3.4 :003 > 

Maybe recordedBy is the problem? Can you share the entire contents of the offending dataset row?

@LocoDelAssembly
Copy link
Contributor

Completely unparsable text result is field be interpreted as blank. Not sure if we wan to change that at the expense of more frequent errored records?

I couldn't reproduce the problem with P., A.V. nor D., C.J., it imports fine for me in local env.

@LocoDelAssembly
Copy link
Contributor

Sorry, I can actually reproduce the problem!

The importer after parsing the name it also cleans it with this third-party code: https://github.com/bionomia/dwc_agent/blob/6c87e49ff877afdf9fddffd21c0794e9acec719c/lib/dwc_agent/cleaner.rb#L27

    # Cleans the passed-in namae object from the parse method and
    # re-organizes it to better match expected Darwin Core output.
    #
    # @param parsed_namae [Namae::Name] a Namae object
    # @return Namae::Name [Object] a new Namae object

I don't feel confident just removing this cleaner (@mjy?). The minimum requirement for parsed names is that the family name be complete, given names can be just initials.

@mjy
Copy link
Member

mjy commented Sep 20, 2024

@LocoDelAssembly Right, there is much more benefit to keep dwc_agent in the loop. The real long-term solution is to include a name field for Person and use that when we can't reconstruct a parsing.

@dshorthouse
Copy link
Contributor

Should it help, a newer version of dwc_agent has a utility method you might use to check if a parsed string once cleaned produces all nil attributes:

if cleaned_name != DwcAgent.default
   # do something otherwise store the unparsed input
end

@mjy mjy added enhancement Suggest an improvement to an existing function. model Issue pertains to an app/model. and removed bug An existing function is broken. labels Oct 30, 2024
@mjy mjy changed the title [Bug]: Unable to put initials only for identifiedBy when using DwC-A import Person model add "name" attribute [Was: [Bug]: Unable to put initials only for identifiedBy when using DwC-A import] Oct 30, 2024
@mjy mjy changed the title Person model add "name" attribute [Was: [Bug]: Unable to put initials only for identifiedBy when using DwC-A import] Person model, add "name" attribute [Was: [Bug]: Unable to put initials only for identifiedBy when using DwC-A import] Oct 30, 2024
@mjy
Copy link
Member

mjy commented Oct 30, 2024

There are many consequences of adding name to Person, it essentially sub-classes our understanding and adds a level of detection and parsing needs to the UI.

  • People with name but not parsed values will interfere with CSL rendering, we'll have to make this clear
  • UI task that help to break down Name into components when possible

@dshorthouse
Copy link
Contributor

dshorthouse commented Oct 30, 2024

Indeed, there's a delicate balancing act to accommodate storage, rendering, and search while not confusing users when all three may have competing needs or rules. For what it's worth, it appears Wikidata has also grappled with this while they also layer on the challenge of localization and language. See https://www.wikidata.org/wiki/Help:Default_values_for_labels_and_aliases as their proposal to use mul for "multiple languages" (whatever that means). There is an admitted mess of terminology and logic in Bionomia (mostly hidden from users) that does consist of oddities like "fullname", "fullname_reverse", and "label" in addition to the parsed bits. The purpose of these is to help tinker & refine the search index rather than to maintain any sort of rigour in the semantics of data storage, which is bound to fail because of all the exceptions. Perhaps it might help to consider how far you can push your use cases through combinations of index or query analyzers while minimizing the impact on the backend design. If however you don't use elasticsearch in Taxonworks, then ignore all this and file it in the "maybe" folder.

@dshorthouse
Copy link
Contributor

dshorthouse commented Oct 31, 2024

You may not need any more rationale for a "name" attribute in your Person model, but here is an ORCID profile that was just recently added to Bionomia, https://orcid.org/0000-0002-5373-2585. It looks as if the journals in which they and their colleagues have published their work have attempted various gymnastics to accommodate monomynous names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Suggest an improvement to an existing function. model Issue pertains to an app/model.
Projects
None yet
Development

No branches or pull requests

4 participants