Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align to the production UN Vocab #536

Closed
nissimsan opened this issue Aug 18, 2022 · 13 comments
Closed

Align to the production UN Vocab #536

nissimsan opened this issue Aug 18, 2022 · 13 comments
Assignees

Comments

@nissimsan
Copy link
Collaborator

The UN CEFACT LD vocab should be bumped to version 1.0, expected during fall of 2022.

The main updates which will be required to be update on our side include:

I advise that we don't start this work until the UN vocab is published.

@nissimsan
Copy link
Collaborator Author

@nissimsan
Copy link
Collaborator Author

@VladimirAlexiev, @brownoxford, FYI ^

@BenjaminMoe
Copy link
Contributor

@nissimsan any updates?

@nissimsan
Copy link
Collaborator Author

Yes, it's basically done. we're currently waiting for the vocabulary.uncefact.org DNS to propagate. It takes oddly long, something must have gone wrong.

This is what v1 will look like, though: dmvc7xzscpizo.cloudfront.net

@nissimsan nissimsan changed the title Align to UN vocab v1.0 Align to the production UN Vocab Dec 6, 2022
@nissimsan
Copy link
Collaborator Author

The much improved production vocabulary.uncefact.org is live now. We should switch our pointers from the draft URIs to this.
For example:
https://service.unece.org/trade/uncefact/vocabulary/uncefact/#consigneeParty
should be changed to
https://vocabulary.uncefact.org/consigneeParty

@nissimsan
Copy link
Collaborator Author

We did this!

@mgh128
Copy link

mgh128 commented Oct 29, 2023

@nissimsan - re "We did this!", I'm wondering what exactly we did.

I've just been looking at https://vocabulary.uncefact.org/UnitMeasureCode

Hyperlinks such as https://vocabulary.uncefact.org/UnitMeasureCode#KGM go nowhere / provide no further details and I don't see any details about conversion factors when I check the source code for the page.

I also tried reloading the page after setting the HTTP header Accept: application/ld+json but that just produced a 404 error page with this rather unfriendly message:

404 Not Found
Code: NoSuchKey
Message: The specified key does not exist.
Key: UnitMeasureCode.jsonld
RequestId: KQDKKWY3RJQ844FQ
HostId: CFF2nAkAhmrgaZgiqdJx8hrd7djYmRIsn9XamQ/2YY3MrwyjNkyG1tt43OxWM+p5kbp9hyRNIiY=

Conversion factors are still present in the older JSON-LD file at https://service.unece.org/trade/uncefact/vocabulary/rec20.jsonld

However that does not use the 2-3 character alphanumeric codes for its @id values, so you can find details for https://service.unece.org/trade/uncefact/vocabulary/rec20#kilogram but not for a URI ending in /KGM (or #KGM, though ideally /KGM )

In comparison, https://qudt.org/vocab/unit/KiloGM provides plenty of data about kilograms and a triple that links via qudt:uneceCommonCode to "KGM" and it would be even better if each UN ECE Rec20 unit code had a corresponding URI such as https://vocabulary.uncefact.org/UnitMeasureCode/KGM that provided similar information about conversion factors, so that QUDT could link to such a Web URI within https://vocabulary.uncefact.org rather than a dumb string such as "KGM".

@nissimsan
Copy link
Collaborator Author

Hi @mgh128,

What we did was switch from the draft to production UN/CEFACT term definitions. (#726) So we now reference for example https://vocabulary.uncefact.org/consigneeParty.

Good catch that the conversion factors are now missing from https://vocabulary.uncefact.org/UnitMeasureCode#KGM. Clearly that data has been available, so we must have dropped it along the way. Note that this is work done on the UN side, not on this repo. I will bring it up with the team - your clear requirements is a great help.

I agree completely that the QUDT should link with a real URI. Might that be something you can bring up there, changing KGM to https://vocabulary.uncefact.org/UnitMeasureCode#KGM?

@mgh128
Copy link

mgh128 commented Oct 30, 2023

Hi @nissimsan

Many thanks in advance for alerting the UN CEFACT team about the missing conversion factors. Unlike many other code lists, the code list(s) for unit of measure do require more than a code value and a description - so either there should be more 'columns' in the displayed table - or clicking on a link such as https://vocabulary.uncefact.org/UnitMeasureCode#KGM would result in a different page view with further details (including conversion factors) or perhaps expand an 'accordion' (e.g. using HTML <details> and <summary> to show further details without switching to a different page view).

I also noticed that within the list of code lists at https://vocabulary.uncefact.org/code-lists there is not only the main unit of measure code list https://vocabulary.uncefact.org/UnitMeasureCode but also some additional code lists such as:

unece:AirFlowUnitMeasureCode
unece:DurationUnitMeasureCode
unece:FileSizeUnitMeasureCode
unece:LinearUnitMeasureCode
unece:TemperatureUnitMeasureCode
unece:VolumeUnitMeasureCode
unece:WeightUnitMeasureCode

Unfortunately, this means that a unit of measure such as KGM for kilogram now appears in more than one code list, e.g.

unece:WeightUnitMeasureCode#KGM
AND
unece:UnitMeasureCode#KGM

or

unece:LinearUnitMeasureCode#MTR
AND
unece:UnitMeasureCode#MTR

Furthermore, the specialised code lists for unit of measure (e.g. https://vocabulary.uncefact.org/LinearUnitMeasureCode ) do not contain the complete set of units for that dimension or type of measurement.
For example, https://vocabulary.uncefact.org/LinearUnitMeasureCode includes values for centimetre (CMT), foot (FOT), inch (INH) and metre (MTR) but excludes values for yard (YRD), millimetre (MMT), micrometre (micron) (4H), kilometre (KMT), nautical mile (NMI), etc.

Similarly, https://vocabulary.uncefact.org/TemperatureUnitMeasureCode includes code values for degree Celsius (CEL) and degree Fahrenheit (FAH) but does not even use the SI base unit - kelvin (KEL), which only appears in the main code list as unece:UnitMeasureCode#KEL

I hope that you can also raise this issue with the UN CEFACT team.

Of course I'd be happy to discuss with QUDT folks and prepare a pull request when we've agreed what the QUDT property should be for pointing to corresponding Web URIs based on https://vocabulary.uncefact.org/UnitMeasureCode as a URI stem - but before I spend any time on that, I'd want to see https://vocabulary.uncefact.org/UnitMeasureCode updated to show the conversion factors that are already present in the older dataset at https://service.unece.org/trade/uncefact/vocabulary/rec20.jsonld
and I'd also like to see a visible link to the RDF dataset (in Turtle and JSON-LD) for https://vocabulary.uncefact.org/UnitMeasureCode and/or have content negotiation working rather than generating a 404 Page Not Found error.

If I could actually see the RDF dataset behind https://vocabulary.uncefact.org/UnitMeasureCode then I could (1) easily detect whether the conversion factors are missing from the dataset or just not shown in the user interface and (2) offer to add the potentially missing triples for conversion factors to the dataset (using a SPARQL query using that dataset and https://service.unece.org/trade/uncefact/vocabulary/rec20.jsonld as the data sources).

We certainly appreciate the efforts of your team and the UN CEFACT team in making the code list for units of measure finally available as Linked Data rather than just an Excel spreadsheet and with the suggested improvements noted above, I think it will be a useful resource for everyone, including everyone in the GS1 community.

@nissimsan
Copy link
Collaborator Author

nissimsan commented Oct 31, 2023

Excellent @mgh128 - cheers!

We can confirm the conversion factors were missing as we switched from Excel to a newer JSON Schema data source. The issue above is the first step, getting it included from upstream.

The term duplication you point out is the result of how the source data is modeled; using endless extensions rather than inheritance. This has been the main challenge of the project, there were no way around case-by-case decisions and rules.

Tagging @kshychko ref. conversation on slack yesterday. There are two things here: a) adding conversions, b) fixing duplicates. The former has a dependency and is IMO most critical as we need those conversions no matter how and when we might change modeling in the future.

@mgh128, zooming out, I can help pondering if the world actually needs two code lists. UN/CEFACT has traditionally liberally defined everything. In the modern world, this has led to significant term duplication which is an anti-pattern (my opinion). Units seems like another case of this, and as much as I love and am proud of the QUDT-UN cross-linking I feel like in an ideal world the UN would just adopt all QUDT's terms where there is overlap. I'm curious if you see any arguments against this - is there a reason why the world needs both? And how should I be thinking about choosing a QUDT over UN unit URI?

@mgh128
Copy link

mgh128 commented Oct 31, 2023

Hi @nissimsan
Do you really mean CQRS or did you actually mean QUDT in your previous comment? If CQRS, please provide a link to where it specifies unit of measure codes because I think you might have picked the wrong 4-letter abbreviation containing a Q.

Regarding two systems for units of measure, I'd note that the UN CEFACT Rec20 unit codes are widely referenced throughout GS1 standards, so as a result, they are widely used in EDI messages, traceability data and master data, at least in the fast moving consumer goods sector and other industry sectors that GS1 supports, including healthcare, apparel and technical industries.

Having said that, in addition to UN/CEFACT Rec20 and QUDT, there is also UCUM - Unified Code for Units Of Measure ( https://ucum.org/ucum ). Unlike both QUDT and UN/CEFACT Rec20, it attempts to take a highly systematic approach to how its unit codes are created, rather than making choices that appear to be somewhat arbitrary. However, not all UCUM unit codes are URI-friendly, especially when using square brackets for units outside the SI system and forward-slash even in SI units such as m/s (metres per second), so that's a downside for a semantic ontology of units of measure and as far as I'm aware, UCUM is not yet published as a semantic ontology or Web vocabulary, whereas QUDT and UN/CEFACT Rec20 are. There is a corresponding dataset for UCUM - see https://ucum.nlm.nih.gov/ucum-lhc/

I'm fairly sure that QUDT provides links to UCUM unit codes but unfortunately only as string values because UCUM doesn't publish a Linked Data ontology as far as I know.

I am aware that in some cases, UCUM code values have been used within GS1's GDSN data model to fill in gaps in the coverage offered by UN/CEFACT Rec20 unit codes. I'm not convinced that using two distinct UoM code lists to populate a single unitCode property is good practice but they didn't ask my advice before taking that decision!

@nissimsan
Copy link
Collaborator Author

Argh - QUDT!! 🤦‍♂️ ... The other four letter acronym with a Q in it! Updated.

@mgh128
Copy link

mgh128 commented Oct 31, 2023

@nissimsan - yes, QUDT, not to be confused with the much older system SPQR which definitely didn't publish a Linked Data ontology ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants