Here some details about how the various files has been retrieved (or reprocessed, or created).

Vocabularies

In the vocabularies/ directory you can find the files ready to be loaded into CKAN. They may be either the original official files, or a processed version that only includes entries fitting DCATAPIT specs.

You will also find copies of the original files in a Release entry in github (in order not to add unused heavy files in the repository checkout), and the scripts for processing the original files in vocabularies/scripts.

In order to be able to run the trimming scripts, you need to install the xmlstarlet package.

Data themes

Official page:
https://op.europa.eu/it/web/eu-vocabularies/at-dataset/-/resource/dataset/data-theme/

Download command:

wget "https://op.europa.eu/o/opportal-service/euvoc-download-handler?cellarURI=http%3A%2F%2Fpublications.europa.eu%2Fresource%2Fcellar%2F2c758808-fdd6-11ea-b44f-01aa75ed71a1.0001.02%2FDOC_1&fileName=data-theme-skos.rdf" -O official/data-theme-skos.rdf

Processing:
- Script scripts/data-theme-trim.sh:
  - Read official/data-theme-skos.rdf
  - Filter out unused languages
- Final processed file: data-theme-filtered.rdf

Places

Official page:
https://op.europa.eu/it/web/eu-vocabularies/at-dataset/-/resource/dataset/place/

Download command:

wget "https://op.europa.eu/o/opportal-service/euvoc-download-handler?cellarURI=http%3A%2F%2Fpublications.europa.eu%2Fresource%2Fcellar%2F87ec948c-581c-11ec-91ac-01aa75ed71a1.0001.04%2FDOC_1&fileName=places-skos.rdf" -O official/places-skos.rdf

Processing:
- Script scripts/places-trim.sh:
  - Read official/places-skos.rdf
  - Filter out non Italian places
  - Filter out unused languages
- Final processed file places-filtered.rdf

Languages

Official page:
https://op.europa.eu/it/web/eu-vocabularies/at-dataset/-/resource/dataset/language/

Download command:

wget "https://op.europa.eu/o/opportal-service/euvoc-download-handler?cellarURI=http%3A%2F%2Fpublications.europa.eu%2Fresource%2Fcellar%2F87f03e0d-581c-11ec-91ac-01aa75ed71a1.0001.05%2FDOC_1&fileName=languages-skos.rdf" -O official/languages-skos.rdf

Processing:
- Script scripts/languages-trim.sh:
  - Read official/languages-skos.rdf
  - Filter out unused languages
  - Filter out unused elements
- Final processed file languages-filtered.rdf

Frequencies

Official page:
https://op.europa.eu/it/web/eu-vocabularies/at-dataset/-/resource/dataset/frequency/

Download command:

wget "https://op.europa.eu/o/opportal-service/euvoc-download-handler?cellarURI=http%3A%2F%2Fpublications.europa.eu%2Fresource%2Fcellar%2Fe20301fe-928e-11e9-9369-01aa75ed71a1.0001.02%2FDOC_1&fileName=frequencies-skos.rdf" -O official/frequencies-skos.rdf

Processing:
- Script scripts/frequencies-trim.sh:
  - Read official/frequencies-skos.rdf
  - Filter out unused languages
  - Filter out unused elements
- Final processed file frequencies-filtered.rdf

File types

Official page:
https://op.europa.eu/it/web/eu-vocabularies/at-dataset/-/resource/dataset/file-type/

Download command:

wget "https://op.europa.eu/o/opportal-service/euvoc-download-handler?cellarURI=http%3A%2F%2Fpublications.europa.eu%2Fresource%2Fcellar%2F7c112635-581c-11ec-91ac-01aa75ed71a1.0001.04%2FDOC_1&fileName=filetypes-skos.rdf" -O official/filetypes-skos.rdf

Processing
- Script scripts/filetypes-trim.sh:
  - Read official/filetypes-skos.rdf
  - Filter out unused languages
  - Filter out unused elements
  - Add missing it de fr es entries
- Final processed file frequencies-filtered.rdf

Subtheme mapping

Subthemes mapping file is not from op.europa.eu and it does not need any further processing.

Download command:

wget https://github.com/italia/daf-ontologie-vocabolari-controllati/raw/master/VocabolariControllati/theme-subtheme-mapping/theme-subtheme-mapping.rdf

EUROVOC

We're using a subset of EUROVOC concepts for extracting localized label for subthemes, which are only provided in it and en in the theme-subtheme-mapping.rdf file.

Official page: https://op.europa.eu/it/web/eu-vocabularies/dataset/-/resource?uri=http://publications.europa.eu/resource/dataset/eurovoc#

Download command:

wget "https://op.europa.eu/o/opportal-service/euvoc-download-handler?cellarURI=http%3A%2F%2Fpublications.europa.eu%2Fresource%2Fcellar%2Fbcd714b1-5f05-11ec-9c6c-01aa75ed71a1.0001.05%2FDOC_1&fileName=eurovoc_skos.zip" -O official/eurovoc_skos.zip

Processing
- Unzip the file, you'll get the file eurovoc-skos-ap-eu.rdf
```
unzip  official/eurovoc_skos.zip  -d official/
```
- Call script scripts/eurovoc-trim.sh official/eurovoc-skos-ap-eu.rdf
  It's a big file, so it will take a bit.
- Final processed file eurovoc-skos-filtered.rdf (about 1% of the original size)

Licenses

Download command:

wget https://raw.githubusercontent.com/italia/daf-ontologie-vocabolari-controllati/master/VocabolariControllati/licences/licences.rdf

Examples

In the examples/ directory there are some sample files.

Tests

In the tests/files/ directory there are overly simplified files used in tests.

/ckanext/dcatapit/tests/files/data-theme-skos.rdf:

 xmlstarlet ed -d "//*[@xml:lang][not(contains('it en fr de', @xml:lang))]" vocabularies/data-theme-skos.rdf > ckanext/dcatapit/tests/files/data-theme-skos.rdf

/ckanext/dcatapit/tests/files/eurovoc_filtered.rdf
This file contains the labels for only the used subthemes.
In order to create this file you need an usable EUROVOC file (the eurovoc-filtered.rdf described above is perfectly fine) and the subtheme mapping.
You can recreate this file using the create_eurovoc_for_test.py script:
```
./create_eurovoc_for_test.py ../../../../vocabularies/theme-subtheme-mapping.rdf  ../../../../vocabularies/eurovoc-filtered.rdf 
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.files.md

README.files.md

Vocabularies

Data themes

Places

Languages

Frequencies

File types

Subtheme mapping

EUROVOC

Licenses

Examples

Tests

Files

README.files.md

Latest commit

History

README.files.md

File metadata and controls

Vocabularies

Data themes

Places

Languages

Frequencies

File types

Subtheme mapping

EUROVOC

Licenses

Examples

Tests