Skip to content

Converts bibliographic records to Linked Open Data

Notifications You must be signed in to change notification settings

ld4l-labs/bib2lod

Repository files navigation

bib2lod

Build Status Coverage Status

What is bib2lod?

bib2lod is a full record MARC-to-bibliotek-o converter that will:

  • Accept any valid MARC record (or set of valid MARC records) as input.
  • Convert each input record to RDF in the bibliotek-o framework. The bibliotek-o framework includes:
    • The bibliotek-o ontology (an extension to BIBFRAME).
    • Defined fragments of BIBFRAME and other external ontologies.
    • An application profile specifying rules for bibliographic metadata modeling using these ontologies.
    • Mappings from MARC to the bibliotek-o target ontologies against which the converter is being developed are in progress by the LD4L Labs/LD4P ontology mapping group.
  • Convert all fields except local (9xx) fields to RDF.

In its initial implementation, it:

  • Converts each record as a self-contained unit, with no attempt to reconcile URIs either locally across an entire data set or to external URIs. New URIs are minted using the configured local namespace and a random local name generator.
  • Supports only file IO.

News

  • 2017-03-31 Release 0.1 pushed to master.
    • Converts a minimal MARCXML record to N-TRIPLES.
    • Most of the architecture is in place.

Quick Start

  • git clone git@github.com:ld4l-labs/bib2lod.git
  • cd bib2lod
  • mvn install
  • mkdir output
  • java -jar target/bib2lod.jar -c src/main/resources/example.config.json
  • more output/102063.min.nt

Build

  • Clone the repository from https://github.com/ld4l-labs/bib2lod
  • run mvn install
  • Copy the executable jar from target/bib2lod.jar to your preferred work location.
  • Copy the example configuration file from src/main/resources/example.config.json to your preferred work location. Rename it appropriately. For example, first.config.json.

Configure

  • Edit the configuration file to set appropriate input source and output destination.
  • Within InputService, change the source attribute to point either to a single file of MARCXML, or to a directory containing MARCXML files.
    • Each input file must have a filename extension of .xml
    • Sample minimal record is in sample-data/marcxml-to-ld4l/cornell/102063-min/102063.min.xml.
  • Within OutputService, change the destination attribute to point to your desired output directory.
    • You must create this directory before running the program.

Run

  • Execute the jar file, referencing the configuration file on the command line:
    • java -jar bib2lod.jar -c first.config.json
  • Output will be written in N-TRIPLE format to the directory specified in the configuration file.
    • One output file will be created for each input file.
    • The name of the output file will be the same as the corresponding input file, but the extension will be .nt.
  • A log directory will be created as target/logs in your work location directory.
    • A log file of the run will be created as target/logs/bib2lod.log
    • An existing log file will not be overwritten, but will be renamed with a timestamp, such as bib2lod-2017-03-31-14-38-47-1.log

Command line options

As illustrated above, the command to run the converter looks like this:

java -jar bib2lod.jar [options]

Where options are:

  • -c path, --config path

    • Specify the path to the configuration file. The path may be relative or absolute.
  • -a spec=value, --add spec=value

    • Add a value to the configuration at the specified location. For example, if the configuration file contains this section:

          "OutputService": {
            "class": "org.ld4l.bib2lod.io.FileOutputService",
            "format": "N-TRIPLES"
          },
      

      Then this command line option

          --add OutputService:destination=./output
      

      will have this effect on the relevant section:

          "OutputService": {
            "class": "org.ld4l.bib2lod.io.FileOutputService",
            "format": "N-TRIPLES",
            "destination": "./output"
          },
      

      Note that --add cannot be used to replace an existing value in the configuration file. It will merely add an additional value at the same location. To replace an existing value, use --set.

  • -d spec, --drop spec

    • Remove a value from the configuration at the specified location. For example, if the configuration file contains this section:

          "OutputService": {
            "class": "org.ld4l.bib2lod.io.FileOutputService",
            "format": "N-TRIPLES"
            "destination": "./output"
          },
      

      Then this command line option

          --drop OutputService:destination
      

      will have this effect on the relevant section:

          "OutputService": {
            "class": "org.ld4l.bib2lod.io.FileOutputService",
            "format": "N-TRIPLES",
          },
      

      If there are multiple values at the specified location, --drop will remove all of them. If there are no such values, --drop will have no effect.

  • -s spec=value, --set spec=value

    • Set a value into the configuration at the specified location, replacing any existing value. For example, if the configuration file contains this section:

          "OutputService": {
            "class": "org.ld4l.bib2lod.io.FileOutputService",
            "format": "N-TRIPLES"
            "destination": "./output"
          },
      

      Then this command line option

          --set OutputService:destination=newOutput
      

      will have this effect on the relevant section:

          "OutputService": {
            "class": "org.ld4l.bib2lod.io.FileOutputService",
            "format": "N-TRIPLES",
            "destination": "newOutput"
          },
      

      If there are multiple values at the specified location, --set will replace all of them. If there are no such values, --set proceeds like --add.

License

Copyright 2017 Cornell University

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

Converts bibliographic records to Linked Open Data

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •