NewspaperWorks is a gem (Rails "engine") for Hyrax -based digital repository applications to support ingest, management, and display of digitized newspaper content.
NewspaperWorks is not a stand-alone application. It is designed to be integrated into a new or existing Hyrax (v2.5-v2.9.4) application, providing content models, ingest workflows, and feature-rich UX for newspaper repository use-cases.
NewspaperWorks supports:
- models for Title, Issue, Page, and Article
- batch ingest via command line
- OCR and ALTO creation
- newspaper-specific metadata fields
- full-text search
- calendar-based issue browsing
- advanced search
- OCR keyword match highlighting
- viewer with page navigation and deep zooming
A complete list of features can be found here.
A set of helpful documents to help you learn more and deploy NewspaperWorks can be found on the Project Wiki, including a PCDM model diagram, metadata schema, batch ingest instructions, and more details on installing, developing, and testing the code.
- Ruby >=2.4
- Rails ~>5.0
- Bundler
- Hyrax v2.5-v2.9.4
- ...and various Samvera dependencies that entails.
- A Hyrax-based Rails application
- FITS
- Tesseract-ocr
- LibreOffice
- ghostscript
- poppler-utils
- ImageMagick
- ImageMagick policy XML may need to be more permissive in both resources and source media types allowed. See template policy.xml.
- libcurl3
- libgbm1
See the wiki for more details on how to install and configure dependencies.
NewspaperWorks easily integrates with your Hyrax 2.x applications.
- Add
gem 'newspaper_works'
to your Gemfile. - Run
bundle install
- Run
rails generate newspaper_works:install
- Set config options as indicated below...
- In
app/controllers/catalog_controller.rb
, theconfig.search_builder_class
is set to a newCustomSearchBuiler
to support newspapers search features. - Additional facet fields for newspaper metadata are added to
app/controllers/catalog_controller.rb
. - Newspaper resource types added to
config/authorities/resource_types.yml
.
(It may be helpful to run git diff
after installation to see all the changes made by the installer.)
- set
config.geonames_username
- Enables geolocation tagging of content
- how to create a Geonames username
- set
config.work_requires_files = false
- set
config.iiif_image_server = true
- set
config.fits_path = /location/of/fits.sh
- set
config.public_file_server.enabled = true
NewspaperWorks supports a range of different ingest workflows:
- single-item ingest via the UI
- batch ingest of NDNP materials (page-level digitization) via command line
- batch ingest of PDF issues via command line
- batch ingest of TIFF or JP2 master files via command line
The ingest process creates a full complement of derivatives for each Page object, including:
- TIFF
- JP2
- OCR text
- word-coordinate JSON
For more information on derivatives, see the wiki.
Detailed information regarding development and testing environments setup and configuration can be found here
A Vagrant VM is available for users and developers to quickly and easily deploy the latest NewspaperWorks codebase using Vagrant and VirtualBox. See samvera-newspapers-vagrant for more.
Additionally, the NewspaperWorks Demo Site is available for those interested in testing out NewspaperWorks as deployed in a vanilla Hyrax application. (NOTE: The demo site may not be running the latest release of NewspaperWorks.)
If you're working on a PR for this project, create a feature branch off of main
.
This repository follows the Samvera Community Code of Conduct and language recommendations. Please do not create a branch called master
for this repository or as part of your pull request; the branch will either need to be removed or renamed before it can be considered for inclusion in the code base and history of this repository.
We encourage anyone who is interested in newspapers and Samvera to contribute to this project. How can I contribute?
This gem is part of a project developed in a collaboration between The University of Utah, J. Willard Marriott Library and Boston Public Library, as part of a "Newspapers in Samvera" project grant funded by the Institute for Museum and Library Services.
The development team is grateful for input, collaboration, and support we receive from the Samvera Community, related working groups, and our project's advisory board.
- Samvera Newspapers Group - The Samvera Newspapers Interest groups meets on the first Thursday of every month to discuss the Samvera newspapers project and general newspaper topics.
- Newspapers in Samvera IMLS Grant (formerly Hydra) - The official grant award for the project.
- National Digital Newspapers Program NDNP
Contact any contributors above by email, or ping us on Samvera Community Slack channel(s)
This software has been developed by and is brought to you by the Samvera community. Learn more at the Samvera website.