Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
bacor committed Jul 12, 2020
0 parents commit 18ee317
Show file tree
Hide file tree
Showing 11 changed files with 1,417 additions and 0 deletions.
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.vscode
tmp
__pycache__/
.DS_Store
__pycache__

dist/*/csv
scrape/

env/
*.log
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.7.6
59 changes: 59 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
CantusCorpus
============

The CantusCorpus is a corpus of plainchant intended specifically for
computational research. It is essentially a research-friendly version dump of
the [Cantus database](http://cantus.uwaterloo.ca/). The database was scraped
using its API, and converted to easy-to use CSV files. For many chants,
transcriptions in the Volpiano format are included. These can be loaded into
[music21](https://web.mit.edu/music21/), a Python toolkit for computational
musicology, using the library `chant21`.

*Note: Even the latest version of the corpus will generally be out-dated, as*
*the Cantus database is updated continuously. CantusCorpus is intended only for*
*computational studies, where this is less of a problem. If you require*
*up-to-date information, please do not use this corpus, but use Cantus directly.*

Usage
-----

TODO: example of using the corpus with chant21

Citation
--------

If you use this corpus in your research, please cite the Cantus Database
as [suggested on its website](http://cantus.uwaterloo.ca/citations):

Cantus: A Database for Latin Ecclesiastical Chant -- Inventories of Chant
Sources. Directed by Debra Lacoste (2011-), Terence Bailey (1997-2010), and
Ruth Steiner (1987-1996). Web developer, Jan Koláček (2011-). Available
from <http://cantus.uwaterloo.ca/>. [date accessed].

Further please cite the paper describing the CantusCorpus:

todo

Versions
--------

As Cantus is being updated continuously, we plan to occasionaly release new
versions of the CantusCorpus as well. All of these will be explicitly versioned,
can be downloaded from GitHub and referenced using a Zenodo DOI.

Licence
-------

The CantusCorpus (the collection of `.csv` files) is released under a
[CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license,
just like the Cantus Database itself. The Python code used to generate the
corpus (the code in the `src/` directory) is released under an MIT license.

Generating the corpus
---------------------

The CantusCorpus is created automatically after scraping the Cantus API.
If you just want to use the corpus, you don't have to regenerate it yourself:
simply download one of the releases and you're good to go.
But if you want to generate the corpus yourself, you can of course do so.
All code and some instructions are available in the `src` directory.
241 changes: 241 additions & 0 deletions dist/cantuscorpus-v0.1/README.md

Large diffs are not rendered by default.

Binary file added dist/cantuscorpus-v0.1/cantuscorpus-v0.1.zip
Binary file not shown.
2 changes: 2 additions & 0 deletions src/changelog.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
version,changes
0.1,Test
Loading

0 comments on commit 18ee317

Please sign in to comment.