Keywords: Australian law; Legal cases; Federal court of Australia
This dataset contains all Australian legal cases from the Federal Court of Australia (FCA) during 2006 - 2009, a textual corpus of 4000 legal cases for automatic summarisation and citation analysis downloaded from AustLII.
For each document, the dataset collected catchphrases, citation sentences, citation catchphrases, and citation classes:
- Catchphrases are found in the document. The dataset uses the catchphrases as the gold standard for the summarisation experiments.
- Citation sentences are found in later cases that cite the present case. The dataset uses citation sentences for summarization.
- Citation catchphrases are the catchphrases (where available) of both later cases that cite the present case, and older cases cited by the present case.
- Citation classes are indicated in the document. They indicate the type of treatment given to the cases cited by the present case.
The original data is downloaded from AustLII and licensed with CC BY 4.0. You can also find it from UCI Machine Learning Repository - Legal Case Reports and Kaggle - Legal Case Reports in Australia.
The cleaned dataset can be found under AustLII_clean
created by Dale Batkhuu.
Under AustLII-Legal-Case-Reports
, after unzipping legal_case_reports.zip
, the dataset corpus
contains:
readme.txt
: An overview of the dataset.citations_class
: 2754.xml
files that contain citation class elements for each case.citations_summ
: 3891.xml
files that contain citation elements for each case.fulltext
: 3890.xml
files that contain full text and the catchphrases of all the cases from the FCA.
You can download the dataset directly from UCI Machine Learning Repository or Kaggle. Or, you can download the dataset by using curl
through your terminal:
source utils/download.sh [<save_path>]
If <save_path>
is not provided, legal_case_reports.zip
will be downloaded to the current directory.
This repository is licensed under MIT.