Skip to content

southern-cross-ai/AustLII-Legal-Case-Report

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AustLII Legal Case Report

Overview

Keywords: Australian law; Legal cases; Federal court of Australia

This dataset contains all Australian legal cases from the Federal Court of Australia (FCA) during 2006 - 2009, a textual corpus of 4000 legal cases for automatic summarisation and citation analysis downloaded from AustLII.

For each document, the dataset collected catchphrases, citation sentences, citation catchphrases, and citation classes:

  • Catchphrases are found in the document. The dataset uses the catchphrases as the gold standard for the summarisation experiments.
  • Citation sentences are found in later cases that cite the present case. The dataset uses citation sentences for summarization.
  • Citation catchphrases are the catchphrases (where available) of both later cases that cite the present case, and older cases cited by the present case.
  • Citation classes are indicated in the document. They indicate the type of treatment given to the cases cited by the present case.

Data Source

The original data is downloaded from AustLII and licensed with CC BY 4.0. You can also find it from UCI Machine Learning Repository - Legal Case Reports and Kaggle - Legal Case Reports in Australia.

The cleaned dataset can be found under AustLII_clean created by Dale Batkhuu.

Dataset Structure

Under AustLII-Legal-Case-Reports, after unzipping legal_case_reports.zip, the dataset corpus contains:

  • readme.txt: An overview of the dataset.
  • citations_class: 2754 .xml files that contain citation class elements for each case.
  • citations_summ: 3891 .xml files that contain citation elements for each case.
  • fulltext: 3890 .xml files that contain full text and the catchphrases of all the cases from the FCA.

Download

You can download the dataset directly from UCI Machine Learning Repository or Kaggle. Or, you can download the dataset by using curl through your terminal:

source utils/download.sh [<save_path>]

If <save_path> is not provided, legal_case_reports.zip will be downloaded to the current directory.

License

This repository is licensed under MIT.