This project is NOT IN USE and will be deleted soon

DocParser

This project has started its life as a Java project and was converted to a Scala one

Capabilities

Pulling decisions from NY Courts of appeal

Crawling all documents from the the 4 courts of appeal, downloading them and converting them from HTML to TXT
Parsing the docs, to summarize for stats.

Parsing the text appeal documents, producing a csv file with extracted attributes

How to run

with sbt

To run, enter sbt. Once inside of sbt, do this

sbt> run <your parameters>

On 'run' choice, chose '1' for Application

Parameters example

For all data

-i ../Greg_Data/NY_Court_All -o  ../Greg_Data/NY_Court_All_Parsed

For a small test

-i test-data/ny-appeals -o output

It will ask you which application to run, and pass your parameters to it

Please note that all software documentation is found in the 'doc' folder in this project

Data

All input

../Greg_Data/NY_Court_All

Location on Google Drive:

https://drive.google.com/drive/u/1/folders/0B2J7VZ46YxKBR0RTc2FGWjlMOHM

Assuming that you have downloaded the data next to your project code

../Greg_Data/NY_Court_2017-09-10/courtdoc_2017.zip contains 12371 text files
../Greg_Data/NY_Court_2017-09-10/courtdoc2015_2016.zip contains 36135 text files
../Greg_Data/NY_Court_2003-2015/txt contains 36135 text files 111019

(The first two files are from GDrive, the third one is from S3) S3:

About 100K of appeal documents scraped from the NY State Court of appeals are found in S3 here

The (hopefully) latest results of processing, extracted with this CourtDoc regex's are here

Pulling data from California rehab institutions

(Kept as archive, probably to be removed)

Useful Scala info

For packaging

package plugin [export Intellij configuration for sharing (by saving them in .idea in git)]

(https://intellij-support.jetbrains.com/hc/en-us/community/posts/206600965-Export-Import-Run-Configurations-)

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
doc		doc
project		project
scripts		scripts
src		src
test-data/ny-appeals		test-data/ny-appeals
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This project is NOT IN USE and will be deleted soon

DocParser

Capabilities

Pulling decisions from NY Courts of appeal

Parsing the text appeal documents, producing a csv file with extracted attributes

How to run

with sbt

Data

Pulling data from California rehab institutions

Useful Scala info

About

Releases

Packages

Contributors 2

Languages

shmsoft/DocParser

Folders and files

Latest commit

History

Repository files navigation

This project is NOT IN USE and will be deleted soon

DocParser

Capabilities

Pulling decisions from NY Courts of appeal

Parsing the text appeal documents, producing a csv file with extracted attributes

How to run

with sbt

Data

Pulling data from California rehab institutions

Useful Scala info

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages