Yet another attempt at creating a full-stack parsing/analysis framework. We do it for fun anyway.
Datasets
There are several steps:
- Obtain the URLs
- Save the articles in a folder
- Parse the articles
- Squash the results into one big JSON
- Analyze the full data set
- Create a visualization
Run make run
, wait around 6 hours and you'll have everything.
You'll also need around 7GB of disk space.
Use rake tasks to trigger fetchers and parsers.
Ex. rake fetch:publika
or rake parse:unimedia
Currently in stock:
- ProTv
- Unimedia
- Publika
- Timpul