This project will deal with extracting and gathering information about the behaviour/ bad work (corresponding to predefined adjectives ) of a leader/ representative by constantly scraping news website.
We have converted the output to a JSON file.
sudo pip install requirements.txt
I have scraped Times Of India Website specially for this purpose.
This dataset have the details of the scrapped article. We have to scrap the text and get the names. Then we have to match the details of the adjective with the matched names that is got.
The dataset is present in the path :
LeaderBehaviour/leaderBehaviour/leaderBehaviour/spiders/newsTOI.sqlite
Scraped names of the members of parliaments in US :
LeaderBehaviour/getUSNames/getUSNames/spiders/getUSNames.json
Scraped the names of the members of parliaments in India :
LeaderBehaviour/getIndianPolNames/getIndianPolNames/spiders/getIndianPolNames.json
* used headers/ user-agent in scrapy.
* need to use proxy/ integrate with Tor to make it completely untraceable.
Possible name extraction from the extracted text :
LeaderBehaviour/leaderBehaviour/leaderBehaviour/spiders/extractNamesTOI.py
LeaderBehaviour/leaderBehaviour/leaderBehaviour/spiders/probable_names_extracted.json
Go to the directory real_shit, then copy the scrapTOI.sqlite, then run *** python get_neg.py***.