Document Search Engine Tool
-
Updated
Dec 8, 2022 - Python
Document Search Engine Tool
Python wrapper for the MediaWiki API to access and parse data from Wikipedia
a crawler for Wikipedia (for now only the English pages)
A search engine that takes keyword queries as input and retrieves a ranked list of relevant results as output. It scraps a few thousand pages from one of the seed Wiki pages and uses Elasticsearch for a full-text search engine.
Custom implementation of Chord DHT and analysis of its operations
Innopolis IR 2016 course semester project IR system part
Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from multiple wikipedia web pages/links using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench.
python web crawler to test theory that repeatedly clicking on the first link on ~97% of wiki pages eventually leads to the wiki page for knowledge 📡
The program can map out the shortest path between 2 wikipedia pages.
A Wikipedia crawler that gives the worst translated page around an english starting using hypertext links
Web scraping is data scraping technique used for extracting data from websites.
[READ-ONLY] A word extractor for Wikipedia articles.
Add a description, image, and links to the wikipedia-crawler topic page so that developers can more easily learn about it.
To associate your repository with the wikipedia-crawler topic, visit your repo's landing page and select "manage topics."