wikipedia-crawler

Here are 12 public repositories matching this topic...

WillCaton2350 / Wikipedia-WebCrawler

Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from multiple wikipedia web pages/links using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench.

mysql python json web-crawler scrapy-spider scrapy-crawler python-crawler wikipedia-crawler

Updated Sep 12, 2023
Python

adidottxt / wikipedia-crawler

Star

python web crawler to test theory that repeatedly clicking on the first link on ~97% of wiki pages eventually leads to the wiki page for knowledge 📡

udacity beautifulsoup python-web-crawler wikipedia-crawler beautifulsoup4

Updated Feb 8, 2018
Python

mayankkumar2 / wikipedia-index-scraper

Star

The program can map out the shortest path between 2 wikipedia pages.

wikipedia wikipedia-crawler wikipedia-scraper wikipedia-entries

Updated May 24, 2020
Go

Relex12 / Wikipedia-Translate-Crawler

Star

A Wikipedia crawler that gives the worst translated page around an english starting using hypertext links

bash translation wikipedia web-crawler wikipedia-crawler

Updated Aug 21, 2022
Shell

ambirpatel / Wikipedia-crawler

Star

Web scraping is data scraping technique used for extracting data from websites.

wikipedia-crawler webcrawling

Updated Apr 25, 2024
Jupyter Notebook

jamesponddotco / wikiextract

Star

[READ-ONLY] A word extractor for Wikipedia articles.

go crawler wikipedia crawling diceware wikipedia-crawler word-extraction

Updated Apr 8, 2024
Go

Smile040501 / Search-Engine

Star

A search engine that takes keyword queries as input and retrieves a ranked list of relevant results as output. It scraps a few thousand pages from one of the seed Wiki pages and uses Elasticsearch for a full-text search engine.