Skip to content
#

wikipedia-crawler

Here are 12 public repositories matching this topic...

Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from multiple wikipedia web pages/links using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench.

  • Updated Sep 12, 2023
  • Python

A search engine that takes keyword queries as input and retrieves a ranked list of relevant results as output. It scraps a few thousand pages from one of the seed Wiki pages and uses Elasticsearch for a full-text search engine.

  • Updated Jan 13, 2023
  • JavaScript

Improve this page

Add a description, image, and links to the wikipedia-crawler topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the wikipedia-crawler topic, visit your repo's landing page and select "manage topics."

Learn more