Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
-
Updated
Nov 12, 2024 - Python
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
To extract main article from given URL with Node.js
Readability / Html Content / Article Extractor & Web Scrapping library written in PHP
SmartReader is a library to extract the main content of a web page, based on a port of the Readability library by Mozilla
Parse markdown article, download images and replace images URL's with local paths
Reddit bot to preview and post hyperlinks as comments
NLP Web Service
Laravel wrapper for common NLP tasks
The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.
Extract article or news by url or html, parse the title and content, output in markdown format.
Involution King Fun Book (IKFB, Chinese: 快卷, 卷王快乐本) is an integrated management system for papers and literature. Powered by Electron.
This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.
【 Spring Boot 实战开发】10 分钟快速构建一个自己的技术文章博客
This is a small and easy-to-use desktop application that allows exporting Web of Science API Expanded and InCites API data in Excel/CSV/JSON/XML with a configurable and flexible data export structure.
A web page content extractor
📚 Сборник полезных штук из Natural Language Processing: Определение языка текста, Разделение текста на предложения, Получение основного содержимого из html документа
디시인사이드 Client-Side 글 검색기 입니다.
从html中提取正文,用于新闻类网页
The program can be used to scrape the content from an article from web by an input of a set of URLs in a text file or a URL. This project uses newspaper3k and python-docx libraries. The output of this program will give a neatly modified Word Document in '.docx' format with the contents of the article.
Add a description, image, and links to the article-extractor topic page so that developers can more easily learn about it.
To associate your repository with the article-extractor topic, visit your repo's landing page and select "manage topics."