Natural Language Precessing related notebooks (Machine Learning)
-
Updated
Aug 20, 2021 - Jupyter Notebook
Natural Language Precessing related notebooks (Machine Learning)
A simple plugin to rewrite multiple domains or subdomains to a single domain in the site's HTML output.
http url normalization for web crawlers
A robust, modular web crawler built in Python for extracting and saving content from websites. This crawler is specifically designed to extract text content from both HTML and PDF files, saving them in a structured format with metadata.
URL normalizer to canonicalize (standardize) the text representation of a URL to determine if differently-formatted URLs are identical
Allows you to remove ad/tracking query params from a given URL in Scala
🔗🧹 Normalize URLs to a standardized form. HTTPS by default, flexible configuration, custom protocols, domain extraction, humazing URL, and punycode support. Both CJS & ESM modules available.
Get a stable, canonical version of any URL, with DNS and HTTPS checks, redirects, tracker stripping, and canonical link extraction!
Remove clutter from URLs and return a canonicalized version
Extract and decompose URLs (including emails, which are conceptually a part of URLs) with robust patterns.
Normalize a URL
Add a description, image, and links to the url-normalization topic page so that developers can more easily learn about it.
To associate your repository with the url-normalization topic, visit your repo's landing page and select "manage topics."