A simple plugin to rewrite multiple domains or subdomains to a single domain in the site's HTML output.
-
Updated
Jan 22, 2020 - PHP
A simple plugin to rewrite multiple domains or subdomains to a single domain in the site's HTML output.
Natural Language Precessing related notebooks (Machine Learning)
http url normalization for web crawlers
Normalize a URL
URL normalizer to canonicalize (standardize) the text representation of a URL to determine if differently-formatted URLs are identical
Allows you to remove ad/tracking query params from a given URL in Scala
Remove clutter from URLs and return a canonicalized version
Extract and decompose URLs (including emails, which are conceptually a part of URLs) with robust patterns.
Get a stable, canonical version of any URL, with DNS and HTTPS checks, redirects, tracker stripping, and canonical link extraction!
🔗🧹 Normalize URLs to a standardized form. HTTPS by default, flexible configuration, custom protocols, domain extraction, humazing URL, and punycode support. Both CJS & ESM modules available.
A robust, modular web crawler built in Python for extracting and saving content from websites. This crawler is specifically designed to extract text content from both HTML and PDF files, saving them in a structured format with metadata.
Add a description, image, and links to the url-normalization topic page so that developers can more easily learn about it.
To associate your repository with the url-normalization topic, visit your repo's landing page and select "manage topics."