Skip to content

Latest commit

 

History

History
157 lines (108 loc) · 5.85 KB

readme.md

File metadata and controls

157 lines (108 loc) · 5.85 KB

@tuplo/fletcher

HTTP request library, focused on web scraping.

Install

$ npm install @tuplo/fletcher

# or with yarn
$ yarn add @tuplo/fletcher

Usage

Fetch a HTML page and parse it using cheerio.

import fetch from '@tuplo/fletcher';

const $page = await fetch.html('https://foo.com/page.html');
const heading = $page.find('body > h1');

Fetch a JSON file and parse it.

const { foo } = await fetch.json('https://foo.com/page.html');

Find a script on a page and evaluate it.

const { foo } = await fetch.script('https://foo.com/page.html', {
  scriptPath: 'script:nth-of-type(3)',
});

Find JSON-LD metadata on a page.

const [jsonld] = await fetch.jsonld('https://foo.com/page.html');

Work with the raw Response.

const res = await fetch.response('https://foo.com');
console.log(res.headers);
console.log(await res.text());

Work with Puppeteer for headless browser automation.

const client = fetch.create({
  browserWSEndpoint: 'ws://localhost:3000',
});
const $page = await client.browser.html('https://foo.com');
const { foo } = await client.browser.json('https://foo.com', /ajax-list/);

Options

Option Description Default
browserWSEndpoint Puppeteer web socket address
cache Caches requests in memory false
delay Introduce a delay before the request (ms) 1_000
formData Object with key/value pairs to send as form data
encoding The encoding used by the source page, will be converted to UTF8
headers A simple multi-map of names to values
jsonData Object with key/value pairs to send as json data
log Should log all request URLS to stderr false
onAfterRequest Callback to be called right after request is resolved
proxy Proxy configuration (host, port, username, password)
retry Retries failed responses async-retry
scriptFindFn A function to find a script element on the page, execute and return it
scriptPath A CSS selector to pick a script element on the page, execute and return it
scriptSandbox An object to use as base on an execution of a piece of code found on the page
urlSearchParams A key-value object listing what parameters to add to the query string of url
userAgent Set a custom user agent
validateStatus A function to decide if the response status is an error and should be thrown

API

fletcher(url: string, options?: FletcherOptions) => http.Response

Generic utility to return a HTTP Response

fletcher.html(url: string, options?: FletcherOptions) => Cheerio<AnyNode>

Requests a HTTP resource, parses it using Cheerio and returns its

const $page = await fletcher.html('https://foo.com/page.html');
const heading = $page.find('body > h1');
fletcher.script<T>(url: string, options?: FletcherOptions) => T

Requests a HTTP resource, finds a script on it, executes and returns its global context.

const { foo } = await fletcher.script('https://foo.com/page.html', {
	scriptPath: 'script:nth-of-type(3)',
});
fletcher.text(url: string, options?: FletcherOptions) => string

Requests a HTTP resource, returning it as a string

fletcher.cookies(url: string, options?: FletcherOptions) => CookieJar

Requests a HTTP resources, returning the cookies returned with it.

fletcher.json<T>(url: string, options?: FletcherOptions) => T

Requests a HTTP resource, returning it as a JSON object

fletcher.jsonld(url: string, options?: FletcherOptions) => unknown[]

Requests a HTTP resource, retrieving all the JSON-LD blocks found on the document

fletcher.response(url: string, options?: FletcherOptions) => Response

Requests a HTTP resource, returning the full HTTP Response object

fletcher.browser.html(url: string) => Cheerio<AnyNode>

Requests a HTTP resource using Puppeteer/Chrome, parses it using Cheerio and returns its.

fletcher.browser.json<T>(url: string, requestUrl: string | RegExp) => T

Requests a HTTP resource using Puppeteer/Chrome, intercepts a request made by that page and returns it as a JSON object

fletcher.create(options: FletcherOptions) => Object

Creates a new instance of fletcher with a custom config

const instance = fletcher.create({ headers: { foo: 'bar' } });
await instance.json('http://foo.com');