Skip to content

Latest commit

 

History

History
580 lines (368 loc) · 21.9 KB

api.md

File metadata and controls

580 lines (368 loc) · 21.9 KB

Table of Contents

Main

Central API elements, used in geocoding and indexing

Geocoder

index.js:55-375

Geocoder is an interface used to submit a single query to multiple indexes, returning a single set of ranked results.

Parameters

Geocoder#geocode

index.js:461-467

  • See: gecode for more details, including options properties.

Main entry point for geocoding API. Returns results across all indexes for a given query.

Parameters

Geocoder#index

index.js:486-492

  • See: index for more details, including options properties.

Main entry point for indexing. Index a stream of GeoJSON docs.

Parameters

CarmenSource

index.js:55-375

An interface to the underlying data that a Geocoder instance is indexing and querying. In addition to the properties described below, instances must satisfy interface requirements for Tilesource and Tilesink. See tilelive API Docs for more info. Currently, carmen supports the following tilelive modules:

Type: function

Properties

  • getGeocoderData function (index, shard, callback) get carmen record at shard in index and call callback with (err, buffer)
  • putGeocoderData function (index, shard, buffer, callback) put buffer into a shard with index index, and call callback with (err)
  • geocoderDataIterator function (type) custom method for iterating over documents in the source.
  • getIndexableDocs function (pointer, callback) get documents needed to create a forward geocoding datasource. pointer is an optional object that has different behavior depending on the implementation. It is used to indicate the state of the database, similar to a cursor, and can allow pagination, limiting, etc. callback is called with (error, documents, pointer) in which documents is a list of objects.

MemSource

lib/sources/api-mem.js:27-47

An in-memory tilelive source/sink. Used primarily for testing purposes when instantiating a geocoder. Satisfies API constraints documented here. If there are three arguments:

Parameters

  • docs Array an array of GeoJSON features to be indexed, or an object with index metadata
  • info object an object with index metadata, or a callback function
  • callback function a callback function

MemSource

lib/sources/api-mem.js:27-47

(See above). If there are two arguments:

Parameters

  • docs object an object with index metadata (gets re-assigned to info)
  • info function a callback function (gets re-assigned to callback)
  • callback undefined undefined

PatternReplaceMap

lib/text-processing/token.js:260-275

A mapping from patterns (keys) to replacements (values).

  • The patterns are used in createGlobalReplacer to create XRegExps (case-insensitive)
  • Patterns can match anywhere in a string, regardless of whether the match is a full word, multiple words, or part of a word.
  • Matching substrings are replaced with the associated replacement

This map is used on input strings at both query and index time.

Example use case: Abbreviating multiple words:

There are a lot of different ways to write "post office box" in US Addresses. You can normalize them with a pattern replace entry:

patternReplaceMap = {
    "\\bP\\.?\\ ?O\\.? Box ([0-9]+)\\b": " pob-$1 "
}

// "P.O. Box 985" -> "pob-985"
// "PO Box 985"   -> "pob-985"
// "p.o. box 985" -> "pob-985"

Type: Object<string, string>

Geocoding

Functions in use when querying indexes. Most commonly used in a production setting.

geocode

lib/geocoder/geocode.js:42-170

Main interface for querying an index and returning ranked results.

Parameters

  • geocoder Geocoder the geocoder itself
  • query string a query string. If the query appears to be a longitude, latitude pair (eg "-75.1327,40.0115"), it is assumed to be a reverse query.
  • options Object options
    • options.proximity Array<number>? a [ lon, lat ] array to use for biasing search results. Features closer to the proximity value will be given priority over those further from the proximity value.
    • options.types Array<string>? an array of string types. Only features matching one of the types specified will be returned.
    • options.language Array<string>? One or more ISO 639-1 codes, separated by commas to be displayed. Only the first language code is used when prioritizing forward geocode results to be matched. If carmen:text_{lc} and/or geocoder_format_{lc} are available on a features, response will be returned in that language and appropriately formatted.
    • options.languageMode string? string. If set to "strict" the returned features will be filtered to only those with text matching the language specified by the
    • options.bbox Array<number>? a [ w, s, e, n ] bbox array to use for limiting search results. Only features inside the provided bbox will be included.
    • options.limit number Adjust the maximium number of features returned. (optional, default 5)
    • options.allow_dupes boolean If true, carmen will allow features with identical place names to be returned. (optional, default false)
    • options.debug boolean If true, the carmen debug object will be returned as part of the results and internal carmen properties will be preserved on feature output. (optional, default false)
    • options.stats boolean If true, the carmen stats object will be returned as part of the results. (optional, default false)
    • options.indexes boolean If true, indexes will be returned as part of the results. (optional, default false)
    • options.autocomplete boolean If true, indexes will be returned as part of the results. (optional, default true)
    • options.reverseMode string Choices are 'distance', 'score'. Affects the way that a result's context array is built (optional, default 'distance')
    • options.routing boolean If true, routable_points will be returned as part of the results for features whose sources are flagged as geocoder_routable in the tile json. (optional, default false)
  • callback function a callback function

phrasematch

lib/geocoder/phrasematch.js:20-196

phrasematch

Parameters

  • source Object a Geocoder datasource
  • query
  • options Object passed through the geocode function in geocode.js
  • callback Function called with (err, phrasematches, source)
  • a Array list of terms composing the query to Carmen

spatialmatch

lib/geocoder/spatialmatch.js:27-138

spatialmatch determines whether indexes can be spatially stacked and discards indexes that cannot be stacked together

Parameters

  • query Array a list of terms composing the query to Carmen
  • phrasematchResults Array for subquery permutations generated by ./lib/phrasematch
  • options Object passed in with the query
  • callback function callback called with indexes that could be spatially stacked *

verifymatch

lib/geocoder/verifymatch.js:30-93

verifymatch - results from spatialmatch are now verified by querying real geometries in vector tiles

Parameters

  • query Array a list of terms composing the query to Carmen
  • stats Object ?
  • geocoder Object a geocoder datasource
  • matched Object resultant indexes that could be spatially stacked
  • options Object passed through the geocode function in geocode.js
  • callback Function callback function which is called with the verified indexes in the correct hierarchical order

verifyContext

lib/geocoder/verifymatch.js:396-540

This function adjusts a result context's relevance score. There are several bits of business logic here that can boost or penalize the original relevance. The nicknames for these are squishy and backy.

squishy

The squishy logic checks for nested, identically-named features in indexes with geocoder_inherit_score enabled. When they are encountered, avoid applying the gappy penalty and combine their scores on the smallest feature.

This ensures that a context of "New York, New York, USA" will return the place rather than the region when a query is made for "New York USA". In the absence of this check the place would be gappy-penalized and the region feature would be returned as the first result.

backy

The backy logic checks to see if the matching substrings of the query are in a single ordering, going from low-to-high or high-to-low, as specified by the geocoder's index hierarchy. It's helpful to walk through this with an example. Here's a result context where the target feature is an address (this is pseudocode for simplicity's sake).

[
  "123 Main St" (address),
  "02169" (postcode),
  "Quincy" (city),
  "Massachusetts" (state),
  "United States" (country)
]

Here are three different query strings that could return that context, and how the backy penalty would apply (or not).

query string direction backy penalty?
123 Main St, Quincy MA ascending no
MA Quincy 123 Main St descending no
123 Main St, MA Quincy ascending yes

The first two examples would not be penalized, because each one follows a licit ordering with respect to the hierarchy of the geocoder. Not so for the third query.

  • The first one is ascending, going from hierarchically low (street) to hierarchically high (state).
  • The second one is descending, going from high (state) to low (street).
  • The third one, however, begins by ascending from street to state, but then descends again, going from state to city. Therefore, a penalty is applied.

It's possible for a layer to be exempted from the backy penalty. This affordance is built in because, in some places, the hierarchical order does not match the conventional way of writing a full address. For instance, postcodes in the United States are often written at the end of an address, even though they're hierarchically positioned lower than cities or states.

If a CarmenSource index has geocoder_ignore_order=true, then the backy penalty is witheld for that layer (but could still apply to other layers).

Parameters

  • context Array created in loadContexts the target feature to be returned is context[0] and context[1:] are the features in which the target is contained, ordered by hierarchy of their layers
  • peers Object A mapping from carmen:tmpids to features, used when applying the "squishy" logic for nested, identically-named features.
  • strict Object A mapping from carmen:tmpids to covers matched by some substring of the query
  • loose Object A mapping from carmen:tmpids to the cover with that tmpid whose relev value is greatest (across all result contexts).
  • indexes Object the geocoder's indexes
  • options Object optional arguments
  • geocoder Geocoder the carmen Geocoder instance

Returns number the adjusted relevance of the supplied result

context

lib/geocoder/context.js:28-87

Returns a hierarchy of features ("context") for a given lon, lat pair. This is used for reverse geocoding: given a point, it returns possible regions that contain it.

Parameters

Indexing

Functions dedicated toward building and manipulating indexes.

index

lib/indexer/index.js:30-97

The main interface for building an index

Parameters

analyze

lib/util/analyze.js:23-55

Generate summary statistics about a source. Used by bin/carmen-analyze.js.

Summary stats include:

statistic type description
total number the total number of grids
byScore Object<string, number> grid counts, grouped by grid score value (from "1" to "6")
byRelev Object<string, number> grid counts, grouped by grid revelance. group labels reflect the relevance value, rounded to the nearest tenth ("0.4", "0.6", "0.8", "1.0")

Parameters

Returns function (object) output of callback(stats)

Text Processing

The various utility functions for preparing text while indexing and querying.

ReplaceRule

lib/text-processing/token.js:31-137

An individual pattern-based replacement configuration.

Type: Object

Properties

  • named boolean does the pattern use a named capturing group?
  • from RegExp pattern to match in a string
  • to string replacement string (possibly including group references)

createGlobalReplacer

lib/text-processing/token.js:260-275

Create an array of ReplaceRules from a PatternReplaceMap.

Parameters

Returns Array<ReplaceRule> an array of rules for replacing substrings

createReplacer

lib/text-processing/token.js:31-137

Create a per-token replacer

Parameters

  • tokens object tokens
  • inverseOpts object options for inverting token replacements
    • inverseOpts.includeUnambiguous object options for

Returns Array<ReplaceRule> an array of replace rules