Central API elements, used in geocoding and indexing
Geocoder is an interface used to submit a single query to multiple indexes, returning a single set of ranked results.
indexes
Object<string, CarmenSource> A one-to-one mapping from index layer name to a CarmenSource.options
Object optionsoptions.tokens
PatternReplaceMap A PatternReplaceMap used to perform custom string replacement at index and query time.options.geocoder_inverse_tokens
Object<string, (string | Function)> for reversing abbreviations. Replace key with a stipulated string value or pass it to a function that returns a string. see Text Processing for details.
- See: gecode for more details, including
options
properties.
Main entry point for geocoding API. Returns results across all indexes for a given query.
query
string a query string, eg "Chester, NJ"options
Object optionscallback
function a callback function, passed on to geocode
- See: index for more details, including
options
properties.
Main entry point for indexing. Index a stream of GeoJSON docs.
from
stream.Readable a readable stream of GeoJSON featuresto
CarmenSource the interface to the index's destinationoptions
Object optionsoptions.zoom
number the max zoom level for the indexoptions.output
stream.Writable the output stream foroptions.tokens
PatternReplaceMap a pattern-based string replacement specification
callback
function a callback function, passed on to inde
An interface to the underlying data that a Geocoder instance is indexing and querying. In addition to the properties described below, instances must satisfy interface requirements for Tilesource
and Tilesink
. See tilelive API Docs for more info. Currently, carmen supports the following tilelive modules:
Type: function
getGeocoderData
function (index, shard, callback) get carmen record atshard
inindex
and call callback with(err, buffer)
putGeocoderData
function (index, shard, buffer, callback) put buffer into a shard with indexindex
, and call callback with(err)
geocoderDataIterator
function (type) custom method for iterating over documents in the source.getIndexableDocs
function (pointer, callback) get documents needed to create a forward geocoding datasource.pointer
is an optional object that has different behavior depending on the implementation. It is used to indicate the state of the database, similar to a cursor, and can allow pagination, limiting, etc.callback
is called with(error, documents, pointer)
in whichdocuments
is a list of objects.
An in-memory tilelive source/sink. Used primarily for testing purposes when instantiating a geocoder. Satisfies API constraints documented here. If there are three arguments:
docs
Array an array of GeoJSON features to be indexed, or an object with index metadatainfo
object an object with index metadata, or a callback functioncallback
function a callback function
(See above). If there are two arguments:
docs
object an object with index metadata (gets re-assigned toinfo
)info
function a callback function (gets re-assigned tocallback
)callback
undefined undefined
lib/text-processing/token.js:260-275
A mapping from patterns (keys) to replacements (values).
- The patterns are used in createGlobalReplacer to create
XRegExp
s (case-insensitive) - Patterns can match anywhere in a string, regardless of whether the match is a full word, multiple words, or part of a word.
- Matching substrings are replaced with the associated replacement
This map is used on input strings at both query and index time.
Example use case: Abbreviating multiple words:
There are a lot of different ways to write "post office box" in US Addresses. You can normalize them with a pattern replace entry:
patternReplaceMap = {
"\\bP\\.?\\ ?O\\.? Box ([0-9]+)\\b": " pob-$1 "
}
// "P.O. Box 985" -> "pob-985"
// "PO Box 985" -> "pob-985"
// "p.o. box 985" -> "pob-985"
Functions in use when querying indexes. Most commonly used in a production setting.
lib/geocoder/geocode.js:42-170
Main interface for querying an index and returning ranked results.
geocoder
Geocoder the geocoder itselfquery
string a query string. If the query appears to be a longitude, latitude pair (eg "-75.1327,40.0115"), it is assumed to be a reverse query.options
Object optionsoptions.proximity
Array<number>? a[ lon, lat ]
array to use for biasing search results. Features closer to the proximity value will be given priority over those further from the proximity value.options.types
Array<string>? an array of string types. Only features matching one of the types specified will be returned.options.language
Array<string>? One or more ISO 639-1 codes, separated by commas to be displayed. Only the first language code is used when prioritizing forward geocode results to be matched. Ifcarmen:text_{lc}
and/orgeocoder_format_{lc}
are available on a features, response will be returned in that language and appropriately formatted.options.languageMode
string? string. If set to"strict"
the returned features will be filtered to only those with text matching the language specified by theoptions.bbox
Array<number>? a[ w, s, e, n ]
bbox array to use for limiting search results. Only features inside the provided bbox will be included.options.limit
number Adjust the maximium number of features returned. (optional, default5
)options.allow_dupes
boolean If true, carmen will allow features with identical place names to be returned. (optional, defaultfalse
)options.debug
boolean If true, the carmen debug object will be returned as part of the results and internal carmen properties will be preserved on feature output. (optional, defaultfalse
)options.stats
boolean If true, the carmen stats object will be returned as part of the results. (optional, defaultfalse
)options.indexes
boolean If true, indexes will be returned as part of the results. (optional, defaultfalse
)options.autocomplete
boolean If true, indexes will be returned as part of the results. (optional, defaulttrue
)options.reverseMode
string Choices are'distance'
,'score'
. Affects the way that a result's context array is built (optional, default'distance'
)options.routing
boolean If true, routable_points will be returned as part of the results for features whose sources are flagged asgeocoder_routable
in the tile json. (optional, defaultfalse
)
callback
function a callback function
lib/geocoder/phrasematch.js:20-196
phrasematch
source
Object a Geocoder datasourcequery
options
Object passed through the geocode function in geocode.jscallback
Function called with(err, phrasematches, source)
a
Array list of terms composing the query to Carmen
lib/geocoder/spatialmatch.js:27-138
spatialmatch determines whether indexes can be spatially stacked and discards indexes that cannot be stacked together
query
Array a list of terms composing the query to CarmenphrasematchResults
Array for subquery permutations generated by ./lib/phrasematchoptions
Object passed in with the querycallback
function callback called with indexes that could be spatially stacked *
lib/geocoder/verifymatch.js:30-93
verifymatch - results from spatialmatch are now verified by querying real geometries in vector tiles
query
Array a list of terms composing the query to Carmenstats
Object ?geocoder
Object a geocoder datasourcematched
Object resultant indexes that could be spatially stackedoptions
Object passed through the geocode function in geocode.jscallback
Function callback function which is called with the verified indexes in the correct hierarchical order
lib/geocoder/verifymatch.js:396-540
This function adjusts a result context's relevance
score. There are
several bits of business logic here that can boost or penalize the original
relevance
. The nicknames for these are squishy and backy.
The squishy
logic checks for nested, identically-named features in indexes
with geocoder_inherit_score enabled. When they are encountered, avoid
applying the gappy penalty and combine their scores on the smallest feature.
This ensures that a context of "New York, New York, USA" will return the place rather than the region when a query is made for "New York USA". In the absence of this check the place would be gappy-penalized and the region feature would be returned as the first result.
The backy logic checks to see if the matching substrings of the query are in a single ordering, going from low-to-high or high-to-low, as specified by the geocoder's index hierarchy. It's helpful to walk through this with an example. Here's a result context where the target feature is an address (this is pseudocode for simplicity's sake).
[
"123 Main St" (address),
"02169" (postcode),
"Quincy" (city),
"Massachusetts" (state),
"United States" (country)
]
Here are three different query strings that could return that context, and how the backy penalty would apply (or not).
query string | direction | backy penalty? |
---|---|---|
123 Main St, Quincy MA | ascending | no |
MA Quincy 123 Main St | descending | no |
123 Main St, MA Quincy | ascending | yes |
The first two examples would not be penalized, because each one follows a licit ordering with respect to the hierarchy of the geocoder. Not so for the third query.
- The first one is
ascending
, going from hierarchically low (street) to hierarchically high (state). - The second one is
descending
, going from high (state) to low (street). - The third one, however, begins by ascending from street to state, but then descends again, going from state to city. Therefore, a penalty is applied.
It's possible for a layer to be exempted from the backy penalty. This affordance is built in because, in some places, the hierarchical order does not match the conventional way of writing a full address. For instance, postcodes in the United States are often written at the end of an address, even though they're hierarchically positioned lower than cities or states.
If a CarmenSource index has geocoder_ignore_order=true
, then the backy
penalty is witheld for that layer (but could still apply to other layers).
context
Array created in loadContexts the target feature to be returned is context[0] and context[1:] are the features in which the target is contained, ordered by hierarchy of their layerspeers
Object A mapping fromcarmen:tmpid
s to features, used when applying the "squishy" logic for nested, identically-named features.strict
Object A mapping fromcarmen:tmpid
s to covers matched by some substring of the queryloose
Object A mapping fromcarmen:tmpid
s to the cover with that tmpid whoserelev
value is greatest (across all result contexts).indexes
Object the geocoder's indexesoptions
Object optional argumentsgeocoder
Geocoder the carmen Geocoder instance
Returns number the adjusted relevance of the supplied result
Returns a hierarchy of features ("context") for a given lon, lat pair. This is used for reverse geocoding: given a point, it returns possible regions that contain it.
geocoder
Object : geocoder instanceposition
Array : [lon, lat]options
Object : optional options objectcallback
Function
Functions dedicated toward building and manipulating indexes.
The main interface for building an index
geocoder
Geocoder a Geocoder instancefrom
stream.Readable a stream of geojson featuresto
CarmenSource the interface to the index's destinatonoptions
Object optionsoptions.zoom
number the max zoom level for the indexoptions.output
stream.Writable the output stream foroptions.tokens
PatternReplaceMap a pattern-based string replacement specification
callback
function A callback function
Generate summary statistics about a source. Used by bin/carmen-analyze.js
.
Summary stats include:
statistic | type | description |
---|---|---|
total |
number | the total number of grids |
byScore |
Object<string, number> | grid counts, grouped by grid score value (from "1" to "6" ) |
byRelev |
Object<string, number> | grid counts, grouped by grid revelance. group labels reflect the relevance value, rounded to the nearest tenth ("0.4", "0.6", "0.8", "1.0") |
source
CarmenSource a source whose indexing is completecallback
function a callback function
Returns function (object) output of callback(stats)
The various utility functions for preparing text while indexing and querying.
lib/text-processing/token.js:31-137
An individual pattern-based replacement configuration.
Type: Object
named
boolean does the pattern use a named capturing group?from
RegExp pattern to match in a stringto
string replacement string (possibly including group references)
lib/text-processing/token.js:260-275
Create an array of ReplaceRules from a PatternReplaceMap.
tokens
PatternReplaceMap a pattern-based string replacement specification
Returns Array<ReplaceRule> an array of rules for replacing substrings
lib/text-processing/token.js:31-137
Create a per-token replacer
tokens
object tokensinverseOpts
object options for inverting token replacementsinverseOpts.includeUnambiguous
object options for
Returns Array<ReplaceRule> an array of replace rules