Skip to content

Commit

Permalink
Merge pull request #305 from python-jsonschema/explicit-base-uri
Browse files Browse the repository at this point in the history
Add support for passing an explicit base URI
  • Loading branch information
sirosen authored Aug 25, 2023
2 parents 86d4b00 + 1695d47 commit 2260cb8
Show file tree
Hide file tree
Showing 5 changed files with 152 additions and 70 deletions.
138 changes: 74 additions & 64 deletions docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,22 @@ Detailed helptext is always available interactively via
the error and exit. Use ``--traceback-mode full`` to request the full traceback
be printed, for debugging and troubleshooting.

Other Schema Options
--------------------
Environment Variables
---------------------

The following environment variables are supported.

.. list-table:: Environment Variables
:widths: 15 30
:header-rows: 1

* - Name
- Description
* - ``NO_COLOR``
- Set ``NO_COLOR=1`` to explicitly turn off colorized output.

Schema Selection Options
------------------------

No matter what usage form is used, a schema must be specified.

Expand Down Expand Up @@ -113,68 +127,6 @@ The following options control caching behaviors.
- The name to use for caching a remote schema.
Defaults to using the last slash-delimited part of the URI.

Environment Variables
---------------------

The following environment variables are supported.

.. list-table:: Environment Variables
:widths: 15 30
:header-rows: 1

* - Name
- Description
* - ``NO_COLOR``
- Set ``NO_COLOR=1`` to explicitly turn off colorized output.

Parsing Options
---------------

``--default-filetype``
~~~~~~~~~~~~~~~~~~~~~~

The default filetype to assume on instance files when they are detected neither
as JSON nor as YAML.

For example, pass ``--default-filetype yaml`` to instruct that files which have
no extension should be treated as YAML.

By default, this is not set and files without a detected type of JSON or YAML
will fail.

``--data-transform``
~~~~~~~~~~~~~~~~~~~~

``--data-transform`` applies a transformation to instancefiles before they are
checked. The following transforms are supported:

- ``azure-pipelines``:
"Unpack" compile-time expressions for Azure Pipelines files, skipping them
for the purposes of validation. This transformation is based on Microsoft's
lanaguage-server for VSCode and how it handles expressions

- ``gitlab-ci``:
Handle ``!reference`` tags in YAML data for gitlab-ci files. This transform
has no effect if the data is not being loaded from YAML, and it does not
interpret ``!reference`` usages -- it only expands them to lists of strings
to pass schema validation

``--fill-defaults``
-------------------

JSON Schema specifies the ``"default"`` keyword as potentially meaningful for
consumers of schemas, but not for validators. Therefore, the default behavior
for ``check-jsonschema`` is to ignore ``"default"``.

``--fill-defaults`` changes this behavior, filling in ``"default"`` values
whenever they are encountered prior to validation.

.. warning::

There are many schemas which make the meaning of ``"default"`` unclear.
In particular, the behavior of ``check-jsonschema`` is undefined when multiple
defaults are specified via ``anyOf``, ``oneOf``, or other forms of polymorphism.

"format" Validation Options
---------------------------

Expand Down Expand Up @@ -253,3 +205,61 @@ follows:
always passes. Otherwise, check validity in the python engine.
* - python
- Require the regex to be valid in python regex syntax.

Other Options
--------------

``--default-filetype``
~~~~~~~~~~~~~~~~~~~~~~

The default filetype to assume on instance files when they are detected neither
as JSON nor as YAML.

For example, pass ``--default-filetype yaml`` to instruct that files which have
no extension should be treated as YAML.

By default, this is not set and files without a detected type of JSON or YAML
will fail.

``--data-transform``
~~~~~~~~~~~~~~~~~~~~

``--data-transform`` applies a transformation to instancefiles before they are
checked. The following transforms are supported:

- ``azure-pipelines``:
"Unpack" compile-time expressions for Azure Pipelines files, skipping them
for the purposes of validation. This transformation is based on Microsoft's
lanaguage-server for VSCode and how it handles expressions

- ``gitlab-ci``:
Handle ``!reference`` tags in YAML data for gitlab-ci files. This transform
has no effect if the data is not being loaded from YAML, and it does not
interpret ``!reference`` usages -- it only expands them to lists of strings
to pass schema validation

``--fill-defaults``
~~~~~~~~~~~~~~~~~~~

JSON Schema specifies the ``"default"`` keyword as potentially meaningful for
consumers of schemas, but not for validators. Therefore, the default behavior
for ``check-jsonschema`` is to ignore ``"default"``.

``--fill-defaults`` changes this behavior, filling in ``"default"`` values
whenever they are encountered prior to validation.

.. warning::

There are many schemas which make the meaning of ``"default"`` unclear.
In particular, the behavior of ``check-jsonschema`` is undefined when multiple
defaults are specified via ``anyOf``, ``oneOf``, or other forms of polymorphism.

``--base-uri``
~~~~~~~~~~~~~~

``check-jsonschema`` defaults to using the ``"$id"`` of the schema as the base
URI for ``$ref`` resolution, falling back to the retrieval URI if ``"$id"`` is
not set.

``--base-uri`` overrides this behavior, setting a custom base URI for ``$ref``
resolution.
21 changes: 18 additions & 3 deletions src/check_jsonschema/cli/main_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,14 @@ def pretty_helptext_list(values: list[str] | tuple[str, ...]) -> str:
"it will be downloaded and cached locally based on mtime."
),
)
@click.option(
"--base-uri",
help=(
"Override the base URI for the schema. The default behavior is to "
"follow the behavior specified by the JSON Schema spec, which is to "
"prefer an explicit '$id' and failover to the retrieval URI."
),
)
@click.option(
"--builtin-schema",
help="The name of an internal schema to use for '--schemafile'",
Expand Down Expand Up @@ -212,6 +220,7 @@ def main(
*,
schemafile: str | None,
builtin_schema: str | None,
base_uri: str | None,
check_metaschema: bool,
no_cache: bool,
cache_filename: str | None,
Expand All @@ -230,6 +239,7 @@ def main(
args = ParseResult()

args.set_schema(schemafile, builtin_schema, check_metaschema)
args.base_uri = base_uri
args.instancefiles = instancefiles

normalized_disable_formats: tuple[str, ...] = tuple(
Expand Down Expand Up @@ -264,13 +274,18 @@ def main(

def build_schema_loader(args: ParseResult) -> SchemaLoaderBase:
if args.schema_mode == SchemaLoadingMode.metaschema:
return MetaSchemaLoader()
return MetaSchemaLoader(base_uri=args.base_uri)
elif args.schema_mode == SchemaLoadingMode.builtin:
assert args.schema_path is not None
return BuiltinSchemaLoader(args.schema_path)
return BuiltinSchemaLoader(args.schema_path, base_uri=args.base_uri)
elif args.schema_mode == SchemaLoadingMode.filepath:
assert args.schema_path is not None
return SchemaLoader(args.schema_path, args.cache_filename, args.disable_cache)
return SchemaLoader(
args.schema_path,
args.cache_filename,
args.disable_cache,
base_uri=args.base_uri,
)
else:
raise NotImplementedError("no valid schema option provided")

Expand Down
1 change: 1 addition & 0 deletions src/check_jsonschema/cli/parse_result.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ def __init__(self) -> None:
# primary options: schema + instances
self.schema_mode: SchemaLoadingMode = SchemaLoadingMode.filepath
self.schema_path: str | None = None
self.base_uri: str | None = None
self.instancefiles: tuple[str, ...] = ()
# cache controls
self.disable_cache: bool = False
Expand Down
22 changes: 19 additions & 3 deletions src/check_jsonschema/schema_loader/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,13 @@ def __init__(
schemafile: str,
cache_filename: str | None = None,
disable_cache: bool = False,
base_uri: str | None = None,
) -> None:
# record input parameters (these are not to be modified)
self.schemafile = schemafile
self.cache_filename = cache_filename
self.disable_cache = disable_cache
self.base_uri = base_uri

# if the schema location is a URL, which may include a file:// URL, parse it
self.url_info = None
Expand Down Expand Up @@ -104,7 +106,10 @@ def get_schema_retrieval_uri(self) -> str | None:
return self.reader.get_retrieval_uri()

def get_schema(self) -> dict[str, t.Any]:
return self.reader.read_schema()
data = self.reader.read_schema()
if self.base_uri is not None:
data["$id"] = self.base_uri
return data

def get_validator(
self,
Expand Down Expand Up @@ -145,18 +150,29 @@ def get_validator(


class BuiltinSchemaLoader(SchemaLoader):
def __init__(self, schema_name: str) -> None:
def __init__(self, schema_name: str, base_uri: str | None = None) -> None:
self.schema_name = schema_name
self.base_uri = base_uri
self._parsers = ParserSet()

def get_schema_retrieval_uri(self) -> str | None:
return None

def get_schema(self) -> dict[str, t.Any]:
return get_builtin_schema(self.schema_name)
data = get_builtin_schema(self.schema_name)
if self.base_uri is not None:
data["$id"] = self.base_uri
return data


class MetaSchemaLoader(SchemaLoaderBase):
def __init__(self, base_uri: str | None = None) -> None:
if base_uri is not None:
raise NotImplementedError(
"'--base-uri' was used with '--metaschema'. "
"This combination is not supported."
)

def get_validator(
self,
path: pathlib.Path,
Expand Down
40 changes: 40 additions & 0 deletions tests/acceptance/test_remote_ref_resolution.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,3 +141,43 @@ def test_ref_resolution_does_not_callout_for_absolute_ref_to_retrieval_uri(
assert result.exit_code == 0, output
else:
assert result.exit_code == 1, output


# this test ensures that `$id` is overwritten when `--base-uri` is used
@pytest.mark.parametrize("check_passes", (True, False))
def test_ref_resolution_with_custom_base_uri(run_line, tmp_path, check_passes):
retrieval_uri = "https://example.org/retrieval-and-in-schema-only/schemas/main"
explicit_base_uri = "https://example.org/schemas/main"
main_schema = {
"$id": retrieval_uri,
"$schema": "http://json-schema.org/draft-07/schema",
"properties": {
"title": {"$ref": "./title_schema.json"},
},
"additionalProperties": False,
}
title_schema = {"type": "string"}

responses.add("GET", retrieval_uri, json=main_schema)
responses.add(
"GET", "https://example.org/schemas/title_schema.json", json=title_schema
)

instance_path = tmp_path / "instance.json"
instance_path.write_text(json.dumps({"title": "doc one" if check_passes else 2}))

result = run_line(
[
"check-jsonschema",
"--schemafile",
retrieval_uri,
"--base-uri",
explicit_base_uri,
str(instance_path),
]
)
output = f"\nstdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
if check_passes:
assert result.exit_code == 0, output
else:
assert result.exit_code == 1, output

0 comments on commit 2260cb8

Please sign in to comment.