Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obscure schema validation error in case of missing required properties #242

Closed
kshpytsya opened this issue Feb 23, 2023 · 2 comments
Closed
Labels
upstream-schema-issue An issue with a schema or schema provider

Comments

@kshpytsya
Copy link

Consider the following azure-pipelines.yml file which passes validation:

stages:
- stage: build

  jobs:
  - job: build

    steps:

    - task: HelmDeploy@0
      inputs:
        azureSubscriptionForACR: ''
        azureResourceGroupForACR: ''
        azureContainerRegistry: ''

Now, remove one of those inputs, e.g. azureContainerRegistry. Instead of a error hinting that a required property is missing the following completely arcane error which includes entire input yaml formatted as json is reported:

Schema validation errors were encountered.
  azure-pipelines.yml::$: {'stages': [{'stage': 'build', 'jobs': [{'job': 'build', 'steps': [{'task': 'HelmDeploy@0', 'inputs': {'azureSubscriptionForACR': '', 'azureResourceGroupForACR': ''}}]}]}]} is not valid under any of the given schemas
  Underlying errors caused this.
  Best Match:
    $: {'stages': [{'stage': 'build', 'jobs': [{'job': 'build', 'steps': [{'task': 'HelmDeploy@0', 'inputs': {'azureSubscriptionForACR': '', 'azureResourceGroupForACR': ''}}]}]}]} is not of type 'string'

As a developer I do understand that this may be what the underlying schema validation library reports but it can hardly be called user friendly. I believe either some workaround in this tool or, better yet, an issue opened against validation library is due.

@sirosen sirosen added the upstream-schema-issue An issue with a schema or schema provider label Feb 23, 2023
@sirosen
Copy link
Member

sirosen commented Feb 23, 2023

Hi there! Your intuition on this is correct: check-jsonschema is "just" reporting a validation error from the schema.
I'll also go ahead and say that I agree, the error we get is less-than-helpful.

However, this isn't really something for check-jsonschema to handle since we're just trying to run validation and report back the error we get, and the underlying jsonschema library already has some issues regarding how to figure out a better "best match" error for some bad cases. So we have some awareness around this problem from the CLI and library in question, but theres's a third player here: the Azure Pipelines schema. That schema is a bit of a beast, and Microsoft hasn't been very receptive to feedback on ways to improve it. Of course, the error is driven by what's in that schema, so... what to do? It's not totally clear.

One thing that you can do today is use the -v/--verbose flag for check-jsonschema. That will report more errors, not only the "best match" guess at what will help you, but all of the errors reported by evaluating the schema. The structure of the Azure Pipelines schema is such that you will see many errors reported, as various branches and conditions in the schema fail to match on your pipeline config.

I take this, from the check-jsonschema perspective, at least partly as a documentation issue.
I'll think about ways of getting users seeing issues like this directed to -v/--verbose more quickly. Any thoughts you might have on how to better document this are very much welcome!

If you look at the --verbose output, you'll find the useful error you're looking for among all of the errors reported. I encourage you to check out that output -- it will help you see how this case looks to me, maintaining the CLI, and how it looks to the jsonschema library, which walks the schema and reports all of the errors.

We can specify an alternative "best match" algorithm.
I'll think about whether or not we should implement an alternative best-match mode (something like "longest error path" to look deep into the instance doc), and possibly make that the default for the azure pipelines pre-commit hook.


I don't think there's a singular clear improvement here, but I'll continue to think about what we can do.

@sirosen
Copy link
Member

sirosen commented Aug 15, 2023

I just took a crack at adding something on this front which shows the "best deep match". You can see an example of the results here: #303

I tried a number of ways of ranking the errors and it turns out that "best error" is just a very hard thing to characterize in a generic way.

The best error is the one which relates to the user's intent. That is, suppose a schema gives options for DeployX and DeployY, then even if your input is most similar to the "DeployX" shape, if you're trying to write a "DeployY" document, your preferred errors are those from the "DeployY" branch of the schema.

Sometimes, ranking required ahead of pattern is better, sometimes it's worse. etc etc. All depending on how the schema was written and what the user is trying to do.
For now, the basic approach is to show a single "deep error" from a long path in the document, and to make sure that there's a message about the number of other errors and --verbose to see them all.

This should be shipping out in the next check-jsonschema release.

@sirosen sirosen closed this as completed Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream-schema-issue An issue with a schema or schema provider
Projects
None yet
Development

No branches or pull requests

2 participants