-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update EDTF regex to be more lenient, or use custom format instead #421
Comments
@retorquere - sorry, I should have moved the thread here earlier.
This makes sense probably. In any case, a regex doesn't seem practical. Here's what it would like in the schema, except "edtf-string": {
"title": "EDTF date string",
"description": "CSL input supports EDTF, levels 0 and 1.",
"type":"string",
"format": "edtf/level_0+1"
} It's seems like this just shouldn't go in the schema, and validation can be recommended independently somehow. Any suggestions @inukshuk? |
This may not be the ideal long-term solution, but a regex isn't well-suited to validating an edtf string. In the future, we may want to use a dedicated 'format' property here. See #421
This may not be the ideal long-term solution, but a regex isn't well-suited to validating an edtf string. In the future, we may want to use a dedicated 'format' property here. See #421
Then it looks like it's out of spec: Additional schema keywords and schema vocabularies MAY be defined by any entity. Save for explicit agreement, schema authors SHALL NOT expect these additional keywords and vocabularies to be supported by implementations that do not explicitly document such support. Implementations SHOULD ignore keywords they do not support. |
so I'd say it'd be a good thing to ship a schema with this extension, certainly if we provide a reference implementation that does do validation. |
A problem I've noticed in my dabbling with the JSON schema ecosystem is it's not very consistent. So it's not always enough to do things according to spec. Some questions we'll want to answer:
On 1, at least https://www.jsonschemavalidator.net/ ignores the "format" property. In any case, I'll leave this issue open and we can figure out WDT as we get closer to a release (including agreement among developers about 1.1 in general). I agree, though, we ideally want this:
... which would also mean adding back that "format" property, once we settle what it's value should be, etc.. |
for ajv, this would work. I could look at other json-schema implementations. This implementation could be shipped as a module that adds the validator to ajv by simply requiring the module, it's a common way to ship ajv extensions.
|
What would that look like? Care to submit a PR? |
I was wrong on the way to require it in for ajv, but I've included tests which show how to apply the formats for 4 json-schema validators. There are a ton on npmjs, I can add more implementations, but these 4 seemed to be the fastest so they should be popular. |
alternately this could be spun off as it's own npm module, or offered to @inukshuk for inclusion in EDTF.js |
BTW, I just tested the "format" property with this supposedly fast and compliant go validator, and it correctly ignored it. |
This looks nice! Where you thinking of bundling a As for the validation, the levels 0-2 are strictly in order; in the few cases where grammar rules of different levels can both produce a given string, the parser will always report the lower level. That is to say, I don't think it's necessary to define level ranges like 0+1, just specifying the highest level should work as well. Similarly, if you set the parser to a certain level, e.g. Finally, there was the old issue of season intervals. Those are currently not supported by the spec, but edtf.js supports them, specifically because @retorquere and others rightly argued that they are important for bibliographic use. Season intervals are defined at level 3 so we'd have to make up our mind how to handle those during validation. |
I think that would be the best option, really. There's currently 4 supported json-schema libraries that could mostly transfer to edtf.js as-is, and adding new ones looks to be trivial.
I've updated the implementation to support I'd be happy to submit a PR on EDTF.js. The test script (or something like it) on this repo could perhaps remain to show how to use it for this schema.
It's for this reason I've included the |
Oh, on re-reading your comment: something further, I was thinking to move the ready-to-run schema format functions now in the PR to EDTF.js |
Sounds good, let's do that! |
Is there an easy way to say "level 1 with season-range support, but not level 2"? Right now I'm walking the parse results to test for level detected, but I'm very likely re-doing work already done in EDTF.js. |
In terms of efficiency, that's not a problem. EDTF.js uses an Earley parser which can produce multiple valid derivations for the same string. So what the Since 'Level 3' doesn't actually exist I wonder if we shouldn't use a different syntax to say that we want to either include extensions or just season intervals in particular? I.e., the validation itself would stay the same for the time being, just that we use a different format to express 'edtf level 1 + season intervals'. |
Just "edtf/level-0+level-1+seasons"? So +-delimited flags, which can either be full levels, or feature components of those levels? PS - seasons are definitely important for this use case. |
Technically possible, but it'd mean implementing all format permutations. I reckon 0, 0+season-ranges, 1, 1+season-ranges, 2, 2+season-ranges would suffice for json schema formats, and for the parser, something like { level: 1, seasonRanges: true} should do. |
The EDTF spec leaves some room when describing compliance. E.g., "Level 1 is supported, and in addition the following features of level 2 are supported (list features)." so I think this is definitely preferable over using the non-existent level 3. However, I'd definitely go with "edtf/level-1+seasons-intervals" because level 1 includes level 0 already, there's no reason to declare it explicitly. Perhaps we can use something shorter than "season-intervals"; just "seasons" might be misleading, because I might want to specify something like "edtf/level-0+seasons" (since seasons is a level 1 feature). |
This way of encoding it in the format string is not the best way though. The implementations I know support type string+a fixed number of explicit formats, or just a new keyword, and then something like |
Does that mean one can use an object for the |
Oh yes, passing it as an extra option would be much preferable. |
I don't think string+format allows for format to be an object. We'd be looking at a custom keyword, which has varying support in validator libraries. |
This allow use of an edtf.js-based validator on the dates. https://github.com/retorquere/json-schema-edtf The json schema spec says tools should ignore the property if they don't know what to do with its value. Close #421
This allow use of an edtf.js-based validator on the dates. https://github.com/retorquere/json-schema-edtf The json schema spec says tools should ignore the property if they don't know what to do with its value. Close #421
While working on #420, I realize the EDTF regex is too strict. In that case, I noticed it doesn't allow date-times, but I'm sure there are others, and my regex skills are too limited to feel comfortable tackling it on that PR.
See also citation-style-language/documentation#145
See #420 (comment)
So really the question is what, more lenient, regex will work here?
The text was updated successfully, but these errors were encountered: