-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect validation of documents with arrays of strings in multi-fields #1971
Comments
Is that a snippet from |
When synthetic is enabled |
IINM, fields always returns an array, even if the source document was sent with a scalar value. Also, maybe we should consistently have the same behavior regardless of whether synthetic _source is used and always fetch from |
The error is a bit misleading. It really comes from |
No, this isn't coming from LogsDB. I suspect this is coming from the |
Ah yes, could be, another difference on the failing build is that all packages are tested against 8.15.0-SNAPSHOT, and then they use |
Yes, I can confirm the problem is in the It defines a number of keyword fields with the
But the multifield is not defined in ECS:
So I think we should:
|
Updated description |
We have found that some cases are caused by an issue on how we handle definitions of nested fields: elastic/package-spec#784 |
There are some errors that look like they come from when validating fields in transforms. Currently, that validation always runs with normalization: elastic-package/internal/testrunner/runners/system/tester.go Lines 1838 to 1842 in 7b03a66
Example of errors found in
Those error validating for As a note, Created PR to disable normalization when synthetics is enabled when validating fields in transforms: #2011 |
For the OTOH, one could argue that we're adding unnecessary overhead by adding a text multi-field to fields that don't require one. And deciding whether or not the tradeoff to add a subfield is worth should be made in the ECS repo. Maybe, what we should have done differently, is to actually propose changing ECS itself to be more consistent so that all
Such a test already exists: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/stack/src/javaRestTest/java/org/elasticsearch/xpack/stack/EcsDynamicTemplatesIT.java However, that test doesn't fail if there are additional multi-fields compared to what's defined in ECS: https://github.com/elastic/elasticsearch/blob/b7b1872dfac518a36fad97cf824b2348ed815ff7/x-pack/plugin/stack/src/javaRestTest/java/org/elasticsearch/xpack/stack/EcsDynamicTemplatesIT.java#L454-L456
Could we instead have a similar logic in the tests that allows additional |
Tested to run the same Elastic stack (8.15.0-SNAPSHOT) with and without enabling LogsDB , and validation for this field just fails with logsdb when checking the documents from the transform. Checking the documents returned in elastic-package by the preview api, that field just appears when enabling logsdb. So according to that, it is expected, that this is not failing when logsdb is disabled (that field is not present in the doc). I don't know the cause those fields are just written with logsdb enabled. But it looks like packages should add that field definition in the transform fields folder. |
+1, I think we should have consistency on this. Otherwise this question will re-appear from time to time.
Ah, it looks great. As it was not detecting this case I thought there was not such a test.
We could, though I still think we should try to find consistency between ECS and |
I've created an ECS issue to discuss how we should best resolve the discrepancy: elastic/ecs#2353 |
The way I see it, a mapping is "ECS-compliant" if all fields that it contains that are also defined in ECS get the correct mapping according to ECS. So mapping Therefore, what I think is:
|
Doing a local test, that If synthetics is not enabled during system tests, that explains why it does not fail in that scenario because For scenarios with synthetic enabled, that validation would fail (that field is present in the documents) if the field definition is not present in the package (data streams and transforms). |
I want to emphasize again that I really think we should always perform the same validation logic, based on |
The problem is that we cannot perform the same validation if presence of fields in |
In elastic/integrations#10353 (comment) we found documents that were causing issues about unexpected arrays of objects, but the value is a valid array of strings. An array of strings should be considered as valid if defined as a string type.
The issue seems to happen with multi-fields and synthetic source, what seems to be frequent when logsdb mode is used.
For example:
Seems to produce this error:
Workaround in the meantime
As workaround in the meantime, package developers need to import the definition for these fields from ECS using
external: ecs
.Tasks
Tasks (see #1971 (comment))
The text was updated successfully, but these errors were encountered: