-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement borrowing #3044
implement borrowing #3044
Conversation
The documentation is now already upstreamed(seeing as you can apparently not create a pr on the docs, but edit them directly) Would still be nice if someone reviewed it to check that there are no issues with it |
@@ -33,7 +33,8 @@ struct jv_refcnt; | |||
Really. Do not play with them. */ | |||
typedef struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot use bitfields in a public struct; their order is not defined; the compiler can order however it wants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If libjq
and the program that uses libjq
are compiled with different compilers there are no guarantees of what bits will be used, so this is likely to not work.
Even if they are compiled with the same compiler, it may still not work because the order of bit-fields needs to only be consistent in individual compilation units; the compiler may feel like ordering them differently for optimisation reasons.
At most you can have a public struct with bit-fields, and required that the user does not touch those fields manually, so they are only used by libjq
functions, but that is an error-prone, and confusing way to define the layout of a public struct in my opinion. (Also you would have to make sure that all the functions that use those bit-fields are in the same compilation unit.)
And in this case it is not possible since there are static
functions defined in jv.h
that access bit-fields directly without calling to libjq
.
Bit-fields don't just subdivide the bits of a struct in the specified order, like regular fields do for bytes: they are different, and their order is not enforced or stable. Only really suitable for internal structs. They are not a very useful C feature because of this.
The only way to do something similar to this with a public struct is with manual masking, maybe facilitated by macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that all compilers I tried were suspiciously all using the same layout for bit-fields.
I just learned that apparently the SysV ABI (3.1.2; for amd64) specifies how bit-fields must be layed out in memory for public structs; they need to be layed out in order like for regular fields dividing the byte from most significant bit to least significant bit. Interesting!
Still you probably should avoid using bit fields in a public struct since standard C does not support them, and libjq
does not only target amd64
on UNIX-like systems (I didn't bother to check whether other standard ABIs support bit-fields in structs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot use bitfields in a public struct; their order is not defined; the compiler can order however it wants.
C99 says, § 6.7.2.1, constraint 10:
- An implementation may allocate any addressable storage unit large enough to hold a bit-
field. If enough space remains, a bit-field that immediately follows another bit-field in a
structure shall be packed into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is
implementation-defined. The order of allocation of bit-fields within a unit (high-order to
low-order or low-order to high-order) is implementation-defined. The alignment of the
addressable storage unit is unspecified.
So, yes, we can't have bit-fields in public structures when either those bits are meant to be accessed by the application or by inline
functions or macros exported by the header. But, as you note, platforms often add their own specifications around some of these issues in their ABIs. SysV, for example, does, and so does Windows:
Bit fields defined as
int
are treated assigned
. A Microsoft extension to the ANSI C standard allowschar
andlong
types (bothsigned
andunsigned
) for bit fields. Unnamed bit fields with base typelong
,short
, orchar
(signed
orunsigned
) force alignment to a boundary appropriate to the base type.
Bit fields are allocated within an integer from least-significant to most-significant bit.
The Windows ABI convention packs bit fields into single storage integers, and doesn't straddle storage units.
Which takes care of all concerns here for Windows. I believe we'll find the same is true for Solaris/Illumos, Linux, and OS X. I believe all of those follow SVID to some degree in their ABIs, except maybe OS X.
That said, I've no problem with using a bit of pad_
w/o using bit fields just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, but what about non-amd64 architectures?
Anyway, those bit-fields should be replaced with masking with get/set macros, yes.
Also, I suggest using the most signficant bit for the boolean flag and the least significant bits for the integer instead of the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get the rest of this PR in good enough shape to rebase-and-merge and then we can resolve this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If bitfields are an issue we could use mod operations instead and perhaps rename the pad_ var to bitfield
I would like to avoid that solution though as a.borrowed is more intuitive than a.bitfield % 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw that mongoose uses bitfields as well. Seeing as mongoose is quite widely used and they showcase it being used on different architectures, I do not think bitfields will be an issue.
static inline jv jv_copy(jv a){if(a.borrowed) return a; return jv__copy(a);} | ||
static inline void jv_free(jv a){if(!a.borrowed) jv__free(a);} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making those functions static wrechs the api, now the jv_copy
and jv_free
symbols don't exist anymore.
Also likely to be broken since a static
function imported into different compilation units is accessing a bitfield without calling libjq
to let it do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making these static inline
was my suggestion. We can still export the original jv_copy()
/jv_free()
from libjq using a bit of pre-processor conditionals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to address this. When this PR is good shape I'll propose a patch.
The commit needs a longer message showing what is the intended use of this. |
Thanks by the way. This will make |
Hmm, this feature should be awesome, but we need to be very careful. Consider this: o = jv_object_set(jv_copy(o), jv_copy(k), jv_copy(v)); What happens if all of Well, we do get a new object, so if after the borrow is undone the original I strongly suspect that this just isn't going to work unless we use a variant of
|
I guess jv__copy() could be used in such cases, but maybe it would be better to rename the original jv__copy() function to something more descriptive Should I add a jv_unborrow() function that increments the reference count and returns a jv with the borrow flag unset? This would behave like jv_copy() only that it also resolves the issue regarding borrowed elements being added to an unborrowed jv |
Yes, I think that's a good name. The rule would be that any time you want a dynamic scope reference you use |
@nicowilliams would the borrow issue just apply to jv_object_set() or also to other functions |
I did not find jv_copy() in jv_object_set() Here are all the functions that return jv or some form of jvp that contain jv_copy() where it could be relevant static jv jvp_object_unshare(jv);
jv jv_array_get(jv, int);
static jv* jvp_array_write(jv*, int);
jv jv_invalid_get_msg(jv); |
are there any functions outside jv.c that need jv_unborrow? |
It would apply to all container mutation functions. There's only arrays and objects as containers. You should arrange that the values stored by |
Until we start making use of |
See earlier comment. As long as you store unborrowed values in containers, then you don't have to alter the container read/iteration paths. (Ah, yes, of course, |
what edge-cases do you think are relevant to test for in the test. Of course we should test the main use-case(skipping jv_copy() and jv_free()), as well as perhaps jv_object_set() and jv_array_set(). Is there anything I'm missing? |
I think you want to write a test where you do build objects/arrays/invalids-with-messages with keys/values that are borrowed, then later free everything, and when run with ASAN/valgrind this should yield no errors. You should also pass borrowed values to other |
I think the following functions should be checked jv_sort()
jv_cmp() //unsure
JV_ARRAY()
jv_invalid_get_msg() //unsure
jv_invalid_with_msg() //unsure
jv_equal()
jv_getpath()
jv_keys()
jv_setpath()
jv_array_append()
jv_array_concat()
jv_array_get()
jv_array_length() //unsure
jv_object_get() Are there any other functions (particularly jv_array_ / object functions that should be tested)? |
|
well it seem as if to integrate Or jv_unborrow() could just consume its input. Meaning wherever in the source code there is a function like this jv func(jv i){
//modify i
return i;
}
|
The test should definitely be refactored |
Perhaps
need to also be adjusted |
Actually,
|
I don't think so. Those ultimately use arrays/objects/invalids to store their values. |
Also what checks tests should be in the final pr?
|
I rewrote the check now to check the following functions comprehensively
|
I think this PR probably should be officially NACKed. While I still think there might be performance benefits to not writing / reading a pointer all the time, this proposal adds way too much complexity to be merged into the project. |
This commit adds borrowing to the jv c-api
As discussed this adds a borrowed flag in the jv struct.
Adds the following functions
jv jv_borrow(jv)
jv jv_is_borrowed(jv)
Changes jv_copy() and jv_free() to return early if the jv is borrowed
still need to add documentation for the two functions