Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement borrowing #3044

Closed
wants to merge 20 commits into from
Closed

implement borrowing #3044

wants to merge 20 commits into from

Conversation

MaxBrandtner
Copy link

This commit adds borrowing to the jv c-api

As discussed this adds a borrowed flag in the jv struct.

Adds the following functions
jv jv_borrow(jv)
jv jv_is_borrowed(jv)

Changes jv_copy() and jv_free() to return early if the jv is borrowed

still need to add documentation for the two functions

@MaxBrandtner
Copy link
Author

The documentation is now already upstreamed(seeing as you can apparently not create a pr on the docs, but edit them directly)

Would still be nice if someone reviewed it to check that there are no issues with it

@@ -33,7 +33,8 @@ struct jv_refcnt;
Really. Do not play with them. */
typedef struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot use bitfields in a public struct; their order is not defined; the compiler can order however it wants.

Copy link
Member

@emanuele6 emanuele6 Feb 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If libjq and the program that uses libjq are compiled with different compilers there are no guarantees of what bits will be used, so this is likely to not work.
Even if they are compiled with the same compiler, it may still not work because the order of bit-fields needs to only be consistent in individual compilation units; the compiler may feel like ordering them differently for optimisation reasons.

At most you can have a public struct with bit-fields, and required that the user does not touch those fields manually, so they are only used by libjq functions, but that is an error-prone, and confusing way to define the layout of a public struct in my opinion. (Also you would have to make sure that all the functions that use those bit-fields are in the same compilation unit.)
And in this case it is not possible since there are static functions defined in jv.h that access bit-fields directly without calling to libjq.

Bit-fields don't just subdivide the bits of a struct in the specified order, like regular fields do for bytes: they are different, and their order is not enforced or stable. Only really suitable for internal structs. They are not a very useful C feature because of this.
The only way to do something similar to this with a public struct is with manual masking, maybe facilitated by macros.

Copy link
Member

@emanuele6 emanuele6 Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that all compilers I tried were suspiciously all using the same layout for bit-fields.
I just learned that apparently the SysV ABI (3.1.2; for amd64) specifies how bit-fields must be layed out in memory for public structs; they need to be layed out in order like for regular fields dividing the byte from most significant bit to least significant bit. Interesting!
Still you probably should avoid using bit fields in a public struct since standard C does not support them, and libjq does not only target amd64 on UNIX-like systems (I didn't bother to check whether other standard ABIs support bit-fields in structs).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot use bitfields in a public struct; their order is not defined; the compiler can order however it wants.

C99 says, § 6.7.2.1, constraint 10:

  1. An implementation may allocate any addressable storage unit large enough to hold a bit-
    field. If enough space remains, a bit-field that immediately follows another bit-field in a
    structure shall be packed into adjacent bits of the same unit. If insufficient space remains,
    whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is
    implementation-defined. The order of allocation of bit-fields within a unit (high-order to
    low-order or low-order to high-order) is implementation-defined. The alignment of the
    addressable storage unit is unspecified.

So, yes, we can't have bit-fields in public structures when either those bits are meant to be accessed by the application or by inline functions or macros exported by the header. But, as you note, platforms often add their own specifications around some of these issues in their ABIs. SysV, for example, does, and so does Windows:

Bit fields defined as int are treated as signed. A Microsoft extension to the ANSI C standard allows char and long types (both signed and unsigned) for bit fields. Unnamed bit fields with base type long, short, or char (signed or unsigned) force alignment to a boundary appropriate to the base type.

Bit fields are allocated within an integer from least-significant to most-significant bit.

The Windows ABI convention packs bit fields into single storage integers, and doesn't straddle storage units.

Which takes care of all concerns here for Windows. I believe we'll find the same is true for Solaris/Illumos, Linux, and OS X. I believe all of those follow SVID to some degree in their ABIs, except maybe OS X.

That said, I've no problem with using a bit of pad_ w/o using bit fields just in case.

Copy link
Member

@emanuele6 emanuele6 Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, but what about non-amd64 architectures?

Anyway, those bit-fields should be replaced with masking with get/set macros, yes.
Also, I suggest using the most signficant bit for the boolean flag and the least significant bits for the integer instead of the other way around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get the rest of this PR in good enough shape to rebase-and-merge and then we can resolve this comment.

Copy link
Author

@MaxBrandtner MaxBrandtner Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If bitfields are an issue we could use mod operations instead and perhaps rename the pad_ var to bitfield

I would like to avoid that solution though as a.borrowed is more intuitive than a.bitfield % 2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw that mongoose uses bitfields as well. Seeing as mongoose is quite widely used and they showcase it being used on different architectures, I do not think bitfields will be an issue.

Comment on lines +58 to +59
static inline jv jv_copy(jv a){if(a.borrowed) return a; return jv__copy(a);}
static inline void jv_free(jv a){if(!a.borrowed) jv__free(a);}
Copy link
Member

@emanuele6 emanuele6 Feb 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making those functions static wrechs the api, now the jv_copy and jv_free symbols don't exist anymore.
Also likely to be broken since a static function imported into different compilation units is accessing a bitfield without calling libjq to let it do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making these static inline was my suggestion. We can still export the original jv_copy()/jv_free() from libjq using a bit of pre-processor conditionals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to address this. When this PR is good shape I'll propose a patch.

src/jv.h Outdated Show resolved Hide resolved
@nicowilliams
Copy link
Contributor

The commit needs a longer message showing what is the intended use of this.

@nicowilliams
Copy link
Contributor

Thanks by the way. This will make jv functions more ergonomic in out-of-tree applications.

@nicowilliams
Copy link
Contributor

Hmm, this feature should be awesome, but we need to be very careful. Consider this:

  o = jv_object_set(jv_copy(o), jv_copy(k), jv_copy(v));

What happens if all of o, k, and v are borrowed and have reference count == 1?

Well, we do get a new object, so if after the borrow is undone the original o is jv_free()ed elsewhere, that's not so bad, but if you look at jvp_object_unshare() you'll see that we do additional jv_copy()s there, and so...

I strongly suspect that this just isn't going to work unless we use a variant of jv_copy()/jv_free() that ignores the borrowed flag and use those judiciously. Basically, whenever we take a reference but don't actually store the referenced copy anywhere that will outlive the local scope, then we can use the modified jv_copy()/jv_free(), but whenever we'll store the "copy" in any slot that will outlive the current local scope then we must actually increment the reference count. And that brings us to:

  • this PR needs a test
  • and changes to jv_object_set()/jv_object_delete() (or the corresponding jvp_object_*() functions), and jv_array_set() to make this safe

@MaxBrandtner
Copy link
Author

MaxBrandtner commented Feb 20, 2024

I strongly suspect that this just isn't going to work unless we use a variant of jv_copy()/jv_free() that ignores the borrowed flag and use those judiciously. Basically, whenever we take a reference but don't actually store the referenced copy anywhere that will outlive the local scope, then we can use the modified jv_copy()/jv_free(), but whenever we'll store the "copy" in any slot that will outlive the current local scope then we must actually increment the reference count. And that brings us to:

I guess jv__copy() could be used in such cases, but maybe it would be better to rename the original jv__copy() function to something more descriptive

Should I add a jv_unborrow() function that increments the reference count and returns a jv with the borrow flag unset? This would behave like jv_copy() only that it also resolves the issue regarding borrowed elements being added to an unborrowed jv

@nicowilliams
Copy link
Contributor

Should I add a jv_unborrow() function that increments the reference count and returns a jv with the borrow flag unset? This would behave like jv_copy() only that it also resolves the issue regarding borrowed elements being added to an unborrowed jv

Yes, I think that's a good name. The rule would be that any time you want a dynamic scope reference you use jv_copy(), and any time you want an indefinite lifetime reference you use jv_unborrow().

@MaxBrandtner
Copy link
Author

@nicowilliams would the borrow issue just apply to jv_object_set() or also to other functions

@MaxBrandtner
Copy link
Author

I did not find jv_copy() in jv_object_set()

Here are all the functions that return jv or some form of jvp that contain jv_copy() where it could be relevant

static jv jvp_object_unshare(jv); 
jv jv_array_get(jv, int);
static jv* jvp_array_write(jv*, int);
jv jv_invalid_get_msg(jv);

@MaxBrandtner
Copy link
Author

are there any functions outside jv.c that need jv_unborrow?

@nicowilliams
Copy link
Contributor

@nicowilliams would the borrow issue just apply to jv_object_set() or also to other functions

It would apply to all container mutation functions. There's only arrays and objects as containers. You should arrange that the values stored by jv_object_set() and jv_array_set() are always non-borrowed, which means they'll automatically always be "copied" when you jv_object_get(), jv_array_get(), or iterate them.

@nicowilliams
Copy link
Contributor

are there any functions outside jv.c that need jv_unborrow?

Until we start making use of jv_borrow() in src/execute.c, then no, I don't think so. If we start making use of jv_borrow() in src/execute.c then everything to do with setting $bindings in the VM will have to unborrow.

@nicowilliams
Copy link
Contributor

I did not find jv_copy() in jv_object_set()

Here are all the functions that return jv or some form of jvp that contain jv_copy() where it could be relevant

static jv jvp_object_unshare(jv); 
jv jv_array_get(jv, int);
static jv* jvp_array_write(jv*, int);
jv jv_invalid_get_msg(jv);

See earlier comment. As long as you store unborrowed values in containers, then you don't have to alter the container read/iteration paths.

(Ah, yes, of course, jv_invalid_with_msg() is a container too, albeit of just one value.)

@MaxBrandtner
Copy link
Author

what edge-cases do you think are relevant to test for in the test. Of course we should test the main use-case(skipping jv_copy() and jv_free()), as well as perhaps jv_object_set() and jv_array_set(). Is there anything I'm missing?

@nicowilliams
Copy link
Contributor

what edge-cases do you think are relevant to test for in the test. Of course we should test the main use-case(skipping jv_copy() and jv_free()), as well as perhaps jv_object_set() and jv_array_set(). Is there anything I'm missing?

I think you want to write a test where you do build objects/arrays/invalids-with-messages with keys/values that are borrowed, then later free everything, and when run with ASAN/valgrind this should yield no errors. You should also pass borrowed values to other jv_*() functions, like jv_sort(), jv_cmp(), jv_array/object_get(), etc, and again, there should be no memory errors.

@MaxBrandtner
Copy link
Author

MaxBrandtner commented Feb 26, 2024

I think the following functions should be checked

jv_sort()
jv_cmp() //unsure
JV_ARRAY()
jv_invalid_get_msg() //unsure
jv_invalid_with_msg() //unsure
jv_equal()
jv_getpath()
jv_keys()
jv_setpath()
jv_array_append()
jv_array_concat()
jv_array_get()
jv_array_length() //unsure
jv_object_get()

Are there any other functions (particularly jv_array_ / object functions that should be tested)?

@MaxBrandtner
Copy link
Author

jv_array_concat() and jv_array_append() produce borrowed outputs. jv_setpath() produces memory errors

@MaxBrandtner
Copy link
Author

well it seem as if to integrate jv_borrow() instead of just returning the input as the output(and basically assuming the input to be consumed), at any point were that occurs jv_unborrow() needs to be called if the jv is borrowed and return the value directly otherwise.

Or jv_unborrow() could just consume its input. Meaning wherever in the source code there is a function like this

jv func(jv i){
  //modify i
  return i;
}

return jv_unborrow(i); should be called instead

src/jv.c Outdated Show resolved Hide resolved
src/jv.c Outdated Show resolved Hide resolved
src/jv.c Outdated Show resolved Hide resolved
src/jv.c Outdated Show resolved Hide resolved
src/jv.c Show resolved Hide resolved
src/jv.c Outdated Show resolved Hide resolved
src/jv.c Outdated Show resolved Hide resolved
src/jv_aux.c Outdated Show resolved Hide resolved
src/jv_aux.c Outdated Show resolved Hide resolved
src/jv_aux.c Outdated Show resolved Hide resolved
@MaxBrandtner
Copy link
Author

The test should definitely be refactored

@MaxBrandtner
Copy link
Author

Perhaps

  • jq_set_attrs()
  • jq_set_attr()
  • jq_get_attr()
  • jq_halt()
  • jq_start()

need to also be adjusted

@MaxBrandtner
Copy link
Author

Actually, jv_return() only needs to be used in a very limited number of circumstances

  • when using jvp
  • when returning a jv* to which the borrowed jv was added
  • when returning an struct to which the borrowed jv was added
  • when returning the borrowed jv

@nicowilliams
Copy link
Contributor

Perhaps

* jq_set_attrs()

* jq_set_attr()

* jq_get_attr()

* jq_halt()

* jq_start()

need to also be adjusted

I don't think so. Those ultimately use arrays/objects/invalids to store their values.

@MaxBrandtner
Copy link
Author

MaxBrandtner commented Feb 29, 2024

Also what checks tests should be in the final pr?
I think the following would be good to check

  • setting jv_array_set / jv_object_set
  • borrow safety(aka assigning a jv to an array or object, then freeing it and seeing if the value is still accessible in the array/object)
  • I guess we could check for every function that does some borrow safety stuff whether it is still borrow safe, but that would make the check quite long

@MaxBrandtner
Copy link
Author

I rewrote the check now to check the following functions comprehensively

  • jv_copy

  • jv_borrow

  • jv_free

  • jv_object_set

  • jv_array_set

  • jv_invalid_with_msg

  • jv_string_concat

  • jv_string_append_buf

@MaxBrandtner
Copy link
Author

I think this PR probably should be officially NACKed. While I still think there might be performance benefits to not writing / reading a pointer all the time, this proposal adds way too much complexity to be merged into the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants