-
-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Placement New #17057
base: master
Are you sure you want to change the base?
add Placement New #17057
Conversation
Thanks for your pull request, @WalterBright! Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + dmd#17057" |
29a5064
to
4db86a4
Compare
311f2cb
to
fd23a02
Compare
I got it to work for some basic cases. Next comes extending it to more complex ones. |
6456c9a
to
c1d3813
Compare
Placement new for structs seem to be working now! |
5f00e37
to
49f4a4c
Compare
If i'm reading this right, this will allow classes in betterC, using a kind of "allocator argument" like in $OTHER_LANGUAGES? Once upon a time I made my own internal fork of dmd that forced betterC on all D code but of course that didn't last long because it couldn't compile Phobos. (I should clarify, my usecase is making a kernel with betterC and so far I have had to use a hacky mixin and alias this in order to use inheritance. Also, will this unlock the door for interfaces?) |
16cdbd6
to
448abed
Compare
448abed
to
4087cfb
Compare
4087cfb
to
902a3a9
Compare
Placement works for class objects now! |
Did you tweak it to receive pointers as arguments? I've sent several emails recently and they've all gone silent; I'm wondering if they're getting lost. I also tried commenting on the code in GitHub. |
I haven't received emails about this. I will check my spam filter. I don't know why you cannot comment on github? I thought it was you who had gone silent! Glad that's not the case. I stewed about the placement expression as pointer for a while, and figured that making it essentially a Of course, placement new will always be unsafe, but at least we can prevent memory overruns with ref parameters. |
Weird, I've sent several messages in various ways. Nar, it's really not appropriate as a ref. I feel it's absolutely inappropriate to hand it a ref to a T to be initialized... that certainly implies that the T pre-exists. And also the asymmetry with the class case really bugs me. Your error message "new must receive an lvalue" gives the wrong impression; an lvalue in my mind suggests that the value exists, but before the new, there is no value, just vacant memory. I think I've convinced myself while writing this that even T* which I initially felt attracted to is wrong, and placement new should only receive void* as input... that correctly expresses the operation. |
Glad your review is getting through this time. Some good thoughts!
|
P.S. lifetime analysis of a pointer to an object is more difficult than lifetime analysis of an object, because of the possibility of multiple pointers to the same object. |
There is no lifetime before the call to new. The lifetime begins with new, and before new, there is only a pointer to bytes in memory. The placement new expression might in the future have integrated into it some initiation of lifetime for any future lifetime analysis; it's one of the reasons I think it's important to introduce it and concentrate initialisation in an intrinsic expression. Your particular case with a union is obviously legit, but I assure you it's an edge case; every other case where you need placement new is when calling malloc() (or some similar allocator) to get memory, and then emplacing runtime values into that allocation. If you're not comfortable with void*, then consider void[] which you can use for bound checking. Your comment about lifetime analysis on any object leading IN to the new() expression is not something that makes sense to me... there is no lifetime of anything to speak of prior to the new... that's exactly what new means; it takes raw memory and turns it into live values. The union case is actually a sort of aberration if anything. Personally I don't much like unions fundamentally from a conceptual point of view; because they express an aggregate of values despite all, or all-but-one of those values are completely invalid. So, assuming If it receives new(&myUnion.member) T(); Where the pointer will naturally cast to The solution that gives you bounds checking is to define new((&myUnion.member)[0..1]) T(); So, I'm taking the address of the union member, and from that pointer taking a 1-length slice. That slice will naturally casts to I grant that looks a tad awkward, but I think it's pretty much only the union case which is subject to this weird contortion, and my feeling is that it's appropriate to place that burden on the union case since they're already an unsavoury and weird case themselves. I would load the union case with the unnatural thunk, rather than requiring all the natural cases to be burdened with an unnatural thunk. The natural cases look like this: T* newObject = new(allocateMem(T.sizeof)) T; Given some allocator Even CRT malloc always seems to be wrapped in some function like I think that's the way forward. |
Actually, thinking a little more, I realised one more really common case: void opAssign(ref typeof(this) rh)
{
this.destroy();
new(&this) typeof(this)(rh);
} It's common to fabricate a copy assignment from a copy constructor using that pattern. And so this is another case where It's always sat a little awkwardly with me that you have some T, you may destroy it and then pass that same T reference forward into a reconstruction... there's a period between destruction and reconstruction where you're handing a reference to raw and invalid memory, but it's still typed like a valid T. From a purist perspective, it would be nice if the language were specified that any time you are holding a T*, then you are holding a valid and live value. That may even be a convenient piece of language spec in the event you were planning to implement lifetime analysis for instance. Okay, so here's my idea; we have This actually feels quite natural to me; if you destroy a value, that terminates the value's lifetime, there is no more value and so there is no more T* object = ...;
void[] mem = destroy(object); // returns the former values buffer
T* newObject = new(mem) T; // recycle the buffer directly Or simply: void opAssign(ref typeof(this) rh)
{
new(this.destroy) typeof(this)(rh);
} That reads nicely, the types all flow... but what's really nice is that because destroy returns the sized void buffer, you are no longer encouraged to handle the typed pointer post-destruction! Terrible idea? After a call to destroy where you receive the value's former buffer as result; it is UB or even a compile error to touch the dead-but-still-typed pointer from that event onwards. In the future; ideally, destroy would be intrinsic too, and not just call the destructor but also implement the termination of the value's lifetime. Maybe you should recycle |
Yeah, but tough; placement new is FOR dealing with freely allocated objects. Your union case is an edge case, and it's also a language aberration. For what it's worth; I find that I rarely use unions in my own code. I usually place an embedded buffer and implement emplace operations and casts in the methods that access what might have otherwise been different members in a union. |
This code in dmd's constfold.d:
(*) would be:
It looks nice, and no ugly casting or use of &. Compile time checking of sizes, instead of runtime. The "joints" between untyped data and typed data are where user error creeps in. Minimizing this is good. This is why I want to try this approach. I've made some progress in the dmd source with eliminating the use of the & operator. The results are appealing, more readable, and safer. I know I haven't addressed all your points, more later. |
I forgot to respond to one of your points:
I just want to point out here, that the idea of the lifetime of something being unceremoniously terminated is definitely not something that should be happening in any case ever. Cases of buffer recycling should have been destroyed prior. The proposal I made above where destroy returns you the buffer that you need for a memory recycling operation feels like the natural way to deal with this situation; it naturally avoids this "unceremoniously terminating" live values, since they are not valid function arguments to begin with. In fact, I would say that the pattern where destroy returns a |
I'm sorry, but using a union as primary (singular!) case-study is completely out of order. You can write this: new((&ue)[0..1]) IntegerExp(loc, ~e1.toInteger(), type); Yeah, it's not perfect, but the union is actually the problem... don't wreck everything for this one degenerate case. My wall of text above should be seriously considered... I think it's a really strong design, and solves a whole lot of problems that C++ (and D) has with lifetime flow, it makes the start/end of lifetimes feel natural while also making the typed variables naturally match the lifetimes of the values they represent, and it also solves convenient recycling of memory. |
Given:
how does this look:
? Where:
Not bad! Only one dirty cast for all the new's, no &, and the user needn't use a |
|
This is what a function called void[] mallocate(size_t);
ref T mallocate(T)()
{
return *new(mallocate(T.sizeof)) T;
} No casts. It's completely @safe. No function that returns a T should return an invalid/uninitialised T ever. |
My whole point is, there's an opportunity here to eliminate UB, invalid memory references, and unsafe casts associated with lifetime management from the language specification entirely... and that's a HUGE deal. If you haven't understood the point of what I'm suggesting, then we need to get on a call and talk. |
We both want the same thing. But realistically, there's no way to not have UB when allocating and initializing memory. That's why at some point it's going to have to not be verifiably safe by the compiler. As for void initialized objects,
we already have that in the language, and it is supported. But only in system code. Casting an array of void into a real object is always going to be system code. I appreciate your efforts to make this as safe as practical. But I also want the union case to be easy - this also applies to option types and sum types. And I want things to be correct without adding runtime checks. So here goes, I verified that it works:
I changed your proposal of a slice
Efficient, compact, and no monkey business. |
This said though; you keep using And sorry, let me be more clear; yes I understand there will be unsafe escape hatches (union for instance, that's unsafe by definition), and some low-level coercions are still available and useful, but what I'm trying to achieve is where the common case (allocate and initialise) have no casts, no weird shit, so that flow can actually be completely safe. I thought about static array briefly too. It seems workable, but it's got some ergonomic problems. You can't pass fixed arrays easily, because you need a different overload of a function for every array size, which means this will always be wrapped in something that; receives a dynamic array, does a bounds check, and then slices the appropriately bounded static array, then pass to a call like you show above. The nice thing about a fixed array is that you can elide runtime bounds checking, which is nice, but I think the ergonomic problem carries more weight. Memory allocation is NEVER a high-frequency operation; I can't imagine any situation where I wouldn't tolerate a runtime bounds check in the Perhaps you'd consider accepting BOTH fixed array and also dynamic array; if a dynamic array was supplied, perform a runtime bounds check, static array, static bounds check? I want to re-enforce though that my proposal is designed to work very nicely if we also make |
It doesn't work because it returns a
Placement new has a major use case for unions, option types, and sum types, and these can be very high frequency. See CTFE, which I sped up quite a bit by using emplace on a stack allocated union. I've also written a Javascript interpreter - the operand stack is a critical high-frequency piece of code, and the operands get overwritten like any stack. As I've shown, the static array method generates optimal code, so the user doesn't need to hack around it.
Not a problem, it's still castable to a pointer to a static array, since the length is known in advance.
I haven't implemented placement new for variable sized arrays yet. The proposed solution uses a template to take care of placement new expressions for types of fixed size. I don't think it is a big deal, it's just a few instructions.
I expect a different implementation will be required for new'ing variable length arrays, meaning a
Placement new is never going to be safe. Prefixing a destroy() means that void-initialized objects will also be destroyed, and that's UB for sure. |
This implements Placement New, described in https://github.com/WalterBright/documents/blob/master/placementnew.md