-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blockexchange uses merkle root and index to fetch blocks #566
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick preliminary review.
Co-authored-by: Dmitriy Ryajov <dryajov@gmail.com> Signed-off-by: Tomasz Bekas <tomasz.bekas@gmail.com>
f365139
to
78a4d79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some more comments as I move along.
One thing I'm seeing right now, is that we're have a |
@@ -37,91 +39,44 @@ type | |||
cid*: Cid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can simplify things a lot, if we make the cid
and BlockAddress
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Block can have different addresses, but if you're looking for .address
property that's based on cid
, there's already a helper proc:
proc address*(b: Block): BlockAddress =
BlockAddress(leaf: false, cid: b.cid)
@dryajov I agree with your comment. In order to not introduce this additional flow, I would do the following: |
I think this is consequence/limitation of the current multihash interfaces, there is no reason why we could not have a digest for merkelized structures, which instead of performing block oriented hashing, uses a merkel tree/trie as it's underlying digest method. I think, the cleanest way to approach it would be to add a digest for our merkelized structures to multihash in libp2p. This isn't hard, but it requires to move some of our code around, including extracting the merkle tree implementation. I'll need to think if there is an intermediate way that allows us to handle this more gracefully, without having to modify too many dependencies. One way, would be to forego the pure multihash interfaces and put it behind yet another abstraction that uses either multihash or a custom function, this might be useful for the poseidon hash as well. It would be a temporary solution as eventually we want to consolidate these things under multiformats, but it might be too many dependencies to modify right now. So, a good intermediate solution could be to:
I'm not settled on it yet, but it might be one pragmatic approach. Another one would be to simply bite the bullet, extract the merkle tree into it's own package and add everything under libp2p, we already kinda have to work with a fork either way, at least for the next couple of months.
I'm not sure if this is required with what I described above, I really like the idea of storing merkle trees as blocks as it keeps the underlying flow pretty consistent, I'd like to avoid special flows as much as possible and consolidate everything under the same underlying primitives. I think what we lack at the repo/blockstore right now is the ability to stream a single block from disk/network, in case it is too large, say several megabytes in size, but this is easily fixed by extending the interface. It does require getting rid of protobuf, to something that a) offers binary consistency (protobuf does not, different implementations will produce different byte by byte outputs) b) allows streaming the contents, which again protobuf doesn't support at all. There are a wealth of serde formats out there, that we eventually want to look into more closely, but for now, protobuf is OK, sans the limitations. All that said, I'm not yet settled on any specific approach, but one thing is clear, we need to get rid of the duplicated identifiers. |
I merged #593, which simply removes the storageproofs directory and tests for now. If you can't get it to merge cleanly, just delete the directory and tests... |
I rebased locally, and it does so cleanly, so it should work without issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments as I go through.
iterator items*(self: BlocksIter): Future[?Cid] = | ||
while not self.finished: | ||
yield self.next() | ||
method getBlock*(self: BlockStore, treeCid: Cid, index: Natural): Future[?!Block] {.base.} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that Natural
isn't a great choise for index
, since it wouldn't support negatives and potentially BackwardsIndex
.
Also, is there any reason not to use BlockAddress
and unify the interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ad1. But backwards index is a different access pattern and hence a different parameter type. This is what we have in AsyncHeapQueue
proc `[]`*[T](heap: AsyncHeapQueue[T], i: Natural) : T {.inline.} =
## Access the i-th element of ``heap`` by order from first to last.
## ``heap[0]`` is the first element, ``heap[^1]`` is the last element.
heap.queue[i]
proc `[]`*[T](heap: AsyncHeapQueue[T], i: BackwardsIndex) : T {.inline.} =
## Access the i-th element of ``heap`` by order from first to last.
## ``heap[0]`` is the first element, ``heap[^1]`` is the last element.
heap.queue[len(heap.queue) - int(i)]
Ad2. There's a method to get block by address (line 46), but in some cases we don't have an address object in the scope, instead we have separately treeCid
and index
variables. We could allocate the memory and create the BlockAddress
object, but: 1. that seem a bit wasteful and 2. it's an additional inconvenience to the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ad1. But backwards index is a different access pattern and hence a different parameter type. This is what we have in AsyncHeapQueue
OK, not a big deal - we can't use []
operator with async and we'll unlikely to use a BackwardsIndex
either way.
Ad2. There's a method to get block by address (line 46), but in some cases we don't have an address object in the scope, instead we have separately treeCid and index variables. We could allocate the memory and create the BlockAddress object, but: 1. that seem a bit wasteful and 2. it's an additional inconvenience to the client.
Hmm, that seems like a problem of not using BlockAddress
consistently rather than anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that seems like a problem of not using BlockAddress consistently rather than anything else?
So in couple of places we have a following pattern:
for i in 0..<manifest.blocksCount:
await getBlock(manifest.treeCid, i)
Would you like it to be replaced by following code?
for i in 0..<manifest.blocksCount:
let address = BlockAddress.init(manifest.treeCid, i)
await getBlock(address)
If so I could get rid of the method getBlock(treeCid, index)
, but in my opinion the 2nd code is simply worse because it's:
- less efficient (new object created on each iteration)
- more verbose than necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I understand the reasoning, but IMO having two slightly distinct interfaces will just make the codebase less consistent - IMO, consistency trumps everything else.
We could add an initializer from a tuple for example
init(_: type (Cid, Natural)): BlockAddress = ...
The point is that we stick with one convention and follow that, otherwise we'll end up with (slightly) distinct looking flows, which will make the codebase harder to read and maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I understand the reasoning and I agree that unification has certain benefits (if at some point we will remove fetching individual blocks by just cid
, we will have less to refactor).
Also there's more to unify if we go this path:
method ensureExpiry*(self: BlockStore, cid: Cid, expiry: SecondsSince1970)
method delBlock*(self: BlockStore, cid: Cid)
method delBlock*(self: BlockStore, treeCid: Cid, index: Natural)
method hasBlock*(self: BlockStore, cid: Cid)
method hasBlock*(self: BlockStore, tree: Cid, index: Natural)
proc contains*(self: BlockStore, blk: Cid)
proc contains*(self: BlockStore, address: BlockAddress)
I could do this as a part of this PR or in a separate one. Wdyt?
p.blocks.withValue(bd.address, blockReq): | ||
trace "Resolving block", address = bd.address | ||
|
||
if not blockReq.handle.finished: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you want to use the deref operator, []
for ref/pointer types, nim will try to infere the type, but it doesn't work always...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it works in this case, should I add []
anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it doesn't work always and it's also easy to miss that we're really handling a pointer when reading the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking much better. I'll try to go through it once more, but so far I think this is meargeable.
@CodiumAI-Agent /review |
PR Analysis
PR Feedback
How to use
|
@CodiumAI-Agent /improve --extended |
I see that integration tests are not successful, I will try to fix it on Monday. |
It seems to be failing on marketplace related stuff, which has been very flaky lately. |
let iter = Iter.fromSlice(0..<manifest.blocksCount) | ||
.map((i: int) => node.blockStore.getBlock(BlockAddress.init(manifest.treeCid, i))) | ||
|
||
for batchNum in 0..<batchCount: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can potentialy throw, we probably want this to in the try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets get this merged, we can address outstanding issues in subsequent PRs
Resolves #511
Resolves #498