Some ideas on the problem of "old" changes #35
Replies: 3 comments 6 replies
-
Another twist is that we have to define what it means to say that someone's data is "out of date" when there's no central authority, and we have to decide who needs to catch up with who. Consider a couple of scenarios, with a 1-week calendar time cutoff.
I think there would be enough information in the sync status tables to figure out who needs to rebase just by comparing two peers' tables, since these include unfakable evidence that a peer has synced, directly or indirectly, with other peers at specific times. So perhaps we just count the number of peers each one has synced within the time limit (in this case within the last week), and whoever has been in sync with more people wins. Tiebreaker would just be something arbitrary + deterministic, like a hash of the whole sync table. |
Beta Was this translation helpful? Give feedback.
-
The admin role has unlimited powers. It's all or nothing. Perhaps "owner" or "superadmin" would be a better term. The idea is that when the founder of a group grants admin authority to another member, she is giving them complete control over the group, up to and including the right to remove any other admins, even the founder herself.* One of the basic assumptions I've made about these teams is that all authority flows from the founder. They're not democracies: They're ruled unilaterally. If the founder delegates her power to another admin, she's 100% trusting them to act on her behalf. This assumption allows us to avoid the complexity of voting or consensus mechanisms. We still need to come up with sensible mechanics for resolving situations where different admins do certain things concurrently, but that shouldn't lead us to conclude that admins are somehow untrusted. Anything an admin chooses to do is acceptable by definition. And while the logic for determining "who needs to catch up with whom" might involve something that looks like a vote, it's not that — it's not so much a security mechanism as it is a way of choosing the least surprising outcome in (hopefully) rare edge cases. *I'd eventually like to be able to make this less of an all-or-nothing thing, allowing you to create roles that have some admin powers but not all. |
Beta Was this translation helpful? Give feedback.
-
I've been thinking about this some more lately, in particular the idea of limiting old updates by calendar time, which seems like it would make the most sense from a user's point of view. In that context I noticed a problem that I don't know how to solve; maybe you have thought about it? Let's say we want to reject updates that are more than one month out of date. Even if we use a clock sync protocol to limit the clock skew between devices, we could end up in the following situation:
Now Bob and Charlie are in inconsistent states because they have accepted different sets of updates. If we can tightly bound the clock skew between devices we can reduce the probability of this situation occurring, but we cannot rule it out entirely when updates have an age that is close to the cut-off. Maybe it's okay as long as this inconsistency is only temporary, but then there needs to be some sort of process by which Bob and Charlie eventually come to agreement on which updates to accept and which to reject. None of the algorithms I can think of are really convincing, though:
Do you have any ideas on how to resolve this? |
Beta Was this translation helpful? Give feedback.
-
The problem
I’ve been thinking about the general problem of merging in old changes to a CRDT, like where a device goes offline for a year and then shows up with changes that are “concurrent” with all the activity that's happened since then.
Two examples, one without malicious intent and the other a deliberate exploit:
Alice makes a change on her phone while disconnected. Before she ever connects, she gets a new phone and puts the old phone in a drawer. A year later, she pulls out the old phone and it reconnects. Now her old change is concurrent with a year's worth of changes, and could override any number of those changes.
Eve has authorized a "sleeper" device to be used at an opportune moment. Bob removes Eve from the admin role. On her device, Eve removes Bob from the group altogether. The change shows up as concurrent with Bob's action, and Eve wins the conflict and stays in the group.
(Both examples assume well-behaved client applications. A member using a modified client would have even more ways to cause mischief.)
This is a security concern in the context of localfirst/auth, but in general it’s also a big usability problem for distributed apps. If a long-dormant device can come online and introduce a single operation that overturns months' worth of activity, people will perceive the app as unstable — even if there's no malice and no security issues involved.
And of course it's the possibility of a change being introduced anywhere in the DAG that forces us to lug around our entire history of changes until the end of time, rather than taking a snapshot at a point in time and discarding older history.
Martin @ept tells me that this is an active area of research under the label of "causal stability". As in: Application state up to and including a given action
A0
is "causally stable" if we know that we'll never see a new change that's earlier than or concurrent toA0
. That could be because we know every device has seen changeA0
, or it could be because we know that any new changes based on anything earlier thanA0
will be rejected.Collecting and sharing sync status
To know when we have causal stability, we'll need to keep track of each device's sync status and share that information as part of the sync process.
In lf/auth, every sync message that Alice sends already includes Alice's current
head
. We can add to that a timestamp (more about that later), and sign those two things. So something like... where
signature
is Alice's signature of the object{ head, timestamp }
.When Alice and Bob finish syncing — which happens when they both have the same head, and they both know they have the same head — Alice records Bob's head, timestamp, and signature under Bob's name in a dictionary that she stores alongside the chain:*
Every time we sync, we will share our entire sync status table once, perhaps as part of the initial sync message. When Alice receives Bob's sync status table, she goes through and updates each of her entries as needed: If the head Bob has listed for Charlie is more recent than what Alice has in her table, she updates hers, otherwise she ignores it.
We can trust this information 100%, even if we receive it second-hand: Bob's self-reported head has to be accurate or he'd never be able to complete the sync process. And if we learn about Charlie's status through Bob, we know that Bob hasn't modified it because it comes with Charlie's signature.
If our sync status table includes every device used by every member of the team, then
A1
is the earliest action listed there, andA0
is the first predecessor ofA1
that has no concurrent actions.We can now snapshot the application state as of A0 and discard everything in the chain up to and including
A0
. And we don't need to worry about a backdating attack prior to that point.Limiting old updates by calendar time
That is good and well, but it wouldn't prevent either of the scenarios described above. We might notice that all but one device is relatively up-to-date, but we can't do anything about it.
It would be great for an app to have the option of limiting how far out of date a device can be before we require it to catch up before submitting changes. That limit
L
could be a day, or a month, or a year — whatever it was, the idea would be that you couldn't base a change on a head that's older than that. Instead you'd have to catch up with the latest information, and then rebase your change onto the current head.In general we know that we can't trust people's self-reported timestamps for the purposes of determining the correct order of actions.
However, when two devices are syncing in real time, it would be reasonable to require them both to be showing approximately the same time. So when Alice receives a timestamped sync message from Bob, if his self-reported time is off by some margin — say, 5 minutes, or even an hour, doesn't need to be very precise for this purpose — she refuses to connect.
Now, we also have a timestamp attached to each action, which we haven't used for anything thus far. We still can't trust that timestamp for purposes of conflict resolution.
However, if make sure when syncing that we never accept an action timestamped in the future, we can now place an upper bound on an action's time: It can't have happened after it was first seen by someone other than the author, or after a dependent action by a different author. And a lower bound: It can't have happened before a preceding action by a different author.
This allows us to create a straightforward way to limit updates by calendar time. Suppose we set a limit of 1 month (
L
= 2629800000 ms). We identifyA1
, the latest action that has no concurrent actions and has a timestamp older thanL
. We then identifyA0
, the first action precedingA1
by a different author thanA1
. We will now refuse to accept new actions based on actions that precedeA0
. If you're syncing, and you have new changes, and you don't know aboutA0
, you will need to sync first, and then rebase your changes onA0
or a later change.Limiting old updates by logical time
An alternative, simpler, way to define
A0
would be to ignore timestamps and calendar time altogether, and set a limit based on the absolute number of changes elapsed —- sayN
= 100 changes, or 1000 changes, or what have you. Here the algorithm would be something like: Working backwards from the head, count the number of actions. When that count is greater or equal toN
and we're on an action with no concurrent actions, we've foundA0
.One objection to this is that Eve could make a change and then stuff the chain with
N
inconsequential actions in order to force others to rebase on her change. To prevent that we could decide to count runs of actions by the same author as a single increment; soN
would be redefined as the number of actions whose parents include an action by a different author.At any rate, my guess is that a calendar-based limit will make more intuitive sense to most users.
Beta Was this translation helpful? Give feedback.
All reactions