X-Git-Url: https://git.ozlabs.org/?p=ccan;a=blobdiff_plain;f=ccan%2Fcrypto%2Fshachain%2Fdesign.txt;h=b4639763c94b3f0122759ba846bf63bb1f035826;hp=02f4c6f6f698b49f5ccffb6beae0b3813944dc3e;hb=d23fb57c8e276090325162966b094ebf71e73e68;hpb=bb88463014092a5550ea4c51a037e12e49446722;ds=sidebyside diff --git a/ccan/crypto/shachain/design.txt b/ccan/crypto/shachain/design.txt index 02f4c6f6..b4639763 100644 --- a/ccan/crypto/shachain/design.txt +++ b/ccan/crypto/shachain/design.txt @@ -16,99 +16,107 @@ A simple system is a hash chain: we select a random seed value, the hash it 1,000,000 times. This gives the first "random" number. Hashed 999,999 times gives the second number, etc. ie: - R(1,000,000) = seed - R(N-1) = SHA256(R(N)) + R(0) = seed + R(N+1) = SHA256(R(N)) This way the remote node needs only to remember the last R(N) it was -given, and it can calculate any R for N-1 or below. +given, and it can calculate any R for N+1 or above. However, this means we need to generate 1 million hashes up front, and then do almost as many hashes to derive the next number. That's slow. -A More Complex Solution ------------------------ - -Instead of a one-dimensional chain, we can use two dimensions: 1000 -chains of 1000 values each. Indeed, we can set generate the "top" of -each chain much like we generated a single chain: - - Chain 1000 Chain 999 Chain 998 ...........Chain 1 - seed SHA256(C1000) SHA256(C999) ....... SHA256(C2) - -Now, deriving chain 1000 from seed doesn't quite work, because it'll -look like this chain, so we flip the lower bit to generate the chain: - - Chain 1000 Chain 999 Chain 998 ...........Chain 1 -1000 seed^1 SHA256(C1000)^1 SHA256(C999)^1...... SHA256(C2)^1 - 999 SHA256(above) SHA256(above) SHA256(above) ..... SHA256(above) - 998 SHA256(above) SHA256(above) SHA256(above) ..... SHA256(above) - ... - -Now, we can get the first value to give out (chain 1, position 1) with -999 hashes to get to chain 1, and 999 hashes to get to the end of the -chain. 2000 hashes is much better than the 999,999 hashes it would -have taken previously. - -Why Stop at 2 Dimensions? -------------------------- - -Indeed, the implementation uses 64 dimensions rather than 2, and a -chain length of 2 rather than 1000, giving a worst-case of 63 hashes -to derive any of 2^64 values. Each dimension flips a different bit of -the hash, to ensure the chains are distinct. - -For simplicity, I'll explain what this looks like using 8 dimensions, -ie. 8 bits. The seed value always sits the maximum possible index, in -this case index 0xFF (b11111111). - -To generate the hash for 0xFE (b11111110), we need to move down -dimension 0, so we flip bit 0 of the seed value, then hash it. To -generate the hash for 0xFD (b11111101) we need to move down dimension -1, so we flip bit 1 of the seed value, then hash it. - -To reach 0xFC (b11111100) we need to move down dimension 1 then -dimension 0, in that order. - -Spotting the pattern, it becomes easy to derive how to reach any value: - - hash = seed - for bit in 7 6 5 4 3 2 1 0: - if bit not set in index: - flip(bit) in hash - hash = SHA256(hash) - -Handling Partial Knowledge --------------------------- - -How does the remote node, which doesn't know the seed value, derive -subvalues? - -Once it knows the value for index 1, it can derive the value for index -0 by flipping bit 0 of the value and hashing it. In effect, it can -always derive a value for any index where it only needs to clear bits. - -So, index 1 gives index 0, but index 2 doesn't yield index 1. When -index 3 comes along, it yields 2, 1, and 0. - -How many hash values will we have to remember at once? The answer is -equal to the number of dimensions. It turns out that the worst case -for 8 dimensions is 254 (0b11111110), for which we will have to -remember the following indices: - - 127 0b01111111 - 191 0b10111111 - 223 0b11011111 - 239 0b11101111 - 247 0b11110111 - 251 0b11111011 - 253 0b11111101 - 254 0b11111110 - -127 lets us derive any hash value for index <= 127. Similarly, 191 -lets us derive anything > 127 but <= 191. 254 lets us derive only -itself. - -When we get index 255 this collapses, and we only need to remember -that one index to derive everything. +A Tree Solution +--------------- + +A better solution is to use a binary tree, with the seed at the root. +The left child is the same as the parent, the right child is the +SHA256() of the parent with one bit flipped (corresponding to the +height). + +This gives a tree like so: + + seed + / \ + / \ + / \ + / \ + seed SHA256(seed^1) + / \ / \ + seed SHA256(seed^2) SHA256(seed^1) SHA256(SHA256(seed^1)^2) +Index: 0 1 2 3 + +Clearly, giving R(2) allows you to derive R(3), giving R(1) allows you +to derive nothing new (you still have to remember R(2)), and giving +R(0) allows you to derive everything. + +In pseudocode, this looks like the following for a 64 bit tree: + +generate_from_seed(index): + value = seed + for bit in 0 to 63: + if bit set in index: + flip(bit) in value + value = SHA256(value) + return value + + +The Receiver's Tree +------------------- + +To derive the value for a index N, you need to have the root of a tree +which contains it. That is the same as needing an index I which is N +rounded down in binary: eg. if N is 0b001100001, you need 0b001100000, +0b001000000 or 0b000000000. + +Pseudocode: + +# Can we derive the value for to_index from from_index? +can_derive(from_index, to_index): + # to_index must be a subtree under from_index; this is the same as + # saying that to_index must be the same as from_index up to the + # trailing zeros in from_index. + for bit in count_trailing_zeroes(from_index)..63: + if bit set in from_index != bit set in to_index: + return false + return true + +# Derive a value from a lesser index: generalization of generate_from_seed() +derive(from_index, to_index, from_value): + assert(can_derive(from_index, to_index)) + value = from_value + for bit in 0..63: + if bit set in to_index and not bit set in from_index: + flip bit in value + value = SHA256(value) + return value + +If you are receiving values (in reverse order), you need to remember +up to 64 of them to derive all previous values. The simplest method +is to keep an array, indexed by the number of trailing zeroes in the +received index: + +# Receive a new value (assumes we receive them in order) +receive_value(index, value): + pos = count_trailing_zeroes(index) + # We should be able to generate every lesser value, otherwise invalid + for i in 0..pos-1: + if derive(index, value, known[i].index) != known[i].value: + return false + known[pos].index = index + known[pos].value = value + return true + +To derive a previous value, find an element in that array from which +you can derive the value you want, eg: + +# Find an old value +regenerate_value(index): + for i in known: + if can_derive(i.index, index): + return derive(i.index, i.value, index) + fail + +You can see the implementation for more optimized variants of the +above code. Rusty Russell