git.ozlabs.org Git - ccan/blob - ccan/crypto/shachain/design.txt

   1 Efficient Chains Of Unpredictable Numbers
   2 =========================================
   3
   4 The Problem
   5 -----------
   6
   7 The Lightning Network wants a chain of (say 1 million) unguessable 256
   8 bit values; we generate them and send them one at a time to a remote
   9 node.  We don't want the remote node to have to store all the values,
  10 so it's better if they can derive them once they see them.
  11
  12 A Simple Solution
  13 -----------------
  14
  15 A simple system is a hash chain: we select a random seed value, the
  16 hash it 1,000,000 times.  This gives the first "random" number.
  17 Hashed 999,999 times gives the second number, etc. ie:
  18
  19         R(1,000,000) = seed
  20         R(N-1) = SHA256(R(N))
  21
  22 This way the remote node needs only to remember the last R(N) it was
  23 given, and it can calculate any R for N-1 or below.
  24
  25 However, this means we need to generate 1 million hashes up front, and
  26 then do almost as many hashes to derive the next number.  That's slow.
  27
  28 A More Complex Solution
  29 -----------------------
  30
  31 Instead of a one-dimensional chain, we can use two dimensions: 1000
  32 chains of 1000 values each.  Indeed, we can set generate the "top" of
  33 each chain much like we generated a single chain:
  34
  35      Chain 1000      Chain 999        Chain 998 ...........Chain 1
  36      seed            SHA256(C1000)    SHA256(C999) ....... SHA256(C2)
  37
  38 Now, deriving chain 1000 from seed doesn't quite work, because it'll
  39 look like this chain, so we flip the lower bit to generate the chain:
  40
  41      Chain 1000      Chain 999        Chain 998 ...........Chain 1
  42 1000 seed^1          SHA256(C1000)^1  SHA256(C999)^1...... SHA256(C2)^1
  43  999 SHA256(above)   SHA256(above)    SHA256(above)  ..... SHA256(above)
  44  998 SHA256(above)   SHA256(above)    SHA256(above)  ..... SHA256(above)
  45  ...
  46
  47 Now, we can get the first value to give out (chain 1, position 1) with
  48 999 hashes to get to chain 1, and 999 hashes to get to the end of the
  49 chain.  2000 hashes is much better than the 999,999 hashes it would
  50 have taken previously.
  51
  52 Why Stop at 2 Dimensions?
  53 -------------------------
  54
  55 Indeed, the implementation uses 64 dimensions rather than 2, and a
  56 chain length of 2 rather than 1000, giving a worst-case of 63 hashes
  57 to derive any of 2^64 values.  Each dimension flips a different bit of
  58 the hash, to ensure the chains are distinct.
  59
  60 For simplicity, I'll explain what this looks like using 8 dimensions,
  61 ie. 8 bits.  The seed value always sits the maximum possible index, in
  62 this case index 0xFF (b11111111).
  63
  64 To generate the hash for 0xFE (b11111110), we need to move down
  65 dimension 0, so we flip bit 0 of the seed value, then hash it.  To
  66 generate the hash for 0xFD (b11111101) we need to move down dimension
  67 1, so we flip bit 1 of the seed value, then hash it.
  68
  69 To reach 0xFC (b11111100) we need to move down dimension 1 then
  70 dimension 0, in that order.
  71
  72 Spotting the pattern, it becomes easy to derive how to reach any value:
  73
  74          hash = seed
  75          for bit in 7 6 5 4 3 2 1 0:
  76              if bit not set in index:
  77                 flip(bit) in hash
  78                 hash = SHA256(hash)
  79
  80 Handling Partial Knowledge
  81 --------------------------
  82
  83 How does the remote node, which doesn't know the seed value, derive
  84 subvalues?
  85
  86 Once it knows the value for index 1, it can derive the value for index
  87 0 by flipping bit 0 of the value and hashing it.  In effect, it can
  88 always derive a value for any index where it only needs to clear bits.
  89
  90 So, index 1 gives index 0, but index 2 doesn't yield index 1.  When
  91 index 3 comes along, it yields 2, 1, and 0.
  92
  93 How many hash values will we have to remember at once?  The answer is
  94 equal to the number of dimensions.  It turns out that the worst case
  95 for 8 dimensions is 254 (0b11111110), for which we will have to
  96 remember the following indices:
  97
  98         127 0b01111111
  99         191 0b10111111
 100         223 0b11011111
 101         239 0b11101111
 102         247 0b11110111
 103         251 0b11111011
 104         253 0b11111101
 105         254 0b11111110
 106
 107 127 lets us derive any hash value for index <= 127.  Similarly, 191
 108 lets us derive anything > 127 but <= 191.  254 lets us derive only
 109 itself.
 110
 111 When we get index 255 this collapses, and we only need to remember
 112 that one index to derive everything.
 113
 114 Rusty Russell <rusty@rustcorp.com.au>