X-Git-Url: https://git.ozlabs.org/?a=blobdiff_plain;f=ccan%2Ftdb2%2Fdoc%2Fdesign.lyx;h=ca17f8fece511ee9e31d1fb7b7280f11429273fc;hb=539f1af037858b905c50c560f2a608555d8457ff;hp=276832ead7bd9582b94cf94f32fc982b4cbf7d8d;hpb=95458bafc9dc99ac8fcd68aa8f48a9fc564e6a31;p=ccan diff --git a/ccan/tdb2/doc/design.lyx b/ccan/tdb2/doc/design.lyx index 276832ea..ca17f8fe 100644 --- a/ccan/tdb2/doc/design.lyx +++ b/ccan/tdb2/doc/design.lyx @@ -53,8 +53,8 @@ Rusty Russell, IBM Corporation \change_deleted 0 1283307542 26-July -\change_inserted 0 1283307544 -1-September +\change_inserted 0 1284423485 +14-September \change_unchanged -2010 \end_layout @@ -476,6 +476,17 @@ The tdb_open() call was expanded to tdb_open_ex(), which added an optional \begin_layout Subsubsection Proposed Solution +\change_inserted 0 1284422789 + +\begin_inset CommandInset label +LatexCommand label +name "attributes" + +\end_inset + + +\change_unchanged + \end_layout \begin_layout Standard @@ -835,6 +846,18 @@ Internal locking is required to make sure that fcntl locks do not overlap \begin_layout Standard The aim is that building tdb with -DTDB_PTHREAD will result in a pthread-safe version of the library, and otherwise no overhead will exist. + +\change_inserted 0 1284016998 + Alternatively, a hooking mechanism similar to that proposed for +\begin_inset CommandInset ref +LatexCommand ref +reference "Proposed-Solution-locking-hook" + +\end_inset + + could be used to enable pthread locking at runtime. +\change_unchanged + \end_layout \begin_layout Subsection @@ -1183,6 +1206,165 @@ reference "TDB_CLEAR_IF_FIRST-Imposes-Performance" \end_inset . +\change_inserted 0 1284015637 + +\end_layout + +\begin_layout Subsection + +\change_inserted 0 1284015716 +Extending The Header Is Difficult +\end_layout + +\begin_layout Standard + +\change_inserted 0 1284015906 +We have reserved (zeroed) words in the TDB header, which can be used for + future features. + If the future features are compulsory, the version number must be updated + to prevent old code from accessing the database. + But if the future feature is optional, we have no way of telling if older + code is accessing the database or not. +\end_layout + +\begin_layout Subsubsection + +\change_inserted 0 1284015637 +Proposed Solution +\end_layout + +\begin_layout Standard + +\change_inserted 0 1284016114 +The header should contain a +\begin_inset Quotes eld +\end_inset + +format variant +\begin_inset Quotes erd +\end_inset + + value (64-bit). + This is divided into two 32-bit parts: +\end_layout + +\begin_layout Enumerate + +\change_inserted 0 1284016149 +The lower part reflects the format variant understood by code accessing + the database. +\end_layout + +\begin_layout Enumerate + +\change_inserted 0 1284016639 +The upper part reflects the format variant you must understand to write + to the database (otherwise you can only open for reading). +\end_layout + +\begin_layout Standard + +\change_inserted 0 1284016821 +The latter field can only be written at creation time, the former should + be written under the OPEN_LOCK when opening the database for writing, if + the variant of the code is lower than the current lowest variant. +\end_layout + +\begin_layout Standard + +\change_inserted 0 1284016803 +This should allow backwards-compatible features to be added, and detection + if older code (which doesn't understand the feature) writes to the database. +\change_deleted 0 1284016101 + +\end_layout + +\begin_layout Subsection + +\change_inserted 0 1284015634 +Record Headers Are Not Expandible +\end_layout + +\begin_layout Standard + +\change_inserted 0 1284015634 +If we later want to add (say) checksums on keys and data, it would require + another format change, which we'd like to avoid. +\end_layout + +\begin_layout Subsubsection + +\change_inserted 0 1284015634 +Proposed Solution +\end_layout + +\begin_layout Standard + +\change_inserted 0 1284422552 +We often have extra padding at the tail of a record. + If we ensure that the first byte (if any) of this padding is zero, we will + have a way for future changes to detect code which doesn't understand a + new format: the new code would write (say) a 1 at the tail, and thus if + there is no tail or the first byte is 0, we would know the extension is + not present on that record. +\end_layout + +\begin_layout Subsection + +\change_inserted 0 1284422568 +TDB Does Not Use Talloc +\end_layout + +\begin_layout Standard + +\change_inserted 0 1284422646 +Many users of TDB (particularly Samba) use the talloc allocator, and thus + have to wrap TDB in a talloc context to use it conveniently. +\end_layout + +\begin_layout Subsubsection + +\change_inserted 0 1284422656 +Proposed Solution +\end_layout + +\begin_layout Standard + +\change_inserted 0 1284423065 +The allocation within TDB is not complicated enough to justify the use of + talloc, and I am reluctant to force another (excellent) library on TDB + users. + Nonetheless a compromise is possible. + An attribute (see +\begin_inset CommandInset ref +LatexCommand ref +reference "attributes" + +\end_inset + +) can be added later to tdb_open() to provide an alternate allocation mechanism, + specifically for talloc but usable by any other allocator (which would + ignore the +\begin_inset Quotes eld +\end_inset + +context +\begin_inset Quotes erd +\end_inset + + argument). +\end_layout + +\begin_layout Standard + +\change_inserted 0 1284423042 +This would form a talloc heirarchy as expected, but the caller would still + have to attach a destructor to the tdb context returned from tdb_open to + close it. + All TDB_DATA fields would be children of the tdb_context, and the caller + would still have to manage them (using talloc_free() or talloc_steal()). +\change_unchanged + \end_layout \begin_layout Section @@ -1760,7 +1942,7 @@ status open \begin_layout Plain Layout -\change_inserted 0 1283310945 +\change_inserted 0 1284424151 Using \begin_inset Formula $2^{16+N*3}$ \end_inset @@ -1771,6 +1953,8 @@ means 0 gives a minimal 65536-byte zone, 15 gives the maximal byte zone. Zones range in factor of 8 steps. + Given the zone size for the zone the current record is in, we can determine + the start of the zone. \change_unchanged \end_layout @@ -2215,6 +2399,8 @@ TDB Does Not Have Snapshot Support \begin_layout Subsubsection Proposed Solution +\change_deleted 0 1284423472 + \end_layout \begin_layout Standard @@ -2227,7 +2413,23 @@ use a real database \begin_inset Quotes erd \end_inset + +\change_inserted 0 1284423891 + +\change_deleted 0 1284423891 . + +\change_inserted 0 1284423901 + (but see +\begin_inset CommandInset ref +LatexCommand ref +reference "replay-attribute" + +\end_inset + +). +\change_unchanged + \end_layout \begin_layout Standard @@ -2250,6 +2452,8 @@ This would not allow arbitrary changes to the database, such as tdb_repack \begin_layout Standard We could then implement snapshots using a similar method, using multiple different hash tables/free tables. +\change_inserted 0 1284423495 + \end_layout \begin_layout Subsection @@ -2269,6 +2473,18 @@ Proposed Solution \end_layout \begin_layout Standard + +\change_inserted 0 1284424201 +None (but see +\begin_inset CommandInset ref +LatexCommand ref +reference "replay-attribute" + +\end_inset + +). + +\change_unchanged We could solve a small part of the problem by providing read-only transactions. These would allow one write transaction to begin, but it could not commit until all r/o transactions are done. @@ -2454,6 +2670,53 @@ At some later point, a sync would allow recovery of the old data into the free lists (perhaps when the array of top-level pointers filled). On crash, tdb_open() would examine the array of top levels, and apply the transactions until it encountered an invalid checksum. +\change_inserted 0 1284423555 + +\end_layout + +\begin_layout Subsection + +\change_inserted 0 1284423617 +Tracing Is Fragile, Replay Is External +\end_layout + +\begin_layout Standard + +\change_inserted 0 1284423719 +The current TDB has compile-time-enabled tracing code, but it often breaks + as it is not enabled by default. + In a similar way, the ctdb code has an external wrapper which does replay + tracing so it can coordinate cluster-wide transactions. +\end_layout + +\begin_layout Subsubsection + +\change_inserted 0 1284423864 +Proposed Solution +\begin_inset CommandInset label +LatexCommand label +name "replay-attribute" + +\end_inset + + +\end_layout + +\begin_layout Standard + +\change_inserted 0 1284423850 +Tridge points out that an attribute can be later added to tdb_open (see + +\begin_inset CommandInset ref +LatexCommand ref +reference "attributes" + +\end_inset + +) to provide replay/trace hooks, which could become the basis for this and + future parallel transactions and snapshot support. +\change_unchanged + \end_layout \end_body