\change_deleted 0 1283307542
26-July
-\change_inserted 0 1284016854
-9-September
+\change_inserted 0 1284423485
+14-September
\change_unchanged
-2010
\end_layout
\begin_layout Subsubsection
Proposed Solution
+\change_inserted 0 1284422789
+
+\begin_inset CommandInset label
+LatexCommand label
+name "attributes"
+
+\end_inset
+
+
+\change_unchanged
+
\end_layout
\begin_layout Standard
\begin_layout Standard
-\change_inserted 0 1284016847
+\change_inserted 0 1284422552
We often have extra padding at the tail of a record.
If we ensure that the first byte (if any) of this padding is zero, we will
have a way for future changes to detect code which doesn't understand a
new format: the new code would write (say) a 1 at the tail, and thus if
there is no tail or the first byte is 0, we would know the extension is
not present on that record.
+\end_layout
+
+\begin_layout Subsection
+
+\change_inserted 0 1284422568
+TDB Does Not Use Talloc
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1284422646
+Many users of TDB (particularly Samba) use the talloc allocator, and thus
+ have to wrap TDB in a talloc context to use it conveniently.
+\end_layout
+
+\begin_layout Subsubsection
+
+\change_inserted 0 1284422656
+Proposed Solution
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1284423065
+The allocation within TDB is not complicated enough to justify the use of
+ talloc, and I am reluctant to force another (excellent) library on TDB
+ users.
+ Nonetheless a compromise is possible.
+ An attribute (see
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "attributes"
+
+\end_inset
+
+) can be added later to tdb_open() to provide an alternate allocation mechanism,
+ specifically for talloc but usable by any other allocator (which would
+ ignore the
+\begin_inset Quotes eld
+\end_inset
+
+context
+\begin_inset Quotes erd
+\end_inset
+
+ argument).
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1284423042
+This would form a talloc heirarchy as expected, but the caller would still
+ have to attach a destructor to the tdb context returned from tdb_open to
+ close it.
+ All TDB_DATA fields would be children of the tdb_context, and the caller
+ would still have to manage them (using talloc_free() or talloc_steal()).
\change_unchanged
\end_layout
\begin_layout Plain Layout
-\change_inserted 0 1283310945
+\change_inserted 0 1284424151
Using
\begin_inset Formula $2^{16+N*3}$
\end_inset
byte zone.
Zones range in factor of 8 steps.
+ Given the zone size for the zone the current record is in, we can determine
+ the start of the zone.
\change_unchanged
\end_layout
\begin_layout Subsubsection
Proposed Solution
+\change_deleted 0 1284423472
+
\end_layout
\begin_layout Standard
\begin_inset Quotes erd
\end_inset
+
+\change_inserted 0 1284423891
+
+\change_deleted 0 1284423891
.
+
+\change_inserted 0 1284423901
+ (but see
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "replay-attribute"
+
+\end_inset
+
+).
+\change_unchanged
+
\end_layout
\begin_layout Standard
\begin_layout Standard
We could then implement snapshots using a similar method, using multiple
different hash tables/free tables.
+\change_inserted 0 1284423495
+
\end_layout
\begin_layout Subsection
\end_layout
\begin_layout Standard
+
+\change_inserted 0 1284424201
+None (but see
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "replay-attribute"
+
+\end_inset
+
+).
+
+\change_unchanged
We could solve a small part of the problem by providing read-only transactions.
These would allow one write transaction to begin, but it could not commit
until all r/o transactions are done.
free lists (perhaps when the array of top-level pointers filled).
On crash, tdb_open() would examine the array of top levels, and apply the
transactions until it encountered an invalid checksum.
+\change_inserted 0 1284423555
+
+\end_layout
+
+\begin_layout Subsection
+
+\change_inserted 0 1284423617
+Tracing Is Fragile, Replay Is External
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1284423719
+The current TDB has compile-time-enabled tracing code, but it often breaks
+ as it is not enabled by default.
+ In a similar way, the ctdb code has an external wrapper which does replay
+ tracing so it can coordinate cluster-wide transactions.
+\end_layout
+
+\begin_layout Subsubsection
+
+\change_inserted 0 1284423864
+Proposed Solution
+\begin_inset CommandInset label
+LatexCommand label
+name "replay-attribute"
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1284423850
+Tridge points out that an attribute can be later added to tdb_open (see
+
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "attributes"
+
+\end_inset
+
+) to provide replay/trace hooks, which could become the basis for this and
+ future parallel transactions and snapshot support.
+\change_unchanged
+
\end_layout
\end_body
-head 1.9;
+head 1.10;
access;
symbols;
locks; strict;
comment @# @;
+1.10
+date 2010.09.14.00.33.57; author rusty; state Exp;
+branches;
+next 1.9;
+
1.9
date 2010.09.09.07.25.12; author rusty; state Exp;
branches;
@
-1.9
+1.10
log
-@Extension mechanism.
+@Tracing attribute, talloc support.
@
text
@#LyX 1.6.5 created this file. For more info see http://www.lyx.org/
\change_deleted 0 1283307542
26-July
-\change_inserted 0 1284016854
-9-September
+\change_inserted 0 1284423485
+14-September
\change_unchanged
-2010
\end_layout
\begin_layout Subsubsection
Proposed Solution
+\change_inserted 0 1284422789
+
+\begin_inset CommandInset label
+LatexCommand label
+name "attributes"
+
+\end_inset
+
+
+\change_unchanged
+
\end_layout
\begin_layout Standard
\begin_layout Standard
-\change_inserted 0 1284016847
+\change_inserted 0 1284422552
We often have extra padding at the tail of a record.
If we ensure that the first byte (if any) of this padding is zero, we will
have a way for future changes to detect code which doesn't understand a
new format: the new code would write (say) a 1 at the tail, and thus if
there is no tail or the first byte is 0, we would know the extension is
not present on that record.
+\end_layout
+
+\begin_layout Subsection
+
+\change_inserted 0 1284422568
+TDB Does Not Use Talloc
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1284422646
+Many users of TDB (particularly Samba) use the talloc allocator, and thus
+ have to wrap TDB in a talloc context to use it conveniently.
+\end_layout
+
+\begin_layout Subsubsection
+
+\change_inserted 0 1284422656
+Proposed Solution
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1284423065
+The allocation within TDB is not complicated enough to justify the use of
+ talloc, and I am reluctant to force another (excellent) library on TDB
+ users.
+ Nonetheless a compromise is possible.
+ An attribute (see
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "attributes"
+
+\end_inset
+
+) can be added later to tdb_open() to provide an alternate allocation mechanism,
+ specifically for talloc but usable by any other allocator (which would
+ ignore the
+\begin_inset Quotes eld
+\end_inset
+
+context
+\begin_inset Quotes erd
+\end_inset
+
+ argument).
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1284423042
+This would form a talloc heirarchy as expected, but the caller would still
+ have to attach a destructor to the tdb context returned from tdb_open to
+ close it.
+ All TDB_DATA fields would be children of the tdb_context, and the caller
+ would still have to manage them (using talloc_free() or talloc_steal()).
\change_unchanged
\end_layout
\begin_layout Plain Layout
-\change_inserted 0 1283310945
+\change_inserted 0 1284424151
Using
\begin_inset Formula $2^{16+N*3}$
\end_inset
byte zone.
Zones range in factor of 8 steps.
+ Given the zone size for the zone the current record is in, we can determine
+ the start of the zone.
\change_unchanged
\end_layout
\begin_layout Subsubsection
Proposed Solution
+\change_deleted 0 1284423472
+
\end_layout
\begin_layout Standard
\begin_inset Quotes erd
\end_inset
+
+\change_inserted 0 1284423891
+
+\change_deleted 0 1284423891
.
+
+\change_inserted 0 1284423901
+ (but see
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "replay-attribute"
+
+\end_inset
+
+).
+\change_unchanged
+
\end_layout
\begin_layout Standard
\begin_layout Standard
We could then implement snapshots using a similar method, using multiple
different hash tables/free tables.
+\change_inserted 0 1284423495
+
\end_layout
\begin_layout Subsection
\end_layout
\begin_layout Standard
+
+\change_inserted 0 1284424201
+None (but see
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "replay-attribute"
+
+\end_inset
+
+).
+
+\change_unchanged
We could solve a small part of the problem by providing read-only transactions.
These would allow one write transaction to begin, but it could not commit
until all r/o transactions are done.
free lists (perhaps when the array of top-level pointers filled).
On crash, tdb_open() would examine the array of top levels, and apply the
transactions until it encountered an invalid checksum.
+\change_inserted 0 1284423555
+
+\end_layout
+
+\begin_layout Subsection
+
+\change_inserted 0 1284423617
+Tracing Is Fragile, Replay Is External
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1284423719
+The current TDB has compile-time-enabled tracing code, but it often breaks
+ as it is not enabled by default.
+ In a similar way, the ctdb code has an external wrapper which does replay
+ tracing so it can coordinate cluster-wide transactions.
+\end_layout
+
+\begin_layout Subsubsection
+
+\change_inserted 0 1284423864
+Proposed Solution
+\begin_inset CommandInset label
+LatexCommand label
+name "replay-attribute"
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1284423850
+Tridge points out that an attribute can be later added to tdb_open (see
+
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "attributes"
+
+\end_inset
+
+) to provide replay/trace hooks, which could become the basis for this and
+ future parallel transactions and snapshot support.
+\change_unchanged
+
\end_layout
\end_body
@
+1.9
+log
+@Extension mechanism.
+@
+text
+@d56 2
+a57 2
+\change_inserted 0 1284016854
+9-September
+d479 11
+d1303 1
+a1303 1
+\change_inserted 0 1284016847
+d1310 56
+d1945 1
+a1945 1
+\change_inserted 0 1283310945
+d1956 2
+d2402 2
+d2416 4
+d2421 12
+d2455 2
+d2476 12
+d2673 47
+@
+
+
1.8
log
@Remove bogus footnote
Rusty Russell, IBM Corporation
-9-September-2010
+14-September-2010
Abstract
argument. Additional arguments to open would require the
introduction of a tdb_open_ex2 call etc.
-2.1.1 Proposed Solution
+2.1.1 Proposed Solution<attributes>
tdb_open() will take a linked-list of attributes:
the tail, and thus if there is no tail or the first byte is 0, we
would know the extension is not present on that record.
+2.17 TDB Does Not Use Talloc
+
+Many users of TDB (particularly Samba) use the talloc allocator,
+and thus have to wrap TDB in a talloc context to use it
+conveniently.
+
+2.17.1 Proposed Solution
+
+The allocation within TDB is not complicated enough to justify
+the use of talloc, and I am reluctant to force another
+(excellent) library on TDB users. Nonetheless a compromise is
+possible. An attribute (see [attributes]) can be added later to
+tdb_open() to provide an alternate allocation mechanism,
+specifically for talloc but usable by any other allocator (which
+would ignore the “context” argument).
+
+This would form a talloc heirarchy as expected, but the caller
+would still have to attach a destructor to the tdb context
+returned from tdb_open to close it. All TDB_DATA fields would be
+children of the tdb_context, and the caller would still have to
+manage them (using talloc_free() or talloc_steal()).
+
3 Performance And Scalability Issues
3.1 <TDB_CLEAR_IF_FIRST-Imposes-Performance>TDB_CLEAR_IF_FIRST
random zone”, but that's less common). It could be done with as
few as 4 bits from the record header.[footnote:
Using 2^{16+N*3}means 0 gives a minimal 65536-byte zone, 15 gives
-the maximal 2^{61} byte zone. Zones range in factor of 8 steps.
+the maximal 2^{61} byte zone. Zones range in factor of 8 steps.
+Given the zone size for the zone the current record is in, we can
+determine the start of the zone.
]
3.6 <sub:TDB-Becomes-Fragmented>TDB Becomes Fragmented
3.9.1 Proposed Solution
-None. At some point you say “use a real database”.
+None. At some point you say “use a real database” (but see [replay-attribute]
+).
But as a thought experiment, if we implemented transactions to
only overwrite free entries (this is tricky: there must not be a
3.10.1 Proposed Solution
-We could solve a small part of the problem by providing read-only
-transactions. These would allow one write transaction to begin,
-but it could not commit until all r/o transactions are done. This
-would require a new RO_TRANSACTION_LOCK, which would be upgraded
-on commit.
+None (but see [replay-attribute]). We could solve a small part of
+the problem by providing read-only transactions. These would
+allow one write transaction to begin, but it could not commit
+until all r/o transactions are done. This would require a new
+RO_TRANSACTION_LOCK, which would be upgraded on commit.
3.11 Default Hash Function Is Suboptimal
levels, and apply the transactions until it encountered an
invalid checksum.
+3.15 Tracing Is Fragile, Replay Is External
+
+The current TDB has compile-time-enabled tracing code, but it
+often breaks as it is not enabled by default. In a similar way,
+the ctdb code has an external wrapper which does replay tracing
+so it can coordinate cluster-wide transactions.
+
+3.15.1 Proposed Solution<replay-attribute>
+
+Tridge points out that an attribute can be later added to
+tdb_open (see [attributes]) to provide replay/trace hooks, which
+could become the basis for this and future parallel transactions
+and snapshot support.
+