Rusty Russell [Mon, 4 Jul 2011 07:27:03 +0000 (16:57 +0930)]
tap: WANT_PTHREAD not HAVE_PTHREAD
I'm not sure that a "pthread-safe" tap library is very useful; how many
people have multiple threads calling ok()?
Kirill Shutemov noted that it gives a warning with -Wundef; indeed, we
should ask in this case whether they want pthread support, not whether the
system has pthread support to offer.
Russell Steicke [Fri, 17 Jun 2011 07:42:13 +0000 (15:42 +0800)]
antithread: patch to antithread arabella example
I've been using the antithread arabella example to generate some
"arty" portraits for decoration. I've made a few changes to it
(triangle sizes and number of generations before giving up), and may
send those as patches later.
Because some of the images I'm generating have taken quite a while
(many days) I've needed to restart the run after rebooting machines
for other reasons, and noticed that arabella restarted the generation
count from zero. I wanted to continue the generation count, so here's
a patch to do just that.
Rusty Russell [Fri, 20 May 2011 06:23:12 +0000 (15:53 +0930)]
tdb2: fix O_RDONLY opens.
We tried to get a F_WRLCK on the open lock; we shouldn't do that for a
read-only tdb. (TDB1 gets away with it because a read-only open skips
all locking).
We also avoid leaking the fd in two tdb_open() failure paths revealed
by this extra testing.
Rusty Russell [Tue, 10 May 2011 01:37:21 +0000 (11:07 +0930)]
tdb2: check pid before unlock.
The original code assumed that unlocking would fail if we didn't have a lock;
this isn't true (at least, on my machine). So we have to always check the
pid before unlocking.
Rusty Russell [Wed, 27 Apr 2011 13:41:02 +0000 (23:11 +0930)]
tdb2: use direct access functions when creating recovery blob
We don't need to copy into a buffer to examine the old data: in the
common case, it's mmaped already. It's made a bit trickier because
the tdb_access_read() function uses the current I/O methods, so we
need to restore that temporarily.
The difference was in the noise, however (the sync no-doubt
dominates).
Before:
$ time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
real 0m45.021s
user 0m16.261s
sys 0m2.432s
-rw------- 1 rusty rusty 364469344 2011-04-27 22:55 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Wed, 27 Apr 2011 13:40:24 +0000 (23:10 +0930)]
tdb2: enlarge transaction pagesize to 64k
We don't need to use 4k for our transaction pages; we can use any
value. For the tools/speed benchmark, any value between about 4k and
64M makes no difference, but that's probably because the entire
database is touched in each transaction.
So instead, I looked at tdbtorture to try to find an optimum value, as
it uses smaller transactions. 4k and 64k were equivalent. 16M was
almost three times slower, 1M was 5-10% slower. 1024 was also 5-10%
slower.
There's a slight advantage of having larger pages, both for allowing
direct access to the database (if it's all in one page we can sometimes
grant direct access even inside a transaction) and for the compactness
of our recovery area (since our code is naive and won't combine one
run across pages).
Before:
$ time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
real 0m47.127s
user 0m17.125s
sys 0m2.456s
-rw------- 1 rusty rusty 366680288 2011-04-27 21:34 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Wed, 27 Apr 2011 13:26:27 +0000 (22:56 +0930)]
tdb2: try to fit transactions in existing space before we expand.
Currently we use the worst-case-possible size for the recovery area.
Instead, prepare the recovery data, then see whether it's too large.
Note that this currently works out to make the database *larger* on
our speed benchmark, since we happen to need to enlarge the recovery
area at the wrong time now, rather than the old case where its already
hugely oversized.
Before:
$ time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
real 0m50.366s
user 0m17.109s
sys 0m2.468s
-rw------- 1 rusty rusty 564215952 2011-04-27 21:31 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Wed, 27 Apr 2011 12:17:58 +0000 (21:47 +0930)]
tdb2: reduce transaction before writing to recovery area.
We don't need to write the whole page to the recovery area if it
hasn't all changed. Simply skipping the start and end of the pages
which are similar saves us about 20% on growtdb-bench 250000, and 45%
on tdbtorture. The more thorough examination of page differences
gives us a saving of 90% on growtdb-bench and 98% on tdbtorture!
And we do win a bit on timings for transaction commit:
Before:
$ time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
real 1m4.844s
user 0m15.537s
sys 0m3.796s
-rw------- 1 rusty rusty 626693096 2011-04-27 21:28 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Thu, 21 Apr 2011 01:46:35 +0000 (11:16 +0930)]
tdb2: handle non-transaction-page-aligned sizes in recovery.
tdb1 always makes the tdb a multiple of the transaction page size,
tdb2 doesn't. This means that if a transaction hits the exact end of
the file, we might need to save off a partial page.
So that we don't have to rewrite tdb_recovery_size() too, we simply do
a short read and memset the unused section to 0 (to keep valgrind
happy).
Rusty Russell [Wed, 27 Apr 2011 12:14:16 +0000 (21:44 +0930)]
tdb2: use counters to decide when to coalesce records.
This simply uses a 7 bit counter which gets incremented on each addition
to the list (but not decremented on removals). When it wraps, we walk the
entire list looking for things to coalesce.
This causes performance problems, especially when appending records, so
we limit it in the next patch:
Before:
$ time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
real 0m59.687s
user 0m11.593s
sys 0m4.100s
-rw------- 1 rusty rusty 752004064 2011-04-27 21:14 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Wed, 27 Apr 2011 12:12:58 +0000 (21:42 +0930)]
tdb2: overallocate the recovery area.
I noticed a counter-intuitive phenomenon as I tweaked the coalescing
code: the more coalescing we did, the larger the tdb grew! This was
measured using "growtdb-bench 250000 10".
The cause: more coalescing means larger transactions, and every time
we do a larger transaction, we need to allocate a larger recovery
area. The only way to do this is to append to the file, so the file
keeps growing, even though it's mainly unused!
Overallocating by 25% seems reasonable, and gives better results in
such benchmarks.
The real fix is to reduce the transaction to a run-length based format
rather then the naive block system used now.
Before:
$ time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
real 0m57.403s
user 0m11.361s
sys 0m4.056s
-rw------- 1 rusty rusty 689536976 2011-04-27 21:10 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Wed, 27 Apr 2011 12:13:23 +0000 (21:43 +0930)]
tdb2: don't start again when we coalesce a record.
We currently start walking the free list again when we coalesce any record;
this is overzealous, as we only care about the next record being blatted,
or the record we currently consider "best".
We can also opportunistically try to add the coalesced record into the
new free list: if it fails, we go back to the old "mark record,
unlock, re-lock" code.
Before:
$ time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
real 1m0.243s
user 0m13.677s
sys 0m4.336s
-rw------- 1 rusty rusty 683302864 2011-04-27 21:03 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Wed, 27 Apr 2011 12:09:27 +0000 (21:39 +0930)]
tdb2: expand more slowly.
We took the original expansion heuristic from TDB1, and they just
fixed theirs, so copy that.
Before:
After:
time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
growtdb-bench.c: In function āmainā:
growtdb-bench.c:74:8: warning: ignoring return value of āsystemā, declared with attribute warn_unused_result
growtdb-bench.c:108:9: warning: ignoring return value of āsystemā, declared with attribute warn_unused_result
real 1m0.243s
user 0m13.677s
sys 0m4.336s
-rw------- 1 rusty rusty 683302864 2011-04-27 21:03 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Thu, 7 Apr 2011 01:29:45 +0000 (10:59 +0930)]
tdb2: allow transaction to nest.
This is definitely a bad idea in general, but SAMBA uses nested transactions
in many and varied ways (some of them probably reflect real bugs) and it's
far easier to support them inside tdb2 with a flag.
We already have part of the TDB1 infrastructure in place, so this patch
just completes it and fixes one place where I'd messed it up.
Rusty Russell [Wed, 27 Apr 2011 11:18:39 +0000 (20:48 +0930)]
tdb2: allow multiple chain locks.
It's probably not a good idea, because it's a recipe for deadlocks if
anyone else grabs any *other* two chainlocks, or the allrecord lock,
but SAMBA definitely does it, so allow it as TDB1 does.
Rusty Russell [Wed, 27 Apr 2011 13:51:32 +0000 (23:21 +0930)]
tdb2: TDB_ATTRIBUTE_STATS access via tdb_get_attribute.
Now we have tdb_get_attribute, it makes sense to make that the method
of accessing statistics. That way they are always available, and it's
probably cheaper doing the direct increment than even the unlikely()
branch.
Rusty Russell [Wed, 6 Apr 2011 23:00:39 +0000 (08:30 +0930)]
tdb2: don't cancel transaction when tdb_transaction_prepare_commit fails
And don't double-log. Both of these cause problems if we want to do
tdb_transaction_prepare_commit non-blocking (and have it fail so we can
try again).
Rusty Russell [Thu, 7 Apr 2011 04:21:54 +0000 (13:51 +0930)]
tdb2: open hook for implementing TDB_CLEAR_IF_FIRST
This allows the caller to implement clear-if-first semantics as per
TDB1. The flag was removed for good reasons: performance and
unreliability, but SAMBA3 still uses it widely, so this allows them to
reimplement it themselves.
(There is no way to do it without help like this from tdb2, since it has
to be done under the open lock).
Rusty Russell [Tue, 10 May 2011 01:45:04 +0000 (11:15 +0930)]
tdb2: cleanups for tools/speed.c
1) The logging function needs to append a \n.
2) The transaction start code should be after the comment and print.
3) We should run tdb_check to make sure the database is OK after each op.
Rusty Russell [Tue, 19 Apr 2011 13:34:13 +0000 (23:04 +0930)]
tdb2: fix tdb_summary reports
1) Fix the bogus reporting of uncoalesced runs: there has to be more than 1
free record to make a "run", and any other record interrups the run.
2) Fix the bogus data count in the top line (which was number of records,
not bytes).
3) Remove the count of free buckets: it's now a constant.
Rusty Russell [Thu, 28 Apr 2011 03:52:07 +0000 (13:22 +0930)]
compiler: don't override existing definitions.
It's common when integrating CCAN into existing projects that they define
such constants for themselves. In an ideal world, the entire project
switches to one set of definitions, but for large projects (eg SAMBA)
that's not realistic and conflicts with the aim of making CCAN modules
easy to "drop in" to existing code.
(This is a generalization of Douglas Bagnell's patch sent in Message-ID:
<4DB8D00D.8000800@paradise.net.nz>).
Andreas Schlick [Tue, 26 Apr 2011 15:07:49 +0000 (17:07 +0200)]
ciniparser: Add a check that len remains within bounds.
If the line is made solely of whitespaces, they get removed and we end up with len
set to -1. Therefore we need an additional check to avoid dereferencing line[-1].
Rusty Russell [Wed, 6 Apr 2011 22:54:41 +0000 (08:24 +0930)]
typesafe_cb: simplify, preserve namespace.
Get rid of many variants, which were just confusing for most people.
Keep typesafe_cb(), typesafe_cb_preargs() and typesafe_cb_postarts(),
and rework cast_if_type() into typesafe_cb_cast() so we stay in our
namespace.
I should have done this as soon as I discovered the limitation that
the types have to be defined if I want const-taking callbacks.
Andreas Schlick [Sat, 2 Apr 2011 01:30:36 +0000 (12:00 +1030)]
asort: Remove the trampoline version.
Most Linux systems use glibc and therefore have qsort_r() and for the others an
implementation is provided, therefore there is no need to have a third case in
asort.
Rusty Russell [Mon, 28 Mar 2011 03:58:22 +0000 (14:28 +1030)]
failtest: continue (without failing) if we run off end of --failpath=
This is important because we tell people to use --failpath to reproduce a
failure, but the fail path we list only goes up to the last failure injection
if the child dies, for example.
Thanks to David Gibson for prodding me to fix this...
Rusty Russell [Mon, 28 Mar 2011 03:48:49 +0000 (14:18 +1030)]
tdb2: make head of free list point to last entry.
This allows for access to the end of the list, in case we need it
later (eg. moving free list entries to the end of the list after
failed coalescing?).
Here's the effect (average of 10) on x8664 with gcc 4.1.2, with an earlier
tree and 1000000:
Before:
Adding 830.1ns
Finding 428.3ns
Missing 250.3ns
Traversing 387.4ns
Deleting 588.8ns
Re-adding 737.2ns
Appending 1141.4ns
Churning 1792ns
Rusty Russell [Fri, 25 Mar 2011 05:39:05 +0000 (16:09 +1030)]
tdb2: fix traversal bug in free list lock_and_alloc()
We keep looking even if the current best is exactly right. This is
really bad, because our smaller free lists hold exactly one size: with
this bug we iterate to the end of the list before hitting the end and
deciding we can use it after all.
Rusty Russell [Thu, 24 Mar 2011 03:40:22 +0000 (14:10 +1030)]
tdb2: fix use after free on error message
We use "r" after we call tdb_access_release() when we find corruption
in the free list. "r" may be a pointer into malloced memory, freed
by tdb_access_release().
Rusty Russell [Thu, 24 Mar 2011 01:12:21 +0000 (11:42 +1030)]
tdb2: fix two transaction bugs.
One but were we didn't update the map_size if we expanded the
transaction but didn't create a new recovery area (most easily
reproduced by setting the TDB_NOSYNC flag).
Another out-by-one bug in transaction_direct where we would give
read-access to the underlying file despite the last block having been
modified.
Rusty Russell [Thu, 24 Mar 2011 01:24:43 +0000 (11:54 +1030)]
tdb2: check PID if we are holding a lock.
tdb1 has tdb_reopen/tdb_reopen_all, which you are supposed to call
after a fork(). But we can detect PID changes whenever we grab a lock,
which is to say, before we do anything.
We currently assume that pread and pwrite work, so the only problem
with fork() is if we are holding locks and expect them to still be
there in the child. This is unusual, but most likely caused by a
fork() inside a transaction. In such cases, allow the child to unlock
without complaint; they can then use the TDB as expected.
Performance before and after shows no measurable speed difference:
Total of 5 runs of tools/speed 100000:
Before:
Adding: 18899ns
Finding: 7040ns
Missing: 5802ns
Traversing: 6466ns
Deleting: 12607ns
Re-adding: 14185ns
Appending: 20770ns
Churning: 93845ns
Rusty Russell [Fri, 25 Mar 2011 03:21:20 +0000 (13:51 +1030)]
ccanlint: fix listing of dependencies
gcc gave a warning:
tools/ccanlint/ccanlint.c:230:19: error: ācā may be used uninitialized in this function
Which indicated that test dependency printing was broken: we need to
loop through the tests! Also, we haven't parsed options yet, so
verbose is never true: move it to later and make it depend on -vvv.
David Gibson [Fri, 18 Mar 2011 08:29:16 +0000 (19:29 +1100)]
Add configurator test for memmem()
memmem(), which searches for one bytestring in a longer
bytestring is one of those functions that should be standard, but
isn't. This patch adds a ccan configurator test for the function
so that modules can use or replace it as necessary.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>