Rusty Russell [Mon, 22 Aug 2011 03:17:08 +0000 (12:47 +0930)]
tdb2: fix intermittant failure in run-50-multiple-freelists-fail.c
layout.c's TDB creation functions were incorrect in case of a hash
collision, causing occasional failure. Make it always use the
(previously-failing) seed value, and fix it.
Joey Adams [Mon, 15 Aug 2011 06:36:33 +0000 (02:36 -0400)]
btree: Changed license from BSD-3 to MIT, and set version to 0.2
NOTE: btree was originally copyright 2010, and has not been
touched by me since then. I don't know if changing the license
to something more permissive requires updating the copyright year
or not.
Joey Adams [Mon, 15 Aug 2011 06:06:17 +0000 (02:06 -0400)]
block_pool: Changed license from BSD-3 to MIT, and set version to 0.1
NOTE: block_pool was originally copyright 2009, and has not been
touched by me since then. I don't know if changing the license
to something more permissive requires updating the copyright year
or not.
Douglas Bagnall [Sat, 13 Aug 2011 12:19:59 +0000 (21:49 +0930)]
opt: incidental comment and whitespace repair
This comment occurred in a couple of places:
/* Set an integer value, various forms. Sets to 1 on arg == NULL. */
One instance was clearly spurious, while the other was misleading.
Another resolution to this mismatch would be to add
"if (arg == NULL){*l = 1; return NULL}" somewhere, but I suspect
it may have been left out/removed because someone thought better.
Douglas Bagnall [Sat, 13 Aug 2011 12:19:59 +0000 (21:49 +0930)]
opt: add integer helpers that accept k, M, G, T, P, E suffixes
These functions come in two flavours: those ending with "_si", which
have 1000-based interpretations of the suffixes; and those ending with
"_bi", which use base 1024. There are versions for signed and
unsigned int, long, and long long destinations, with tests for all 12
new functions. The tests get a bit repetitive, I am afraid.
As an example, if the -x option were using the opt_set_intval_bi
function, then all of these would do the same thing:
quite what that thing is depends on the size of your int -- people
with 16 bit ints would see an "out of range" error message.
The arithmetic for unsigned variations is actually done using signed
long long integers, so the maximum possible value is LLONG_MAX, not
ULLONG_MAX. This follows the practice of existing functions, and
avoids tedious work.
Rusty Russell [Fri, 22 Jul 2011 12:13:39 +0000 (21:43 +0930)]
cast: downgrade license from LGPL3+ to LGPLv2.1+
Kirill A. Shutemov asked for libgit. I would say they should upgrade their
license, but libhx on which these are based is also LGPLv2.1 or later, so
I prefer to match that.
Rusty Russell [Thu, 21 Jul 2011 05:20:00 +0000 (14:50 +0930)]
isaac, crcsync: acknowledge licensing issues.
The recently added ccanlint licensing checks revealed several cases
where the published license of a module is misleading: a dependency of
that module has a stricter license (eg. a public domain module which
depends on a GPL one).
Where these are my modules, I've fixed them. Otherwise I'm overriding
the checks for the moment, and asking the authors what they want to do.
Rusty Russell [Thu, 21 Jul 2011 05:14:49 +0000 (14:44 +0930)]
htable: relicense under LGPL
Various LGPL components depend on it, via ccan/likely. ccan/likely
really only needs it when CCAN_LIKELY_DEBUG is set, but making it a
conditional dependency is a bit nasty if defining that changes the
license.
So this is the simplest fix. I might relicense under PD or BSD later,
since the likely module should probably have an even more liberal
license.
Rusty Russell [Thu, 21 Jul 2011 04:59:06 +0000 (14:29 +0930)]
md4: fix license
As ccanlint now says:
Source files don't contain incompatible licenses (license_file_compat): FAIL
/home/rusty/devel/cvs/ccan/ccan/md4/md4.c:Found boilerplate for license 'GPLv2+' which is incompatible with 'LGPLv2+'
Rusty Russell [Thu, 21 Jul 2011 04:59:03 +0000 (14:29 +0930)]
ccanlint: check for incompatible license boilerplates within subfiles.
This checks to make sure you're not accidentally relicensing code;
eg. it's OK (though a bit impolite) to turn a BSD-licensed file into a
GPL module, but not the other way around.
Rusty Russell [Thu, 21 Jul 2011 03:32:27 +0000 (13:02 +0930)]
ccanlint: add simple check for comment referring to LICENSE file.
After discussion with various developers (particularly the Samba
team), there's a consensus that a reference to the license in each
source file is useful. Since CCAN modules are designed to be cut and
paste, this helps avoid any confusion should the LICENSE file go
missing.
We also detect standard boilerplates, in which case a one-line summary
isn't necessary.
Rusty Russell [Thu, 21 Jul 2011 03:32:04 +0000 (13:02 +0930)]
noerr: relicense to public domain.
We really want everyone to be using these; establishing conventions
helps all code, so make it the most liberal license possible. It's
all my code, so I can do this unilaterally.
Rusty Russell [Thu, 21 Jul 2011 03:31:45 +0000 (13:01 +0930)]
short_types: relicense to public domain.
We really want everyone to be using these; establishing conventions
helps all code, so make it the most liberal license possible. It's
all my code, so I can do this unilaterally.
Rusty Russell [Thu, 21 Jul 2011 03:31:39 +0000 (13:01 +0930)]
compiler: relicense to public domain.
We really want everyone to be using these; establishing conventions
helps all code, so make it the most liberal license possible. It's
all my code, so I can do this unilaterally.
Rusty Russell [Tue, 19 Jul 2011 08:02:40 +0000 (17:32 +0930)]
various: make the _info License: wording uniform for GPL variants.
GPL versions 2 and 3 both specifically mention "any later version" as
the phrase which allows the user to choose to upgrade the license.
Make sure we use that phrase, and make the format consistent across
modules.
Rusty Russell [Mon, 4 Jul 2011 07:27:03 +0000 (16:57 +0930)]
tap: WANT_PTHREAD not HAVE_PTHREAD
I'm not sure that a "pthread-safe" tap library is very useful; how many
people have multiple threads calling ok()?
Kirill Shutemov noted that it gives a warning with -Wundef; indeed, we
should ask in this case whether they want pthread support, not whether the
system has pthread support to offer.
Russell Steicke [Fri, 17 Jun 2011 07:42:13 +0000 (15:42 +0800)]
antithread: patch to antithread arabella example
I've been using the antithread arabella example to generate some
"arty" portraits for decoration. I've made a few changes to it
(triangle sizes and number of generations before giving up), and may
send those as patches later.
Because some of the images I'm generating have taken quite a while
(many days) I've needed to restart the run after rebooting machines
for other reasons, and noticed that arabella restarted the generation
count from zero. I wanted to continue the generation count, so here's
a patch to do just that.
Rusty Russell [Fri, 20 May 2011 06:23:12 +0000 (15:53 +0930)]
tdb2: fix O_RDONLY opens.
We tried to get a F_WRLCK on the open lock; we shouldn't do that for a
read-only tdb. (TDB1 gets away with it because a read-only open skips
all locking).
We also avoid leaking the fd in two tdb_open() failure paths revealed
by this extra testing.
Rusty Russell [Tue, 10 May 2011 01:37:21 +0000 (11:07 +0930)]
tdb2: check pid before unlock.
The original code assumed that unlocking would fail if we didn't have a lock;
this isn't true (at least, on my machine). So we have to always check the
pid before unlocking.
Rusty Russell [Wed, 27 Apr 2011 13:41:02 +0000 (23:11 +0930)]
tdb2: use direct access functions when creating recovery blob
We don't need to copy into a buffer to examine the old data: in the
common case, it's mmaped already. It's made a bit trickier because
the tdb_access_read() function uses the current I/O methods, so we
need to restore that temporarily.
The difference was in the noise, however (the sync no-doubt
dominates).
Before:
$ time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
real 0m45.021s
user 0m16.261s
sys 0m2.432s
-rw------- 1 rusty rusty 364469344 2011-04-27 22:55 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Wed, 27 Apr 2011 13:40:24 +0000 (23:10 +0930)]
tdb2: enlarge transaction pagesize to 64k
We don't need to use 4k for our transaction pages; we can use any
value. For the tools/speed benchmark, any value between about 4k and
64M makes no difference, but that's probably because the entire
database is touched in each transaction.
So instead, I looked at tdbtorture to try to find an optimum value, as
it uses smaller transactions. 4k and 64k were equivalent. 16M was
almost three times slower, 1M was 5-10% slower. 1024 was also 5-10%
slower.
There's a slight advantage of having larger pages, both for allowing
direct access to the database (if it's all in one page we can sometimes
grant direct access even inside a transaction) and for the compactness
of our recovery area (since our code is naive and won't combine one
run across pages).
Before:
$ time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
real 0m47.127s
user 0m17.125s
sys 0m2.456s
-rw------- 1 rusty rusty 366680288 2011-04-27 21:34 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Wed, 27 Apr 2011 13:26:27 +0000 (22:56 +0930)]
tdb2: try to fit transactions in existing space before we expand.
Currently we use the worst-case-possible size for the recovery area.
Instead, prepare the recovery data, then see whether it's too large.
Note that this currently works out to make the database *larger* on
our speed benchmark, since we happen to need to enlarge the recovery
area at the wrong time now, rather than the old case where its already
hugely oversized.
Before:
$ time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
real 0m50.366s
user 0m17.109s
sys 0m2.468s
-rw------- 1 rusty rusty 564215952 2011-04-27 21:31 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Wed, 27 Apr 2011 12:17:58 +0000 (21:47 +0930)]
tdb2: reduce transaction before writing to recovery area.
We don't need to write the whole page to the recovery area if it
hasn't all changed. Simply skipping the start and end of the pages
which are similar saves us about 20% on growtdb-bench 250000, and 45%
on tdbtorture. The more thorough examination of page differences
gives us a saving of 90% on growtdb-bench and 98% on tdbtorture!
And we do win a bit on timings for transaction commit:
Before:
$ time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
real 1m4.844s
user 0m15.537s
sys 0m3.796s
-rw------- 1 rusty rusty 626693096 2011-04-27 21:28 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Thu, 21 Apr 2011 01:46:35 +0000 (11:16 +0930)]
tdb2: handle non-transaction-page-aligned sizes in recovery.
tdb1 always makes the tdb a multiple of the transaction page size,
tdb2 doesn't. This means that if a transaction hits the exact end of
the file, we might need to save off a partial page.
So that we don't have to rewrite tdb_recovery_size() too, we simply do
a short read and memset the unused section to 0 (to keep valgrind
happy).
Rusty Russell [Wed, 27 Apr 2011 12:14:16 +0000 (21:44 +0930)]
tdb2: use counters to decide when to coalesce records.
This simply uses a 7 bit counter which gets incremented on each addition
to the list (but not decremented on removals). When it wraps, we walk the
entire list looking for things to coalesce.
This causes performance problems, especially when appending records, so
we limit it in the next patch:
Before:
$ time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
real 0m59.687s
user 0m11.593s
sys 0m4.100s
-rw------- 1 rusty rusty 752004064 2011-04-27 21:14 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Wed, 27 Apr 2011 12:12:58 +0000 (21:42 +0930)]
tdb2: overallocate the recovery area.
I noticed a counter-intuitive phenomenon as I tweaked the coalescing
code: the more coalescing we did, the larger the tdb grew! This was
measured using "growtdb-bench 250000 10".
The cause: more coalescing means larger transactions, and every time
we do a larger transaction, we need to allocate a larger recovery
area. The only way to do this is to append to the file, so the file
keeps growing, even though it's mainly unused!
Overallocating by 25% seems reasonable, and gives better results in
such benchmarks.
The real fix is to reduce the transaction to a run-length based format
rather then the naive block system used now.
Before:
$ time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
real 0m57.403s
user 0m11.361s
sys 0m4.056s
-rw------- 1 rusty rusty 689536976 2011-04-27 21:10 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Wed, 27 Apr 2011 12:13:23 +0000 (21:43 +0930)]
tdb2: don't start again when we coalesce a record.
We currently start walking the free list again when we coalesce any record;
this is overzealous, as we only care about the next record being blatted,
or the record we currently consider "best".
We can also opportunistically try to add the coalesced record into the
new free list: if it fails, we go back to the old "mark record,
unlock, re-lock" code.
Before:
$ time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
real 1m0.243s
user 0m13.677s
sys 0m4.336s
-rw------- 1 rusty rusty 683302864 2011-04-27 21:03 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Wed, 27 Apr 2011 12:09:27 +0000 (21:39 +0930)]
tdb2: expand more slowly.
We took the original expansion heuristic from TDB1, and they just
fixed theirs, so copy that.
Before:
After:
time ./growtdb-bench 250000 10 > /dev/null && ls -l /tmp/growtdb.tdb && time ./tdbtorture -s 0 && ls -l torture.tdb && ./speed --transaction 2000000
growtdb-bench.c: In function ‘main’:
growtdb-bench.c:74:8: warning: ignoring return value of ‘system’, declared with attribute warn_unused_result
growtdb-bench.c:108:9: warning: ignoring return value of ‘system’, declared with attribute warn_unused_result
real 1m0.243s
user 0m13.677s
sys 0m4.336s
-rw------- 1 rusty rusty 683302864 2011-04-27 21:03 /tmp/growtdb.tdb
testing with 3 processes, 5000 loops, seed=0
OK
Rusty Russell [Thu, 7 Apr 2011 01:29:45 +0000 (10:59 +0930)]
tdb2: allow transaction to nest.
This is definitely a bad idea in general, but SAMBA uses nested transactions
in many and varied ways (some of them probably reflect real bugs) and it's
far easier to support them inside tdb2 with a flag.
We already have part of the TDB1 infrastructure in place, so this patch
just completes it and fixes one place where I'd messed it up.
Rusty Russell [Wed, 27 Apr 2011 11:18:39 +0000 (20:48 +0930)]
tdb2: allow multiple chain locks.
It's probably not a good idea, because it's a recipe for deadlocks if
anyone else grabs any *other* two chainlocks, or the allrecord lock,
but SAMBA definitely does it, so allow it as TDB1 does.
Rusty Russell [Wed, 27 Apr 2011 13:51:32 +0000 (23:21 +0930)]
tdb2: TDB_ATTRIBUTE_STATS access via tdb_get_attribute.
Now we have tdb_get_attribute, it makes sense to make that the method
of accessing statistics. That way they are always available, and it's
probably cheaper doing the direct increment than even the unlikely()
branch.
Rusty Russell [Wed, 6 Apr 2011 23:00:39 +0000 (08:30 +0930)]
tdb2: don't cancel transaction when tdb_transaction_prepare_commit fails
And don't double-log. Both of these cause problems if we want to do
tdb_transaction_prepare_commit non-blocking (and have it fail so we can
try again).
Rusty Russell [Thu, 7 Apr 2011 04:21:54 +0000 (13:51 +0930)]
tdb2: open hook for implementing TDB_CLEAR_IF_FIRST
This allows the caller to implement clear-if-first semantics as per
TDB1. The flag was removed for good reasons: performance and
unreliability, but SAMBA3 still uses it widely, so this allows them to
reimplement it themselves.
(There is no way to do it without help like this from tdb2, since it has
to be done under the open lock).
Rusty Russell [Tue, 10 May 2011 01:45:04 +0000 (11:15 +0930)]
tdb2: cleanups for tools/speed.c
1) The logging function needs to append a \n.
2) The transaction start code should be after the comment and print.
3) We should run tdb_check to make sure the database is OK after each op.