monotone

monotone Mtn Change Log

Age Message
12 years 9 months Skip calling select again if we processed some data from the remote

side. Somehow the call to select with a 1us timeout ends up waiting a
bunch of the time leading to idleness in the client (on a pull).
Oddly making this change significantly increases the amount of user
time; haven't investigated why as the tradeoff for reduced wall clock
time is a win. Best guess is that because the client is running
faster, it makes more recv calls for less data.

netsync.cc: move the handling of armed above the probe; go back
around the loop if we successfully processed something.

original
real 10m27.918s user 6m19.432s sys 0m6.404s
real 10m21.848s user 6m17.436s sys 0m6.284s

patched:
real 8m10.310s user 6m49.878s sys 0m11.125s
real 8m16.353s user 6m57.018s sys 0m11.473s
Commit 5cc1ed0346c5129ddd77ae895985bf743f349cae, by anderse-monotone@cello.hpl.hp.com
12 years 9 months adler32.hh, xdelta.cc: Cosmetic fixes to whitespace and expansion of comments.
Commit 40accd16c45a4195e207b91c1d9dec6d6a8170cb, by anderse-monotone@cello.hpl.hp.com
12 years 9 months Improvements to the xdelta code in order to take advantage of the fact

that we are always using a relatively small window for the adler32
hash, and that when we skip forward, we normally skip forward by alot
so it's faster to just recompute the rolling checksum on the new data
than actually move the rolling checksum forward. 1.03x improvement in
cpu usage on the client, 1.12x improvement in cpu usage on the server.

adler32.hh: move a bunch of things into being constants, and migrate
to just doing a direct cast of the character rather than the widen()
stuff because the widen does a bunch of unnecessary masking. Code
added to xdelta.cc to verify correctness of this transform. Optimize
the insertion of a bunch of values for the case where the count is
small; just put an invariant in place to require that condition as
it's the only one used. For the small update case, we can skip a lot
of the masking that we would otherwise have to do as nothing can
overflow.

xdelta.cc: mark the blocksz as constant (it is), remove the unused hi
variable from compute_delta_insns. Specialize the xdelta advancement
code to handle the case where we advance by a lot by skipping over the
intermediate characters rather than pulling the rolling checksum
forward the entire way. Add in some simple code to verify that the
widen transform we made is valid.

Statistic ref-2245 xdelta
pull-avg-resident-MiB 46.69 ? 46.78
pull-avg-size-MiB 52.21 ? 52.20
pull-max-resident-MiB 65.32 < 65.89
pull-max-size-MiB 72.01 ? 72.22
pull-system-time 20.44 > 19.67
pull-user-time 676.79 > 657.13
pull-wall-time 919.58 > 895.51
server-avg-resident-MiB 53.41 ? 53.32
server-avg-size-MiB 58.66 ? 58.57
server-max-resident-MiB 59.56 ? 59.43
server-max-size-MiB 65.02 ? 64.98
server-system-time 19.55 < 20.61
server-user-time 190.12 > 166.96
server-wall-time 922.73 > 898.66
Commit 5dc45de4c282a4e93ad1384a68b38740728cd0cb, by anderse-monotone@cello.hpl.hp.com
12 years 9 months Modify whitespace trimming used to make cert id's to append to an

existing string rather than constructing a new string and appending
the new string to the existing one. 1.01x cpu reduction on client,
1.01x cpu reduction on server.

cert.cc: use the new function.
simplestring_xform.cc: define function that appends a whitespace-cleared
string to another string, modify existing whitespace clearing function
to use the above.
simplestring_xform.hh: define new function.

ref-e1a7 whitespace cleanup
pull-avg-resident-MiB 46.81 ? 46.59
pull-avg-size-MiB 52.25 ? 52.03
pull-max-resident-MiB 65.59 ? 65.98
pull-max-size-MiB 72.21 ? 72.59
pull-system-time 22.18 > 21.09
pull-user-time 691.40 > 687.00
pull-wall-time 931.83 ? 929.23
server-avg-resident-MiB 53.13 ? 53.32
server-avg-size-MiB 58.36 ? 58.56
server-max-resident-MiB 59.54 ? 59.43
server-max-size-MiB 65.09 ? 64.82
server-system-time 22.02 ? 21.16
server-user-time 193.65 > 192.19
server-wall-time 934.98 ? 932.38
Commit 22457095fd36ea02a44652d45b5e7a6788cdea06, by anderse-monotone@cello.hpl.hp.com
12 years 9 months Remove zeroing of memory used by Botan to do compression. 1.06x

reduction in client time, 1.02x in server time.

gzip.cpp: if we are not paranoid, don't use the paranoid memory
allocation functions.

init.{cpp,h}: define a variable for how paranoid we are going to be
about zeroing memory.

monotone.cc: set that we are not paranoid.

Statistic ref-4d38 zlib-patch
pull-avg-resident-MiB 46.64 ? 46.81
pull-avg-size-MiB 52.03 ? 52.25
pull-max-resident-MiB 65.93 ? 65.59
pull-max-size-MiB 72.46 ? 72.21
pull-system-time 23.04 > 22.18
pull-user-time 734.72 > 691.40
pull-wall-time 976.10 > 931.83
server-avg-resident-MiB 53.00 ? 53.13
server-avg-size-MiB 58.23 ? 58.36
server-max-resident-MiB 59.20 ? 59.54
server-max-size-MiB 64.73 ? 65.09
server-system-time 21.44 ? 22.02
server-user-time 198.78 > 193.65
server-wall-time 979.25 > 934.98
Commit e1a721eb1b1bf8d64229419ac1f73bda0a855590, by anderse-monotone@cello.hpl.hp.com
12 years 9 months Move the verify function to be inline so that it disappears from callgrind

output to make it easier to find real problems. A non-statistically
significant reduction in user time on client and server.

Mean1 from results-pull/pull-ref-464e-memtime/stats.csv
Mean2 from results-pull/pull-verify-memtime/stats.csv
Statistic Mean1 Mean2
pull-avg-resident-MiB 46.82 ? 46.64
pull-avg-size-MiB 52.24 > 52.03
pull-max-resident-MiB 65.63 ? 65.93
pull-max-size-MiB 71.92 ? 72.46
pull-system-time 23.17 ? 23.04
pull-user-time 735.76 ? 734.72
pull-wall-time 976.53 ? 976.10
server-avg-resident-MiB 53.06 ? 53.00
server-avg-size-MiB 58.34 ? 58.23
server-max-resident-MiB 59.76 ? 59.20
server-max-size-MiB 65.47 ? 64.73
server-system-time 21.45 ? 21.44
server-user-time 198.95 ? 198.78
server-wall-time 979.69 ? 979.25
Commit 4d389c13b3bb1235c720b8392f9574f1ddb72d13, by anderse-monotone@cello.hpl.hp.com
12 years 9 months * Patch to add in binary rosters; substantial (1.2x) speed

improvement for the client on pull, some speed improvement on
annotate (only informally tested, matters much more when annotating
a file near the end of the roster than the beginning). A wash on the
server, although I haven't tested serving with an all-binary roster
database.

cmd_merging.cc: call the ascii version of the roster_print routine for the
regresison test.

fake_pthread.c: fixups to eliminate compiler warnings

roster.cc: All sorts of changes to deal with the binary version of rosters
format for the binary version is almost the same as for ascii, but the
netio routines are used rather than the printing routines, the markers
for different types are single characters, and a length is included in
each record so that it is easy to skip through records when scanning
for a particular one. An earlier attempt put an index at the end of the
roster, and while this also worked to skip through quickly, it had the
downside that the database increased in size by ~10%.

roster.hh: binary version of the printing routines and the explicit ascii
version

Statistic Mean1 Mean2
pull-avg-resident-MiB 49.58 ? 49.29
pull-avg-size-MiB 55.08 < 56.65
pull-max-resident-MiB 73.31 < 74.61
pull-max-size-MiB 79.20 < 80.91
pull-system-time 18.44 < 19.44
pull-user-time 737.24 < 879.79
pull-wall-time 969.47 < 1110.88
server-avg-resident-MiB 53.47 ? 53.66
server-avg-size-MiB 58.77 ? 58.82
server-max-resident-MiB 59.29 ? 59.31
server-max-size-MiB 64.69 ? 64.83
server-system-time 27.34 ? 28.28
server-user-time 207.30 ? 206.19
server-wall-time 972.64 < 1114.76
Commit 464e510af4959231ff63352c902c689b0f1687aa, by anderse-monotone@cello.hpl.hp.com
12 years 9 months merge of '4d7d4d7b60a52b709486285c830a32e47825fd45'

and 'd6ac464bec394bf665ed8207a169c9ecdb7bbc05'
Commit dfe4be7fb03de1066790da0095fb2abe3f3017e6, by anderse-monotone@cello.hpl.hp.com
12 years 9 months * Add in the fake-pthread hack to fake up pthread calls with no-ops so that

programs that don't really need pthreads but are forced to link by a
shared library dependency don't suffer. 1.20x performance improvement

fake_pthread.c: All the faked up calls
Makefile.am, configure.ac: Enable use of fake-pthread with --enable-fakepthread

Statistic ref-d1e5 fake-pthread
pull-avg-resident-MiB 49.07 > 48.49
pull-avg-size-MiB 56.41 > 55.82
pull-max-resident-MiB 74.79 ? 74.66
pull-max-size-MiB 81.15 ? 81.00
pull-system-time 11.36 ? 11.48
pull-user-time 1109.36 > 922.06
pull-wall-time 1619.89 > 1429.07
server-avg-resident-MiB 54.02 ? 53.51
server-avg-size-MiB 59.29 ? 58.78
server-max-resident-MiB 59.73 ? 59.19
server-max-size-MiB 65.06 > 64.28
server-system-time 26.02 ? 25.94
server-user-time 228.76 > 216.60
server-wall-time 1623.11 > 1432.27
Commit d6ac464bec394bf665ed8207a169c9ecdb7bbc05, by anderse-monotone@cello.hpl.hp.com
12 years 9 months * Allow for size of the vcache to be set by a lua hook so that people

can choose their tradeoff between memory and cpu usage.

app_state.cc: call db.set_vcache_max_size() after we load the rc files.
Can't call it when we init the db thing because rc files haven't been
loaded yet.

constants.{cc,hh}: make db_version_cache_sz not const

database.{cc,hh}: function to set the max vcache size

lru_cache.h: allow us to set the max vcache size

lua_hooks.{cc,hh}: function to get the lua max vcache size. Perhaps
this should take some argument to say what operation we are doing.

Statistic default 32MiB 128MiB
pull-avg-resident-MiB 49.59 < 89.50 < 228.81
pull-avg-size-MiB 56.93 < 102.10 < 258.91
pull-max-resident-MiB 74.40 < 114.19 < 268.95
pull-max-size-MiB 80.58 < 125.55 < 308.79
pull-system-time 21.18 > 19.75 > 19.14
pull-user-time 1126.80 > 1112.72 < 1124.99
pull-wall-time 1436.74 > 1385.77 ? 1397.35
server-avg-resident-MiB 53.96 < 86.14 < 196.27
server-avg-size-MiB 59.19 < 91.41 < 201.59
server-max-resident-MiB 59.94 < 102.96 < 223.86
server-max-size-MiB 65.58 < 108.37 < 229.77
server-system-time 29.11 > 9.26 > 3.71
server-user-time 239.75 > 167.52 > 125.67
server-wall-time 1439.97 > 1389.00 ? 1400.62
Commit 27c06ef5b20ade167011e489bc2e5333eed00faf, by anderse-monotone@cello.hpl.hp.com
12 years 9 months xdelta.cc: handle the space doubling of appending a single character at

a time since both libstdc++6-4.0 and libstdc++5-3.3 both get this wrong.
Calculate the space needed by a merged string and set the output string
to that size to avoid unnecessary copies during space doubling.
The combination of these two happens to slightly reduce memory usage,
probably mostly a result of reduced fragmentation. Negligable benefit
on the client, 1.12x cpu usage reduction on server.

Benchmark pull(monotone.db)

Statistic ref-997a test
pull-avg-resident-MiB 49.29 ? 49.30
pull-avg-size-MiB 56.78 ? 56.65
pull-max-resident-MiB 74.96 > 74.52
pull-max-size-MiB 83.79 > 80.86
pull-system-time 19.42 ? 19.09
pull-user-time 1027.69 ? 1022.54
pull-wall-time 1310.81 > 1301.62
server-avg-resident-MiB 55.25 > 54.18
server-avg-size-MiB 61.21 > 59.35
server-max-resident-MiB 62.08 > 60.14
server-max-size-MiB 74.04 > 65.46
server-system-time 25.71 ? 26.58
server-user-time 248.87 > 218.63
server-wall-time 1314.01 > 1304.81
Commit d1e563447ab7d92bb965eb330988c637f3c22d14, by anderse-monotone@cello.hpl.hp.com
12 years 9 months Enable optional compilation with openssl libcrypto for the optimized SHA1

hash.

Makefile.am: add -DWITH_CRYPTO if we are compiling with libcrypto
mdx_hash.h: make add_data, final_result virtual so we can reduce
unnecessary copies and get the max benefit from libcrypto
sha160.cpp: Two implementations in one file, one WITH_CRYPTO, one without
sha160.h: if WITH_CRYPTO, override add_data, final_result.
configure.ac: test for with-crypto, default is no.

Benchmark: Pull(monotone.db) 1 repeat on Pentium-M, 3 repeats on Xeon,
1.7GhZ Pentium-M 2.8GhZ Xeon
Statistic botan libcrypto botan libcrypto
pull-system-time 6.46 > 6.21 14.61 ? 6.35
pull-user-time 411.58 > 381.80 724.30 > 301.48
pull-wall-time 1530.08 > 1468.54 1491.42 > 948.05
server-system-time 125.96 < 127.72 170.94 ? 133.92
server-user-time 845.42 > 822.07 864.89 > 644.80
server-wall-time 1533.32 > 1471.84 1494.64 > 950.00
Commit 997a677db676734acc0d098979d2a9cee8765ec9, by anderse-monotone@cello.hpl.hp.com
12 years 9 months * Patch to limit the amount of memory in any pending writes.

constants.{cc,hh}: add in db_max_pending_writes_bytes constant.
database.cc: move global_slow_assertions_version_check into here so
that the regression tests can compile properly. updates to
properly track the number of bytes pending in the functions that
add or remove pending writes. Update to schedule_write to flush out
the pending writes if the current size exceeds the max pending.
Note that this doesn't commit the changes, it just flushes them out
to disk.
database.hh: pending_writes_size class variable
monotone.cc: remove global_slow_assertions_version_check so that
regression tests compile properly.
Commit eddb7e59361efeb8d9300ba0ddd7483272565097, by anderse-monotone@cello.hpl.hp.com
12 years 9 months * Performance tuning of annotate -- create a special path for roster

parsing to make annotate go faster, and disable some of the database
cross checks; still spends a lot of time parsing, but is overall
5-20x faster.

net.venge.monotone.contrib.benchmark ;; examples/annotate.sh

Makefile.am paths.hh xdelta.cc database.cc
opt ref opt ref opt ref opt ref
system 2.852 3.128 1.600 1.700 2.656 2.868 12.021 12.329
user 6.396 192.884 9.933 125.232 26.542 193.132 47.763 231.726
wall 10.149 205.530 14.129 130.573 32.234 210.039 67.273 259.602

annotate.cc: add a function that just gets the info for a particular
file in a particular revision. Retains the ability to do a cross
check with the old method, although this is disabled at compile
time. Fixup do_annotate_node to use this function.

cmd_files.cc: disable the version_check assertion if we are doing
an annotation.

database.cc: optionally skip the database version check.

monotone.cc: new global to determine if we are doing the version check
assertion.

roster.cc: Fast parsing code that skips to the beginning of a record
trying to find the appropriate one for a given file_id. It should
fail if the roster is of a different format than expected, but there
may exist some rosters that it can't parse that can be parsed by the
default code.
Commit 4e99cc37f548b5884d63c48bc486dfe98c8d0bd2, by anderse-monotone@cello.hpl.hp.com

Branches

Tags

Quick Links:     www.monotone.ca    -     Downloads    -     Documentation    -     Wiki    -     Code Forge    -     Build Status