monotone

Issue 86: fatal `merge` error: I(right_uncommon_ancestors.find(right_rid) != right_uncommon_ancestors.end())

Reported by Douglas Dickinson, Oct 6, 2010

See shell log below, and attached _MTN/debug.
Is there anything else that I can provide to help tracking this 
down?

My next steps will be
* save copies of the workspace & database
* turn off inodeprints
* create a fresh DB on the same box, synced from a different host
* sync and see if the weirdness spreads to other hosts.
* try the latest 0.48

Searching for similar issues I found:
* bug #23349 closed in 0.45-2 as not reproducible
* and some older references:
http://lists.nongnu.org/archive/html/monotone-devel/2006-04/msg00020.
html
http://lists.nongnu.org/archive/html/monotone-devel/2007-10/msg00155.
html

#### Shell log:

$ mtn status
Current revision: 0062536569ea03ca240be11162c03e65698d74c5
Current branch: ddd.ds
Changes against parent 01097bd9412a12c71a00c846d0ab31252dc0f9c2
  no changes

$ mtn heads
mtn: branch 'ddd.ds' is currently unmerged:
01097bd9412a12c71a00c846d0ab31252dc0f9c2 douglasdd@google.com 
10/04/2010 14:19:50
1bf1c99573748c3b51beabc91bbbaa9b39bc96fe douglasdd@google.com 
douglasdd@douglasdd.kir.corp.google.com 10/04/2010 12:53:34 
10/04/2010 12:54:17

$ mtn merge
mtn: 2 heads on branch 'ddd.ds'
enter passphrase for key ID [douglasdd@google.com] (cc46df37...):
mtn: merge 1 / 1:
mtn: calculating best pair of heads to merge next
mtn: [left]  01097bd9412a12c71a00c846d0ab31252dc0f9c2
mtn: [right] 1bf1c99573748c3b51beabc91bbbaa9b39bc96fe
mtn: fatal: error: roster.cc:2086: 
I(right_uncommon_ancestors.find(right_rid) != 
right_uncommon_ancestors.end())
mtn: this is almost certainly a bug in monotone.
mtn: please send this error message, the output of 'mtn version 
--full',
mtn: and a description of what you were doing to 
monotone-devel@nongnu.org.
mtn: wrote debugging log to /Users/douglasdd/ds/_MTN/debug
mtn: if reporting a bug, please include this file

$ mtn version --full
monotone 0.47 (base revision: 
58eca89fab6322a14c219fb377eae54e21311986)
Running on          : Darwin 9.8.0 Darwin Kernel Version 9.8.0: Wed 
Jul 15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
C++ compiler        : GNU C++ version 4.0.1 (Apple Inc. build 5493)
C++ standard library: GNU libstdc++ version 20050421
Boost version       : 1_42
SQLite version      : 3.6.23.1 (compiled against 3.6.23.1)
Lua version         : Lua 5.1
PCRE version        : 7.9 2009-04-11 (compiled against 7.9)
Botan version       : 1.8.8 (compiled against 1.8.8)
Changes since base revision:
format_version "1"

new_manifest [69563b5f6cb9e1af6a87aebd6d672e721f646026]

old_revision [58eca89fab6322a14c219fb377eae54e21311986]

  Generated from data cached in the distribution;
  further changes may have been made.

$ uname -a 
Darwin douglasdd-macbookpro 9.8.0 Darwin Kernel Version 9.8.0: Wed 
Jul 15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
# ...aka Mac OS X 10.5.8

Comment 1 by Thomas Keller, Oct 7, 2010

Hrm... interesting. The 2007 workspace bug is insofar similar as 
Nathaniel mentions the "no committed changes" merge 
problem in 
http://lists.nongnu.org/archive/html/monotone-devel/2007-10/msg00165.
html for which the revision here in question also looks very 
suspicious:

format_version "1"

new_manifest [69932b4f4598082f8010d28d0d9ef4f349db34e8]

old_revision [01097bd9412a12c71a00c846d0ab31252dc0f9c2]

old_revision [1bf1c99573748c3b51beabc91bbbaa9b39bc96fe]

delete "todo/todo.tmp"

patch "env/setup.sh"
 from [abbe35eadaaadb152d1af0628d3413b4744bda3b]
   to [c1533dedd84fc37600d561eb6a72d279f949bdb5]
...

This looks like [1bf1c99573748c3b51beabc91bbbaa9b39bc96fe] has no 
recorded changes (or at least no changes which should be applied to 
[01097bd9412a12c71a00c846d0ab31252dc0f9c2] because the latter is an 
ancestor of the former)
and thus shouldn't even exist. What does `mtn au get_revision 
1bf1c99573748c3b51beabc91bbbaa9b39bc96fe` print?

Sorry for fishing a little in the dark here - others might be of 
more help...

Comment 2 by Douglas Dickinson, Oct 7, 2010

> What does `mtn au get_revision
> 1bf1c99573748c3b51beabc91bbbaa9b39bc96fe` print?

$ mtn au get_revision 1bf1c99573748c3b51beabc91bbbaa9b39bc96fe
format_version "1"

new_manifest [ce8b79ac00bf432cbff130a27d86c7c0367ace7b]

old_revision [29b175db52cc213d24e21123fb7381f1763c1901]

patch "env/bashrc"
 from [7f49a269b79d77aa0ad34251f3a5f0d33314c26a]
   to [019ad922fd77f976cdfd8e94e2c85d427ba03e4a]

patch "env/setup-google.sh"
 from [feb5dd1df011f1859d8538e939d6d64a661f8b2c]
   to [1bbc2211d4d470b120ccb798c1fe59c2fc70fbbe]

patch "env/setup.sh"
 from [9fa58266ae12836cccf38f0bfb07ff2effa823ee]
   to [abbe35eadaaadb152d1af0628d3413b4744bda3b]

old_revision [2bedf3c8839ab8f6e501f6c21dd52f6915c83183]

add_dir "todo"

add_file "bin/todo.sh"
 content [2b8d615bcaaaebe50fc01ebfc8e47e36c02fed52]

add_file "todo/config"
 content [543fc050007349ab09147215b0f74ca731a98356]

add_file "todo/done.txt"
 content [da39a3ee5e6b4b0d3255bfef95601890afd80709]

add_file "todo/report.txt"
 content [da39a3ee5e6b4b0d3255bfef95601890afd80709]

add_file "todo/todo.tmp"
 content [da39a3ee5e6b4b0d3255bfef95601890afd80709]

add_file "todo/todo.txt"
 content [da39a3ee5e6b4b0d3255bfef95601890afd80709]

patch "env/setup.sh"
 from [085c143b04ec8488032789d9f20e53c602e8d0c2]
   to [abbe35eadaaadb152d1af0628d3413b4744bda3b]

  set "bin/todo.sh"
 attr "mtn:execute"
value "true"

Comment 3 by Douglas Dickinson, Oct 7, 2010

> My next steps will be
> * save copies of the workspace & database
> * turn off inodeprints

...did not help, same problem.

> * sync and see if the weirdness spreads to other hosts.

Synced to 2 other hosts, could not reproduce it one either one:
* Mac 10.5.8 PPC (mtn 0.48)
* Linux x86_64   (mtn 0.48)

> * try the latest 0.48

No change w/ 0.48:

$ /tools/bin/mtn version --full
monotone 0.48 (base revision: 
844268c137aaa783aa800a9c16ae61edda80ecea)
Running on          : Darwin 9.8.0 Darwin Kernel Version 9.8.0: Wed 
Jul 15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
C++ compiler        : GNU C++ version 4.0.1 (Apple Inc. build 5493)
C++ standard library: GNU libstdc++ version 20050421
Boost version       : 1_40
SQLite version      : 3.6.11 (compiled against 3.6.11)
Lua version         : Lua 5.1
PCRE version        : 7.8 2008-09-05 (compiled against 7.8)
Botan version       : 1.8.1 (compiled against 1.8.1)
Changes since base revision:
format_version "1"

new_manifest [86bede3ba4251594f3a0f7e0c31560f9f8ce3744]

old_revision [844268c137aaa783aa800a9c16ae61edda80ecea]

$ /tools/bin/mtn merge
mtn: 2 heads on branch 'ddd.ds'
mtn: merge 1 / 1:
mtn: calculating best pair of heads to merge next
mtn: [left]  01097bd9412a12c71a00c846d0ab31252dc0f9c2
mtn: [right] 1bf1c99573748c3b51beabc91bbbaa9b39bc96fe
mtn: fatal: error: ../roster.cc:2080: 
I(right_uncommon_ancestors.find(right_rid) != 
right_uncommon_ancestors.end())
mtn: this is almost certainly a bug in monotone.
mtn: please send this error message, the output of 'mtn version 
--full',
mtn: and a description of what you were doing to 
monotone-devel@nongnu.org.
mtn: wrote debugging log to 
/Users/douglasdd/work/monotone.merge-error/ds/_MTN/debug
mtn: if reporting a bug, please include this file

> * create a fresh DB on the same box, synced from a different 
host

... that plus a checkout from that freshly+synced DB could not 
reproduce the issue.
But it does have me up and running again!

Comment 4 by Stephen Leake, Oct 7, 2010

This is a known bug, and it is fixed in 0.48, but you need to do 
'mtn db regenerate_caches' to fix the corrupted database, then do 
'merge'.

The bug is that the local database cache of heads of branches gets 
confused by certain sync cases. That is fixed in 0.48, but the 
database cache is not automatically fixed when you upgrade; you need 
to manually run 'regenerate_caches'.

Comment 5 by Thomas Keller, Oct 7, 2010

Ah, was it that one? Many thanks for digging this up Stephen! Too 
much in my head right now...

@Douglas: Let us know if it works out for you, so we can close this 
ticket.

Comment 6 by Douglas Dickinson, Oct 7, 2010

> but you need to do 'mtn db regenerate_caches'
> to fix the corrupted database, then do 'merge'.

Worked perfectly, Thanks!!

Aside:

Perhaps `db regenerate_caches` should made part of `db migrate`?  or 
maybe migrate could suggest running `db check`?

Despite being a longtime user (~2003), `migrate` is the only 
database maintenance that I have ever performed.  I suspect this is 
also true for many non-mtn-developer users. (just my $0.02)

Happy Hacking,
./ddd

$ mtn db regenerate_caches
mtn: regenerating cached rosters and heights
mtn: regenerated
mtn:   3320/3320
mtn: finished regenerating cached rosters and heights
mtn: regenerating cached branches
mtn: finished regenerating cached branches

$ mtn merge
mtn: branch 'ddd.ds' is already merged

$ mtn up
mtn: updating along branch 'ddd.ds'
mtn: already up to date at 01097bd9412a12c71a00c846d0ab31252dc0f9c2

$ mtn --version
monotone 0.48 (base revision: 
844268c137aaa783aa800a9c16ae61edda80ecea)

Comment 7 by Thomas Keller, Oct 7, 2010

> Perhaps `db regenerate_caches` should made part of `db 
> migrate`?  or maybe migrate could suggest running `db
> check`?

Actually regenerate_caches is automatically called after migrations 
that need it, but we probably forgot to mention in the NEWS file for 
0.48 that it has to be run in order to fix the bug. The bug fix did 
not require a schema change and we therefor also had no automatism 
to call it.
Status: Duplicate

Created: 14 years 1 month ago by Douglas Dickinson

Updated: 14 years 1 month ago

Status: Duplicate

Followed by: 2 persons

Labels:
Type:Defect
Priority:Medium

Quick Links:     www.monotone.ca    -     Downloads    -     Documentation    -     Wiki    -     Code Forge    -     Build Status