Reported by Judson Lester, May 27, 2011
Steps to reproduce the problem: 1. Add a print statement to your rcfile "print('hello out there') 2. Start usher such that that rcfile will load 3. Attempt to sync with usher Expected result: Ideally: usher forks monotone and sync begins. Less ideally: usher reports somewhere, anywhere that the output "hello out there" is not the expected "beginning service" Actual result: "Received warning from usher: Cannot fork server." This is caused by server::fork checking only the first line for 'beginning service' based on the commented assumption that otherwise monotone's stderr is reporting that it failed to launch. A side effect of this is that adding -v to the monotone args will also silently crash Usher.
Comment 1 by Richard Levitte, Jun 1, 2011
This issue has really been around for some time, just not addressed before now. I just has a look at the code, and I believe it should be possible to remove all the code that looks for "beginning service", and instead add some code in server::connect that loops around sock::connect as long as the monotone server that was just forked is still up. Something like that... I'll experiment a bit and see what I can come up with.
Comment 2 by Richard Levitte, Jun 1, 2011
I'd suggest trying on revision 1336ca3b4c1b316eeeec33f333f7506dcc40a858 and see if that makes life better...
Comment 3 by Thomas Keller, Jun 5, 2011
This works fine for me with "-v" enabled, though I get a different problem here (not sure if this is related or not, it might also be something OSX specific): If I kill and start a specific server instance through the admin interface shortly after another, the admin interface hangs and usher issues two times Could not initialize admin port: cannot bind to address: Address already in use If I debug with gdb into usher, I only get back #0 0x00007fff85064e52 in select$DARWIN_EXTSN () #1 0x000000010001f157 in main (argc=2, argv=0x7fff5fbff730) at src/usher.cc:160 and if I continue and break a little later, I get #0 0x00007fff8509ee72 in accept () #1 0x000000010001e092 in sock::accept (this=<value temporarily unavailable, due to optimizations>) at src/sock.cc:52 #2 0x000000010001f565 in main (argc=2, argv=0x7fff5fbff730) at src/usher.cc:170 The temporary solution for me is now to add a little time frame between the stop and the restart. 2 seconds helped in my case.
Comment 4 by Richard Levitte, Jun 5, 2011
Thomas, that sounds like a different issue... what happens is that administrator::reload_conffile is called (it calls administrator::initialize, which gives you that error message), and as far as I can tell from the source, that only happens when usher gets a SIGHUP.