Issue 175: Usher hates Monotone saying anything other than "beginning service" - Repository with various smaller tools and contributions

Reported by Judson Lester, May 27, 2011

Steps to reproduce the problem:
1. Add a print statement to your rcfile "print('hello out 
there')
2. Start usher such that that rcfile will load
3. Attempt to sync with usher

Expected result:

Ideally: usher forks monotone and sync begins.

Less ideally: usher reports somewhere, anywhere that the output 
"hello out there" is not the expected "beginning 
service"

Actual result:

"Received warning from usher: Cannot fork server."

This is caused by server::fork checking only the first line for 
'beginning service' based on the commented assumption that otherwise 
monotone's stderr is reporting that it failed to launch.

A side effect of this is that adding -v to the monotone args will 
also silently crash Usher.

Comment 1 by Richard Levitte, Jun 1, 2011

This issue has really been around for some time, just not addressed 
before now.

I just has a look at the code, and I believe it should be possible 
to remove all the code that looks for "beginning service", 
and instead add some code in server::connect that loops around 
sock::connect as long as the monotone server that was just forked is 
still up.  Something like that...
I'll experiment a bit and see what I can come up with.

Status: Started
Owner: levitte

Comment 2 by Richard Levitte, Jun 1, 2011

I'd suggest trying on revision 
1336ca3b4c1b316eeeec33f333f7506dcc40a858 and see if that makes life 
better...

Comment 3 by Thomas Keller, Jun 5, 2011

This works fine for me with "-v" enabled, though I get a 
different problem here (not sure if this is related or not, it might 
also be something OSX specific):

If I kill and start a specific server instance through the admin 
interface shortly after another, the admin interface hangs and usher 
issues two times

Could not initialize admin port: cannot bind to address: Address 
already in use


If I debug with gdb into usher, I only get back
#0  0x00007fff85064e52 in select$DARWIN_EXTSN ()
#1  0x000000010001f157 in main (argc=2, argv=0x7fff5fbff730) at 
src/usher.cc:160

and if I continue and break a little later, I get

#0  0x00007fff8509ee72 in accept ()
#1  0x000000010001e092 in sock::accept (this=<value temporarily 
unavailable, due to optimizations>) at src/sock.cc:52
#2  0x000000010001f565 in main (argc=2, argv=0x7fff5fbff730) at 
src/usher.cc:170

The temporary solution for me is now to add a little time frame 
between the stop and the restart. 2 seconds helped in my case.

Comment 4 by Richard Levitte, Jun 5, 2011

Thomas, that sounds like a different issue...  what happens is that 
administrator::reload_conffile is called (it calls 
administrator::initialize, which gives you that error message), and 
as far as I can tell from the source, that only happens when usher 
gets a SIGHUP.