Issue 154: Localized output of mtn manpage leads to corrputed characters

Reported by Thomas Keller, Mar 22, 2011

Steps to reproduce the problem:

Call `mtn manpage` or `mtn manpage | nroff -man` in a non-UTF8 and 
non-english locale.

Expected result:

Proper display of characters outside the ASCII range.

Actual results:

Single UTF8 bytes.

Output of `mtn version --full`:


Note that this doesn't work either:

   `mtn manpage | nroff -man | less`
   `mtn manpage | nroff -Tutf8 -man | less`
   `LANG=de_DE.ISO-8859-1 mtn manpage`

but this does for some weird reason:

   `LANG=de_DE.ISO-8859-1 mtn manpage | nroff -man | less`

I think the problem is that nroff (groff actually) doesn't really 
notice that it should run in utf8 mode (this should be the default 
nowadays) and that it refuses to do so even if we tell him with 
-Tutf8 to do so explicitely.

Comment 1 by Richard Levitte, Mar 22, 2011

The conclusion is incorrect.  What happens is that nroff/groff does 
not recognise UTF-8 input.  For groff, the default is iso-8859-1, 
and all it will do, then, is to interpret the UTF-8 characters as a 
series of iso-8859-1 characters, and will happily convert them to 
UTF-8 (because the locale says it should), hence the weird display.

With groff, it's possible to have it filter the input with another 
program, preconv.  It does so if you call it with -k or -K, or if 
the environment variable GROFF_ENCODING is set to our input 

The absolutely easiest way to deal with this is to change the 
current nroff call in std_hooks.lua to the following:

    GROFF_ENCODING=`locale charmap` nroff -man -rLL=%d | less -R

I can't quite grasp why 'LANG=de_DE.ISO-8859-1 mtn manpage' wouldn't 
work for you, Thomas.  The following works perfectly for me, and 
makes sense:

    LANG=sv_SE.ISO-8859-1 mtn manpage

Anyhow, there's a choice.  Either we can modify std_hooks.lua, which 
is really the simplest, or we can modify scr/, which isn't 
hard, really, it just requires a few more lines of code (more to 
type ;-)).
Status: Accepted
Owner: levitte

Comment 2 by Richard Levitte, Mar 22, 2011

Revision 40ca0bb7367dc73f024da6ded659be69cfac1144 contains a 
solution for groff users.  That doesn't solve the issue for those 
who do not use groff, but it won't harm them either.

Created: 13 years 2 months ago by Thomas Keller

Updated: 13 years 2 months ago

Status: Accepted

Owner: Richard Levitte

Type:Incorrect Behavior

Quick Links:    -     Downloads    -     Documentation    -     Wiki    -     Code Forge    -     Build Status