Mined Unicode Howto

Environment setup and Usage of mined for Unicode text



Environment setup:

  • Mined is a text mode editor. Its UTF-8 support is available for example with the newer versions of xterm (>= 145 recommended), or with mlterm or on the Linux console, each in UTF-8 mode.
    • If you don't have a recent version of xterm on your system, compile one yourself; configure xterm with the option "--enable-wide-chars" or use this xterm configuration script. Then invoke "make".
  • Install Unicode fonts for your X server.
  • Invoke xterm in UTF-8 mode and configure it to use fonts sufficient to display the text you want to edit.
    • You can do this by resource configuration or command parameters. I recommend to invoke xterm with my Unicode xterm invocation script uterm.
      Since mined 2000.8, UTF-8 mode is auto-detected. So it will work even if you don't have your locale environment configured correctly. For hints how to configure the environment explicitly so that other applications work with UTF-8 too, see the mined manual page (LC_CTYPE and other environment variables).

  • How to use UTF-8 modes with mined:

    Screen handling
    If you have arranged (as suggested above) an appropriate environment setting that indicates a UTF-8 terminal, mined will set up its display mode accordingly. There is also auto-detection of UTF-8 terminal mode as well as various UTF-8 features. So mined will automatically adjust to the availability of any of the following: different width data versions, handling of double-width, combining and joining characters.
    Character encoding
    By default, mined detects automatically if the text in an edited file is UTF-8 encoded (Unicode character set) or not (either 8-bit encoded or CJK encoded).
    It also detects UTF-16 (16-bit Unicode representation with surrogate pairs for a 21-bit character set) and transforms it automatically into UTF-8.
    UTF-8 is the internal representation of mined's Unicode editing. It also handles illegal UTF-8 sequences transparently so if you accidentally open a Latin-1 file in UTF-8 mode, or a file with mixed parts, you may still edit the contents and will not loose any information. You can switch the interpretation while editing by clicking on the encoding indication in the flags area or open the encoding menu from the eXtra menu. The encoding indication shows L1 for Latin-1 8-bit encoding and U8 for UTF-8 encoding.
    Manual mode specification is also available
    In order to enable detection and handling of Unicode line ends (line separator and paragraph separator), invoke mined with mined -uu [«filenames ...»] .
    Please consult the manual page for further options.
    Unicode display on non-Unicode terminal
    If a UTF-8 file is edited in a Latin-1 terminal environment, characters outside of the Latin-1 range (greater than 0xFF) are displayed as a block symbol ¤ with special indications for wide and combining characters. The Euro symbol is displayed as E. Please consult the manual page for further details.
    CJK display on Unicode terminal
    As a related topic, note that mined can also handle major CJK encodings in a UTF-8 terminal, see the mined features page.


    Bidirectional display:

  • Run mined in a bidirectional terminal (e.g. mlterm).
  • Invoke mined with the parameter +UU to tell it that the terminal handles bidirectional display; this is not necessary if the terminal also applies Arabic ligature joining (LAM/ALEF) which is auto-detected by mined (e.g. mlterm).
    In this mode, also scrollbar display is suppressed (would confuse with the terminal's bidi algorithm).
    • I recommend to invoke mlterm with my mlterm invocation script mterm.

  • Mined homepage and download.
    Thomas Wolff