Mined Unicode Howto
Environment setup and Usage of mined for Unicode text
-
- Screenshot
with opened pop-up menu and with Unicode contents.
UTF-8 encoded Unicode support and features:
- ... are described on the mined features page.
-
Environment setup:
-
Mined is a text mode editor. Its UTF-8 support is available for example
with the newer versions of
xterm
(>= 145 recommended), or with mlterm or on the Linux console, each
in UTF-8 mode.
- If you don't have a recent version of xterm on your system, compile
one yourself; configure xterm with the option "--enable-wide-chars" or
use this xterm configuration script. Then
invoke "make".
-
Install Unicode fonts for your X server.
- Invoke xterm in UTF-8 mode and configure it to use fonts sufficient
to display the text you want to edit.
- You can do this by resource configuration or command parameters.
I recommend to invoke xterm with my Unicode xterm invocation script
uterm.
Since mined 2000.8, UTF-8 mode is auto-detected. So it will work
even if you don't have your locale environment configured correctly.
For hints how to configure the environment explicitly so that other
applications work with UTF-8 too, see the mined manual page
(LC_CTYPE and other environment variables).
-
How to use UTF-8 modes with mined:
- Screen handling
-
If you have arranged (as suggested above) an appropriate
environment setting that indicates a UTF-8 terminal,
mined will set up its display mode accordingly.
There is also auto-detection of UTF-8 terminal mode as well
as various UTF-8 features. So mined will automatically adjust
to the availability of any of the following:
different width data versions, handling of double-width,
combining and joining characters.
- Character encoding
-
By default, mined detects automatically if the text in an edited
file is UTF-8 encoded (Unicode character set) or not (either
8-bit encoded or CJK encoded).
It also detects UTF-16 (16-bit Unicode representation with surrogate
pairs for a 21-bit character set) and transforms it automatically
into UTF-8.
UTF-8 is the internal representation of mined's Unicode editing.
It also handles illegal UTF-8 sequences transparently so
if you accidentally open a Latin-1 file in UTF-8 mode, or a file
with mixed parts, you may still edit the contents and will not
loose any information. You can switch the interpretation while
editing by clicking on the encoding indication in the flags area
or open the encoding menu from the eXtra menu.
The encoding indication shows L1
for Latin-1 8-bit encoding and U8
for UTF-8 encoding.
- Manual mode specification is also available
-
In order to enable detection and handling of Unicode line ends
(line separator and paragraph separator), invoke mined with
mined -uu [«filenames ...»]
.
Please consult the manual page for further options.
- Unicode display on non-Unicode terminal
-
If a UTF-8 file is edited in a Latin-1 terminal environment,
characters outside of the Latin-1 range (greater than 0xFF)
are displayed as a block symbol ¤
with special indications for wide and combining characters.
The Euro symbol is displayed as E.
Please consult the manual page for further details.
- CJK display on Unicode terminal
-
As a related topic, note that mined can also handle
major CJK encodings in a UTF-8 terminal,
see the mined features page.
Handling combined characters:
You may enter combining characters in the text or on the prompt
line (for search expressions or file names). Unless you have any
assigned to your keyboard (which could be configured with xmodmap),
you may use coded or mnemonic input support.
For editing combined characters there are two modes, indicated
by a flag next to the encoding indication flag in the flags area
(right part of the top screen line):
- ē indicates that the combined display
mode is active
- “ indicates that the separated display
mode is active
Clicking on this flag toggles between the modes, or toggle
"combined display" in the eXtra menu.
- Combined editing mode (flag ē)
- Combined characters are displayed as intended (i.e., combined).
The cursor can be moved into a combined character with
ctrl-left-arrow or ctrl-right-arrow, provided these cursor keys are
configured to emit distinguished escape sequences with control-key
held. ^V-left-arrow and ^V-right-arrow also work. You can determine
the exact position of the cursor if permanent character info is
switched on (by HOP ESC u or with HOP "toggle char info" in the eXtra menu).
- Partially editing combined characters:
- If the cursor is on a combined character, delete next character
will delete the whole combined character, with all combining accents.
- If the cursor is within a combined character, delete next
character will delete the current combining accent only.
- You can also position the cursor as described above and use
copy-and-paste operations.
- Separated editing mode (flag “)
- Combined characters are separated into base character and
combining character(s) for display and editing.
Edit the separated characters as usual.
- In separated display mode, all cursor and text modification
operations work on the combining parts as displayed.
-
Bidirectional display:
- Run mined in a bidirectional terminal (e.g. mlterm).
- Invoke mined with the parameter +UU to tell it that the terminal
handles bidirectional display; this is not necessary if the
terminal also applies Arabic ligature joining (LAM/ALEF) which
is auto-detected by mined (e.g. mlterm).
In this mode, also scrollbar display is suppressed (would
confuse with the terminal's bidi algorithm).
- I recommend to invoke mlterm with my mlterm invocation script
mterm.
Mined homepage and download.
Thomas Wolff