Epsilon User's Manual and Reference >
Commands by Topic >
Buffers and Files >
Unicode Features
This
section explains how to use Epsilon to edit text containing
non-English characters such as ê or å.
Epsilon supports Unicode, as well as many 8-bit national character
sets such as ISO 8859-1 (Latin 1).
In Unix, full Unicode support is only available when Epsilon runs
under X11, and when a font using the iso10646 character set is in use.
See http://www.lugaru.com/links.html#unicode for Unicode font
sources. To select a Unicode font, first select iso10646-1 in the
list of character sets on the Filter pane of the font selection
dialog.
Under Windows, full Unicode support is only available under Windows NT
and later versions. For Unicode support in the Win32 Console version,
see the console-ansi-font variable. Also see DOS/OEM Character Set Support
for more information on the DOS/OEM encoding used by default in the
Win32 console version.
In
this release, Epsilon doesn't display Unicode characters outside the
basic multilingual plane (BMP), or include any of the special
processing needed to handle complex scripts, such as scripts written
right-to-left.
Epsilon knows how to translate between its native Unicode format and
dozens of encodings and character sets (such as UTF-8, ISO-8859-4, or
KOI-8).
Epsilon autodetects the encoding for files that start with a Unicode
signature ("byte order mark"), and for many files that use the UTF-8
encoding. To force translation from a particular encoding, provide a
numeric argument to a file reading command like
find-file. Epsilon will then prompt for the name of the
encoding to use. Press "?" when prompted for an encoding to see a
list of available encodings. The special encoding "raw" reads and
writes 8-bit data without any character set translation.
Epsilon
uses the buffer's current encoding when writing or rereading a file.
Use the set-encoding command to set the buffer's encoding.
The unicode-convert-from-encoding command makes Epsilon translate an
8-bit buffer in a certain encoding to its 16-bit Unicode version. The
unicode-convert-to-encoding command does the reverse.
You can add a large set of additional converters to Epsilon by
downloading a file. Mostly these converters add support for various
Far East languages and for EBCDIC conversions. See
http://www.lugaru.com/encodings.html for details.
Internally, buffers with no character codes outside the range 0-255
are stored with 8 bits per character; other buffers are stored with 16
bits per character. Epsilon automatically converts formats as needed.
The detect-encodings variable controls whether Epsilon tries to
autodetect certain UTF-8 and UTF-16 files. The
default-read-encoding variable says which encoding to use when
autodetecting doesn't select an encoding. The
default-write-encoding variable sets which encoding Epsilon
uses to save a file with 16-bit characters and no specified encoding,
in a context where prompting wouldn't be appropriate such as when
auto-saving.
See the insert-ascii command in Inserting and Deleting
to type arbitrary Unicode characters, and the show-point
command to see what specific characters are present (if the current
font doesn't make that clear enough).
Copyright (C) 1984, 2020 by Lugaru Software Ltd. All rights reserved.
|