Unicode Features

This section explains how to use Epsilon to edit text containing non-English characters such as ê or å.

Epsilon supports Unicode, as well as many 8-bit national character sets such as ISO 8859-1 (Latin 1).

In Unix, full Unicode support is only available when Epsilon runs under X11, and when a font using the iso10646 character set is in use. See http://www.lugaru.com/links.html#unicode for Unicode font sources. To select a Unicode font, first select iso10646-1 in the list of character sets on the Filter pane of the font selection dialog.

Under Windows, full Unicode support is only available under Windows NT and later versions. For Unicode support in the Win32 Console version, see the console-ansi-font variable. Also see DOS/OEM Character Set Support for more information on the DOS/OEM encoding used by default in the Win32 console version.

In this release, Epsilon doesn't display Unicode characters outside the basic multilingual plane (BMP), or include any of the special processing needed to handle complex scripts, such as scripts written right-to-left.

Epsilon knows how to translate between its native Unicode format and dozens of encodings and character sets (such as UTF-8, ISO-8859-4, or KOI-8).

Epsilon autodetects the encoding for files that start with a Unicode signature ("byte order mark"), and for many files that use the UTF-8 encoding. To force translation from a particular encoding, provide a numeric argument to a file reading command like find-file. Epsilon will then prompt for the name of the encoding to use. Press "?" when prompted for an encoding to see a list of available encodings. The special encoding "raw" reads and writes 8-bit data without any character set translation.

Epsilon uses the buffer's current encoding when writing or rereading a file. Use the set-encoding command to set the buffer's encoding.

The unicode-convert-from-encoding command makes Epsilon translate an 8-bit buffer in a certain encoding to its 16-bit Unicode version. The unicode-convert-to-encoding command does the reverse.

You can add a large set of additional converters to Epsilon by downloading a file. Mostly these converters add support for various Far East languages and for EBCDIC conversions. See http://www.lugaru.com/encodings.html for details.

Internally, buffers with no character codes outside the range 0-255 are stored with 8 bits per character; other buffers are stored with 16 bits per character. Epsilon automatically converts formats as needed.

The detect-encodings variable controls whether Epsilon tries to autodetect certain UTF-8 and UTF-16 files. The default-read-encoding variable says which encoding to use when autodetecting doesn't select an encoding. The default-write-encoding variable sets which encoding Epsilon uses to save a file with 16-bit characters and no specified encoding, in a context where prompting wouldn't be appropriate such as when auto-saving.

See the insert-ascii command in Inserting and Deleting to type arbitrary Unicode characters, and the show-point command to see what specific characters are present (if the current font doesn't make that clear enough).

Previous Up Next
URL Syntax Commands by Topic Printing