Unicode Features

This section explains how to use Epsilon to edit text containing non-English characters such as ê or å.

Epsilon supports Unicode, as well as many 8-bit national character sets such as ISO 8859-1 (Latin 1).

In Unix, Unicode support is only available when Epsilon runs under X11, and when a font using the iso10646 (Unicode) character set is in use. See https://www.lugaru.com/links.html#unicode for Unicode font sources. Epsilon includes a shell script named get_unicode_core_x11_fonts than can install Unicode-based fonts in various sizes. By default it's in /opt/epsilon14.04/bin on Linux and FreeBSD, and/Applications/Epsilon 14.04.app/Contents/Resources on macOS. Under Unix, Epsilon displays all characters using a glyph width determined by the widest character in the font.

Epsilon for Windows shows characters in a font using their specified individual widths, but only in a full-width window. If you instead create side-by-side windows, Epsilon will ignore the special width rules of zero-width characters and extra-wide characters, among other things, putting each character into a same-width cell. Text with such characters should be edited in full-width windows for the best display. (See the change-show-spaces command to make zero-width characters visible.)

To enable Unicode display in Epsilon for Windows Console, see the console-ansi-font variable. Also see DOS/OEM Character Set Support for more information on the DOS/OEM encoding used by default in the Windows Console version.

In this release, Epsilon doesn't display Unicode characters outside the basic multilingual plane (BMP), or include any of the special processing needed to handle complex scripts, such as scripts written right-to-left. Each character outside the BMP is handled as a pair of surrogate characters. While Epsilon cannot display their glyphs, the show-point command will report the Unicode character name of the one at point.

Epsilon knows how to translate between its native Unicode format and dozens of encodings and character sets (such as UTF-8, ISO-8859-4, or KOI-8).

Epsilon autodetects the encoding for files that start with a Unicode signature ("byte order mark"), and for many files that use the UTF-8 encoding. To force translation from a particular encoding, provide a numeric argument to a file reading command like find-file. Epsilon will then prompt for the name of the encoding to use. Press "?" when prompted for an encoding to see a list of available encodings. The special encoding "raw" reads and writes 8-bit data without any character set translation.

Epsilon uses the buffer's current encoding when writing or rereading a file. Use the set-encoding command to set the buffer's encoding.

The unicode-convert-from-encoding command makes Epsilon translate an 8-bit buffer in a certain encoding to its 16-bit Unicode version. The unicode-convert-to-encoding command does the reverse.

You can add a large set of additional converters to Epsilon by downloading a file. Mostly these converters add support for various Far East languages and for EBCDIC conversions. See https://www.lugaru.com/encodings.html for details.

Internally, buffers with no character codes outside the range 0-255 are stored with 8 bits per character; other buffers are stored with 16 bits per character. Epsilon automatically converts formats as needed.

The detect-encodings variable controls whether Epsilon tries to autodetect certain UTF-8 and UTF-16 files. The default-read-encoding variable says which encoding to use when autodetecting doesn't select an encoding. The default-write-encoding variable sets which encoding Epsilon uses to save a file with 16-bit characters and no specified encoding, in a context where prompting wouldn't be appropriate such as when auto-saving.

See the insert-ascii command in Inserting and Deleting to type arbitrary Unicode characters, and the show-point command to see what specific characters are present (if the current font doesn't make that clear enough).

Previous Up Next
URL Syntax Commands by Topic Printing