Lugaru's Epsilon
Programmer's
Editor 14.04

Context:
Epsilon User's Manual and Reference
   Primitives and EEL Subroutines
      . . .
      Display Primitives
         Creating & Destroying Windows
         Window Resizing Primitives
         Preserving Window Arrangements
         . . .
         Colors
      File Primitives
         . . .
         File Writing Primitives
         Line Translation Primitives
         Character Encoding Conversions
         More File Primitives
         File Properties
         . . .
      Operating System Primitives
         System Primitives
         Window System Primitives
         Timing
         Calling Windows DLLs
         Running a Process
      . . .

Previous   Up    Next
Line Translation Primitives  Primitives and EEL Subroutines   More File Primitives


Epsilon User's Manual and Reference > Primitives and EEL Subroutines > File Primitives >

Character Encoding Conversions

char *encoding_to_name(int enc)
int encoding_from_name(char *name)

When Epsilon reads or writes a file, it converts text between the Unicode character representation it uses internally and one of various file encodings. Epsilon represents each possible encoding with a number.

These numbers may change from one version of Epsilon to the next, so if an encoding setting must be recorded somehow, it should be recorded by name, not by number. Certain specific encodings will not change their codes: the encoding "auto-detect" is always numbered 0, and the encoding "raw" is always numbered 1.

The encoding_from_name( ) primitive returns the number of an encoding given its name. It returns -1 pointer if the encoding name is unknown.

The encoding_to_name( ) primitive returns the name of an encoding given its number. It returns a NULL pointer if the encoding number is unknown. Many encodings have more than one name, but this primitive treats each name as a separate encoding, even if it's an alias of another encoding.

int file_convert_write(char *file, int trans,
                       struct file_info *f_info)
int save_remote_file(char *fname, int trans,
                     struct file_info *finfo)
buffer char *(*file_io_converter)();
char *oem_file_converter(int func)
zeroed char *(*new_file_io_converter)();
zeroed buffer char file_write_newfile;

The do_save_file( ) subroutine uses the file_convert_write( ) subroutine to actually write the file. Like new_file_write( ), it takes a file name, a line translation code as described under translation-type, and a structure which Epsilon will fill with information on the file's write date, file type, and so forth. See do_save_file( ) above for details.

Unlike primitives such as new_file_write( ), the file_convert_write( ) subroutine knows how to handle URL files by calling the save_remote_file( ) subroutine.

In addition to the built-in conversion codes described above, Epsilon also supports user-defined EEL conversion routines. These are currently used only for DOS/OEM files read using the find-oem-file command and similar. The file_convert_write( ) subroutine handles writing these. It looks for a buffer-specific variable file_io_converter. This variable can be null, for no special translation, or it can contain a function pointer. For OEM files, for example, it points to the subroutine oem_file_converter( ).

Any such subroutine will be called with a code indicating the desired action. The codes are defined in eel.h. The code FILE_CONVERT_READ tells the subroutine to translate the text in the current buffer as appropriate when reading a file. The code FILE_CONVERT_WRITE tells the subroutine to translate the buffer as appropriate when writing a file.

Before actually performing a conversion, Epsilon will call the subroutine to ask if the conversion is safe (reversible), by passing the FILE_CONVERT_ASK in addition to one of the above flags. A conversion is reversible, and therefore safe, if the conversion followed by the opposite conversion (for instance, ANSI => OEM => ANSI) yields the original text. If the conversion isn't safe, the subroutine should ask the user for permission to proceed.

The converter should then return a null pointer to cancel the read or write operation, or any other value to let it proceed. You can add the FILE_CONVERT_QUIET flag, and the converter won't ask the user for confirmation, merely return a value indicating whether the conversion would be safe.

Whenever the FILE_CONVERT_ASK flag isn't present, the subroutine should return the name of its minor mode--Epsilon will display this in the mode line. The OEM converter returns " OEM".

When creating a new buffer, file-reading subroutines initialize the file_io_converter variable by copying the value of new_file_io_converter. Commands like find-oem-file temporarily set this variable to effect reading a file with OEM translation.

The file_convert_write( ) subroutine performs one more function. It checks the variable file_write_newfile. If this variable is nonzero, it arranges things so the attempt to write a file will fail with an error code if the file already exists, by passing the FILE_IO_NEWFILE code to new_file_write( ).

int perform_unicode_conversion(int buf, int from, int to,
                               int flags, char *encoding)

The perform_unicode_conversion( ) primitive converts between 16-bit Unicode characters and various 8-bit encodings such as UTF-8. It converts characters in the range from...to in the specified buffer buf in place.

By default, the primitive converts from 16-bit Unicode characters to the named 8-bit encoding. The CONV_TO_16 flag makes it convert in the opposite direction, from the specified 8-bit encoding to 16-bit characters.

The primitive returns the code EBADENCODE if it doesn't recognize the encoding name. It returns ETOOBIG when converting from 8-bit characters if one of the characters is outside the range 0-255. It returns 0 on success. The primitive moves point to the end of the buffer.

If the specified encoding has a defined signature (a byte order mark), and an entire buffer was converted, not just part of one, Epsilon adds the signature when converting to the encoding, and removes the signature, if there is one, when converting from the encoding.

int buffer_flags(int buf)

Internally, Epsilon stores the text of a buffer with 8 bits for each character, unless it contains some characters outside the range 0-255. In that case it uses 16 bits for each character. A buffer that once contained such characters but no longer does may still be stored as 16 bits per character. Epsilon transparently handles all needed translations between the two formats (for instance, when you copy text from one buffer to another), but it's occasionally useful to tell which format Epsilon is using.

The buffer_flags( ) primitive returns a bit mask. Check the bit represented by the BF_UNICODE macro; if it's present, the specified buffer buf is stored in 16-bit format internally. If buf is omitted or zero, the primitive checks the current buffer.



Previous   Up    Next
Line Translation Primitives  Primitives and EEL Subroutines   More File Primitives


Lugaru Epsilon Programmer's Editor 14.04 manual. Copyright (C) 1984, 2021 by Lugaru Software Ltd. All rights reserved.