Lugaru's Epsilon
Programmer's
Editor 14.04

 Previous Up Next Control Flow Primitives and EEL Subroutines Examining Strings

### Character Types

```int isspace(int ch) int isdigit(int ch) int isalpha(int ch) int islower(int ch) int isupper(int ch) int iscntrl(int ch) int isgraph(int ch) int ispunct(int ch) int isprint(int ch) int isxdigit(int ch) int isalnum(int ch)  /* basic.e */ int isident(int ch)  /* basic.e */ int any_uppercase(char *p) ```
Epsilon has several primitives that are helpful for determining if a character is in a certain class. The isspace( ) primitive tells if its character argument is a space, tab, or newline character. It returns `1` if it is, otherwise `0`.

In the same way, the isdigit( ) primitive tells if a character is a digit (one of the characters `0` through `9`), and the isalpha( ) primitive tells if the character is a letter. The islower( ) and isupper( ) primitives tell if the character is a lower case letter or upper case letter, respectively.

The iscntrl( ) primitive tells if a character is a control character, isgraph( ) if a character is a graphical character (has a printed representation, not a space or a control character), ispunct( ) if a character is a punctuation character, isprint( ) if a character is printable (not a control character), and isxdigit( ) if a character is a hex digit.

The isalnum( ) subroutine returns nonzero if the specified character is alphanumeric: either a letter or a digit. The isident( ) subroutine returns nonzero if the specified character is an identifier character: a letter, a digit, or the `_` character.

All functions in this section also handle Unicode characters appropriately: the isspace( ) primitive, for instance, also returns `1` for characters with a Unicode category of Z, meaning separators, and primitives that report on lowercase letters understand any Unicode character that's a lowercase letter.

The any_uppercase( ) subroutine returns nonzero if there are any upper case characters in its string argument `p`.

```int tolower(int ch) int toupper(int ch) ```
The tolower( ) primitive converts an upper case letter to the corresponding lower case letter. It returns a character that is not an upper case letter unchanged. The toupper( ) primitive converts a lower case letter to its upper case equivalent, and leaves other characters unchanged.

```int set_character_property(int ch, int propcode, int value) ```
You can alter the rules Epsilon uses for determining if a particular character is alphabetic, uppercase, or lowercase, and how Epsilon case-folds when searching, sorting or otherwise comparing text, using the set_character_property( ) primitive. It takes the numeric code of the character whose properties you want to modify, a property code indicating which of its properties to access, and a new value for that property.

The property code `CPROP_CTYPE` sets whether the isalpha( ), isupper( ), islower( ), and isdigit( ) primitives consider a character alphabetic, uppercase, lowercase, or a digit, respectively. These attributes are independent, though there are conventions for their use. (For instance, only alpha characters generally have a case, no character is both uppercase and lowercase, and so forth.) The bits `C_ALPHA`, `C_LOWER`, `C_UPPER`, and `C_DIGIT` represent these attributes. The bits also control whether the regular expressions <digit>, <alpha>, <alphanum>, and <word> match these characters; see Character Classes.

Similarly, the bits `C_CNTRL`, `C_GRAPH`, `C_PUNCT`, and `C_XDIGIT` may be used to modify the iscntrl( ), isgraph( ), ispunct( ), and isxdigit( ) primitives, respectively, using CPROP_CTYPE.

The property code `CPROP_TOLOWER` controls what value the tolower( ) primitive returns for the specified character, and the property code `CPROP_TOUPPER` controls what value the toupper( ) primitive returns for it.

The property code `CPROP_FOLD` controls how Epsilon case-folds that character during searching, sorting, and similar functions, whenever case folding is in use. It specifies a replacement character to be used in place of the original during comparisons. The complete set of case-folding properties must follow two rules: if some character X folds to Y, then Y must fold to itself, and character codes below 256 must never fold to a value greater than or equal to 256. (If a particular group of characters should be treated as equal when searching, setting the case folding property of each to the code of the lowest-numbered one is sufficient to comply with these rules.)

The primitive returns the previous value of the specified property of that character. If the new value is out of range for the property (such as a negative value), it will be ignored, and the primitive will just return the current value. You can use this to retrieve the current properties of a character without changing them.

The special property code `CPROP_DISP_WIDTH` may be used to retrieve the width in columns of any character in the current font in Epsilon for Windows. It returns 1 for normal characters, 0 for zero-width characters, and 2 for those characters treated as doublewidth (if Epsilon's experimental doublewidth feature to better support Chinese, Japanese and Korean text has been enabled; it's disabled by default). It returns -1 if called during startup before Epsilon has selected a font, and in non-Windows versions (where all characters have a font display width of 1).

The special property code `CPROP_DEF_REPL` may be used to retrieve replacement Unicode characters for those in the range 128-255 in Epsilon for Windows using the current language settings. Some Extended ASCII characters in this range have different character numbers in Unicode, so Epsilon uses this function to map undisplayable characters in that range to displayable characters (for display purposes only). The function returns the original character for any values outside its range.

Epsilon doesn't store current character properties in its state file. If you want to use non-default properties all the time, write a startup function that calls this primitive. See Starting and Finishing.

Epsilon always starts with character classifications based on standard Unicode properties, except for the Windows Console version. That version, when running with a DOS/OEM character set (see the console-ansi-font variable), begins with its classifications for 8-bit characters set to match the current OEM font.

```int get_direction()         /* window.e */ ```
The get_direction( ) subroutine converts the last key pressed into a direction. It understands arrow keys, as well as the equivalent control characters. It returns BTOP, BBOTTOM, BLEFT, BRIGHT, or `-1` if the key doesn't correspond to any direction.

 Previous Up Next Control Flow Primitives and EEL Subroutines Examining Strings