Defining Language Modes

There are several things to be done to define a new mode. Copy the file samplemode.e, included with Epsilon's EEL source code files, to get started.

Suppose you wish to define a mode called reverse-mode in which typing letters inserts them backwards, so typing "abc" produces "cba", and yanking characters from a kill buffer inserts them in reverse order. First, define a key table for the mode with the keytable keyword, and put the special definitions for that mode in the table:

keytable rev_tab; command reversed_normal_character() { normal_character(); point--; } when_loading() { int i; for (i = 'a'; i <= 'z'; i++) rev_tab[toupper(i)] = rev_tab[i] = (short) reversed_normal_character; } command yank_reversed() on rev_tab[CTRL('Y')] { ... }
Now define a command whose name is that of the mode. It should set mode_keys to the new table and major_mode to the name of the mode, and then call the subroutine make_mode( ) to update the mode line:

command reverse_mode() { mode_keys = rev_tab; /* use these keys */ major_mode = strkeep("Reverse"); make_mode(); }
Using strkeep( ) for the mode name ensures that it remains valid even if the reverse-mode command is redefined later. Since some buffers may continue to point to it, it's important that the pointer remains valid. (Alternatively, you could define a character array variable with the mode name, and set major_mode to that.) The mode name in major_mode, with the addition of "-mode", should be the name of a command that goes into that mode.

If you want Epsilon to go into that mode automatically when you find a file with the extension .rev (as it goes into C mode with .c files, for instance), define a function named suffix_rev() which calls reverse_mode(). The EEL subroutine find_it( ) defined in files.e automatically calls a function named suffix_ext (where ext is the file's extension) whenever you find a file, if a function with that name exists. It tries to call the suffix_none( ) function if the file has no suffix. If it can't find a function with the correct suffix, it will try to call the suffix_default( ) function instead.

suffix_rev() { reverse_mode(); }
The source file samplemode.e defines a sample mode you can use as a template for your modes. Make a copy of the file, replace all references to "sample" with the name of your mode, and modify it as needed for your language's syntax.

Language modes may wish to define a compilation command. This tells the compile-buffer command on Alt-F3 how to compile the current buffer. For example, compile_asm_cmd is defined as ml "%r". (Note that " characters must be quoted with \ in strings.) Use one of the % sequences shown in File Name Templates in the command to indicate where the file name goes, typically %f or %r.

The mode can define coloring rules. See Code Coloring Internals for details. Often, you can copy existing syntax coloring routines like those for .asm or .html files and modify them. They typically consist of a loop that searches for the next "interesting" construct (like a comment or keyword), followed by a switch statement that provides the coloring rule for each construct that could be found. Usually, finding an identifier calls a subroutine that does some additional processing (determining if the identifier is a keyword, for instance).

A language mode should set comment variables like comment-start. This tells the commenting commands (see Commenting Commands) how to search for and create legal comments in the language.

The comment commands look for comments using regular expression patterns contained in the buffer-specific variables comment-pattern (which should match the whole comment) and comment-start (which should match the sequence that begins a comment, like "/*"). When creating a comment, comment commands insert the contents of the buffer-specific variables comment-begin and comment-end around the new comment.

Showing matching delimiters

Commands like forward-level that move forward and backward over matching delimiters will (by default) recognize (, [, and { delimiters. It won't know how to skip delimiters inside quoted strings, or similar language-specific features. A language mode can define a replacement delimiter movement function. See Other Movement Functions for details.

To let Epsilon automatically highlight matching delimiters in the language when the cursor appears on them, a language mode uses code like this to set the auto-show-matching-characters variable:

if (auto_show_asm_delimiters) auto_show_matching_characters = asm_auto_show_delim_chars;
where references to "asm" are of course replaced by the mode's name. The language mode should define the two variables referenced above:

user char auto_show_asm_delimiters = 1; user char asm_auto_show_delim_chars[20] = "{[]}";
The list of delimiters should contain an even number of characters, with all left delimiters in the left half and right delimiters in the right half. (A delimiter that's legal on the left or right should appear in both halves; then the language must provide a mode_move_level definition that can determine the proper search direction itself. See Other Movement Functions.)

Sometimes a mode may wish to highlight delimiters more complicated than single characters, such as BEGIN and END keywords. To do this, the mode should define a function such as mymode_auto_show_delimiter() and then set the buffer-specific function pointer variable mode_auto_show_delimiter to point to it in that buffer.

Epsilon will then call that function when idle to highlight delimiters. It should return 0 if no highlighting should be done, 1 to make Epsilon try to use the auto_show_matching_characters setting described above for simple highlighting, 2 to indicate mismatched delimiters, or 3 to indicate matched delimiters. In the latter two cases it should also display the highlighting, by setting two arrays to mark the appropriate buffer regions, as shown in the example. This sample only demonstrates how to control the highlighting; a typical mode would use smarter rules for finding the matching keywords (ignoring nested pairs, skipping over keywords in comments or strings, and so forth).

#include "eel.h" #include "colcode.h" int mymode_auto_show_delimiter() { save_var point, case_fold = 1; save_var matchstart, matchend, abort_searching = 0; init_auto_show_delimiter(); // Must do this first. point -= parse_string(-1, "[a-z0-9_]+"); *highlight_area_start[0] = point; if (parse_string(1, "</word>begin</word>")) { *highlight_area_end[0] = matchend; if (!re_search(1, "</word>end</word>")) return 2; } else if (parse_string(1, "</word>end</word>")) { *highlight_area_end[0] = matchend; if (!re_search(-1, "</word>begin</word>")) return 2; } else return 1; *highlight_area_start[1] = matchstart; // Mark the far end. *highlight_area_end[1] = matchend; modify_region(SHOW_MATCHING_REGION, MRTYPE, REGNORM); // Make the highlighting visible. return 3; }

A language mode may also want to set things up so typing a closing delimiter momentarily moves the cursor back to show its matching pair. Binding keys like ] and ) to the command show-matching-delimiter will accomplish this.

Displaying the current function's name

c_func_name_finder() // Sample C mode func name finder. char display_func_name[]; char must_find_func_name; int start_of_function; set_display_func_name() get_func_name(int idle)
A language mode may want to arrange for the name of the current function or similar to appear in the mode line, subject to the display-definition variable.

To do this, it must define a function named modename_func_name_finder, where modename is the mode's name as recorded in the major_mode variable. The function should write the current function's name to the display_func_name variable and return 1, also setting the start_of_function variable to a buffer position representing the start of the function, if it can. If point is not in a function, set display_func_name to an empty string and return 1.

Epsilon normally runs this function during idle time. If the user presses a key during this function, and the must_find_func_name variable is zero, it should stop any slow parsing if it can and return 0.

There are two subroutines that take advantage of such functions. The set_display_func_name( ) subroutine is what Epsilon calls when idle, to update the displayed function name.

The get_func_name( ) subroutine allows EEL code to take advantage of a mode's function name finder at other times. Pass a nonzero value for idle if you want it to give up and return should the user press a key. It returns 0 if Epsilon's idea of the function name in display_func_name was already up to date, 1 if it wasn't, but it now is, 2 if it couldn't be computed, or 3 if the function gave up to handle a waiting key. Both these functions set start_of_function to the start of the current function in this buffer if they can, or -1 otherwise.

Helpful subroutines

Some subroutines help with mode-specific tasks.

int call_by_suffix(char *file, char *pattern) int get_mode_variable(char *pat) char *get_mode_string_variable(char *pat) int get_mode_based_index(char *pat)
The call_by_suffix( ) subroutine constructs a function name based on the extension of a given file (typically the file associated with the current buffer). It takes the file name, and a function name with %s where the extension (without its leading ".") should be. For example, call_by_suffix("file.cpp", "tag-suffix-%s") looks for a subroutine named tag-suffix-cpp. (If the given file has no extension, the subroutine pretends the extension was "none".)

If there's no subroutine with the appropriate name, call_by_suffix( ) then replaces the %s with "default" and tries to call that function instead. The call_by_suffix( ) subroutine returns 1 if it found some function to call, or 0 if it couldn't locate any suitable function.

The get_mode_variable( ) subroutine searches for a function or variable with a name based on the current mode. Its parameter pat must be a printf-style format string, with a %s where the current mode's name should appear. The subroutine will look for a function or variable with the resulting name. A variable by that name must be numeric; the subroutine will return its value. A function by that name must take no parameters and return a number; this subroutine will call it and return its value. In either case it will set the got_bad_number variable to zero. If get_mode_variable( ) can't locate a suitable function or variable, it sets got_bad_number nonzero.

The get_mode_string_variable( ) subroutine retrieves the value of a string variable whose name depends on the current mode. The name may also refer to a function; its value will be returned. It constructs the name by using sprintf( ); pat should contain a %s and no other % characters; the current mode's name will replace the %s. If there's no such variable or function with that name, it returns NULL. The subroutine sets the got_bad_number variable nonzero to indicate that there was no such name, or zero otherwise.

The get_mode_based_index( ) subroutine looks for a name table entry of any sort (a function, variable, key table, etc.) with a name built by replacing the %s sequence in the specified pattern with the name of the current mode. If there is none, it substitutes "default" for the mode name and tries again. It returns the name table index of the entry it found, or zero if none.

int guess_mode_without_extension(char *res, char *pat)
The guess_mode_without_extension( ) subroutine tries to determine the correct mode for a file without an extension, or with an extension Epsilon doesn't recognize, by examining its text or complete file name. It can detect some Perl and C++ header files that lack any extension, as well as makefiles (based simply on the file's name) and various other sorts of files. If it can determine the mode, it uses pat as a pattern for sprintf( ) (so it should contain one %s and no other %'s) and sets res to the pat, with its %s replaced by the mode name. Then it returns 1. If it can't guess the mode it returns 0.

mode_default_settings()
The mode_default_settings( ) subroutine resets a number of mode-specific variables to default settings. A command that establishes a mode can call this subroutine, if it doesn't want to provide explicit settings for all the usual mode-specific variables, such as comment pattern variables.

zeroed buffer (*buffer_maybe_break_line)(); int example_maybe_break_line(int type) int generic_maybe_break_line(int type) zeroed buffer int (*mode_restrict_break)(); int example_mode_restrict_break(int pos)
The auto-fill minor mode normally calls a function named maybe_break_this_line( ) to break lines. A major mode may set the buffer-specific function pointer buffer_maybe_break_line to point to a different function; then auto-fill mode will call that function instead, for possibly breaking lines as well as for turning auto-fill on or off, or testing its state.

A buffer_maybe_break_line function will be called with one numeric parameter. If 0 or 1, it's being told to turn auto-fill off or on. The function may interpret this request to apply only to the current buffer, or to all buffers in that mode. It should return 0.

If its parameter is 2, it's being asked whether auto-fill mode is on. It should return a nonzero value to indicate that auto-fill mode is on.

If its parameter is 3, it's being asked to perform an auto-fill, if appropriate, triggered by the key in the variable key, which has not yet been inserted in the buffer. It may simply return 1 if the line is not wide enough yet, or after it has broken the line. Epsilon will then insert the key that triggered the filling request. If it returns zero, Epsilon will skip inserting the key that triggered the filling.

Many language modes set buffer_maybe_break_line to point to the generic_maybe_break_line( ) function, which breaks within comments by using variables like comment-start, and doesn't break long lines outside comments. It works in languages that use simple one-line comments.

Even if a mode uses the standard maybe_break_this_line( ) subroutine to handle its line breaking, it can still limit where breaks may occur by setting the buffer-specific function pointer mode_restrict_break to point to a restriction function. A restriction function takes a parameter specifying the position of a space or tab character in the current buffer, and returns 1 if it's OK to break a line at that position, or 0 if it's not. The function must preserve matchstart, matchend, and point. In buffers where mode_restrict_break is zero, any space or tab character is a valid breaking position.

Previous Up Next
Binding Primitives Primitives and EEL Subroutines Language-specific Subroutines