Code Coloring Internals

Epsilon's code coloring routines use the character coloring primitives above to do code coloring for various languages like C, TeX, and HTML. There are some general purpose code coloring functions that manage code coloring and decide what sections of a buffer need to be colored. Then, for each language, there are functions that know how to color text in that language.

The general purpose section maintains information on what parts of each buffer have already been colored. It divides each buffer into sections that are already correctly colored, and sections that may not be correctly colored. When the buffer changes, it moves its divisions so that the modified text is no longer marked "correctly colored". Whenever Epsilon displays part of a buffer, this part of code coloring recolors sections of the buffer as needed, and marks them so they won't be colored again unless the buffer changes. Epsilon only displays the buffer after the appropriate section has been correctly colored. This part also arranges to color additional sections of the buffer whenever Epsilon is idle, until the buffer has been completely colored.

The other part of code coloring does the actual coloring of C, TeX, and HTML buffers. You can write new EEL functions to tell Epsilon how to color other languages, and use the code coloring package's mechanisms for remembering which parts of the buffer have already been colored, and which need to be recolored. This section describes how to do this. (Also see Defining Language Modes.)

buffer int (*recolor_range)(); // how to color part of this buffer buffer int (*recolor_from_here)(); // how to find a good starting pos int color_c_range(int from, int to) // how to color part of C buffer int color_c_from_here(int safe) // how to find starting pos in C buffer buffer char coloring_flags; #define COLOR_DO_COLORING 1 #define COLOR_IN_PROGRESS 2 #define COLOR_MINIMAL 4 #define COLOR_INVALIDATE_FORWARD 8 #define COLOR_INVALIDATE_BACKWARD 16 #define COLOR_INVALIDATE_RESETS 32 #define COLOR_RETAIN_NARROWING 64 #define COLOR_IGNORE_INDENT 128 int must_color_through;
You must first write two functions and make the buffer-specific function pointers refer to them, in each buffer you want to color. For C/C++/EEL buffers, the c-mode command takes care of setting the function pointers. It also contains the lines

if (want_code_coloring) when_setting_want_code_coloring();
to actually turn on code coloring for the buffer if necessary.

The first function, which must be stored in the buffer-specific recolor_range variable, does the actual coloring of a part of the buffer. It takes two parameters from and to specifying the range of the buffer that needs coloring. It colors at least the specified range, but it may go past to and color more of the buffer. It returns the buffer position it reached, indicating that all characters between from and its return value are now correctly colored. In C buffers, the recolor_range function is named color_c_range( ).

The recolor_range function may decide to mark some characters in the range "uncolored", by calling set_character_color( ) with a color class of -1. Or it may assign particular color classes to all parts of the range to be colored. But either way, it should make sure all characters in the given range are correctly colored. Typically, a function begins by setting all characters between from and to to a default color class, then searching for elements which should be colored differently. Be sure that if you extend the range past to, you color all the characters between to and your new stopping point.

Epsilon remembers which parts of the buffer require coloring by using a tagged region (see Character Coloring) named "needs-color". A coloring routine may decide, while parsing a buffer, that some later or earlier section of the buffer requires coloring; if so, it can set the needs-color attribute of that section to -1 to indicate this, and Epsilon will recolor that section of the buffer the next time it's needed. Or it can declare that some other section of the buffer is already properly colored by setting that section's attribute to 0. It may also decide to examine the must_color_through variable, a buffer position marking the end of the region that really requires coloring right now. (Ordinarily, Epsilon expands this region to include entire color blocks.)

When the buffer's modified, some of its coloring becomes invalid, and must be recomputed the next time it's needed. Normally Epsilon invalidates a few lines surrounding the changed section. Some language modes tell Epsilon to automatically invalidate more of the buffer by setting flags in the buffer-specific coloring_flags variable. (Other flags in this variable aren't normally set by language modes; code coloring uses them for bookkeeping purposes.)

COLOR_INVALIDATE_FORWARD indicates that after the user modifies a buffer, any syntax highlighting information after the modified region should be invalidated. COLOR_INVALIDATE_BACKWARD indicates that syntax highlighting information before the modified region should be invalidated.

COLOR_INVALIDATE_RESETS tells Epsilon that whenever it invalidates syntax highlighting in a region, it should also set the color of all text in that region to the default of -1. COLOR_RETAIN_NARROWING indicates that coloring should respect any narrowing in effect (instead of looking outside the narrowed area to parse the buffer in its entirety). COLOR_IGNORE_INDENT says that a simple change of indentation shouldn't cause any recoloring. Languages with no column-related highlighting rules may set this for better performance.

For many languages, starting to color at an arbitrary place in the buffer requires a lot of unnecessary work. For example, the C language has comments that can span many lines. A coloring function must know whether it's inside a comment before it can begin coloring. Similarly, a coloring function that began looking from the third character in the C identifier id37 might decide that it had seen a numeric constant, and incorrectly color the buffer.

To simplify this problem, the coloring routines ensure that coloring begins at a safe place. We call a buffer position safe if the code coloring function can color the buffer beginning at that point, without looking at any earlier characters in the buffer.

When Epsilon calls the function in recolor_range, the value of from is always safe. Epsilon expects the function's return value to be safe as well; it must be OK to continue coloring from that point. For C, this means the returned value must not lie inside a comment, a keyword, or any other lexical unit. Moreover, inside the colored region, any boundary between characters set to different color classes must be safe. If the colored region contains a keyword, for example, Epsilon assumes it can begin recoloring from the start of that keyword. (If this isn't true for a particular language, its coloring function can examine the buffer itself to determine where to begin coloring.)

When Epsilon needs to color more of the buffer, it generally starts from a known safe place: either a value returned by the buffer's recolor_range function, or a boundary between characters of different colors. But when Epsilon first begins working on a part of the buffer that hasn't been colored before, it must determine a safe starting point. The second function you must provide, stored in the recolor_from_here buffer-specific function pointer, picks a new starting point. In C buffers, the recolor_from_here function is named color_c_from_here( ).

The buffer's recolor_from_here function looks backward from point for a safe position and returns it. This may involve a search back to the start of the buffer. If Epsilon knows of a safe position before point in the buffer, it passes this as the parameter safe. (If not, Epsilon passes 0, which is always safe.) The function should respect the value of the color-look-back variable to limit searching on slow machines.

Epsilon provides two standard recolor_from_here functions that coloring extensions can use. The recolor_by_lines( ) subroutine is good for buffers where coloring is line-based, such as dired buffers. In such buffers the coloring needed for a line doesn't depend at all on the contents of previous lines. The recolor_from_top( ) subroutine has just the opposite effect; it forces Epsilon to start from the beginning of the buffer (or an already-colored place). This may be all that's needed if a mode's coloring function is very simple and quick.

Epsilon runs the code coloring functions while it's refreshing the screen, so running the EEL debugger on code coloring functions is difficult, since the debugger itself needs to refresh the screen. The best way to debug such functions is to test them out by calling them explicitly, using test-bed functions like these:

command debug_color_region() { fix_region(); set_character_color(point, mark, color_class default); point = color_algol_range(point, mark); }

command debug_from_here() { point = color_algol_from_here(point); }
The first command above tries to recolor the current region, and moves past the region it actually colored. It begins by marking the region with a distinctive color (using the default color class), to help catch missing coloring. The second command helps you test your from_here function. It moves point backwards to the nearest safe position. Once you're satisfied that your new code-coloring functions work correctly, you can then set the recolor_range and recolor_from_here variables to refer to them.

To help find bugs in a code-coloring function, you can have Epsilon check whether it has colored every character in the range. This requires two steps. First, define DEBUG_COLORING in your EEL file before it includes eel.h. (This makes it intercept all your set_character_color( ) calls and send them to a debugging version.) Second, set the want-debug-coloring variable nonzero.

Bits in the want-debug-coloring variable enable features. The 1 bit makes Epsilon simply check if a coloring function colored every character. That is, if it was given a range start...end and returned the value ret (which must be >= end), it checks whether set_character_color( ) was called to assign a color on every character in the range start...ret. If it skips some, Epsilon will display a message and add a report to a debug buffer whose name is the original buffer's name with "debug-color-" prefixed. The 2 bit makes color debugging apply a distinctive color class to all the missed characters.

The 4 bit makes the debug buffer's report list not only every region that was missed, but also every region that had a color applied to it, and the 8 bit generates such a report every time your coloring function is invoked. Finally, the 16 bit has the debug buffer list every single call to set_character_color() by your coloring function. (These bits can be helpful in simply seeing what regions Epsilon is asking for, and what your function does in response.)

Detecting uncolored-region bugs only works on those coloring subroutines that are designed to color every character in their given range (plus any additional range they report they've colored, when they return a value greater than the range start...end they were given). But some modes' coloring functions (for example, HTML mode) keep track of which buffer regions require coloring in a different way, and other modes intentionally only apply colors to certain parts of the buffer. (For example, process mode uses this method only for user input, and colors process output as it arrives, based on any ANSI escape sequences it contains.) Color debugging isn't useful for spotting bugs in such modes, but its logging ability can be useful.

buffer int (*when_displaying)(); recolor_partial_code(int from, int to) char first_window_refresh; add_buffer_when_displaying(int buf, int (*func)()) delete_buffer_when_displaying(int buf, int (*func)()) default_when_displaying(int from, int to) drop_all_colored_regions() drop_coloring(int buf)
Epsilon calls the EEL subroutine pointed to by the buffer-specific function pointer when_displaying as it displays a window on the screen. It calls this subroutine once for each window, after determining which part of the buffer will be displayed, but before putting text for that window on the screen.

Epsilon sets the first_window_refresh variable prior to calling the when_displaying subroutine to indicate whether or not this is the first time a particular buffer has been displayed during a particular screen refresh. When a buffer appears in more than one window, Epsilon sets this variable to 1 before calling the when_displaying subroutine during the display of the first window, and sets it to zero before calling that subroutine during the display of the remaining windows. Epsilon sets the variable to 1 if the buffer only appears in one window. The value is valid only during a call to the buffer's when_displaying subroutine.

In a buffer with code coloring turned on, the when_displaying variable points to a subroutine named recolor_partial_code( ). Epsilon passes two values to the subroutine that specify the range of the buffer that was modified since the last time the buffer was displayed. The standard recolor_partial_code( ) subroutine provided with Epsilon uses this information to discard any saved coloring data for the modified region of the buffer in the data structures it maintains. It then calls the two language-specific subroutines described at the beginning of this section as needed to color parts of the buffer.

You can tell Epsilon to run a function at display time by calling the add_buffer_when_displaying( ) subroutine. It arranges for the specified function to be called after code coloring has been done when displaying any window showing the specified buffer. The function will be called with no parameters. The delete_buffer_when_displaying( ) removes the specified function from that buffer's list of functions to be called at display time.

The recolor_partial_code( ) subroutine calls the default_when_displaying( ) function, which calls each such function set by add_buffer_when_displaying( ). In most buffers without code coloring turned on, the when_displaying variable points to the default_when_displaying( ) function directly. Other functions assigned to when_displaying should call default_when_displaying( ) too.

The drop_all_colored_regions( ) subroutine discards coloring information collected for the current buffer. The next time Epsilon needs to display the buffer, it will begin coloring the buffer again. The drop_coloring( ) subroutine is similar, but lets you specify the buffer number. It also discards some data structures, so it's more suitable when the buffer is about to be deleted.

recolor_buffer_range(start, end) int get_character_syntax_color(int pos)
You can call the recolor_buffer_range( ) subroutine to make Epsilon apply appropriate colors to a range of the current buffer. This can be useful if you want to copy some of the buffer with its correct coloring, even if it's not currently displayed in any window. Similarly, the get_character_syntax_color( ) subroutine returns the character class of a character in the buffer, after using recolor_buffer_range( ) to apply coloring to it.

Previous Up Next
Character Coloring Primitives and EEL Subroutines Colors