Coroutines in JOE
JOE 3.6 now uses co-routines to help reduce the amount of event-driven
code which has to be written. All previous versions of JOE were entirely in
the event driven coding style, which made user dialog code a chore to write.
Now, with the help of co-routines, dialog code can be written as old
fashioned blocking code which is much cleaner, even though JOE can have
multiple un-finished dialogs on the screen.
The coroutine library has only three (now four) functions:
|int co_call(int (*func)(va_list args),...);||Call a function
as a co-routine.|
|int co_yield(Coroutine *t, int val);||Suspend current
co-routine and return to caller with given return value.|
|int co_resume(Coroutine *t, int val);||Suspend current
co-routine and resume specified one with given return value.|
|int co_suspend(Coroutine *u, int val);||Suspend
current co-routine (remembering its suspended invoking coroutines) and
resume the top-level. The current coroutine is stored in u. When u is
resumed and returns, the suspended chain is resumed (the resumer is
continued when the chain tries to continue the top level.|
co_call() invokes 'func' as a co-routine- it will run on its own stack.
'func' is called immediately upon the call to co_call. All of the arguments
after func are passed as a va_list to the function. co_call calls va_start
and va_end, 'func' should call va_arg once for each passed argument (see
stdarg.h). The co-routine should not yield before it has finished reading
all of the arguments, otherwise va_end will be called early. When 'func'
returns, the co-routine is destroyed and co_call() returns with the return
value of 'func'. If the co-routine yields by calling co_yield, co_call
returns with the return value specified in the call to co_yield. Note that
co_call only ever returns once for a particular call.
co_yield() suspends the current co-routine and returns with the specified
return value to the caller (either to co_call or co_resume). co_yield()
writes information about the co-routine to the 'Coroutine' structure whose
address is passed to co_yield. The Coroutine structure's address is used as
a handle for the co-routine and can be passed to co_resume().
co_resume() suspends the current co-routine (henceforth called the
original co-routine) and resumes the specified co-routine. When the
specified co-routine returns (when the function given in the co_call that
created the co-routine returns), the original co-routine is continued and
co_resume() returns with the function's return value. If the specified
co-routine yields, the original co-routine is continued and co_resume()
returns with the return value specified in the call to co_yield.
co_suspend() is like co_resume, except that the top-level task is resumed
instead of the calling co-routine. This has the effect of suspending the
entire chain of co-routines which invoked the caller of co_suspend.
Co_suspend stores the suspended co_routine in another co_routine. That
co_routine will resume this one when it finally returns.
The co-routine library is structured this way (which is different from
the usual way where you get a co-routine handle immediately upon creation,
for example see Knuth) to encourage proper stack un-winding. Basically, a
co-routine can not be forcibly destroyed. Instead, each created co-routine
is only destroyed when the initial function returns.
The co-routine library is based on ucontext.h and the getcontext,
swapcontext, makecontext and setcontext system calls. If the system does
not have these calls, it uses setjmp/longjmp tricks in a highly portable
way: the stacks for each co-routine are allocated off the top of the main
stack. The main stack grows after each allocation. The stacks are never
freed, but they are reused. This works for any machine with a single normal
stack. It will not work on IA64 (two stacks) or Cray (the stack is really a
I was originally going to use simple cooperative multi-threading, with
only two basic calls: sleep_on() and wake_up() (inspired by old versions of
the Linux kernel) and a top-level scheduler. Sleep_on would suspend the
current thread, and create a new thread which runs just the scheduler.
Wake_up would mark a thread as ready to run, and on the next return to the
scheduler, it would be run.
This turns out to not be powerful enough to implement JOE's keyboard
macro language, at least without much rewriting and inelegance. It could be
done, but the keyboard macro player would have to feed events into the
top-level scheduler or would have to be integrated with it, whereas in older
versions of JOE (even without co-routines) the macro player just calls the
edit functions (more or less) directly. The problem is that sleep_on
(called by a function asking for user input) would suspend the task all the
way up to the scheduler including the macro player, thus preventing the
macro player from simulating user input.
On the other hand with real co-routines, the macro player will call each
edit function as a co-routine (which I like to think of as a task with a
very clear return point). If the edit function has to yield, it returns
immediately to the macro player for the next macro step. If the yield is
due to a request for input, the macro player supplies it by calling the
recorded edit functions. If the macro wants to allow the user to provide
input (using the "query" function, also known as an interactive macro), it
can itself yield (since it is also running in a coroutine) back to the
top-level edit loop (in actuality, the macro player calls a function which
calls co_suspend() to do this). The macro player is continued when the user
finishes supplying the input (hits the return key at a prompt). Several
interactive macros can be running at once with this scheme.
JOE 3.6 uses "semi-automatic" variables for strings. There is a global
"object-stack", accessed through these functions:
|void *obj_malloc(int size);||Allocate a block of memory.|
|void *obj_realloc(void *ptr, int size);||Change allocation size.|
|void obj_free(void *ptr);||Free a specified block and all newer blocks.|
|void obj_perm(void *ptr);||Mark a block as permanent, which in effect moves the block from the stack to the heap.|
|int obj_len(void *ptr);||Access hidden "length" part of block (string length).|
|int obj_size(void *ptr);||Access hidden "size" part of block (allocation size).|
These functions remember the order of allocation. When an allocated
block is freed, it and all newer blocks are freed as well. For example, in:
unsigned char *a = obj_malloc(10);
unsigned char *b = obj_malloc(10);
unsigned char *c = obj_malloc(10);
unsigned char *d = obj_malloc(10);
b, c, and d are all freed, but a remains allocated. The top-level edit
loop is structured to call obj_free after every editor command:
unsigned char *gc = obj_malloc(1);
execmd(...); /* Execute edit commands */
obj_free(gc); /* Collect garbage */
Thus, you rarely have to call obj_free- the only time you might want to do
it is if you are calling obj_malloc in a loop. This could cause a large
amount of memory to be allocated, so obj_free() should be called at the
end of each loop iteration.
obj_perm() marks an object as a heap object instead of a stack object.
When a heap object is freed, only itself is freed- the stack is not
modified. Also when the stack is freed, heap objects are not touched.
JOE uses dynamic strings built on top of the global object stack
functions. These strings automatically resize themselves to fit their
contents. They also know their own length, which allows you to have NULs in
the middle of a string. They also always have a terminating NUL to make
them compatible with C-strings. In fact they have (almost) the same type as
C-strings: unsigned char *. The length is hidden in the obj_malloc data
structure. These functions are provided:
|unsigned char *vsmk(int prealloc);||Allocate a zero length string, but with enough space to grow to the specified length.|
|int vslen(unsigned char *s);||Return string length or 0 if s is NULL.|
|unsigned char *vsensure(unsigned char *s, int len);||Make sure there is enough space for a string of the specified length.|
|unsigned char *vstrunc(unsigned char *s, int len);||Set length of string: the string is truncated or extended with spaces to the specified length.|
|unsigned char *vsncpy(unsigned char *s,int offset,unsigned char *block, int block_size);||Copy a memory block to a string. The string is extended with spaces if offset > string length.|
|unsigned char *vsdupz(unsigned char *s);||Duplicate a z-string to a variable string.|
|unsigned char *vsdup(unsigned char *s);||Duplicate a variable string.|
|unsigned char *vscpyz(unsigned char *s,unsigned char *z);||Copy a z-string to a variable string.|
|unsigned char *vscpy(unsigned char *s,unsigned char *block,int size);||Copy a memory block to a variable string.|
|unsigned char *vscatz(unsigned char *s,unsigned char *z);||Append a z-string to a variable string.|
|unsigned char *vscat(unsigned char *s,unsigned char *block,int size);||Append a memory block to a variable string.|
|unsigned char *vsadd(unsigned char *s,unsigned char c);||Append a single character to a string.|
|unsigned char *vsfmt(unsigned char *s,int offset,const char *format,...);||Sprintf() to a variable string.|
|unsigned char *vsgets(unsigned char **sp,FILE *f);||fgets() to a variable string (returns NULL on EOF, does not put '\n' in the string).|
Note that it is OK to pass NULL in place of 's' in any of the above
functions. In this case, a new string is allocated.
Most of these function return the address of the variable string, in case
it was modified. One exception is vsgets()- it takes an address of a
variable holding the string. It returns the string address, but it also
writes it to the variable directly.
Many of the above functions take a memory block, which is specified by an
address and a length. These macros can be used to help construct these
|#define sc(s) (s), (sizeof(s)-1)||For string constants.|
|#define sz(s) (s), zlen(s)||For C-strings (zlen is the same as strlen).|
|#define sv(s) (s), vslen(s)||For variable strings.|
So for example, vscatz is defined as:
unsigned char *vscatz(unsigned char *s, unsigned char *z);
return vsncpy(sv(s), sz(z));
Variable arrays of variable strings
JOE also semi-automatic variable arrays of variable strings. These
functions are provided:
|unsigned char *vamk(int prealloc);||Allocate a zero length array, but with enough space to grow to the specified length.|
|void varm(unsigned char **a);||Free an array.|
|int valen(unsigned char *a);||Return array length or 0 if a is NULL.|
|unsigned char *vaensure(unsigned char *s, int len);||Make sure there is enough space for an array of the specified length.|
|unsigned char *vatrunc(unsigned char *s, int len);||Set length of array: the array is truncated or extended with NULLs to the specified length.|
|unsigned char *vaadd(unsigned char **a,unsigned char *s);||Append a single string to an array.|
|unsigned char *vasort(unsigned char **a,int len);||Sort an array.|
|unsigned char *vawords(unsigned char **a,unsigned char *s,int len, unsigned char *sep, int seplen);||Convert a word list in s (words separated by characters given in sep/seplen) into an array of strings.|
|void vaperm(unsigned char **a);||Mark an array as preferment (heap instead of stack).|
When an array is marked as permanent, any strings in the array also
marked as permanent, and any string later added to the array are also marked
Look at the comments in b.h for more information.
B *bfind(unsigned char *name);
Load disk file into memory buffer 'B'.
bsave(P *p,unsigned char *name,int size);
Write size bytes from buffer beginning at p to disk file
brm(b); Free data structure
Once you have a B you can access the characters in it via P pointers (which
are like C++ STL iterators).
B *b = bfind("foo"); Load file into memory
P *p = pdup(b->bof); Get pointer to beginning of file (duplicate
b->bof which is a P).
prm(p); Free p when we're done with it.
int c=brch(p); Get character at p.
int c=pgetc(p); Get character at p and advance it.
int c=prgetc(p); Retract p, then return character at it.
- These return -1 (NO_MORE_DATA) if you try to read end of file or
before beginning of file.
- A pointer can point to any character of the file and right after the
end of the file.
- For UTF-8 files, character can be between 0 and 0x7FFFFFFF
Publicly readable members of P:
p->byte The byte offset into the buffer
p->line The line number
p->xcol If P is the cursor, this is the column number
where the cursor will be displayed on the screen
(which does not have to match the column of the
character at P).
pisbof(p); True if pointer is at beginning of buffer
piseof(p); True if pointer is at end of buffer
pisbol(p); True if pointer is at beginning of line
piseol(p); True if pointer is at end of line
pisbow(p); True if pointer is at beginning of a word
piseow(p); True if pointer is at end of a word
More information about character at p:
piscol(p); Get column number of character at p.
Some other ways of moving a P through a B:
pnextl(p); Go to beginning of next line
pprevl(p); Go to end of previous line
pfwrd(p,int n); Move forward n characters
pbkwd(p,int n); Move backward n characters
pset(p,q); Move p to same position as q.
pline(p,n); Goto to beginning of a specific line.
pgoto(p,n); Goto a specific byte position.
pfind(P,unsigned char *s,int len);
Fast Boyer-Moore search forward.
prfind(P,unsigned char *s,int len);
Fast Boyer-Moore search backward.
These are very fast- they look at low level
data structure and don't go through pgetc(). Boyer-Moore
allows you to skip over many characters without reading
them, so you can get something like O(n/len).
Local operations are fast: pgetc(), prgetc().
Copy is fast: pset().
pline() and pgoto() are slower, but look for the closest existing
P to start from.
The column number is stored in P, but it is only updated if
it is easy to do so. If it's hard (like you crossed a line
boundary backward) it's marked as invalid. piscol() then has
to recalculate it.
Modifying a buffer:
binsc(p,int c); Insert character at p.
bdel(P *from,P *to); Delete character between two Ps.
Note that when you insert or delete, all of the Ps after the insertion/
deletion point are adjusted so that they continue to point to the same
character before the insert or delete.
Insert and Delete create undo records.
Insert and Delete set dirty flags on lines which are currently being
displayed on the screen, so that when you return to the edit loop, these
lines automatically get redrawn.
An edit buffer is made up of a doubly-linked list of fixed sized (4 KB)
gap buffers. A gap buffer has two parts: a ~16 byte header, which is always
in memory, and the actual buffer, which can be paged out to a swap file (a
vfile- see vfile.h). A gap buffer consists of three regions: text before
the gap, the gap and text after the gap (which always goes all the way to
the end of buffer). (hole and ehole in the header indicate the gap
position). The size of the gap may be 0 (which is the case when a file is
first loaded). Gap buffers are fast for small inserts and deletes when the
cursor is at the gap (for delete you just adjust a pointer, for insert you
copy the data into gap). When you reposition the cursor, you have to move
the gap before any inserts or deletes occur. If you had only one window and
a single large gap buffer for the file, you could always keep the gap at the
cursor- the right-arrow key copies one character across the gap.
Of course for edits which span gap buffers or which are larger than a gap
buffer, you get a big mess of gap buffer splitting and merging plus
doubly-linked list splicing.
Still, this buffer method is quite fast: you never have to do large memory
moves since the gap buffers are limited in size. To help search for line
numbers, the number of newlines '\n's contained in each gap buffer is stored
in the header. Reads are fast as long as you have a P at the place you
want to read from, which is almost always the case.
It should be possible to quickly load files by mapping them directly into
memory (using mmap()) and treating each 4KB page as a gap buffer with 0 size
gap. When page is modified, use copy-on-write to move the page into the
swap file (change pointer in header). This is not done now. Instead the
file is copied when loaded.
There is a tiny object-oriented windowing system built into JOE. This is
the class hierarchy:
A optimizing terminal screen driver (very similar to 'curses').
has a pointer to a CAP, which has the terminal capabilities read
from termcap or terminfo.
writes output to screen with calls to the macro ttputc(). (tty.c is the
actual interface to the tty device).
cpos() - set cursor position
outatr() - draw a character at a screen position with attributes
eraeol() - erase from some position to the end of the line
Contains list of windows on the screen (W *topwin).
Points to window with focus (W *curwin).
Contains pointer to a 'SCRN', the tty driver for the particular terminal
A window on a screen.
Has position and size of window.
void *object- pointer to a structure which inherits window (W should
really be a base class for these objects- since C doesn't have this
concept, a pointer to the derived class is used instead- the derived
class has a pointer to the base class: it's called 'parent').
Currently this is one of:
BW * a text buffer window (screen update code is here.)
QW * query window (single character yes/no questions)
MENU * file selection menu
BW * is inherited by (in the same way that a BW inherits a W):
PW * a single line prompt window (file name prompts)
TW * a text buffer window (main editing window).
WATOM *watom- Gives type of this window. The WATOM structure has
pointers to virtual member functions.
KBD *kbd- The keyboard handler for this window. When window has
focus, keyboard events are sent to this object. When key sequences
are recognized, macros bound to them are invoked.
Some window are operators on others. For example ^K E, load a file into a
window prompt operates on a text window. If you hit tab, a file selection
menu which operates on the prompt window appears below this. When a window
is the target of operator windows is killed, the operators are killed also.
Currently all windows are currently the width of the screen (should be fixed
in the future). The windows are in a big circular list (think of a big loop
of paper). The screen is small window onto this list. So unlike emacs, you
can have windows which exist, but which are not on the screen.
^K N and ^K P move the cursor to next or previous window. If the next
window is off the screen it is moved onto the screen, along with any
operator windows are target it.
- add something here.
- add something here.
There are two parts to the syntax highlighter: the parser (in syntax.c)
and the line attribute database (in lattr.c). The parser is simple enough:
it is a finite state machine interpreter. You supply a line of text and the
starting state, and it returns the starting state of the next line and an
array containing the color to be used for each character. Each state has a
color. The line is scanned left to right, character by character: the state
and the character index a table, which gives an action, which includes the
The action specifies the next state, and also indicates whether some
previous N characters (including the current) should be "recolored" with the
color of the next state, or whether the color of the current state should be
used for the character. The action also indicates whether to move on to the
next character, or to stay put and re-evaluate the current character with
the new state. This leads to simpler hand-coding of the state machine, but
can create an infinite loop. The action also indicates if the character
should be added to a string buffer. The string buffer can be used (when
specified by an action) as a key for a hash table look-up (typically filled
with key-words). If there is a match, the hash-table entry supplies the
action instead of the state table.
This parser is fast and simple and powerful enough to lexically analyze
more than 40 languages. However a few enhancements were made to improve
both the class of languages which can be highlighted and to improve the ease
of writing the state machine. To support "here" documents in perl and sh,
as well as to simplify matching delimiters (like open with close
parenthesis), a string (from the string buffer) can be stored along with the
state. This can be matched against the string buffer. So first the
terminating word from the here document is stored in the string, and then
the state machine looks forward until it finds this string to determine the
end of the here document.
The state is only a single integer, not an entire stack. The parser can
therefore not handle recursive grammars. Unfortunately, some languages have
recursive elements in their lexical structure. One way to deal with this is
to support only a very limited recursion depth by tediously repeating a set
of states in the hand written state machine N times, where N is your maximum
recursion depth. To ease writing this type of thing, and also to allow
state machine to be reused in cases where one language is embedded within
another (PHP in HTML, for example), a macro system was added to the state
machine loader. Recursion is allowed in the macro calls, but is limited to
a depth of 5.
Line attribute cache
The second part of the syntax highlighter is the line attribute cache.
In the original implementation, the state of the first line of each window
is maintained and the parser is invoked during the screen update algorithm
to highlight all of the lines on the screen. This works well when the
window remains in place or when it scrolls forward through the file.
However, if the window scrolls backward you may have to reparse the entire
file or from the nearest previous window if there happens to be one. This
can be slow for large files, and certainly wastes a lot of CPU time when
scrolling backward through a file.
One attempt to solve this was to instead of starting at the beginning of
the file, to instead assume that the line 200 lines (a configurable option)
back was at the initial state, parse forward from there and hope for the
best. Sometimes the state machine would find itself back in the proper
state, but not always- to the point where most syntaxes had specified to
just always reparse from the beginning of the file.
Finally, the line attribute cache was added. It stores the parser state
at the beginning of each line, from the beginning of the file to last line
actually displayed on the screen. This solves the scrolling problem, but
introduces a new one: how to keep the line attribute cache up to date. This
turns out to not be so difficult. First of all, the states are stored in a
gap-buffer, which makes it easy to perform insertions and deletions.
Whenever lines are inserted or deleted in the file, the line attribute cache
is expanded or contracted to keep in sync. Now, when any change is made to
the file, we reparse from just before the start of the change until the
state of the beginning of a line again matches the corresponding entry in
the cache. If a match is found, the rest of the cache is assumed to be
correct. Some changes will require a large amount of reparsing, but many
will not. In any case, the the highlighting on the screen is always
|b.c||Text buffer management|
|blocks.c||Library: fast memcpy() functions (necessary on really old versions of UNIX).|
|bw.c||A class: text buffer window (screen update code is here)|
|charmap.c||UNICODE to 8-bit conversion functions|
|cmd.c||Table of user edit functions|
|dir.c||Directory reading functions (for old UNIXs).|
|hash.c||Library: automatically expanding hash table functions.|
|help.c||Implement the on-line help window|
|i18n.c||Unicode character type information database|
|kbd.c||Keymap data structure (keysequence to macro bindings).|
|lattr.c||Line attribute cache|
|macro.c||Keyboard and joerc file macros|
|main.c||Has main() and top level edit loop|
|menu.c||A class: menu windows|
|obj.c||Semi-automatic objects, plus dynamic string and array library|
|path.c||Library: file name and path manipulation functions|
|poshist.c||Cursor position history|
|pw.c||A class: prompt windows|
|queue.c||Library: doubly linked lists|
|qw.c||A class: single-key query windows|
|rc.c||Joerc file loader|
|scrn.c||Terminal update functions (like curses)|
|selinux.c||Secure linux (for RedHat) functions|
|tab.c||Tab completion for file selection prompt|
|termcap.c||Load terminal capabilities from /etc/termcap file or terminfo database|
|tw.c||A class: main text editing window|
|ublock.c||User edit functions: block moves|
|uedit.c||User edit functions: basic edit functions|
|uerror.c||User edit functions: parse compiler error messages and goto next error, previous error|
|ufile.c||User edit functions: load and save file|
|uformat.c||User edit functions: paragraph formatting, centering|
|uisrch.c||User edit functions: incremental search|
|umath.c||User edit functions: calculator|
|usearch.c||User edit functions: search & replace|
|ushell.c||User edit functions: subshell|
|utag.c||User edit functions: tags file search|
|utf8.c||UTF-8 to unicode coding functions|
|vfile.c||Library: virtual memory functions (allows you to edit files larger than memory)|
|w.c||A class: base class for all windows|