The JKForth Hand-Compiled Kernel

The JKForth Hand-Compiled Kernel

Most of the core JKForth code consists of hand-compiled Forth words. That may sound complicated, but given the structure of forth words, it's really not.

Recall that a Forth word consist of a name field, a link field, a code header, and a list of the word addresses comprising the definition. In the main JKForth source file tfdict.s (Trivial Forth Dictionary :), the macro WORD_DEF is used to set up the name and link fields for a hand compiled word. The DO_JKF_ENTER macro adds the code header, and the EXE macro precedes each word in the word-list. For example, the word "1+", which increments the value on top of the stack, is defined thus:

	WORD_DEF PLUS1,"1+ ",2,JKF_NOP
	DO_JKF_ENTER
	EXE NUMBER ; Push 1 onto the stack.
	DW 1
	EXE PLUS   ; Leave the sum of the two top items on the stack.
	EXE JKF_EXIT

Expressed in Forth source code, this would be:

: 1+   1 + ;
Given that definition, "2+" could be defined to call "1+" twice:
	WORD_DEF PLUS2,"2+ ",2,JKF_NOP
	DO_JKF_ENTER
	EXE PLUS1
	EXE PLUS1
	EXE JKF_EXIT
This is not too different from writing actual Forth code. In fact, with a little more pre-processing it could be pretty much exactly like writing Forth code, at which point we'd have a Forth compiler...

The WORD_DEF macro works as follows:

  • PLUS1 becomes a label attached to the beginning of the word's executable code.
  • "1+ " is an ASCII string containing the first three characters of the word name, padded with spaces if necessary. The third argument, 2, is the length of the word name. The bytes '1', '+', ' ', and 0x02 are concatenated into a four-byte field to form the word name. From Forth's point of view, a word is identified by its first three characters and its length; thus the words ROCKY and ROCKS are indistinguishable. In practice, this rarely causes problems. The first-three-letters-and-length construct is called the "dictionary hash" of a word.
  • JKF_NOP is the label of the code header field of the previous word in the dictionary; it is used to populate the link field of the PLUS1 word.

The DO_JKF_ENTER macro expands into a snippet of assembly code that calls the JKF_ENTER routine to start the word's execution.

The EXE macro simply expands into the NASM "DW" (Define Word) directive: it reserves a 16-bit memory location initialized to the given value. In this case, the values are just pointers to the code-address labels of other words in the dictionary.

The Dictionary Searcher

The word FIND expects to find the address of a string on the top of the stack. It computes the dictionary hash of the string, and searches the dictionary for a matching definition. It leaves the address of the matching word on the stack, or 0 if no match was found. This is the fundamental operation needed to implement the Forth text interpreter.

The Text Interpreter

The text interpreter is what you interact with at the JKForth command prompt. It is implemented by the hand-compiled word INTERPRET. All it does is read the next whitespace-delimited word from the input stream (using the GETNEXTWORD word), FINDs the word in the dictionary, and EXECutes it. (The EXEC word jumps to the address on the top of the stack.)

Actually it is a little more complicated than that. Numeric values are permitted in the input stream, and when encountered they are pushed onto the stack. Thus, if INTERPRET does not find a word in the dictionary, it must then attempt to convert it to a number using the word NUMBERCONVERT (which, incidentally, can work in any base from 2 through 36). Only if number conversion fails does it give up and issue an error message. Note that this means that (a) number conversion is certain to be slower than dictionary lookup, and (b) dictionary entries that happen to be numeric names will be found and executed before number conversion is done. Thus, some Forth systems define commonly-used numeric values (1, 0, -1) as dictionary entries (JKForth does not do this).

The Compiler

The compiler's job is almost as simple as the interpreter's, except that instead of executing the input words, it must compile them into the current definition. What does that mean exactly?

The "current definition" is just the word at the end of the dictionary (highest in RAM). The Forth word HERE places the address of the current end of the dictionary on the stack. "Compiling" a word address into the current definition simply means appending the address to the end of the dictionary and incrementing the HERE value! For instance, confronted with the code:

: DUCK   BEND_LEGS LOWER_HEAD COVER_HEAD_WITH_HANDS ;
the compiler (implemented, believe it or not, by the word COMPILER) would take the following actions:
  1. Create a new word entry at the end of the dictionary called "DUC/4", populate its link field with the previous word, and increment HERE appropriately. The word CREATE performs this task.
  2. Add the code header to the end of the dictionary. Increment HERE appropriately. The word COMPILE_ENTRY_CODE performs this task.
  3. Find BEND_LEGS (using the FIND word). Append its code address to the dictionary, and increment HERE. The COMPILE1 word performs this task.
  4. FIND LOWER_HEAD, COMPILE1.
  5. FIND COVER_HEAD_WITH_HANDS, COMPILE1.
  6. FIND JKF_EXIT (indicated by the ";"), COMPILE1.

And that's all there is to it! Now you are a Forth expert, and you can die happy :-)

OK, it is slightly more complicated. For one thing, just as in the INTERPRET word, there is some special code to handle numbers. A numeric string appearing in the definition gets compiled into the sequence NUMBER, [value]. The NUMBER word increments the virtual instruction pointer and pushes [value] onto the stack, which means that at runtime [value] will appear on the stack when the word containing the NUMBER call is executed.

Second, there is special handling for "immediate" words: they are executed rather than being compiled into the dictionary. A word is marked as immediate by having the high-order bit in the first byte of its dictionary hash set to 1. COMPILE checks each word it finds for the immediate bit, and executes the word if necessary. Among other things, looping constructs are typically implemented using immediate words. Also, it is possible to build data objects with runtime behavior by using the word DOES>, which is a bit more detail than I want to get into here (especially since JKForth's implementation of DOES> is rather hairy). Use the Source.


Last changed:
06-28-06 15:46:00


Questions and comments to Joseph Knapka.