Low-Level JKForth Machinery

Low-Level JKForth Machinery

The Address Interpreter

When the bootloader finishes loading the JKForth kernel, it jumps to the ABORT word, which initializes the data and return stack pointers, and then begins interpreting Forth code using the "address interpreter". I will now explain what that means :)

Forth "words" (subroutines) have the following structure:

  • Word name
  • Link field
  • x86 code header
  • List of words comprising the definition

The Forth "dictionary" (the repository of all word definitions) consists very simply of the concatenation of all the word definitions (the words are organized as a singly-linked list on the link field - each word's link field points to the previous word's name field). The "list of words" comprising the definition is just an array containing the code-header addresses of the words in the definition, in order. The tfdict.s source file directly reflects this dictionary structure.

The source code of a Forth word consists of a definition that might look like this:

: SAY_HELLO   GET_NAME WRITE_HELLO WRITE_NAME ;

This code says: "to execute SAY_HELLO, first execute GET_NAME, then execute WRITE_HELLO, then execute WRITE_NAME". When the compiler sees this definition, it simply creates a definition containing the address of the words GET_NAME, WRITE_HELLO, and WRITE_NAME, in that order. But how does the address interpreter execute this thing?

The key lies in the code header that appears at the beginning of each word. In JKForth, that header consists of a single instruction: an indirect CALL (not JMP) to the JKF_ENTER routine. Of course, the CALL instruction pushes the address following onto the x86 stack. In this case, the address placed on the stack is the address of the first word in the definition being executed. So the JKF_ENTER routine simply pops that address off the stack and jumps (JMP) there!

Naturally, there must be an end to this madness: somewhere we must execute some actual machine code! That happens when the call chain (or "thread") reaches a "primitive" word. Primitive words live in the dictionary just like compiled Forth words, but instead of a code header and word address list, their executable portion consists simply of a sequence of x86 machine instructions. Thus, the most essential pieces of the JKForth kernel (arithmetic functions, stack manipulation) are implemented as primitive words.

Now you may be asking yourself, how does a called word know how to get back to the caller when it's finished executing all the words in its definition? The answer is that all compiled words end with the address of the word JKF_EXIT. JKF_EXIT has only one duty: it figures out where the current word was called from (that is, the address containing the address of the word currently executing), and calls the word indicated by the address following that one - that is, the next address in the caller's definition. [Note: a diagram would be good here.]

How does JKF_EXIT know where the current word was called from? Easy: it uses the saved value of the virtual instruction pointer. The virtual IP always points to the current position in the word list of the word currently being executed. JKF_ENTER pushes that address onto the x86 stack before jumping to the beginning of each new word. Thus, JKF_EXIT merely pops the x86 stack, increments the value obtained, and jumps to the given address. Simple, no? However, primitive words can't use JKF_EXIT directly, because they did not begin with a JKF_ENTER call and thus they have no saved virtual IP. That's OK though - the virtual IP is still valid, and it points just before the next word to be executed in the caller. Thus, primitive words end with a jump to the label __P_JKF_EXIT, which is a location inside the JKF_EXIT code that performs the increment-and-jump on the virtual IP, but skips the stack pop.

Stack Manipulation Primitives

The x86 stack is used to keep track of the calling sequence of Forth words. In Forth parlance, it is the "return stack". There is also a "data stack" which is used for manipulating data. (Actually the programmer has some direct control over the return stack as well, but it is restricted by the fact that the return stack must be left in a consistent state when a word completes. Still, it is perfectly feasible - though not recommended - to do tricks like forcing one's caller to abort by popping its return address.)

In JKForth, the return stack is implemented by the x86 stack pointer, and grows downward from address 2000:FFFE. The data stack, on the other hand, grows upward from address 2000:0000, and is implemented by one of the x86 general-purpose registers. The primitive Forth operations PUSH (add an item to the stack), POP (remove an item from the stack), DUP (push a copy of the top item on the stack), ROT (rotate the top three stack items), and so forth are implemented as primitive words. So are the R> and R< words, which move an item from the top of the data stack to the top of the return stack and vice-versa.

Both stacks hold only 16-bit values, so there is never any question about what size object to push or pop.

Looping primitives

Only one looping primitive is absolutely necessary: the GOTO. JKForth has a GOTO primitive called ABRANCH, which can be used to directly change the virtual instruction pointer to any other address. In source code it would be used as follows:

(* Quack continuously *)
: SILLY_WORD    QUACK ABRANCH [ HERE 4 - ] , ;
(* Note: the phrase "[ HERE 4 - ] ," compiles the address of the QUACK
   call within SILLY_WORD into the definition of SILLY_WORD.
   Don't worry about it. *)

It compiles into the sequence

[address of ABRANCH code]
[target address]
When the ABRANCH code is entered, it increments the virtual IP to point at the target address, then loads the data at that address and moves it to the virtual IP register, thus executing an unconditional absolute branch.

The Forth LOOP construct is also implemented as a primitive.

Conditionals

Forth's primary conditional construct is the "[flag] IF [true-code] THEN" phrase. The word IF compiles a conditional jump based on the flag value on the top of the stack; the jump is to address 0. The word THEN searches the definition for the most recent IF with a 0 address and patches the target address there.

The astute reader will notice that there must be at least two different kinds of words: those that get compiled when they appear in a definition (eg ABRANCH), and those that get executed when they appear in a definition (eg THEN). That is correct. Words that are executed by the compiler, rather than simply being compiled into the current definition, are known as "immediate" words. Programmers can add new immediate words to the dictionary, and can thus directly add functionality to the compiler; this is part of what gives Forth its power. In JKForth, the more usual loop constructs (WHILE, UNTIL) are implemented in this manner, as regular Forth source code.

Console I/O

The EMIT word interprets the top stack item as an ASCII character code (throwing away the high-order byte) and prints the character on the screen using PC BIOS calls.

The GETC word reads a key from the keyboard and pushes the corresponding scan code (high-order byte) and ASCII code (low-order byte) onto the stack.

Disk Access

At the very front of the dictionary there appear a few assembly routines that are used to load and save disk blocks. These routines use the PC BIOS to do their job. I will not say much about them here; you will be able to figure them out if you have any decent BIOS reference.


Last changed:
03-02-06 22:56:46


Questions and comments to Joseph Knapka.