Through the Looking Glass

Write You a Forth, Part 0x05

Posted on February 27, 2018

NB: Today’s update was pretty large, so I don’t show all of the code; this is what git is for.

A bit of break; I took a motorcycle course this weekend and was sick, so I didn’t get much done until last night.

Today I need to start actually doing things with tokens. This requires two things:

  1. Some idea of what a word is, and
  2. A dictionary of words

I started taking some notes on this previously, and I think there are a few kinds of words that are possible:

  1. Numbers (e.g. defining a variable)
  2. Built-in functions
  3. Lambda functions (that is, user-defined functions).

Stage 1 really only needs to incorporate #2, so that’s what I’ll focus on for now. However, to prepare for the future, I’m going to define a Word base class and inherit from there. This interface is going to need to be stack-aware, so what I’ve done is define a System struct in system.h:

#ifndef __KF_CORE_H__
#define __KF_CORE_H__

#include "defs.h"
#include "stack.h"

typedef struct _System {
        Stack<KF_INT>        dstack;
        IO                  *interface;
} System;


#endif // __KF_CORE_H__

This will let me later add in support for the return stack and other things that might be useful. Other ideas: adding something like an errno equivalent to indicate the last error, and storing a dictionary of words. This will need some restructuring, though. I’ve already moved the I/O into the system as well. This took some finangling in kforth.cc; I’m eliding the diff here because it’s so long, but it’s basically a sed -i -e 's/interface./sys->interface..

The Word interface

Now I can start defining a Word.h. Maybe this is a case of when you have an object-oriented language, everything looks like a class, but I decided to design an abstract class for a Word and implement the first concrete class, Builtin. What I came up with was:

class Word {
public:
        virtual ~Word() {};

The eval method takes a System structure and executes some function.

virtual bool  eval(System *) = 0;

The dictionary is a linked list, so next is used to traverse the list.

virtual Word *next(void) = 0;

The match method is used to determine whether this is the word being referred to.

virtual bool  match(struct Token *) = 0;

Finally, getname will fill in a char[MAX_TOKEN_SIZE] buffer with the word’s name.

  virtual void  getname(char *, size_t *) = 0;
};

With the interface defined, I can implement Builtins (I’ve elided the common interface from the listing below to focus on the implementation):

class Builtin : public Word {
public:
        ~Builtin() {};
        Builtin(const char *name, size_t namelen, Word *head, bool (*fun)(System *));

private:
        char         name[MAX_TOKEN_LENGTH];
        size_t       namelen;
        Word        *prev;
        bool        (*fun)(System *);
};

The bool works as a first pass, but I think I’m going to have add some notion of system conditions later on to denote why execution failed. One thing that both pforth and gforth do that I don’t yet do is to clear the stack when there’s an execution failure. At least, they clear the stack with an unrecognised word. The implementation is pretty trivial:

#include "defs.h"
#include "parser.h"
#include "system.h"
#include "word.h"

#include <string.h>


Builtin::Builtin(const char *name, size_t namelen, Word *head, bool (*target)(System *))
        : prev(head), fun(target)
{
        memcpy(this->name, name, namelen);
        this->namelen = namelen;
}

bool
Builtin::eval(System *sys)
{
        return this->fun(sys);
}

Word *
Builtin::next()
{
        return this->prev;
}

bool
Builtin::match(struct Token *token)
{
        return match_token(this->name, this->namelen, token->token, token->length);
}

void
Builtin::getname(char *buf, size_t *buflen)
{
        memcpy(buf, this->name, this->namelen);
        *buflen = namelen;
}

Right. Now to do something with this.

The system dictionary

The dictionary’s interface is minimal:

// dict.h
#ifndef __KF_DICT_H__
#define __KF_DICT_H__

#include "defs.h"
#include "parser.h"
#include "system.h"
#include "word.h"

typedef enum _LOOKUP_ : uint8_t {
        LOOKUP_OK = 0,       // Lookup executed properly.
        LOOKUP_NOTFOUND = 1, // The token isn't in the dictionary.
        LOOKUP_FAILED = 2    // The word failed to execute.
} LOOKUP;

void    init_dict(System *);
LOOKUP  lookup(struct Token *, System *);

#endif // __KF_DICT_H__

There’s a modicum of differentiation between “everything worked” and “no it didn’t,” and by that I mean the lookup can tell you if the word wasn’t found or if there was a problem executing it.

I added a struct Word *dict field to the System struct, since we’re operating on these anyways. I guess it’s best to start with the lookup function first so that when I started adding builtins later it’ll be easy to just recompile and use them. :

LOOKUP
lookup(struct Token *token, System *sys)
{
        Word    *cursor = sys->dict;
        KF_INT   n;

I seem to recall from Programming a Problem-Oriented Language that Chuck Moore advocated checking whether a token was a number before looking it up in the dictionary, so that’s the approach I’ll take:

if (parse_num(token, &n)) {
        if (sys->dstack.push(n)) {
                return LOOKUP_OK;
        }
        return LOOKUP_FAILED;
}

The remainder is pretty much bog-standard linked list traversal:

  while (cursor != nullptr) {
          if (cursor->match(token)) {
                  if (cursor->eval(sys)) {
                          return LOOKUP_OK;
                  }
                  return LOOKUP_FAILED;
          }
          cursor = cursor->next();
  }

  return LOOKUP_NOTFOUND;
}

This needs to get hooked up into the interpreter now; this is going to require some reworking of the parser function:

static bool
parser(const char *buf, const size_t buflen)
{
        static size_t       offset = 0;
        static struct Token token;
        static PARSE_RESULT result = PARSE_FAIL;
        static LOOKUP       lresult = LOOKUP_FAILED;
        static bool     stop = false;

        offset = 0;

        // reset token
        token.token = nullptr;
        token.length = 0;

        while ((result = parse_next(buf, buflen, &offset, &token)) == PARSE_OK) {
                lresult = lookup(&token, &sys);
                switch (lresult) {
                case LOOKUP_OK:
                        continue;
                case LOOKUP_NOTFOUND:
                        sys.interface->wrln((char *)"word not found", 15);
                        stop = true;
                        break;
                case LOOKUP_FAILED:
                        sys.interface->wrln((char *)"execution failed", 17);
                        stop = true;
                        break;
                default:
                        sys.interface->wrln((char *)"*** the world is broken ***", 27);
                        exit(1);
                }

                if (stop) {
                        stop = false;
                        break;
                }
        }

        switch (result) {
        case PARSE_OK:
                return false;
        case PARSE_EOB:
                sys.interface->wrbuf(ok, 4);
                return true;
        case PARSE_LEN:
                sys.interface->wrln((char *)"parse error: token too long", 27);
                return false;
        case PARSE_FAIL:
                sys.interface->wrln((char *)"parser failure", 14);
                return false;
        default:
                sys.interface->wrln((char *)"*** the world is broken ***", 27);
                exit(1);
        }
}

Now I feel like I’m at the part where I can start adding in functionality. The easiest first builtin: addition. Almost impossible to screw this up, right? :

static bool
add(System *sys)
{
        KF_INT  a = 0;
        KF_INT  b = 0;
        if (!sys->dstack.pop(&a)) {
                return false;
        }

        if (!sys->dstack.pop(&b)) {
                return false;
        }

        b += a;
        return sys->dstack.push(b);
}

Now this needs to go into the init_dict function:

void
init_dict(System *sys)
{
        sys->dict = nullptr;
        sys->dict = new Builtin((const char *)"+", 1, sys->dict, add);
}

And this needs to get added into the main function:

int
main(void)
{
        init_dict(&sys);
#ifdef __linux__
        Console interface;
        sys.interface = &interface;
#endif
        sys.interface->wrbuf(banner, bannerlen);
        interpreter();
        return 0;
}

The moment of truth

Hold on to your pants, let’s see what breaks:

$ ./kforth
kforth interpreter
<>
? 2 3 +
ok.
<5>

Oh hey, look, it actually works. Time to add a few more definitions for good measure:

These are all pretty simple, fortunately. A few things that tripped me up, though:

The XOR by 0x20 is just a neat trick for inverting the case of a letter.

                if ((a[i] ^ 0x20) == b[i]) {
                        continue;
                }

                if (a[i] == (b[i] ^ 0x20)) {
                        continue;
                }

                return false;
        }
        return true;
}
$ ./kforth
kforth interpreter
<>
? 2 5040 /
ok.
<��>

It turns out that in write_num, the case where n = 0 results in nothing happening, and therefore the buffer just being written. This is a dirty thing, but I edge cased this:

$ git diff io.cc
diff --git a/io.cc b/io.cc
index 77e0e2a..a86156b 100644
--- a/io.cc
+++ b/io.cc
@@ -24,6 +24,10 @@ write_num(IO *interface, KF_INT n)
                        n++;
                }
        }
+       else if (n == 0) {
+               interface->wrch('0');
+               return;
+       }

        while (n != 0) {
                char ch = (n % 10) + '0';

May the compiler have mercy on my soul and whatnot.

For you sports fans keeping track at home, here’s the classic bugs I’ve introduced so far:

  1. bounds overrun
  2. missing case in a switch statement

But now here I am with the interpreter in good shape. Now I can start implementing the builtins in earnest!

As before, get the snapshot here.