Through the Looking Glass

Write You a Forth, Part 0x04

Posted on February 23, 2018

So, I lied about words being next. When I thought about it some more, what I really need to do is start adding the stack in and adding support for parsing numerics. I’ll start with the stack, because it’s pretty straightforward.

I’ve added a new definition: constexpr uint8_t STACK_SIZE = 128. This goes in the linux/defs.h, and the #else in the top defs.h will set a smaller stack size for other targets. I’ve also defined a type called KF_INT that, on Linux, is a uint32_t:

index 4dcc540..e070d27 100644
--- a/defs.h
+++ b/defs.h
@@ -3,6 +3,9 @@

#ifdef __linux__
#include "linux/defs.h"
+#else
+typedef int KF_INT;
+constexpr uint8_t STACK_SIZE = 16;
#endif

constexpr size_t       MAX_TOKEN_LENGTH = 16;
diff --git a/linux/defs.h b/linux/defs.h
index 57cdaeb..3740f5a 100644
--- a/linux/defs.h
+++ b/linux/defs.h
@@ -4,4 +4,7 @@
#include <stddef.h>
#include <stdint.h>

+typedef int32_t KF_INT;
+constexpr uint8_t      STACK_SIZE = 128;
+
#endif
\ No newline at end of file

It seems useful to be able to adapt the kind of numbers supported; an AVR might do better with 16-bit integers, for example.

stack.h

The stack is going to be templated, because we’ll need a double stack later for floating point and a return address stack later. This means everything will go under stack.h. This is a pretty simple implementation that’s CS 101 material; I’ve opted to have the interface return =bool=s for everything to indicate stack overflow and underflow and out of bounds:

#ifndef __KF_STACK_H__
#define __KF_STACK_H__

#include "defs.h"

template <typename T>
class Stack {
public:
        bool   push(T val);
        bool   pop(T &val);
        bool   get(size_t, T &);
        size_t size(void) { return this->arrlen; };
private:
        T arr[STACK_SIZE];
        size_t arrlen;
};

// push returns false if there was a stack overflow.
template <typename T>
bool
Stack<T>::push(T val)
{
        if ((this->arrlen + 1) > STACK_SIZE) {
                return false;
        }

        this->arr[this->arrlen++] = val;
        return true;
}

// pop returns false if there was a stack underflow.
template <typename T>
bool
Stack<T>::pop(T &val)
{
        if (this->arrlen == 0) {
                return false;
        }

        val = this->arr[this->arrlen - 1];
        this->arrlen--;
}

// get returns false on invalid bounds.
template <typename T>
bool
Stack<T>::get(size_t i, T &val)
{
        if (i > this->arrlen) {
                return false;
        }

        val = this->arr[i];
        return true;
}

#endif // __KF_STACK_H__

I’ll put a Stack<KF_INT> in kforth.cc later on. For now, this gives me an interface for the numeric parser to push a number onto the stack.

parse_num

It seems like the best place for this is in parser.cc — though I might move into a token processor later. The definition for this goes in parser.h, and the body is in parser.cc:

// parse_num tries to parse the token as a signed base 10 number,
// pushing it onto the stack if needed.
bool
parse_num(struct Token *token, Stack<KF_INT> &s)
{
        KF_INT  n = 0;
        uint8_t i = 0;
        bool    sign = false;

It turns out you can’t parse a zero-length token as a number… :

if (token->length == 0) {
        return false;
}

I’ll need to invert the number later if it’s negative, but it’s worth checking the first character to see if it’s negative. :

if (token->token[i] == '-') {
        i++;
        sign = true;
}

Parsing is done by checking whether each character is within the range of the ASCII numeral values. Later on, I might add in separate functions for processing base 10 and base 16 numbers, and decide which to use based on a prefix (like 0x). If the character is between those values, then the working number is multiplied by 10 and the digit added. :

while (i < token->length) {
        if (token->token[i] < '0') {
                return false;
        }

        if (token->token[i] > '9') {
                return false;
        }

        n *= 10;
        n += (uint8_t)(token->token[i] - '0');
        i++;
}

If it was a negative number, then the working number has to be inverted:

if (sign) {
        n *= -1;
}

Finally, return the result of pushing the number on the stack. One thing that might come back to get me later is that this makes it impossible to tell if a failure to parse the number is due to an invalid number or due to a stack overflow. This will be a good candidate for revisiting later. :

  return s.push(n);
}

io.cc

Conversely, it’ll be useful to write a number to an IO interface. It seems more useful right now to just provide a number → I/O function, but that’ll be easily adapted to a number → buffer function later. This will add a real function to io.h, which will require a corresponding io.cc (which also needs to be added to the Makefile):

#include "defs.h"
#include "io.h"

#include <string.h>

void
write_num(IO &interface, KF_INT n)
{

Through careful scientific study, I have determined that most number of digits that a 32-bit integer needs is 10 bytes (sans the sign!). This will absolutely need to be changed if KF_INT is ever moved to 64-bit (or larger!) numbers. There’s a TODO in the actual source code that notes this. :

char buf[10];
uint8_t i = 10;
memset(buf, 0, 10);

Because this is going out to an I/O interface, I don’t need to store the sign in the buffer itself and can just print it and invert the number. Inverting is important; I ran into a bug earlier where I didn’t invert it and my subtractions below were correspondingly off. :

if (n < 0) {
        interface.wrch('-');
        n *= -1;
}

The buffer has to be filled from the end to the beginning to do the inverse of the parsing method:

while (n != 0) {
        char ch = (n % 10) + '0';
        buf[i--] = ch;
        n /= 10;
}

But then it can be just dumped to the interface:

  interface.wrbuf(buf+i, 11-i);
}

kforth.cc

And now I come to the fun part: adding the stack in. After including stack.h, I’ve added a stack implementation to the top of the file:

// dstack is the data stack.
static Stack<KF_INT>    dstack;

It’s kind of useful to be able to print the stack:

static void
write_dstack(IO &interface)
{
        KF_INT  tmp;
        interface.wrch('<');
        for (size_t i = 0; i < dstack.size(); i++) {
                if (i > 0) {
                        interface.wrch(' ');
                }

                dstack.get(i, tmp);
                write_num(interface, tmp);
        }
        interface.wrch('>');
}

Surrounding the stack in angle brackets is a cool stylish sort of thing, I guess. All this is no good if the interpreter isn’t actually hooked up to the number parser:

// The new while loop in the parser function in kforth.cc:
while ((result = parse_next(buf, buflen, &offset, &token)) == PARSE_OK) {
        interface.wrbuf((char *)"token: ", 7);
        interface.wrbuf(token.token, token.length);
        interface.wrln((char *)".", 1);

        if (!parse_num(&token, dstack)) {
                interface.wrln((char *)"failed to parse numeric", 23);
        }

        // Temporary hack until the interpreter is working further.
        if (match_token(token.token, token.length, bye, 3)) {
                interface.wrln((char *)"Goodbye!", 8);
                exit(0);
        }
}

But does it blend?

Hopefully this works:

~/code/kforth (0) $ make
g++ -std=c++14 -Wall -Werror -g -O0   -c -o linux/io.o linux/io.cc
g++ -std=c++14 -Wall -Werror -g -O0   -c -o io.o io.cc
g++ -std=c++14 -Wall -Werror -g -O0   -c -o parser.o parser.cc
g++ -std=c++14 -Wall -Werror -g -O0   -c -o kforth.o kforth.cc
g++  -o kforth linux/io.o io.o parser.o kforth.o
~/code/kforth (0) $ ./kforth
kforth interpreter
<>
? 2 -2 30 1000 -1010
token: 2.
token: -2.
token: 30.
token: 1000.
token: -1010.
ok.
<2 -2 30 1000 -1010>
? bye
token: bye.
failed to parse numeric
Goodbye!
~/code/kforth (0) $

So there’s that. Okay, next time for real I’ll do a vocabulary thing.

As before, get the snapshot here.

Part B

So I was feeling good about the work above until I tried to run this on my Pixelbook:

$ ./kforth
kforth interpreter
<>
? 2
token: 2.
ok.
<5>

WTF‽ I spent an hour debugging this to realise it was a bounds overflow in write_num. This led me to checking the behaviour of the maximum and minimum values of KF_INT which led to me revising io.cc:

#include "defs.h"
#include "io.h"

#include <string.h>

static constexpr size_t nbuflen = 11;

void
write_num(IO &interface, KF_INT n)
{

    // TODO(kyle): make the size of the buffer depend on the size of
    // KF_INT.
    char buf[nbuflen];
    uint8_t i = nbuflen;
    memset(buf, 0, i);
    bool neg == n < 0;

    if (neg) {
        interface.wrch('-');
        n = ~n;
    }

    while (n != 0) {
        char ch = (n % 10) + '0';
        if (neg && (i == nbuflen)) ch++;

This was the source of the actual bug: buf[i] where i == nbuflen was stomping over the value of n, which is stored on the stack, too. :

    buf[i-1] = ch;
    i--;
    n /= 10;
  }

  uint8\_t buflen = nbuflen - i % nbuflen; interface.wrbuf(buf+i,
  buflen);
}

A couple of things here: first, the magic numbers were driving me crazy. It didn’t fix the problem, but I changed all but one of the uses of them at one point and forgot one. So, now I’m doing the right thing (or the more right thing) and using a constexpr. Another thing is changing from n * -1= to n = ~n. This requires the check for neg && (i = nbuflen)= to add one to get it right, but it handles the case where n = -2147483648:

(gdb) p -2147483648 * -1
$1 = 2147483648
(gdb) p ~(-2147483648)
$2 = 2147483647

Notice that $1 will overflow a uint32_t, which means it will wrap back around to -2147483648, which means negating it this way has no effect. ~n + 1 is a two’s complement.

Finally, I made sure to wrap the buffer length so that we never try to write a longer buffer than the one we have.

I feel dumb for making such a rookie mistake, but I suppose that’s what happens when you stop programming for a living. The updated code is here.