Jul 8, 2020

11 min read

Introduction

I was always intrigued by the Esoteric programming languages, programming languages designed to test the boundaries of computer programming language design, to serve as a proof of concept or just as a software art! One of the more popular Esoteric programming languages is the Branfuck (BF in the rest of the article), a language which uses only 8 characters. Developers challenge themselves to create a BF interpreter. That’s exactly what I’ve been wanting to make for a long time. I’ve set some rules for myself for this codding challenge: I can Google only Golang specific questions (as my Go knowledge is limited), but I MUST NOT search or look at any other BF interpreter implementation.

Join me in this article as we explore how to achieve such a goal.

Choosing a language

I am mostly using Java for development, and naturally, that is the right language to choose for this task. Or maybe not! This is a great opportunity to try a new language.

I chose Go. I’ve passed all tests on Codecademy, which covers very basic stuff. This is perfect, as a BF interpreter is relatively easy to write.

Brainfuck language design

The language consists of 8 commands, listed below. A BF program is a sequence of these commands, possibly interspersed with other characters (which are ignored).

The BF language uses a simple machine model consisting of the program and instruction pointer, as well as an array of at least 30,000 byte cells initialized to zero; a movable data pointer (initialized to point to the leftmost byte of the array); and two streams of bytes for input and output (most often connected to a keyboard and a monitor respectively, and using the ASCII character encoding).

The eight language commands each consist of a single character:

Character Meaning
> increment the data pointer (to point to the next cell to the right)
< decrement the data pointer (to point to the next cell to the left)
+ increment (increase by one) the byte at the data pointer
- decrement (decrease by one) the byte at the data pointer
. output the byte at the data pointer
, accept one byte of input, storing its value in the byte at the data pointer
[ if the byte at the data pointer is zero, then instead of moving the instruction pointer forward to the next command, jump it forward to the command after the matching ] command
] if the byte at the data pointer is nonzero, then instead of moving the instruction pointer forward to the next command, jump it back to the command after the matching [ command

Coding

Let’s start with a simple hello world program.

package main  
  
import "fmt"  
  
func main() {  
    fmt.Print("Hello!")  
}

First, I want to read contents of a file (containing a BF program) and store it in a variable.

package main  
  
import (  
    "fmt"  
    "io/ioutil"
    "regexp"
)  
  
func check(e error) {  
    if e != nil {  
        fmt.Println(e)  
        panic(e)  
    }  
}  
  
func main() {  
    file, err := ioutil.ReadFile("programs/helloworld.bf")  
    check(err)  
  
    code := string(file)  
    re := regexp.MustCompile(`\r?\n| |[a-zA-Z0-9]`)  
    code = re.ReplaceAllString(code, "")  
  
    fmt.Println(code)  
}

The code reads contents of a file helloworld.bf, removes all spaces, new lines, numbers and letters (as they are not valid BF command) and prints it to the standard output.

I’ll create a new type Program, that will store the address array of bytes, as well as the size of an array and a current pointer position.

type Program struct {  
    size         int  
    instructions []byte  
    at           int  
}

Next, I’ll create a function that takes a BF code as an argument, creates a new Program and interprets the code.

func execute(code string) {  
    var program = new(Program)  
    program.size = 30000  
    program.instructions = make([]byte, program.size, program.size)  
    program.at = 0  
  
    // insert interpreter code here  
  
    fmt.Println()  
    fmt.Println("(END)")  
}

Because of the fact that there are loops and nested loops in BF, I’ll create a recursive function that will be executing BF code for the existing Program instruction array. The function will iterate through every rune (character or char) in code and execute a specific set of instructions for each BF command.

func executeWith(program *Program, code string) {  
  
    for pos, char := range code {  
        if char == '+' {  
            // add 1  
        } else if char == '-' {  
            // sub 1  
        } else if char == '>' {  
            // move right  
        } else if char == '<' {  
            // move left  
        } else if char == '.' {  
            // print  
        } else if char == ',' {  
            // input  
        } else if char == '[' {  
            // loop start  
        } else if char == ']' {  
            // loop end if current value is zero  
        }
    }
}

The first 6 commands are straight forward (+, -, >, <, ., ,).

if char == '+' {
    program.instructions[program.at] += 1
} else if char == '-' {
    program.instructions[program.at] -= 1
} else if char == '>' {
    if program.at == program.size-1 {
        program.at = 0
    } else {
        program.at += 1
    }
} else if char == '<' {
    if program.at == 0 {
        program.at = program.size - 1
    } else {
        program.at -= 1
    }
} else if char == '.' {
    fmt.Printf("%c", rune(program.instructions[program.at]))
} else if char == ',' {
    fmt.Print("input: ")
    reader := bufio.NewReader(os.Stdin)
    input, _, err := reader.ReadRune()
    check(err)
    program.instructions[program.at] = byte(input)
}

The program.instructions is a byte array and program.at is a current array index pointing to one cell in instruction array.

Coding loops

This was the challenging part. I’ve decided to have a flag that I will raise when the loop start [ command occurs. When the flag is raised, the program will be ignoring all other loop start commands and will try to find it’s matching pair of closing bracket ]. When it finds it’s closing bracket, the code within the loop will be extracted and then the recursive function executeWith will be called for that code, until the current cell is 0.

First, the necessary flags and variables

var loopStart = -1
var loopEnd = -1
var ignore = 0
var skipClosingLoop = 0

If current command is [, raise that flag and remember the index of an array where that loop started.

else if char == '[' {
    loopStart = pos + 1
    ignore = 1
}

Next is the “ignore” logic. If the program encounters another [ command, add 1 to the skipClosingLoop variable. This variable will be decremented every time a ] command is encountered, until the final ] is found. When the matching closing bracket is found, the code inside the loop will be extracted and executed. There is a check if loopStart == loopEnd in case there is a [] in code.

...
for pos, char := range code {  
    if ignore == 1 {
        if char == '[' {
            skipClosingLoop += 1
        } else if char == ']' {
            if skipClosingLoop != 0 {
                skipClosingLoop -= 1
                continue
            }
            loopEnd = pos
            ignore = 0
            if loopStart == loopEnd {
                loopStart = -1
                loopEnd = -1
                continue
            }
            loop := code[loopStart:loopEnd]
            for program.instructions[program.at] > 0 {
                executeWith(program, loop)
            }
        }
        continue
    }
...

And we’re done! Last thing I want to add is passing a file in program arguments.

func main() {  
    if len(os.Args) > 1 {  
        file, err := ioutil.ReadFile(os.Args[1])  
        check(err)  
  
        code := string(file)  
        re := regexp.MustCompile(`\r?\n| |[a-zA-Z0-9]`)  
        code = re.ReplaceAllString(code, "")  
  
        fmt.Println(code)  
        fmt.Println()  
  
        execute(code)  
    } else {  
        log.Fatal("You must specify a file to execute")  
    }  
}

Let’s run the hello world BF program.

$ go run main.go programs/helloworld.bf
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.

Hello World!

(END)

Source code

Source code can be found on GitHub

Conclusion

The program lacks in error handling and error checking. But, this was really fun to implement and good time to practice a new programming language. I challenge you to try and implement the BF interpreter in a language you’re learning.


Comments