Sunday, 11 June 2017

Fun with C++ (Expert)

Or; C++ bytecode interpreter snippet with fused operations

Or; yeah, I'll add comments...

I program. A lot. I started with a little-known computer known as the Commodore *VIC-20(c), which was (to quote Wikipedia) "... the first computer of any description to sell one million units." It was succeeded shortly thereafter, with total sales of 2,500,000 units, by the Commodore 64(c), which went on to sell a staggering **17,000,000 units. In the early-to-mid 1980s.

Which isn't all that relevant today, except to establish some background info for those who normally see tabletop gaming stuff on here (on the rare occasions I post).

And today, I'm still programming. Including to relax.

Warning: There's a lot I don't know about interpreters and bytecode. I have no professional training here. I was just messing around, came up with something that looked neat, and decided to post it.

So I stuck it under the MIT license, and you can find it below the breakpoint.



//Copyright(c) 2017 MouseProduced Games
//
//Permission is hereby granted, free of charge, to any person obtaining a copy
//of this software and associated documentation files(the "Software"), to deal
//in the Software without restriction, including without limitation the rights
//to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
//copies of the Software, and to permit persons to whom the Software is
//furnished to do so, subject to the following conditions :
//
//The above copyright notice and this permission notice shall be included in all
//copies or substantial portions of the Software.
//
//THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
//IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
//FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.IN NO EVENT SHALL THE
//AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
//LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
//OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
//SOFTWARE.

#include <array>
#include <iostream>
#include <vector>
#include <stdint.h>

using namespace std;

typedef uint8_t byte;

// One stack for registers,
// The register stack is filled from bottom-up.
array<byte, 2> b_reg;
// and one stack for memory.
// Fortunately, we know ahead of time exactly how large each needs to be.
// Also, we prefill the stack memory, since we don't have a compiler.
// This is, after all, just a snippet.
// The stack grows from bottom-up.
array<byte, 3> stack_mem{ (byte)3, (byte)2, (byte)1 };

// If you've ever not wanted C++ enums cluttering your global namespace,
// this is how to do it. The enum is accessible by prefacing bytecode:::
// abd not otherwise.
struct bytecode
{
    enum code : byte
    {
        nop = 0,
        // Read a byte from the stack and push it to a register.
        read,
        // Pop a byte from a register and write it to the stack.
        // Not currently used.
        write,
        // Pop the left and right bytes from the register, add them,
        // and push the result to the register.
        add,
        // Pop the left and right bytes from the register, multiply them,
        // and push the result to the register.
        mul,
        // Pop a byte from the register, convert it to size_t, and cout it.
        // Then cout << endl;
        print
    };
};

// If your machine is big-endian, then you'll need to reverse these.
// OTOH, if your machine is big-endian, and you know what that means,
// you probably already knew you'd need to reverse these.
struct fused16_bytecode
{
    enum code : uint16_t
    {
        // As far as bytes go, it's 0x0301
        fused_add8 = bytecode::read | (bytecode::add << 8),
        // As far as bytes go, it's 0x0401
        // You might be thinking, "why is the read byte last?"
        // x86 is little-endian, which means that the first byte goes last.
        // Since read is done before mul, that means we need to put it last.
        fused_mul8 = bytecode::read | (bytecode::mul << 8),
    };
};

struct fused32_bytecode
{
    enum code : uint32_t
    {
        // And this is bytes 0x03010401
        fused_mul8Add8 = fused16_bytecode::fused_mul8 | (fused16_bytecode::fused_add8 << 16)
    };
};

// Final output: 7
int main()
{
    array<bytecode::code, 6> prog { bytecode::read, bytecode::read, bytecode::mul, bytecode::read, bytecode::add, bytecode::print };

    size_t ip = 0;
    size_t sp = 0 - 1;
    size_t rp = 0 - 1;

    // Suggested reading format for comments:
    // Last switch statement to first switch statement.
    while (ip < prog.size())
    {
        switch (*((fused32_bytecode::code*)(prog._Elems + ip)))
        {
        case fused32_bytecode::fused_mul8Add8:
            // Here we replace two register pops with two adds from the stack,
            // starting with one value in the register, and ending with
            // one value in the register.
            b_reg[rp] = (stack_mem[sp + 1] * b_reg[rp]) + stack_mem[sp + 2];
            sp += 2;
            ip += 4;
            continue;
        }

        switch (*((fused16_bytecode::code*)(prog._Elems + ip)))
        {
            // Where's our register pop? We don't need one.
            // Since we're fusing a register push and an add,
            // we can just add the value directly from the stack.
            // Starting with one value in the register,
            // ending with one value in the register, to clarify.
        case fused16_bytecode::fused_add8: b_reg[rp] = stack_mem[++sp] + b_reg[rp]; ip += 2; continue;
            // Clarification: The exact same thing as add, only multiply.
        case fused16_bytecode::fused_mul8: b_reg[rp] = stack_mem[++sp] * b_reg[rp]; ip += 2; continue;
        }

        switch (prog[ip])
        {
            // Since we need at least two values in the register,
            // and will end up with one less value in the register,
            // we decrement the register pointer (believe me, I checked;
            // despite how unintuitive it is, "--rp" happens first)
            // and then grab the two values, add them, and set our new top value.
        case bytecode::add: b_reg[--rp] = b_reg[rp + 1] + b_reg[rp]; ++ip; break;
            // Clarification: The exact same thing as add, only multiply.
        case bytecode::mul: b_reg[--rp] = b_reg[rp + 1] * b_reg[rp]; ++ip; break;
        case bytecode::print: cout << (size_t)b_reg[rp--] << endl; ++ip;  break;
            // Read from stack, push to register.
        case bytecode::read: b_reg[++rp] = stack_mem[++sp]; ++ip; break;
            // Pop from register, read to stack.
        case bytecode::write: stack_mem[++sp] = b_reg[rp--]; ++ip; break;
        }
    }

    cout << "Press enter to exit" << endl;
    getchar();

    return 0;
}

* Vic-1001 in Japan; VC-20 in Germany due to "Vic" sounding somewhat vulgar in German.
** The Commodore 64 was, for a long time, the (edit: second-best; the SharpX68000 sold 18 million; the PC-98 sold 16 million) best-selling computer system of all time.

No comments:

Post a Comment