The Custom Hobo Game Engine

Custom Hobo is built on a bespoke C++ game engine. We don’t use Unity, Unreal, or Godot, for better or for worse. What started out as an exercise in programming turned into a full-blown rendering pipeline and level editor (more on that later!). This post covers the three things that are most central to how the engine works: CHRenderTarget, the game update loop, and the OpenGL rendering pipeline that turns scene graph data into pixels.

CHRenderTarget: The Basic Unit of Everything

The core abstraction in our engine is CHRenderTarget. The definition is simple: a render target is anything that can be updated and/or drawn. Every visible thing in the game is a CHRenderTarget: sprites, physics objects, UI panels, particle emitters, text labels, actors, the HUD. (Other game engines might call this a Game Object or Node.)

Two virtual methods define the contract:

virtual void update(float dt);
virtual void render();

update(float dt) is called with a fixed timestep every game tick. render() is called once per frame to submit draw data to the renderer. Subclasses override both to do their work. A sprite implements render() by submitting its vertex data. An actor implements update(dt) by running its AI brain and physics integration.

Because CHRenderTarget is a base class rather than an interface, it handles shared concerns: transform management, child/parent graph membership, deletion lifecycle, tweens, sleep/wake state, draw step assignment, shader uniforms, and opacity. Subclasses inherit that for free and only override what’s relevant.

The Scene Graph

Render targets form a tree. Any CHRenderTarget can hold child render targets:

virtual void addChild(const std::shared_ptr<CHRenderTarget>& child, bool isAutoChild = false);

Children marked as “auto” children are automatically updated and rendered by their parent, and automatically removed when they flag themselves for deletion. Non-auto children are managed manually.

This hierarchy has a practical implication: a child’s world transform is always expressed relative to its parent. A light contained by a lamp is a child of the sprite. Move the lamp, and the light moves with it. Rotate the lamp, and the light rotates around the lamp’s origin. None of that requires explicit bookkeeping in the lamp’s code.

The applyToChildren function lets callers propagate operations down the entire subtree:

virtual void applyToChildren(std::function<void(CHRenderTarget*)> functor);

This is how things like opacity, shader assignments, and draw step changes flow through a hierarchy without every object needing to manually forward them.

Transforms and CHTransformGraphNode

Every object in the scene needs a position, a scale, and a rotation. CHRenderData3D stores these properties directly:

struct CHRenderData3D
{
    float x, y, z;
    float scaleX, scaleY;
    float rotation;
    b2Vec2 origin;
};

x and y are the object’s position in the world. z controls layering: a higher z value puts an object in front of a lower one. The engine is technically rendering in 3D — objects have real z coordinates in world space — but the camera uses an orthographic projection aimed straight down the z axis, so depth never produces any perspective distortion. Everything looks flat. z becomes purely a layering tool rather than a depth cue.

The 2D world is rendered orthographically in 3D

scaleX and scaleY stretch the object along each axis. rotation is an angle in radians. origin is the pivot point that rotation and scale are applied around.

That’s readable enough on its own. The problem shows up the moment objects start parenting each other. If a sprite is a child of a physics object, and the object moves and rotates, where does the sprite end up? Computing that by hand with raw floats gets messy fast. This is where a matrix representation pays off.

The same position, scale, rotation, and origin can be encoded as a single 4×4 matrix — the local transform. With a matrix, composing two transforms (parent then child) is just a multiply. The result is the world transform: where the object actually sits in absolute world coordinates.

Each CHRenderTarget owns a CHTransformGraphNode that tracks both:

Local is built from this object’s CHRenderData3D properties. It represents where the object sits relative to its parent.

World is the combined result of all ancestors’ transforms concatenated with the local transform. When you ask a render target where it is in the game world, you get its world transform.

void CHTransformGraphNode::forceUpdateWorld() const
{
    if (m_parent == nullptr)
    {
        m_world = getLocal();
    }
    else
    {
        m_world = m_parent->getTransformNode().getWorld();
        m_world.combine(getLocal());
    }
}

Both transforms are lazy. They’re only recomputed when something marks them dirty. Setting a position marks the local transform dirty and propagates a “parent is dirty” notification to all children. The world transform is not recomputed until someone calls getWorld(), at which point the entire dirty chain resolves in one traversal.

The result is that transforms are cheap to set and computed on demand only when needed, such as when a render target submits its draw data.

The Fixed-Timestep Game Loop

The game loop in CHGame::runFrame follows a fixed-timestep accumulator pattern:

void CHGame::runFrame(CHTimer& gameClock, float& accumulator)
{
    const float elapsedTime = gameClock.restartAndGetSeconds();
    accumulator += elapsedTime;

    while (accumulator > TIMESTEP)
    {
        accumulator -= TIMESTEP;
        doFixedUpdate(TIMESTEP);
    }

    submitRenderTask();
}

TIMESTEP is 10ms. The accumulator tracks how much real-world time has gone unaccounted for. Each time through the outer loop, one 10ms tick is paid off. If a frame takes 30ms, three physics ticks run in that frame. If a frame takes 5ms, the loop returns early and no ticks run.

The 10ms step is a deliberate tradeoff. It caps game logic at a maximum of 100 updates per second — rendering faster than 100fps is possible, but the game state won’t advance any faster. A 5ms timestep would allow higher-rate logic, but at the cost of spending twice as much CPU time on physics and game updates every frame for every player.

This decouples physics simulation from frame rate. Physics always advances in discrete 10ms steps, regardless of whether the renderer ran fast or slow. Collisions, joint forces, projectile trajectories: none of it is affected by frame rate variation. This is important to keep gameplay consistent and avoid certain issues from large timesteps such as tunneling.

doFixedUpdate is where the world advances. It steps the Box2D physics simulation, calls updateLevel(dt, paused) on the current level (which cascades update through every CHRenderTarget in the level), updates UI, camera, audio, particles, and profiles.

After the accumulator loop exhausts itself, the game thread assembles one render request and sends it to the render thread. Update and render are cleanly separated: updates happen in a tight loop until caught up, then one render task goes out. (Render threading is covered in depth in an earlier post.)

Draw Batching

Every OpenGL draw call carries significant per-call overhead: state validation, buffer uploads, driver bookkeeping. The goal is to minimize the total number of draw calls dispatched per frame, even when thousands of render targets are on screen. We do this by grouping render targets that share the same GPU state into a single draw call — a draw batch.

CHRenderTarget::render() doesn’t directly call any OpenGL functions. Instead, it submits vertex data and a batch key to CHDrawRequest::submitDraw:

template <typename VertexType>
static void submitDraw(const CHBatchKey& key, VertexType* vertexData, size_t vertexCount, const CHTransform& transform);

The CHBatchKey is the draw call fingerprint. Two render targets with the same batch key get merged into a single OpenGL draw call:

struct CHBatchKey
{
    CHDrawStepEnum drawStep;  // Geometry, lighting, postprocess, UI...
    CHShaderEnum shaderId;
    CHDrawModeEnum drawMode;  // Triangle fan, quads, sprite quads, lines...
    CHZBatch zBatch;          // Background, scenery, objects, transparent...
    std::optional<CHWildcardTextureHandle> textureKey;
    std::optional<CHUniformCollectionHandle> uniformsKey;
};

Batch keys are sorted by an ordered hash. Lower draw steps execute first. Within a draw step, lower z-batches execute first. Within a z-batch, shader, draw mode, texture, and uniforms determine the final order.

The ordering matters for correctness, not just performance. GEOMETRY must complete before LIGHTING can run — the lighting shader reads from the G-buffer that geometry writes, so running them out of order would mean lighting pixels that haven’t been drawn yet. Within GEOMETRY, z-batches enforce painter’s-order layering: background scenery renders before the objects standing on top of it. UI renders last so HUD elements always sit unambiguously over the world.

As a concrete example: two sprites sharing the same shader and texture get merged into one draw call. A background prop and a foreground actor, even sharing a shader, belong to different z-batches and stay in separate draw calls that execute in the correct order. This is how thousands of objects on screen produce only a handful of actual GPU dispatches per frame.

The CHDrawBatcher accumulates vertex data for each unique batch key during the game thread’s render collection phase. When the render thread takes over, it processes each key in order, uploads its vertex data to the GPU, binds the appropriate shader and textures, and dispatches a draw call.

The world transform from the scene graph is passed directly to the batcher as a matrix. The batcher applies it as a per-instance transform on the GPU side, so thousands of objects with different transforms can share the same draw call as long as they share the same batch key.

The OpenGL Rendering Pipeline

The renderer uses a multi-pass deferred pipeline. Each frame processes every CHDrawStepEnum in order:

for (auto step : magic_enum::enum_values<CHDrawStepEnum>())
{
    doDrawStep(step, &gameState->batcher, windowDimensions, renderingResolution);
}

The steps in order:

GEOMETRY. Most render targets write to the G-buffer: color, world position, material properties, and emissivity. Four color attachments, one depth buffer. This is where sprites and physics objects land.

LIGHTING. A full-screen pass reads the G-buffer and applies directional lights, point lights, and spot lights. Light data is uploaded to the GPU via Uniform Buffer Objects before the draw steps begin. The result goes to the light buffer, which also carries a separate glow channel for emissive contributions.

PING_PONG_GAUSSIAN. Gaussian blur runs on the glow channel over multiple passes. Two framebuffers alternate as read/write targets to avoid aliasing artifacts.

GLOW. The blurred glow is composited back onto the lit scene, producing the bloom effect. Damage effects also land here.

POSTPROCESS. Render targets that need to sample the already-lit scene draw here. This stage is for effects where context matters — a window with a blur effect needs to read the unblurred scene before it can produce the blur, for example.

TONE_MAP. HDR output from the lighting stages is compressed into a standard RGBA range for display. Color grading is also applied here.

UI. Text, HUD elements, and anything that should sit unambiguously on top of the world render here.

RENDER_TO_WINDOW. A full-screen quad blits the completed offscreen buffer to the window.

How It Fits Together

Everything that appears on screen follows the same path:

A CHRenderTarget subclass is constructed and positioned in the scene.
Each game tick, update(dt) is called, advancing the object’s state. This may run multiple times before a single render() — a slow frame catches up with multiple fixed updates before assembling one render.
Each frame, render() is called, which walks the transform graph to compute a world transform and submits vertex data + a batch key to CHDrawRequest.
The render thread picks up the assembled game state, runs through the draw steps in order, and dispatches one draw call per unique batch key.
The result is displayed.

The design is deliberately boring at the boundary. CHRenderTarget doesn’t know what shader will process its pixels. CHRenderer doesn’t know what generated the vertex data. The batch key is the only negotiation between them: a shared description of what GL state is needed to draw this thing correctly.

Most objects in the game don’t need to know any of this. They set a position, set a sprite, and call it done. The graph, the transforms, the batching, and the passes handle the rest.