You Never Forgot What It Felt Like
That moment when you first saw the Amiga's copper bars rippling down the screen. The way a game's parallax layers scrolled past each other with that impossible smoothness. The warm glow of a Trinitron monitor, scanlines and all. The feeling that the machine was alive — that you could reach inside and bend it to your will in ways a PC never let you.
You didn't imagine it. That was real. And it's been missing ever since.
Introducing FireStorm
The display heart of the Ant64 is FireStorm — a GoWin GW5AT-138 FPGA with 138,000 logic cells, 340 block RAM modules, and 298 DSP units, running at up to 380MHz. It doesn't emulate classic hardware. It is classic hardware — rebuilt from first principles, freed from every constraint that held the originals back.
This is what happens when someone who shipped games on the Amiga in the 90s gets to design the display chip they always wished existed.
The Outputs
Four simultaneous display outputs, each independently configurable, each rendering from the same source data in real time.
DisplayPort — HBR2, four lanes, 5.4Gbps each. 4K at 60Hz, with VRR adaptive sync stretching up to 75Hz. Drop to 1440p and you get 144Hz. 1080p gives you 240Hz. These are not theoretical numbers. This is what the hardware does.
HDMI — 1080p or 4K, broad compatibility, clean digital signal. Your TV in the living room, your capture card, your second monitor.
VGA — yes, VGA. Because some of us still have a CM8833 in the garage and we'd like to use it. And because the analogue signal path does things a digital display physically cannot.
LVDS — an internal video bus carrying the composited pixel stream to the SG2000. The SG2000's hardware VO compositor takes that feed as a video layer and adds its own CPU-generated graphics overlay on top — OS UI, boot status, touch menus — before driving an embedded panel via MIPI DSI. The same feed also goes to the hardware MJPEG encoder: record or stream your output with zero CPU overhead. FireStorm's external outputs are completely unaffected.
The Inputs
FireStorm doesn't just push pixels out. It receives content in — from the Colony network, and from the SG2000 itself — and composites it into the layer stack like any other layer. The Copper controls it per scanline. It appears on all outputs simultaneously.
Colony video (Ant64 and Ant64C — not Ant64S) — connect one or two HDMI frame grabbers to the Colony RX ports and FireStorm receives live video from any HDMI source as standard compositor layers. Console output, PC footage, a camera feed — any HDMI source, arriving over the network, composited in hardware over your generated content. Two grabbers means two simultaneous streams. Picture-in-picture. Chroma key. Copper-driven transitions between them. FireStorm sprites flying over live footage from a PlayStation. The frame grabbers can sit anywhere in the Colony string — they don't need to be physically attached to the machine.
SG2000 MIPI layer — the SG2000 has its own MIPI display output, and FireStorm has a hardware MIPI RX input. The SG2000 can push its own CPU-generated content — OS overlays, decoded video frames, debug panels, anything the general-purpose CPU produces — directly into FireStorm's compositor as a layer. From FireStorm's perspective it's just another layer: priority register, colour key, Copper-controllable per scanline, visible on all outputs. The SG2000 handles what CPUs are good at. FireStorm handles the compositing. Neither gets in the other's way.
The Philosophy: Render Small, Output Big
The Ant64 has two primary native resolutions. Standard at 480×270 closely matches the pixel size of the Amiga's low res mode — that classic chunky pixel feel, exactly as you remember it. Hires at 960×540 is exactly double in both axes, the same relationship the Amiga had between low res and high res. Except the Ant64 doesn't need interlace to get there, and it doesn't flicker.
At Standard, every native pixel becomes a perfect 8×8 block on the 4K output. At Hires, a 4×4 block. No interpolation. No blur. Pure integer scaling, pixel-perfect, exactly as intended.
These aren't the only options — horizontal and vertical axes are fully independent, and any valid combination from the resolution tables works. Run 320 wide for Mega Drive vibes, 640 wide for Amiga hires, mix different resolutions on different layers simultaneously. The hardware doesn't care. Standard and Hires are named reference points, not constraints.
The Layers
FireStorm composites multiple independent layers in hardware on every scanline. No CPU cost for the final mix. No frame buffer round-trip. Just the beam and the data.
Each layer has its own independent native resolution. Not a shared global resolution that everything must fit — each layer picks its own, and the blender composites them all to the output resolution simultaneously. A game background at 320 wide. Sprites at 480. A UI overlay at 960 for sharp text. A Colony video stream at whatever resolution it carries. All on screen at once, blended in hardware, at the output pixel clock.
Each layer also has its own position. An X offset and Y offset place it anywhere on the output. A width and height clip its active rectangle. Outside those bounds, the layer is transparent and the layer below shows through.
A concrete example. You want a game with a sharp high-resolution score panel at the top and a rich high-colour lo-res tilemap game underneath. On the Amiga this was a genuine problem — hi-res for the status bar meant hi-res for everything, burning bitplane DMA bandwidth on the game area. Lo-res for the game meant a lo-res status bar. HAM gave you colour but killed sprites in that region. Every choice was a compromise the hardware forced on you.
On the Ant64:
- Score panel — a 960×60 Hires layer, Y_offset = 0, sharp pixel-perfect text and numbers at the top of the screen
- Game tilemap — a 320×210 Standard layer behind it, 256-colour palette, full Mega Drive-style per-column vertical scroll, sprites in their own layer on top
- The two layers overlap freely — the tilemap scrolls underneath the score panel, the panel stays fixed, zero compromise on either
The score panel only needs updating when the score changes. It isn't part of the game framebuffer. It doesn't share bitplane bandwidth with the game. It just sits in its own BRAM, composited in hardware, costing nothing at runtime. The hardware does what you always wanted it to do.
The Amiga used the Copper to drag screens of different resolutions up and down the display. Those regions were vertically exclusive — lo-res and hi-res content could be stacked but never overlap, always with a hard horizontal boundary between them. Move the boundary with the Copper, but never cross it.
The Ant64 has no such boundary. Any layer can overlap any other layer freely. A Hires UI strip docked to the bottom of a Standard game layer. A popup panel floating over both. A HAM24 photograph fading in over a tilemap via alpha blend where their rectangles intersect. The Copper can animate any layer's X or Y position every frame — layers sliding across the screen, no CPU rendering involved. It is the Amiga screen-drag mechanic generalised to every layer independently, extended to allow full overlap, with no remaining constraints.
Each layer also has its own palette, its own scroll registers. They composite in hardware priority order, with colour key transparency and per-palette alpha — and the Copper can change any of it at any pixel position on any scanline.
Tilemap layers support per-tile palette selection, H and V flip per tile, and a scroll system that shames the Mega Drive. Per-tile-row horizontal scroll is driven by the Copper — one register, updated at each tile row boundary, costing nothing in silicon. Per-tile-column vertical scroll uses a configurable register file with adjustable granularity: set the step count to fit your tile configuration, and the feature degrades gracefully rather than switching off. Variable tile sizes from 4×4 to 128×128, non-square supported, power-of-two for zero-cost address calculation.
Sprite layers use a double-buffered pipeline that sorts and fetches one native row ahead of the beam, giving the full vertical repeat window for prefetch. With the BSRAM running at 380MHz the sprite budget is in the hundreds per scanline — more than you will ever use. Sprites are split between a user region and a system region: DeMon owns the top of the table for the cursor and OS overlays, and that ownership is enforced in silicon. No software bug can steal your cursor.
Each layer also has its own clip table — a memory block of left/right boundary pairs, one per scanline, that defines exactly which pixels of the layer are visible. A single entry covering the full screen is no clip at all. A table with one entry per scanline can describe any shape — diagonal splits, curves, portals, anything expressible as a left and right edge per line. The FireStorm EE builds the table each frame for dynamic clips. The Copper swaps the pointer mid-frame to change clip regions instantly.
This is how you make a Lego-style dynamic split-screen. Each player's view is its own layer with its own clip table. The tables are complementary — one player's right boundary is the other's left. The join follows whatever geometry the game computes. When the players are close enough, swap both tables for single full-screen entries and the split disappears in one frame. No transition, no artefact — both layers always contain valid full-screen content.
Beyond the hardware tilemaps and sprites, the FireStorm Blitter can render bitmap layers — intermediate framebuffers filled with primitives before the scanline mixer composites them. Sprites with rotation and scaling. Mode 7-style floor planes. Textured triangles. Lines, shapes, flood fills, text. The blitter runs a job queue defined by the FireStorm EE, working through it while the EE continues executing. Double and triple buffering keep the display fed while the blitter prepares the next frame. If nothing changed in a layer, the blitter skips it entirely — a static HUD costs nothing until it needs updating.
Texture data for any primitive — sprites, triangles, tilemap tiles — can come from the Graphics SRAM: fast pipeline SRAM dedicated to the blitter and audio DSP, 4.5MB with deterministic access and burst mode. No page misses, no refresh, no wait states. The audio DSP shares this bus for wavetables and FM voice data alongside the texture atlas. The graphics/audio set and the EE set each have their own dedicated bus, so neither ever contends with the other. The memory's data width also enables pixel formats that standard 32-bit memory can't — including high-precision intermediate buffers for bloom and compositing. Full details in the Memory Architecture reference.
A typical scene's sprite sheets, texture atlas, and intermediate buffers fit comfortably within 4.5MB — meaning a full frame renders with no DDR3 involvement at all. For assets beyond that, FireStorm's own DDR3 backs the system with the full art library. The SG2000 has its own separate 512MB DDR3 for AntOS and applications — the two memory worlds are not shared. Small data and register writes cross via the QSPI FRAM bridge; bulk asset uploads use the MIPI link from SG2000 into FireStorm at up to 2.5Gbps, and the LVDS link carries data back the other way at high speed.
The FRAM bridge is worth understanding because it shapes how programming the Ant64 actually feels. From the perspective of the SG2000, Pulse, and DeMon, FireStorm is simply a block of memory-mapped addresses. Writing to an address updates a palette entry, moves a sprite, changes a layer register, or queues a blitter job. Reading from an address returns the current scanline counter, blitter status, or a value from Graphics SRAM.
There is no driver. No kernel call. No ioctl, no DMA descriptor ring, no abstraction layer negotiating on your behalf with hardware that doesn't trust you. Just a memory address and a value.
If you wrote games or demos in the 80s or 90s, you already know this feeling. POKE $D000 on the C64 and a sprite moved. Write to $DFF180 on the Amiga and a colour changed on the next scanline. Prod the SID registers and sound came out. The machine did what you told it, immediately, because you spoke to it directly. There was nothing between you and the hardware.
That's what the FRAM model gives you on the Ant64. Poke a memory location to move a sprite. Poke another to fire a blitter job. Poke another to change the CRT simulation mode. The FPGA controls what each chip's window exposes and what it maps to internally, and each of the three interfaces — SG2000, Pulse, DeMon — is independent. Each chip reaches into FireStorm's register file through its own window without disturbing the others.
Modern machines forgot this. The Ant64 remembers.
Pulse, the audio and control subsystem, adds a dedicated 4-wire high-speed PIO link to FireStorm alongside its FRAM interface — the RP2350's programmable PIO drives a custom firmware-defined protocol for audio DMA and timing-critical MIDI and control surface data at rates QSPI cannot sustain. DeMon, the supervisor, holds JTAG connections to the SG2000, the FPGA, and via the FPGA's JTAG chain also the FPGA flash — giving it the ability to reprogram the entire system from one chip, including the bitstream, the SG2000 firmware, and the flash contents, over a connection that survives even a completely dead OS.
A separate identical set of SRAM — another 4.5MB on its own dedicated bus — is reserved exclusively for the FireStorm EE's instruction store. This bus belongs to the EE alone. Code fetch and texture fetch are on physically separate buses with no arbitration between them. The EE streams its instruction prefetch while the blitter scans a texture line, simultaneously, at full speed. The throughput figures are what you actually get — not theoretical peaks that evaporate the moment two things happen at once.
The FireStorm also has a flash chip for read-only system assets — built-in wavetables, FM voice data, the standard system font, default palettes. These are available to any subsystem without consuming a byte of Graphics SRAM or DDR3. Static assets that never change live in flash; the fast RAM is reserved for scene data that does.
The blitter also has bloom lines and bloom particles — antialiased lines and point sprites with phosphor glow and additive blending, designed specifically for colour vector game aesthetics. Tempest. Asteroids. Star Wars. Each primitive has its own intensity and falloff. Line crossings get brighter. Particle clusters bloom together. The bitmap saturates at hot spots. It looks like a vector CRT because the maths is the same maths. Named presets for the classic arcade titles are built in. A full Tempest-style frame — web wireframe, enemy lines, shot particles, explosion burst — renders in well under 500 microseconds.
FireStorm also has dedicated hardware for ray casting, BSP rendering, and voxel rendering — the techniques behind Wolfenstein 3D, Doom, Quake, Comanche, and Minecraft. A pool of parallel ray DDA units casts all 480 screen columns simultaneously, each stepping through the grid at full clock rate and returning hit distance, wall face, and texture offset in one hardware pass. A BSP traversal engine walks the scene tree from the viewpoint and outputs visible sectors in front-to-back order to a FIFO the EE drains. A fixed-point reciprocal unit converts distances to column heights in 2–4 cycles rather than the 16–32 a software divide costs.
FireStorm also supports a layered approach to shadows and lighting without full ray tracing. Shadow maps render the scene from the light's point of view into a depth buffer in Graphics SRAM — a standard blitter pass — then a depth comparator tests each pixel during the main render, adding directional shadows at minimal hardware cost. The same DDA units used for ray casting can march screen-space rays through the depth buffer for ambient occlusion, contact shadows, and reflections — zero additional silicon, high visual impact. A BVH traversal engine (a small generalisation of the SVO engine) and a cross product unit open the path to proper ray-tested shadows for polygon geometry. The combination of shadow maps and screen-space techniques produces results most people perceive as ray-traced without the cost of a full path tracer.
For voxel rendering there are three dedicated paths. A height-field mode (Comanche-style) adds a height map sampler that reads (x,y) height and colour from Graphics SRAM in one pipeline cycle and a vertical span renderer that fills the column as higher voxels appear — the 2D DDA units and reciprocal unit handle the rest. For dense 3D voxel grids the DDA units extend to three axes. For sparse voxel octrees a dedicated SVO traversal engine walks the tree using a slab test unit — ray versus axis-aligned bounding box in 3–4 cycles — outputting hit voxels to a FIFO in front-to-back order without involving the EE in the recursion. An SVO node packs exactly into one 36-bit Graphics SRAM word.
The hardware handles the arithmetic. The EE handles the scheduling. The blitter draws the textured spans. The scanline mixer composites hardware sprites on top at zero additional render cost. The 480×270 Standard resolution is what these engines actually ran at — pixel-perfect on the output.
FireStorm includes a complete isometric rendering system — the techniques behind Populous, Theme Hospital, and RollerCoaster Tycoon, done properly. An isometric scanline mapper and diamond span unit render tiles scanline by scanline rather than tile by tile, so the CPU has zero per-tile rendering cost. A hardware depth buffer eliminates painter's algorithm sorting entirely — tiles and sprites draw in any order and the depth buffer arbitrates occlusion per-pixel automatically. A character walking behind a building is correctly occluded without split-sprite calculations, stencil masks, or any of the tricks those games used on period hardware. The SG2000 spends its time on AI, pathfinding, and game logic — not pixel pushing. An RCT-scale scene of 50×50 visible tiles and 500 dynamic sprites runs comfortably within a single frame's blitter budget.
The Copper is a general register-write engine that fires at any beam position. Every FireStorm register is a valid Copper target. Palette entries, scroll offsets, layer positions, clip table pointers, layer enables, tile sets, simulation modes — all of it, per scanline, with no CPU involvement. If you wrote Copper lists on the Amiga, you already know how to use this. If you didn't — you're about to discover what all the fuss was about.
HAM. But Actually Good.
The Amiga's Hold-And-Modify mode was an act of desperation that accidentally produced something beautiful. By encoding each pixel as "modify one colour channel of the previous pixel," it delivered photographic images from hardware that had no business showing them. The fringing artefacts were the price.
FireStorm has HAM24. Same principle. 8-bit channel precision instead of 4 or 6. 1024-entry direct palette. And a colour space option that changes everything.
Switch to YCbCr mode and the modify operations target luminance, blue chroma, and red chroma separately — exactly the same separation that JPEG and H.264 use. Human vision is far less sensitive to chroma errors than luma errors. The fringing that defined the Amiga HAM aesthetic — the colour bleeding at transitions — drops to the point of invisibility. At 4K native, it simply isn't there.
HAM24 is a per-layer property. The Copper can switch it mid-frame. A HAM24 photograph in the top half of the screen, a tilemap game in the bottom half, exactly as Amiga coders split between HAM and normal modes with the Copper thirty years ago.
The Palette
256 palette descriptors. Each is a base offset into a flat 16,384-entry RGBA32 palette RAM. The address space is reserved for 65,536 entries when you need to expand.
A 2-colour sprite uses 2 palette entries. A 256-colour background uses 256. Two palettes can share entries — point two descriptors at the same base offset and they share colours. Overlap them deliberately for gradient windows, sprite recolouring, shared transparency. The hardware just does the addition. One adder. Essentially free.
The palette RAM has a second address window that presents the same data in HSV. Write hue, saturation, value — the hardware converts to RGB before storing. Read from the HSV window — the hardware converts back. The RAM always holds RGB. HSV is a view.
This means palette rotation, hue shifting, and brightness cycling are single-register writes. The Copper can shift the entire palette around the colour wheel every scanline. Raster bar effects, colour cycling, team colour swaps — all trivial.
The CRT Simulation
This is where it gets personal.
FireStorm applies CRT simulation in the output pipeline — after pixel replication, before the encoder — independently on each output. DP can run one mode, HDMI another, VGA a third. Simultaneously.
The simulation pipeline has six stages: bloom pre-pass at native resolution, pixel replication, row brightness mask (scanlines), column brightness mask (pixel boundaries), phosphor aperture mask, and final blend. Every stage is configurable. Every stage can be bypassed.
Seven named monitor profiles are built in:
Commodore 1084S — warm shadow mask, thick scanlines, medium bloom. The C64 in your bedroom, 1987.
Philips CM8833 — tighter shadow mask, sharper scanlines, cooler colour temperature. The Amiga demo scene monitor. If you had one of these you were serious.
Sony Trinitron — aperture grille, vertical RGB stripes, thin damper wires at a third and two-thirds of screen height. Saturated, sharp, slightly clinical. The professional's choice.
Sony PVM/BVM — Trinitron technology, broadcast-grade electronics. Razor scanlines, minimal bloom, clinically accurate. The monitor that makes pixel art look like the artist intended. The one the retro gaming community obsesses over.
Mitsubishi Diamondtron — Trinitron's warmer rival. Slightly wider stripe pitch, slightly more glow. The monitor you had if you couldn't find a Trinitron and secretly liked it better anyway.
Generic SCART TV — wide shadow mask, heavy bloom, the warm fuzzy softness of a 28-inch Matsui in the front room. Not accurate. Not sharp. Exactly how most people actually experienced these games.
JAMMA Arcade — slot mask, seven-eighths fill, hard-driven phosphors, a whisper of green tint. The fish and chip shop at the end of the road.
Call display.setMode("pvm") from your application. That's it. DeMon handles the rest.
Tate Mode
One more thing. A single bit in the simulation register rotates the entire CRT pipeline 90°. The dark gaps between scanlines run vertically instead of horizontally. The phosphor mask transposes. The bloom is unaffected.
This is Tate mode — named from the Japanese 縦 (tate, vertical). It exists for vertical arcade games on a horizontal screen. A shoot-em-up, a platformer, Donkey Kong. Pillarboxed on a widescreen display, with rotated CRT simulation active, it looks like a real upright arcade cabinet rather than a game squeezed into a letterbox.
At 4K, two Tate mode games fit side by side in a perfect 50/50 split. The 270×480 native resolution — tall pixels, PAR 1:2 — gives exactly 1920 output pixels per half. Two completely independent vertical games, each with their own layers, sprites, palettes, and rotated scanline simulation, composited on one screen by one Ant64.
display.setMode("tate_arcade") — left game. display.setMode("tate_arcade") — right game. Different X offsets. That's it.
VGA — The Unexpected Hero
The VGA output runs through a 5-bit-per-channel DAC — 32 levels per channel, 32,768 colours. On paper, worse than everything else.
In practice, it has two tricks nobody else is doing.
Spatial Bayer dithering spreads quantisation error across adjacent pixels so gradients appear continuous. The analogue signal path blurs the dither pattern naturally. 192 bits of lookup table. Essentially free.
Temporal half-bit dithering alternates the least significant bit on a 2×2 frame-and-line pattern. Even frame, even line: base value. Even frame, odd line: base+1. Odd frame, even line: base+1. Odd frame, odd line: base value. The eye integrates the average. You get half a DAC step that the hardware cannot physically produce.
This is the same trick used on the Amiga to get effective 5-bit output from a 4-bit DAC. Applied here it takes the VGA output from effective 5-bit to effective 6-bit per channel — 262,144 perceived colours from hardware rated at 32,768. On a CRT with phosphor persistence blending adjacent lines naturally, it is invisible as a trick and visible only as quality.
The analogue signal path also provides free horizontal bandwidth limiting — partial simulation of limited CRT horizontal resolution at no computational cost. The "old" output may produce the most authentically retro image of all three.
The Numbers
| Native resolutions | 480×270 Standard (Amiga low res equivalent), 960×540 Hires (Amiga hires equivalent), and many others — axes independent |
| Primary output | DisplayPort HBR2, 4K/60 VRR to 75Hz |
| HDMI | 1080p/240Hz, 4K/75Hz |
| VGA | Analogue, 5-bit DAC + 6-bit effective with temporal dither |
| Palette RAM | 16,384 RGBA32 entries, 256 descriptors |
| Palette access | Dual window: RGBA and HSV |
| HAM mode | HAM24 — 12bpp, 8-bit channel precision, RGB or YCbCr |
| Sprite budget | Hundreds per scanline at 380MHz BSRAM |
| Scroll system | Per-tile-row H (Copper), per-tile-column V (register file, configurable step) |
| CRT profiles | 7 named monitors, all parameters individually overridable |
| Copper | General register-write engine — any register, any beam position |
| CPU display cost | Zero — FireStorm handles all composition autonomously |
What the Amiga Taught Us
The Amiga was great not because of its specs — though those were remarkable for 1985 — but because of its philosophy. The hardware trusted the programmer. The Copper trusted you to know what you were doing. The custom chips did real work so the CPU didn't have to. You could feel the machine under your hands.
Modern machines have forgotten this. They hide their hardware behind seventeen layers of abstraction, drivers, APIs, and operating systems that get between you and the metal. The result is powerful and impenetrable.
The Ant64 remembers. It's not a retro machine. It runs a modern RISC-V CPU, a 138K-LUT FPGA, and AntOS — a real operating system with networking, scripting, and a proper filesystem. But it thinks the way the Amiga thought. The hardware works for you. The display system does the heavy lifting. The Copper is yours.
You write a Copper list. The beam follows your instructions. The screen does what you tell it.
Just like it used to.