Ant64 Memory Architecture
Overview
The Ant64 family uses a federated memory architecture — each subsystem has access to the memory type best suited to its workload, on its own dedicated bus. This eliminates the bus contention that plagues unified-memory designs and allows the blitter, EE, and audio DSP to run at sustained throughput simultaneously rather than sharing a peak figure between them.
The Ant64S has BSRAM, FireStorm DDR3, and the SG2000's own DDR3. The Ant64 and Ant64C add the Graphics SRAM and EE Code SRAM banks.
On-Chip BSRAM (all models)
GoWin GW5AT-138: 340 BSRAM blocks × 36Kbits = ~765KB total. Dual-port, up to 380MHz, 72-bit data width — ~3.4GB/s per block at full speed. Used for: sprite line buffers, tilemap data, clip tables, palette RAM, CRT simulation LUTs, texture cache hot set, Copper lists.
Graphics SRAM — 2× IS61LPS51236B (Ant64 and Ant64C only)
Two IS61LPS51236B pipeline synchronous SRAM chips with chip select lines, mapped sequentially in address space to form a single 1M × 36-bit bank. The set has its own dedicated bus, completely independent from the EE Code SRAM set:
| Parameter | Per chip | Set total |
|---|---|---|
| Organisation | 512K × 36-bit | 1M × 36-bit |
| Capacity | 2.25MB | 4.5MB |
| Clock speed | 200MHz | 200MHz |
| Bandwidth | — | ~900MB/s |
| Access mode | Pipeline — 1-cycle latency, burst | 36-bit bus |
| Wait states | None | None |
Pipeline mode: address on cycle N, data valid on cycle N+1. Burst mode: sequential addresses stream at full 200MHz clock rate after the initial pipeline fill — ideal for line buffer reads and sequential texture streaming. No wait states, no refresh, no page boundaries, no bus turnaround penalty.
The 36-bit data width is not accidental. 36 bits is evenly divisible by 3 — the number of channels in RGB — enabling pixel formats that 32-bit-wide memory cannot accommodate cleanly:
| Format | Pixels per 36-bit word | Colours | Use case |
|---|---|---|---|
| 3bpp (8-colour) | 12 pixels | 8 | Text screen, attribute modes |
| 4bpp (16-colour) | 9 pixels | 16 | Classic sprite/tilemap art |
| 12bpp (R4G4B4) | 3 pixels | 4,096 | Compact true-colour tiles |
| 18bpp (R6G6B6) | 2 pixels | 262,144 | Near-true-colour |
| R12G12B12 | 1 pixel | 68 billion | High-precision intermediate buffer |
| R9G9B9+A9 | 1 pixel + alpha | 512 per channel | Precision blending intermediate |
High-colour intermediate buffers: The blitter can render to Graphics SRAM in R12G12B12 — 12 bits per channel, one pixel per 36-bit word. Bloom accumulation, alpha compositing, and colour correction operate at 12-bit precision before the blitter converts to the output pixel format. No premature clamping — precision where effects need it, compact format at output.
8-colour text screen: At 12 pixels per 36-bit word, a Standard-resolution scanline (480 pixels wide) is 40 words — one burst of 40 cycles at 200MHz = 0.2μs per line.
Line buffers: A 480-pixel line in 12bpp (3 pixels per word): 160 burst cycles = 0.8μs. Graphics SRAM serves as a colour-format staging area between rendering passes at speeds that don't constrain the blitter.
Shared with audio DSP: The Graphics SRAM bus is also used by the audio DSP for wavetables and FM voice data. Both workloads are naturally burst-sequential; they time-slice the bus without significant conflict in practice.
Primary use: active texture atlas, sprite sheets, tilemap tile data, high-colour intermediate buffers, audio wavetables and FM data. A typical scene fits comfortably within 4.5MB, allowing a complete frame to render with no DDR3 involvement once assets are loaded.
EE Code SRAM — 2× IS61LPS51236B (Ant64 and Ant64C only)
An identical second set of IS61LPS51236B chips with chip select lines, mapped sequentially as 1M×36-bit, on their own dedicated bus — exclusively for FireStorm Execution Engine instruction storage:
| Parameter | Per chip | Set total |
|---|---|---|
| Organisation | 512K × 36-bit | 1M × 36-bit |
| Capacity | 2.25MB | 4.5MB |
| Bandwidth | — | ~900MB/s |
| Access mode | Pipeline — 1-cycle latency, burst | 36-bit bus |
The 36-bit word width matches the EE's 36-bit instruction word exactly — one instruction per SRAM word, no packing or unpacking. 4.5MB is ample for all EE system tasks, Copper handlers, blit controllers, audio synthesis code, and application programs.
This bus is exclusively the EE's. No other device shares it. The EE fetches instructions at full burst bandwidth regardless of what the blitter, audio DSP, or any other subsystem is doing. This is the unconditional guarantee of the architecture.
Why Separate Banks Matter — Bus Isolation and Burst
A unified memory architecture — where EE instruction fetch, blitter texture reads, audio DMA, and CPU transfers all share the same DDR3 bus — requires an arbiter. Every device that wants the bus must queue. At 200MHz with multiple simultaneous requestors, wait states accumulate and theoretical bandwidth figures become peaks that no individual device ever actually sustains. The Ant64S, with DDR3 as its primary working memory, lives with this trade-off.
The Ant64 and Ant64C do not. The memory subsystem is split across physically separate buses, and each subsystem has high-speed dedicated paths for bulk data transfer:
FireStorm FPGA (memory buses)
├── BSRAM (on-chip, 380MHz) ← blitter destination, line buffers, palette
├── Graphics SRAM bus ← blitter (textures, intermediate buffers)
│ audio DSP (wavetables, FM voice data)
├── EE Code SRAM bus ← EE instruction fetch ONLY
├── FPGA flash bus ← read-only system assets (all subsystems)
├── FireStorm DDR3 bus ← art library, audio sample banks,
│ framebuffers — blitter + audio DSP only
└── (external interfaces — see below)
SG2000 ↔ FireStorm
├── QSPI FRAM ← mapped memory window, SG2000 side (slow/control)
├── SG2000 MIPI TX → ← bulk data INTO FireStorm at D-PHY speeds
│ FireStorm MIPI RX (display overlay layer + arbitrary bulk transfer)
└── FireStorm LVDS TX → ← bulk data OUT of FireStorm to SG2000
bridge → SG2000 (composited pixel stream + data return path)
Sub-LVDS RX
Pulse (RP2350) ↔ FireStorm
├── QSPI FRAM ← mapped memory window, Pulse side (independent
│ from SG2000's FRAM interface)
└── 4-wire PIO link ← high-speed dedicated transfer via RP2350
programmable PIO (audio DMA, MIDI timing data)
DeMon (RP2350) ↔ FireStorm
├── QSPI FRAM ← mapped memory window, DeMon side (independent)
└── JTAG ← FPGA programming + debug; chain extends to
FPGA flash via FPGA JTAG
DeMon ↔ SG2000
└── JTAG ← SG2000 programming + debug
SG2000 DDR3 (512MB) ← AntOS, apps, SG2000 framebuffer, TPU
(separate from FireStorm DDR3, not shared)
The EE instruction bus is fully isolated — no other device shares it.
The Graphics SRAM bus is shared between the blitter and the audio DSP. Both are FPGA-internal clients, so arbitration is lightweight. Audio accesses wavetable and FM data in short predictable bursts at sample rate, leaving the majority of each cycle available to the blitter.
FireStorm's DDR3 is not accessible to the SG2000 directly. The SG2000 has its own 512MB DDR3. The QSPI FRAM bus is the narrow control/small-data bridge. For bulk data transfer in either direction, the MIPI and LVDS links are the fast paths — orders of magnitude faster than QSPI for large asset uploads or data retrieval.
Burst mode makes sequential access efficient across all clients:
- EE prefetch window — instruction stream advances linearly; burst fills the window in a sequential run, zero fetch stalls until it empties
- Blitter line buffer reads — purely sequential; a 480-pixel Standard line at 12px/word = 40 burst cycles = 0.2μs
- Audio wavetable reads — sequential phase accumulation through a wavetable is a textbook burst access pattern; pipeline latency paid once per note envelope segment
- Tilemap tile and compositing passes — sequential within each tile row and scanline
The architecture gives each client the right bus for its workload. The bandwidth figures are sustained figures, not theoretical peaks shared between competing devices.
FireStorm DDR3 (all models)
FireStorm's DDR3 is the FPGA subsystem's bulk store — art library, large audio sample banks, streamed audio, bitmap framebuffers. ~2.5GB/s sequential bandwidth, variable latency. Shared between the blitter and the audio DSP via the FireStorm DDR3 arbiter. Not suited to random-access or low-latency synthesis; Graphics SRAM and FPGA flash handle those roles on Ant64/Ant64C.
The SG2000 does not have direct access to FireStorm's DDR3. FireStorm's DDR3 arbiter serves only the blitter and audio DSP.
SG2000 DDR3 — 512MB (all models)
The SG2000 has 512MB of its own DDR3, separate from FireStorm's DDR3. This is the memory AntOS runs in — applications, file system cache, the SG2000's own video buffers, TPU working memory. The SG2000 cannot directly address FireStorm's DDR3; it sees only what FireStorm maps into the FRAM window.
FPGA Flash (all models)
The FireStorm FPGA has access to a flash chip used as a read-only asset store for built-in system data that never changes at runtime:
- Built-in wavetables — standard waveforms (sine, square, saw, triangle, noise variants) for the audio synthesiser
- FM voice data — operator configurations, envelope presets, algorithm tables for the FM synthesis engine
- Standard font bitmaps — the system font and built-in character sets, available to any rendering subsystem without consuming Graphics SRAM or DDR3 capacity
- Default palette data — factory palettes, CRT profile LUTs, and other read-only display parameters
- Boot graphics — splash screen, AntOS boot assets
Flash read bandwidth is lower than SRAM but adequate for these workloads — font data and FM tables are small enough to be copied to BSRAM or Graphics SRAM at boot time for faster runtime access if needed. The key advantage is that none of this consumes Graphics SRAM or DDR3 capacity: static assets that never change live in flash; the fast RAM is reserved for scene data that does.
DeMon has access to the FPGA flash via the FPGA's JTAG chain — it can reprogram flash contents without the FPGA bitstream being involved.
The FRAM Model — FireStorm as a Memory-Mapped Peripheral
From the perspective of every external chip — SG2000, Pulse, DeMon — FireStorm is simply a block of memory-mapped addresses accessible over QSPI. There is no special protocol, no command layer, no driver handshake. Writing to an address updates a FireStorm register or writes into SRAM. Reading from an address returns a register value or SRAM contents. It is exactly the same programming model as any other memory-mapped peripheral on an embedded system — the FPGA just has a very large and very capable register file.
This means:
- The SG2000 writes a palette entry by writing a 32-bit value to the palette RAM address in its FRAM window. The palette updates on the next scanline. No IPC, no kernel call, no DMA setup.
- DeMon writes a display mode register to force a specific CRT profile at boot — one 16-bit write to the right address.
- Pulse writes audio parameter registers — volume, pitch, envelope — as ordinary memory writes, timed by its own PIO logic.
- Any chip can read back status registers, scanline counters, blitter job IDs, or completion flags as simple memory reads.
The FPGA controls what address ranges are visible in each chip's FRAM window and what they map to internally. A window can expose palette RAM, sprite attribute tables, layer registers, the Copper list, audio registers, blitter job queues, or any region of Graphics SRAM or FireStorm DDR3. Each of the three FRAM interfaces (SG2000, Pulse, DeMon) is independent — they can have different mappings, different access permissions, and never contend with each other.
In principle, FireStorm's EE internal registers could also be mapped into a FRAM window — making D0–D31 of whatever task is currently running appear as memory locations to the SG2000 or DeMon. This is not recommended. An EE core context-switches in ~2 cycles; by the time a QSPI read completes, the register values belong to a different task. The feature exists as a potential debug curiosity, not as a programming model.
If you wrote games or demos in the 80s or 90s, you already know this feeling. POKE $D000 on the C64 and a sprite moved. Write to $DFF180 on the Amiga and a colour changed on the next scanline. The machine did what you told it, immediately, because you spoke to it directly. The FRAM model gives you that on the Ant64 — poke a memory location to move a sprite, poke another to fire a blitter job, poke another to change the CRT simulation mode.
Inter-Subsystem Data Paths
SG2000 → FireStorm (bulk data in): The SG2000 MIPI TX drives FireStorm's MIPI RX hardcell at up to 2.5Gbps. Primarily used to push the SG2000's CPU-rendered overlay layer into the FireStorm compositor, but the same physical link can carry arbitrary bulk data — art assets, compressed textures, pre-computed data — into FireStorm memory at D-PHY speeds. Orders of magnitude faster than FRAM for large transfers.
FireStorm → SG2000 (bulk data out): The FireStorm LVDS TX drives the SG2000's Sub-LVDS RX via the bridge chip. Primarily the composited pixel stream (display output + hardware MJPEG encoder feed + TPU vision input), but the physical link provides a high-bandwidth return path from FireStorm to the SG2000 when needed.
Pulse ↔ FireStorm: Pulse has two paths to FireStorm. The QSPI FRAM interface is the control path — register reads and writes, parameter updates, short commands. The 4-wire PIO link is the fast path — the RP2350's programmable PIO implements a custom high-speed serial protocol that can drive audio DMA transfers, MIDI timing-critical data, and jog dial / control surface events at rates the QSPI bus cannot sustain. Being PIO-based, the protocol is defined in firmware and can be changed without hardware modification.
DeMon ↔ FireStorm / SG2000 / FPGA flash: DeMon's QSPI FRAM interface gives it register-level access to FireStorm for system control, monitoring, and configuration. DeMon's JTAG connections give it programming and debug access to the SG2000, the FPGA bitstream, and — via the FPGA's internal JTAG chain — the FPGA flash. DeMon can reprogram the entire system: SG2000 firmware, FPGA bitstream, and FPGA flash contents, all from one supervisor chip over JTAG.
Memory Map Summary
| Memory / Bus | Capacity | Bandwidth | Ant64S | Ant64/C | Accessible to |
|---|---|---|---|---|---|
| BSRAM (on-chip) | ~765KB | ~3.4GB/s/block | ✓ | ✓ | Blitter, display pipeline, audio buffers |
| Graphics SRAM | 4.5MB | ~900MB/s | — | ✓ | Blitter + audio DSP |
| EE Code SRAM | 4.5MB | ~900MB/s | — | ✓ | EE only |
| FPGA flash | varies | read-only | ✓ | ✓ | All FireStorm subsystems; DeMon via JTAG |
| FireStorm DDR3 | varies | ~2.5GB/s | ✓ | ✓ | Blitter + audio DSP |
| SG2000 DDR3 | 512MB | — | ✓ | ✓ | SG2000 only |
| FRAM (SG2000) | window | QSPI speed | ✓ | ✓ | SG2000 ↔ FireStorm mapped window |
| FRAM (Pulse) | window | QSPI speed | ✓ | ✓ | Pulse ↔ FireStorm mapped window |
| FRAM (DeMon) | window | QSPI speed | ✓ | ✓ | DeMon ↔ FireStorm mapped window |
| MIPI TX (SG2000→) | — | up to 2.5Gbps | ✓ | ✓ | SG2000 → FireStorm bulk data in |
| LVDS (FireStorm→) | — | high speed | ✓ | ✓ | FireStorm → SG2000 bulk data out |
| PIO link (Pulse↔) | — | high speed | ✓ | ✓ | Pulse ↔ FireStorm fast transfer |