FireStorm™ Execution Engine

The FireStorm Execution Engine is not a RISC-V, ARM, or 68k CPU. It is a hardware-scheduled task engine with a compact, expressive ISA designed specifically for real-time multimedia workloads.

The Execution Engine is a deterministic, multi-core execution engine embedded inside the FireStorm FPGA. It exists to run graphics, audio, copper, blitter, and system micro-tasks faster and more predictably than a general-purpose CPU.

FireStorm


The FireStorm EE Advantage

FireStorm EE code is always relocatable by default. No absolute jumps, no relocation tables, no GOT/PLT, no veneer thunks, no "linker magic". Position independent code is a core philosophy.

  • Instructions are always one word
  • Using Tri-Ops™, the EE can execute up to 3 operations per cycle/instruction
  • EE cores are closer to four synchronised DSPs than a traditional SMP CPU

Core Architectural Differences

Feature 68K ARM32 FireStorm EE
Data registers 8 — D0–D7 16 GPRs 32 (Ant64S: 2× D0–D15) / 64 (Ant64/C: 2× D0–D31)
Address registers 8 — A0–A7 Same as data 32 (Ant64S: 2× A0–A15) / 64 (Ant64/C: 2× A0–A31)
Register pairing Implicit (mul/div) Awkward Explicit (.q) everywhere
Register blitting No No Any data register
Program counter Exception, can misalign Exception, can misalign Offset in opcode words, no misalignment
Stack register Fixed A7 Fixed SP Hardware SP, plus any A-reg can be a stack
Base register Points to the base of task data memory

The FireStorm EE's separate data and address register files is very intentionally 68k-inspired — and it matters.


Instruction Set Expressiveness

Capability 68K ARM32 FireStorm EE
Conditional execution Limited Good (pre-Thumb) Full CC field 99% (base ISA)
Operand order source → dest dest → source 68k-style source → dest
Addressing modes Rich Minimal Rich plus RISC three/four operands
Address vs data regs Yes No Yes
Code addressing Mixed Mixed All relative
Relocatable code Optional / manual Toolchain-dependent Always
PC-relative branches Yes Yes Native
Absolute jumps Common Common Not allowed
Jump tables Absolute / PC-rel Absolute / PC-rel Relative tables
Stack dependency Strong Strong Optional / register based
Self-contained tasks Partial Rare Yes
Copy and execute code blocks Risky Cache-sensitive Trivial
Post/pre inc addressing Yes No Yes
Scaled indexed addressing Yes Limited Yes
DBRA loop Yes No Yes (dbra, dbraz)
MOVEM Yes No Yes (alt ISA)
XCHG / SWAP Limited No Yes
Native fixed-point support Partial (manual) Partial (DSP extensions) Yes (architectural)
Fixed-point MAC No (68881 / DSP needed) Only with DSP extensions Yes
Saturation and rounding No Optional (SIMD/DSP) Built-in
Deterministic latency Mostly No (pipeline variance) Yes
Rotate + shift for scaling Multi-cycle Variable Single-cycle barrel
Fixed-point friendly ISA Accidental Secondary Designed for it
Audio-safe math Risky Cache-dependent Yes
64-bit ops Awkward Limited Native via .q

The FireStorm EE deliberately feels familiar to 68k programmers, but executes more like a modern RISC core.


Example: Memory Copy Loop

68K ARM32 FireStorm EE
move.l #500-1,d0 ldr r0, =src move.l #500,d0
lea src,a0 ldr r1, =dst move.l src,a0
lea dst,a1 mov r2,#500 move.l dst,a1
loop: move.l (a0)+,(a1)+ loop: ldr r0,[r1],#4 copy.l (a0)+,(a1)+, d0
dbra d0,loop str r0,[r3],#4 (starts DMA copy in background)
subs r2,r2,#1
bne loop

The FireStorm EE betters 68k clarity, takes one fewer instruction (no branch needed), the copy executes in the background, AND runs on a modern FPGA clock with no bus arbitration penalty.


Memory and Hardware Integration

68K ARM32 FireStorm EE
Unified address space Harvard-ish Lives inside the FPGA
Self-modifying code possible Heavy cache management Zero-cost access to hardware
Hard to cache safely Self-modifying code expensive Can be started by hardware (Copper, etc.)
Memory-mapped hardware Everything goes through the CPU Can launch hardware (blits, audio)
CPU must wait on everything Peripherals are slow MMIO Code and data are logically separate
Copper/blitter are separate but dumb DMA exists but is OS-heavy Code memory is word-addressed
Writes to code space are rare and controlled
Prefetch window can be safely invalidated
Safe background prefetching

This is not a CPU talking to hardware. This is hardware talking to hardware, with the FireStorm EE as the glue logic.


Execution Model (the hidden superpower)

Feature 68K ARM32 FireStorm EE
Cores 1 1–4 2 (Ant64S) / 4 (Ant64/C)
Instruction fetch Raw memory Cache/predictor BRAM prefetch window
Backward loop behaviour Decent Predictor dependent Explicit "behind window"
Context switch cache penalty Huge Huge Usually none
Hardware scheduling + cache stickiness No No Yes
Task switch Slow Slow ~2 cycles (reg/state swap)
Scheduler OS OS Hardware
Idle cores Waste Waste Repurposed instantly
Reaper reclaimer No No Redundant tasks/jobs removed
Repeat without fetch No No Up to 16 instructions in length

FireStorm EE cores are:

  • Not tied to copper, blitter, or audio
  • General-purpose, with fast context switching
  • Can be scheduled by hardware events

Register Blitting

Register Blitting is a FireStorm EE feature you won't find on other CPUs. Instead of forcing every small effect through a separate graphics unit, data registers themselves can be attributed with store-time behaviour — swizzling formats, masking writes, XOR or alpha-controlled drawing, and read-modify-write accumulation.

Combined with large register files and tight counted loops, this lets the CPU handle fine-grained graphics and effects directly, while the hardware blitter focuses on bulk work. It's a deliberate design for games, demos, and creative computing.

Example: Alpha-Controlled Masked Merge

Slot 0 store does swizzle ARGB→BGRA, RMW merge using source alpha:

  • If A==0: new = old
  • Else: new = (old & ~MASK) | (swz(src) & MASK)
  • MASK selects which bytes/bits are affected when A!=0
  • Example: MASK = 0x00FFFFFF — update RGB, keep destination top byte
68K ARM32 FireStorm EE
Performance 150–180 cycles/pixel 15–30 cycles/pixel Typical 6–12, best case 3 cycles/pixel

68K:

loop:
    move.l  (a0)+, d0
    tst.b   d0
    beq.s   skip
    move.l  d0, d1
    and.l   #$FF00FF00, d1
    move.l  d0, d2
    and.l   #$00FF0000, d2
    lsr.l   #16, d2
    move.l  d0, d3
    and.l   #$000000FF, d3
    lsl.l   #16, d3
    or.l    d2, d1
    or.l    d3, d1
    move.l  (a1), d4
    and.l   #$00FFFFFF, d1
    and.l   #$FF000000, d4
    or.l    d1, d4
    move.l  d4, (a1)
skip:
    addq.l  #4, a1
    dbra    d7, loop

ARM32:

loop:
    ldr     r3, [r0], #4
    movs    r4, r3, lsr #24
    beq     transparent
    and     r5, r3, #0xFF00FF00
    and     r6, r3, #0x00FF0000
    lsr     r6, r6, #16
    and     r7, r3, #0x000000FF
    lsl     r7, r7, #16
    orr     r3, r5, r6
    orr     r3, r3, r7
    ldr     r4, [r1]
    and     r3, r3, #0x00FFFFFF
    bic     r4, r4, #0x00FFFFFF
    orr     r4, r4, r3
    str     r4, [r1]
transparent:
    add     r1, r1, #4
    subs    r2, r2, #1
    bne     loop

FireStorm EE:

rgsetilo  #lo16(CFG_SWZ_ARGB_TO_BGRA | CFG_RMW_ALPHAKEY), slot0
rbsetihi  #hi16(CFG_SWZ_ARGB_TO_BGRA | CFG_RMW_ALPHAKEY), slot0
rbsetmlo  #$FFFF, slot0
rbsetmhi  #$00FF, slot0
rbsel     slot0, D0
rbenable  #$01, d0
loop:
    move  [A0]+, D0
    move  D0, [A1]+
    dbra  D2, loop
rbenable  #0, d0

Tri-Ops™: Parallelism Without the Pain

Most CPUs chase performance with wider pipelines and complex schedulers. Ant64 takes a different path.

Tri-Ops™ allow three simple operations to execute together in a single cycle, using a compact, deterministic encoding that's easy for hardware — and for humans. No speculative execution. No hidden stalls. Just more work done per cycle.

Tri-Ops excel at pointer arithmetic, loop control, and bit-level work — keeping hot loops small, readable, and fast.

Forwarding Model

Each slot runs conceptually in parallel, but with an explicit forwarding path from earlier slots to later slots:

S0 → S1    (Slot1 can see Slot0's written value)
S0 → S2    (Slot2 can see Slot0's written value)
S1 → S2    (Slot2 can see Slot1's written value)

Tri-Ops Examples

; Example 1 — D0 &= (D0 - 1)
mov d0, d3 : dec d3 : and d3, d0
; D3 = D0
; D3 = D3 - 1
; D0 = D0 & D3

; Example 2 — 64-bit add: (D1:D0) += (D3:D2)
add d2, d0 : adc d3, d1 : nop
; Slot0: D0 = D0 + D2   (sets C)
; Slot1: D1 = D1 + D3 + C

Instruction Timing

Instruction class 68K ARM32 FireStorm EE
ALU ops 2–8 cycles 1 cycle 1 cycle
Shift/rotate Variable 1–3 cycles 1 (barrel)
Mul 38+ cycles 3–5 cycles 1–3 cycles (non-blocking)
Div Up to 158 cycles 8–21 cycles 16–32 cycles (non-blocking)
Branch Expensive Pipeline flush Cheap (no predictor)

FireStorm EE trades speculation, caches, and out-of-order madness for predictability and bandwidth — which is exactly what demo code wants.


Architectural Overview

Feature Ant64S Ant64 / Ant64C
EE cores 2 4
Pipeline depth 3 stages 4 stages
Instruction width 32 bits 36 bits
Registers 64 (2× 16D + 2× 16A) 128 (2× 32D + 2× 32A)
Alternate ISA
Tri-Ops™
Fixed point
Float assist

Each EE core:

  • Executes one task at a time
  • Has private register state
  • Can be scheduled or descheduled in ~2 cycles
  • Is returned to the scheduler via ENDTASK

Execution Model

The FireStorm EE uses a hardware task scheduler.

Tasks are launched by: Copper · Blitter · Audio engine · FireStorm control logic

Tasks run until they execute ENDTASK or are pre-empted by the scheduler.

Context switching swaps the entire register file + flags in hardware — no software save/restore.


Syntax Rules

  • Operand order: source, destination
  • Only round brackets
  • No square brackets (reserved)
  • No PC-relative data
MOVE D0, D1
ADD  #4, A0
MOVE (A0)+, D0
MOVE D1, -(A1)

Register Model

Ant64S — Base ISA — 64 registers (2× 32)

  • Data registers: D0–D15 (register blitting capable)
  • Address registers: A0–A15
  • Both banks can be swapped with an alternate bank

Ant64 / Ant64C — Base ISA — 128 registers (2× 64)

  • Data registers: D0–D31 (register blitting capable)
  • Address registers: A0–A31
  • Both banks can be swapped with an alternate bank

Base ISA semantics are identical between Ant64S and Ant64/C. Ant64 simply exposes more registers using v3 to v0.


Flags — BIRDXNZVC

Flag Name Description
B Blitting Register blitting effects enabled
I Interrupts Interrupts enabled
R Reserved
D Division Division by zero
X eXtended Sticky carry
N Negative Result is negative
Z Zero Result is zero
V oVerflow Arithmetic overflow
C Carry Carry out

Base ISA Instruction Format

Ant64/Ant64C adds 4 extension bits [35:32]:

Name Bits Meaning
V3 [35:35] Extend RD or variant 2 (Ant64/C only)
V2 [34:34] Extend RA or variant 2 (Ant64/C only)
V1 [33:33] Extend RB or variant 1 (Ant64/C only)
V0 [32:32] Extend RC or variant 0 (Ant64/C only)
Alt [31:31] Alternative ISA
OPC [30:27] Opcode group (0–15)
SUB [26:22] Sub-op within group (0–31)
SZ [21:20] Size: 00=8, 01=16, 10=32, 11=variant/64-bit
CC [19:16] Condition code
RD [15:12] Destination register index
RA [11:8] Register A
RB [7:4] Register B / EA2b
RC [3:0] Register C / EA2c

Effective register index:

RD_eff = (V3<<4) | RD
RA_eff = (V2<<4) | RA
RB_eff = (V1<<4) | RB
RC_eff = (V0<<4) | RC

Variant-bit rule: If a sub-op does not use one of RD/RA/RB/RC, its corresponding V bit becomes a free variant bit (Ant64/C only). Ant64S has no such variants.


Immediate Encoding Table

✔ = part of immediate; otherwise column shows what that field is used for

# Name Bits SZ CC RD RA RB RC
0 NA 0 SZ CC RD RA RB RC
1 IMM2 2 CC RD RA RB RC
2 IMM4 4 SZ CC RD RA RB
3 IMM4b 4 SZ RD RA RB RC
4 IMM6 6 CC RD RA RB
5 IMM6b 6 RD RA RB RC
6 IMM8 8 SZ CC RD RA
7 IMM8b 8 SZ RD RA RB
8 IMM10 10 CC RD RA
9 IMM10b 10 RD RA RB
10 IMM12 12 SZ CC RD
11 IMM12b 13 SZ RD RA
12 IMM14 14 CC RD
13 IMM14b 15 RD RA
14 IMM16 16 SZ CC
15 IMM16b 16 SZ RD
16 IMM18 18 CC
17 IMM18b 18 RD
18 IMM20 20 SZ
19 IMM22 22

Addressing Modes

EA4 — 4-bit Effective Address

EA4 EA Meaning
0 data Rn Data register direct
1 (Rn) Address register indirect
2 (Rn)+ Address register indirect with postincrement
3 -(Rn) Address register indirect with predecrement
4 addr Rn Address register direct
5 (Rn+) Indirect next, doesn't change register
6 (-Rn) Indirect last, doesn't change register
7 ((Rn)) Address register indirect indirect
8 data Rn Alt data register direct
9 addr Rn Alt address register direct
10 (Rn) Alt address register indirect
11 (Rn)+ Alt address register indirect with postincrement
12 -(Rn) Alt address register indirect with predecrement
13 (Rn+) Alt indirect next
14 (-Rn) Alt indirect last
15 ((Rn)) Alt address register indirect indirect

EA2a — 2-bit Effective Address (2× fit in bits 0–3)

EA2a EA Meaning
0 data Rn Data register direct
1 (Rn) Address register indirect
2 (Rn)+ Address register indirect with postincrement
3 -(Rn) Address register indirect with predecrement

EA2b — Scale replaces RB (bit 0–2 = scale, bit 3 = +)

EA2b EA Meaning
0 (addr RA, #IMM12b), data RD Address indirect with 12-bit displacement to data
1 (addr RA, data RC<<scale)[+], data RD Address indirect with index [,scale] [,postincrement] to data
2 data RD, (addr RA, #IMM12b) Data to address indirect with 12-bit displacement
3 data RD, (addr RA, data RC<<scale)[+] Data to address indirect with index [,scale] [,postincrement]

EA2c — Scale replaces CC (bit 0–2 = scale, bit 3 = +)

EA2c EA Meaning
0 (addr RA, #IMM8b) Address indirect with 8-bit displacement
1 (addr RA, data RC<<scale)[+] Address indirect with index [,scale] [,postincrement]
2 ((addr RA), #IMM8b) Address indirect indirect with 8-bit displacement
3 ((addr RA), data RC<<scale)[+] Address indirect indirect with index [,scale] [,postincrement]

MAC Options

IMAC Option Meaning
Bit 0 RND Rounding
Bit 1 SAT Saturation
Bit 2 RSVD Reserved
Bit 3 RSVD Reserved

Conditional Branch Types

bra / bt beq bne bhi
bls bcc / bhs bcs / blo bpl
bmi bvc bvs bge
blt bgt ble bf

Reserved Vector Table

  • Words 0–31 — Interrupt vectors. Flags and PC pushed to hardware SP. Word 0 = BREAK, rest are user interrupts. Return with RTI.
  • Words 32–63 — "LineA" OS functions. 13-bit immediate loaded into D0. Return with RTS.
  • Words 64–127 — "LineF" user functions. 12-bit immediate loaded into D0. Return with RTS.

64-bit operand pairs use the "next in bank" rule — second reg is next within the 16-reg bank (wraps).


Base ISA Instruction Set

Group 0 — System

OPC SUB Name Operands Description
0 0 (Register shuffle 1) One register Bits 6–0 = minor #. Bit 7 = RD pair present
0 Minor 0 END [RD] End task
0 Minor 1 YIELD [RD] Voluntarily yield. RD pair = MS to yield (positive) or frames to wait (negative)
0 Minor 2 PAUSE RD Pause job ID. Bit 7 = unpause
0 Minor 3 WAIT [RD] Wait for remaining copies/tasks. If RD provided, check ID in RD
0 Minor 4 STATUS [RD] Check job ID in RD, set/clear Z if not complete
0 Minor 5 INTERRUPT [RD], #IMM6 Trigger interrupt [RD] + #IMM6. INTERRUPT #0 = BREAK
0 Minor 5 VBLANK [RD], #IMM6 Yield to next ([RD] + #IMM6) vblank
0 Minor 5 HBLANK [RD], #IMM6 Yield to next ([RD] + #IMM6) hblank
0 Minor 32 MOVE RD, SP, #IMM6 Move addr RD + signed #IMM6 to SP
0 Minor 33 MOVE SP, RD, #IMM6 Move SP + signed #IMM6 to addr RD
0 Minor 34 MOVE RD, PC, #IMM6 Move addr RD + signed #IMM6 to PC
0 Minor 35 MOVE PC, RD, #IMM6 Move PC + signed #IMM6 to addr RD
0 Minor 36 MOVE RD, F Move data register RD to Flags
0 Minor 37 MOVE F, RD Move Flags to data register RD
0 Minor 38 PUSH F, (SP/RD)+ Push flags to address RD or SP
0 Minor 39 POP -(SP/RD), F Pop flags from address RD or SP
0 Minor 63 BANK ADDR, DATA Bit 7 = Address bank Hi, Bit 6 = Data bank Hi
0 Minor 64 SWAP RD Swap hi/lo parts of register. Bit 7 = A or D
0 1 (Register shuffle 2) Two registers Bits 3–0 = minor #
0 Minor 0 SETPRI #imm8 Set task priority
0 Minor 1 REPEAT RA, #imm4 Repeat #imm4 instructions RA times. CC to exit early
0 Minor 8 XCHG RA, RD Exchange registers
0 Minor 9 FINDNEXT RA, RD Find next
0 2 START RD, #IMM8 Start task at code address RC, priority #imm8. Returns job # in RD (0 = fail)
0 3 PREFETCH #imm18 Set prefetch forward to #imm18+1 words
0 4 REPEAT #imm14, #imm4 Repeat #imm4 instructions RD times. 0 = max
0 5 NEG.sz [X], EAMOD(RD) Negate. X = bit 0
0 6 EXT.sz EAMOD(RD) Sign extend
0 7 ZERO RA, RB, RD Zero up to 3 registers. SZ bits enable 2nd/3rd operand
0 10 LINEA #IMM5, #IMM13 BSR/JSR to 32+#IMM5, D0 = #IMM13
0 11 ZERO RA, RB, RD Zero up to 3 registers (no SZ)
0 12 INC RA, RB, RD Increment up to 3 registers
0 13 DEC RA, RB, RD Decrement up to 3 registers
0 14 ROTATE RA, RB, RD Rotate 3 regs: temp=RD, RD=RB, RB=RA, RA=temp
0 15 LINEF #IMM6, #IMM12 BSR/JSR to 64+#IMM6, D0 = #IMM12
0 16 MIN.sz RA, RB, RD RD = min(RA, RB)
0 17 MAX.sz RA, RB, RD RD = max(RA, RB)
0 18 CLAMP.sz RA, RB, RD Clamp RD: RA = min, RB = max
0 19 SIGN.sz RA, RB, RD RD = sign(RA, RB)
0 20 MOVE #IMM22, SP Move immediate to stack pointer

Group 1 — Load / Store / Immediate

OPC SUB Name Operands Description
1 0 LD.sz EA2b[0] Address indirect with 12-bit displacement to data
1 1 LD.sz EA2b[1] Address indirect with index [,scale] [,postincrement] to data
1 2 LD.sz EA2c[0] Address indirect with 12-bit displacement to data
1 3 LD.sz EA2c[1] Address indirect with index [,scale] [,postincrement] to data
1 4 LD.sz EA4[RA], data RB, data RD Do EA4[RA] to RB then repeat to RD
1 5 LD.sz (base, #IMM12), data RD Load from base + #imm12
1 6 LD (base, #IMM14), data RD Load from base + #imm14
1 7 LD (base, #IMM18b), data RD Load from base + #imm18b
1 12 MOVE.sz #IMM12, data RD Move signed #IMM12 into RD
1 13 MOVE #IMM14, data RD Move signed #IMM14 into RD
1 14 MOVE #IMM18b, data RD Move signed #IMM18b into RD
1 15 COPY.sz EA4(RA), EA4(RD), RC No CC. Copy from EA4(RA) to EA(RD), RC times
1 16 ST.sz EA2b[2] Data to address indirect with 12-bit displacement
1 17 ST.sz EA2b[3] Data to address indirect with index [,scale] [,postincrement]
1 18 ST.sz EA2c[2] Data to address indirect with 12-bit displacement
1 19 ST.sz EA2c[3] Data to address indirect with index [,scale] [,postincrement]
1 20 ST.sz data RB, data RD, EA4[RA] Store RB to EA4[RA] then repeat with RD
1 21 ST.sz data RD, (base, #IMM12) Store to base + #imm12
1 22 ST data RD, (base, #IMM14) Store to base + #imm14
1 23 ST data RD, (base, #IMM18b) Store to base + #imm18b
1 27 MOVE.sz EA2c, EA4[RD] No CC. EA to EA
1 28 MOVE.sz EA4[RD], EA2c No CC. EA to EA
1 29 MOVE.sz EA4(RA), EA4(RD) Move EA4(RA) to EA4(RD)
1 30 MOVE.sz EA2b, EA2b 2nd EA2b uses RD rather than RA
1 31 COPY.sz EA2a(RA), EA2a(RD), data RB Copy RB units from EA2a(RA) to EA2a(RD)

Any memory read/write that doesn't use a 3rd register AND still has the CC field can set V0 for "strict" mode. Assembler syntax: ! after opcode, e.g. st! D0,(A0). Ignored on Ant64S.

Group 2 — ALU

OPC SUB Name Operands Description
2 0 ADD.sz EA2a(RA), data RB, EA2a(RD) Load from EA2(RA), add RB, result to EA2(RD)
2 1 ADD.sz data RA, #imm8, data RD Load RA, add #imm8, result in RD
2 2 ADD.sz #imm12, data RD Add #imm12 to RD
2 3 ADD.sz #imm16, data RD No CC. Add #imm16 to RD
2 4 ADD.sz EA2b
2 8 ADC.sz EA2a(RA), data RB, EA2a(RD) Add with carry
2 9 ADC.sz data RA, #imm8, data RD
2 10 ADC.sz #imm12, data RD
2 11 ADC.sz #imm16, data RD No CC
2 12 ADC.sz EA2b
2 16 SUB.sz EA2a(RA), data RB, EA2a(RD) Subtract
2 17 SUB.sz data RA, #imm8, data RD
2 18 SUB.sz #imm12, data RD
2 19 SUB.sz #imm16, data RD No CC
2 20 SUB.sz EA2b
2 24 SBC.sz EA2a(RA), data RB, EA2a(RD) Subtract with carry
2 25 SBC.sz data RA, #imm8, data RD
2 26 SBC.sz #imm12, data RD
2 27 SBC.sz #imm16, data RD No CC
2 28 SBC.sz EA2b

Group 3 — Mul / Div / MAC / Fixed Point

OPC SUB Name Operands Notes
3 0–2 MULS.sz EA2a / #imm8 / #imm12 Signed multiply
3 4–6 MULU.sz EA2a / #imm8 / #imm12 Unsigned multiply
3 8–10 DIVS.sz EA2a / #imm8 / #imm12 Signed divide
3 12–14 DIVU.sz EA2a / #imm8 / #imm12 Unsigned divide
3 16–18 MACS.sz EA2a / #imm8 / #imm12 Signed MAC (uses MACMOD)
3 20–22 MACU.sz EA2a / #imm8 / #imm12 Unsigned MAC (uses MACMOD)
3 24–26 REMU.sz EA2a / #imm8 / #imm12 Unsigned remainder
3 28–30 REMS.sz EA2a / #imm8 / #imm12 Signed remainder

Group 4 — Shifts / Rotates

OPC SUB Name Operands Notes
4 0 LSL.sz EA2a(RA), RB count, EA2a(RD) Also ASL
4 1 LSL.sz data RA, #imm8, data RD Also ASL
4 4 ROL.sz EA2a(RA), RB count, EA2a(RD)
4 5 ROL.sz data RA, #imm8, data RD
4 8 ROXL.sz EA2a(RA), RB count, EA2a(RD) Uses X as in/out bit
4 9 ROXL.sz data RA, #imm8, data RD Uses X as in/out bit
4 12 LSR.sz EA2a(RA), RB count, EA2a(RD)
4 13 LSR.sz data RA, #imm8, data RD
4 16 ROR.sz EA2a(RA), RB count, EA2a(RD)
4 17 ROR.sz data RA, #imm8, data RD
4 20 ROXR.sz EA2a(RA), RB count, EA2a(RD) Uses X as in/out bit
4 21 ROXR.sz data RA, #imm8, data RD Uses X as in/out bit
4 24 ASR.sz EA2a(RA), RB count, EA2a(RD)
4 25 ASR.sz data RA, #imm8, data RD

Group 5 — Logic

OPC SUB Name Operands Description
5 0 AND.sz EA2a(RA), data RB, EA2a(RD)
5 1 AND #imm12, RD No SZ. V0/V1 extend to #imm14
5 2 AND #imm16, RD No SZ, no CC
5 4 NOT.sz EA2a(RA), data RB, EA2a(RD)
5 5 NOT #imm12, RD No SZ
5 6 NOT #imm16, RD No SZ, no CC
5 8 OR.sz EA2a(RA), data RB, EA2a(RD)
5 9 OR #imm12, RD No SZ
5 10 OR #imm16, RD No SZ, no CC
5 12 EOR.sz EA2a(RA), data RB, EA2a(RD)
5 13 EOR #imm12, RD No SZ
5 14 EOR #imm16, RD No SZ, no CC
5 20 SHLADD #imm4, data RB, RA, RD RD = RA + (RB << #imm5). Shift 0 = 32
5 21 SHLSUB #imm4, data RB, RA, RD RD = RA - (RB << #imm5)
5 24 CMP.sz EA2a(RA), RB, data RD Load from EA2a(RA), shift by RB, CMP data RD
5 25 CMP.sz #imm16, RD No CC
5 26 CMP #imm17, RD No SZ, no CC

Group 6 — Move / Copy

OPC SUB Name Operands Description
6 0 move.sz (addr RA)+, (addr RD)+, #imm8 Copy #imm8 units from (RA)+ to (RD)+
6 1 move.sz EA2a(RA), RB, EA2a(RD) Load to EA2a, 4-bit signed shift using RB bits
6 2 move.sz EA2a(RA), lsl #0-3, EA2a(RD) 2-bit shift left on data
6 3 move.sz (addr RA)+, (addr RD)+, #imm12 No CC. Copy #imm12 units
6 4 move EA2a(RA), shift, EA2a(RD) No SZ. 6-bit signed shift using RB+SZ
6 16 LDLO [Z], #imm16b, RD No CC. Sign/zero extend to 32. Bit 21 = RD is A or D
6 17 LDHI [Z], #imm16b, RD No CC. Write high 16 into Dn
6 18 COPY.sz (addr RA)[+], (addr RD)[+], #imm, [WAIT] No CC. DMA copy. Bit 7 = wait, bit 6 = fixed value
6 19 COPY.sz (addr RA)[+], (addr RD)[+], RB, [WAIT] DMA copy. RD = job ID (0 = fail), Z set on fail

Group 7 — Bit Test / Clear

OPC SUB Name Operands Description
7 0 BTST.sz #IMM8\|data RA, EA4(RD) Bit test
7 8 BSET.sz #IMM8\|data RA, EA4(RD) Bit test and set
7 16 BCLR.sz #IMM8\|data RA, EA4(RD) Bit test and clear
7 24 BCHG.sz #IMM8\|data RA, EA4(RD) Bit test and change
7 28 TST.sz EA4(RD) Test from register
7 29 TSTF #IMM16 Flags RDXNZVC bit test
7 30 SETF #IMM16 Flags RDXNZVC bit set
7 31 CLRF #IMM16 Flags RDXNZVC bit clear

Group 8 — Language Assist

OPC SUB Name Operands Description
8 0 SCAN RA, RB, (RD)+, #imm5 Scan characters. Pattern = 5 bits (alpha, number, whitespace, null, eol…)
8 1 PARSENUM.sz (RA)+, RB, RD, #imm2 base, #imm2 type Parse number at (RA). Base = binary/octal/decimal/hex

Group 9 — Branching / Flow

OPC SUB Name Operands Description
9 0 SYS #imm14, addr RD Set PC to #imm14
9 1 SYS data RA, addr RD Set PC to contents of RA
9 2 BRA rel #IMM18 Branch relative
9 3 BRA rel #IMM22 No CC. Branch relative
9 4 BRA (addr RD, v) Branch to (RD) + signed #IMM14
9 5 BRA (addr RD, #IMM18b) No CC. Branch to contents of return address
9 6 BRA (rel #IMM14, addr RD) Branch to #IMM14 + RD
9 7 BRA (rel #IMM18b, addr RD) No CC. Branch to #IMM18b + RD
9 8 BSR rel #IMM14, addr RD Branch-with-link. PC → RD
9 9 BSR rel #IMM18b, addr RD Branch-with-link. PC → RD
9 9 JSR rel #IMM22 JSR. PC → -(SP)
9 10 DBRA data RD, rel #IMM14 Decrement and branch if Dn < 0
9 11 DBRAZ data RD, rel #IMM14 Decrement and branch if Dn != 0

Group 10 — Register Blitter System / Control

OPC SUB Name Operands Description
10 0 RBENABLE #IMM8, RD Bit 0 = store enable, 1 = load enable, 2 = inverse on load
10 1 RBSEL slot, Dn Bind Dn to slot
10 2 RBSETILO #IMM16, slot No CC. Set slot control low to #IMM16, high bits zeroed
10 3 RBSETIHI #IMM16, slot No CC. Set slot control high word, leave low as-is
10 4 RBSETMLO #IMM16, slot No CC. Set slot mask low to #IMM16, high bits zeroed
10 5 RBSETMHI #IMM16, slot No CC. Set slot mask high word, leave low as-is
10 6 RBSET RA, RD, slot Set slot mask to RA, control to RD

Register Blitter Control Word

Bits Field Values / Description
31:28 SWZ_EXT 16 = none; 17–31 reserved
27:24 FORMAT RGB565, ARGB1555, etc. (metadata)
23:16 ALPHA_THR 0–255. Write if A > thr. Default 0 = A==0 transparent
15:12 RMW_OP See table below
11 RMW_EN 0 = no RMW; 1 = enable RMW read
10 ALPHA_SRC 0 = source pixel A byte; 1 = reserved
9:8 SWZ 0 = none; 1 = ARGB→BGRA; 2 = BGRA→ARGB; 3 = reserved
7:6 WIDTH 0 = 32-bit; 1 = 16-bit; 2 = 8-bit; 3 = reserved
5 AFFECTS_A 0 = leave dest alpha alone; 1 = allow alpha update
4:0 Misc See below

RMW_OP Values

Value Operation Description
0 NONE Plain store (may still swizzle)
1 MASKMERGE new = (old & ~MASK) \| (src & MASK)
2 XOR new = old ^ src
3 OR new = old \| src
4 ADD new = old + src (wrap)
5 SUB new = old - src (wrap)
6 ALPHAKEY_MERGE If A==0 keep old, else masked-merge
7 ALPHAMASK_MERGE Use A as per-byte mask level
8–15 Reserved

Misc Bits 4:0

Bit Description
0 Strict alignment required
1 Forbid MMIO regions
2 Treat A==0 as transparent
3 Invert alphakey (A==0 = write, else keep)
4 Reserved for saturation

Alternate ISA: TriOps

Ant64/Ant64C adds 4 extension bits [35:32]:

Name Bits Meaning
!FLAGS [35:35] Don't set flags for slot 0 (Ant64/C only)
EXT-S0 [34:34] Extend SRC0 (Ant64/C only)
EXT-S1 [33:33] Extend SRC1 (Ant64/C only)
EXT-S2 [32:32] Extend SRC2 (Ant64/C only)
1 [31:31] Alternative ISA
1 [30:30] TriOps
SLOT0 [29:20] Slot 0 (sets flags by default)
SLOT1 [19:10] Slot 1
SLOT2 [9:0] Slot 2

Slot Decoding

Name Bits Meaning
OPC [9:6] Opcode
SRC [5:4] Data register 0–3
AD [3:3] Address or Data for DST
DST [2:0] Data/Address register 0–7

TriOps Slot Opcodes

OPC Name Operands Description
0 META See Meta table
1 MOV SRC, DST dst = src
2 ADD SRC, DST dst = dst + src
3 SUB SRC, DST dst = dst - src
4 AND SRC, DST dst = dst & src
5 OR SRC, DST dst = dst \| src
6 XOR SRC, DST dst = dst ^ src
7 NOT SRC, DST dst = !src
8 SHIFT SRC, DST Logical shift: << positive, >> negative
9 CTZ SRC, DST Count trailing zeros
10 CLZ SRC, DST Count leading zeros
11 ADC SRC, DST dst = dst + src + c
12 SBC SRC, DST dst = dst - src - (1-c)
13 MIN SRC, DST dst = min(src, dst)
14 MAX SRC, DST dst = max(src, dst)
15 CMP SRC, DST flags = dst - src (dst unchanged)

Meta Opcodes (SRC used = 4/8 opcodes)

OPC Name Operands Description
0 META 2 See Meta 2
1 INC DST dst += 1
2 DEC DST dst -= 1
3 ADD4 DST dst += 4
4 SUB4 DST dst -= 4 (36-bit only)
5 ADD8 DST dst += 8 (36-bit only)
6 SUB8 DST dst -= 8 (36-bit only)
7 CLR DST dst = 0 (36-bit only)

Meta 2 Opcodes (A/D + DST used = 16 opcodes)

OPC Name Description
0 NOP No operation
1 CLRF Clear flags
2 SETC Set carry
3 CLRC Clear carry
4 CLV Clear overflow
5 CLRNZ Clear N and Z
6 EI Enable interrupts
7 DI Disable interrupts
8 RTS (SP)+ copies to PC
9 RTI Restore flags and PC from (SP)+
10–15 Reserved

Alternate ISA: Broadcast Ops

Ant64/Ant64C adds 4 extension bits [35:32]:

Name Bits Meaning
SRC-AD [35:35] Src is A/D (Ant64/C only)
SRC-HI [34:34] Extend source register (Ant64/C only)
BANK [33:33] Use alt register banks (Ant64/C only)
HI [32:32] Dest is registers 16–31 (Ant64/C only)
1 [31:31] Alternative ISA
0 [30:30] Not TriOps
MODE [29:28] Mode 0–3. Broadcast Ops = 0
DST-AD [27:27] Dest is A/D
OPC [26:20] Opcode (0–127)
SRC [19:16] Source register
MASK [15:0] Dest register mask

Broadcast Opcodes

OPC Name Operands Description
0 RESERVED SRC, DST-LIST Reserved
1 MOV SRC, DST-LIST dst = src
2 ADD SRC, DST-LIST dst = dst + src
3 SUB SRC, DST-LIST dst = dst - src
4 AND SRC, DST-LIST dst = dst & src
5 OR SRC, DST-LIST dst = dst \| src
6 XOR SRC, DST-LIST dst = dst ^ src
7 NOT DST-LIST dst = -dst
8 CLR SRC, DST-LIST
9 SET SRC, DST-LIST
10 NEG SRC, DST-LIST
11 PUSH DST-LIST, -(SRC) (inverts SRC-AD flag)
12 POP (SRC)+, DST-LIST (inverts SRC-AD flag)
13 MIN SRC, DST-LIST dst = min(src, dst)
14 MAX SRC, DST-LIST dst = max(src, dst)
15 PUSH DST-LIST, -(SRC) BANK flag = bit 16, both A and D = bit 17
16 POP (SP)+, DST-LIST BANK flag = bit 16, both A and D = bit 17
17 MUL SRC, DST-LIST dst = dst * src
18 MULHIU SRC, DST-LIST dst = high(dst * src) unsigned
19 MULHIS SRC, DST-LIST dst = high(dst * src) signed
20 ABS DST-LIST dst = abs(dst)
21 SHIFT SRC, DST-LIST Shift dst. Signed src: negative = right
22 ROTATE SRC, DST-LIST Rotate dst. Signed src: negative = right
23 CMPEQ SRC, DST-LIST dst = (dst == src) ? -1 : 0
24 CMPLT SRC, DST-LIST dst = (dst < src) ? -1 : 0
25 CMPGT SRC, DST-LIST dst = (dst > src) ? -1 : 0

Important: The Ant64 family of home computers are at early design/prototype stage, everything you see here is subject to change.