DMA transfer
Graphics are large and can take a while to write to video memory, which can be a problem for performance. You can write directly to video memory using the 68000, but there's a faster way: DMA ("Direct Memory Access", referring to the fact that the hardware accesses memory directly without the CPU's help).
The Mega Drive's VDP supports multiple DMA operations but the most common is "DMA transfer" (used to load graphics into video memory), which is what we'll see in this page.
- How it works
- Set up source address and length
- Execute the transfer
- 128KB boundary bug
- Using a DMA queue
How it works
The idea behind a DMA transfer is that the 68000 tells the VDP to read data from ROM or RAM and write it somewhere in video memory. The 68000 tells the VDP what to look for and then the VDP takes over and copies the data as fast as it can.
The VDP needs to know three things:
- Source address (ROM or RAM address)
- Destination address (video address)
- Length (how much data to copy)
Set up source address and length
The source address and length are set by five VDP registers (three for the source and two for the length). These registers were already defined in the VDP setup page, but here they are again:
VDPREG_DMALEN_L: equ $9300 ; DMA length (low)
VDPREG_DMALEN_H: equ $9400 ; DMA length (high)
VDPREG_DMASRC_L: equ $9500 ; DMA source (low)
VDPREG_DMASRC_M: equ $9600 ; DMA source (mid)
VDPREG_DMASRC_H: equ $9700 ; DMA source (high)
First of all: source address and length are measured in words,
not bytes. This means that we need to drop the bottom bit, so shift
right by one if you haven't already. Addresses are usually passed in one
of the a0~a7
registers, you'll need to move the source
address to d0~d7
instead.
Another thing worth noting is that the source address is 23-bit. This
matters, since the 24th bit is used for other DMA operations, so you
must make sure to not leave any garbage in the upper bits (we can get
rid of them with and
if needed).
; length = d0
; source = a0 -> d7
lsr.w #1, d0
move.l a0, d7
lsr.l #1, d7
and.l #$7FFFFF, d7
Now you need to write all five VDP registers (each register holds one byte). The obvious way to do this is to write one register at a time and shift by 8 for each byte:
lea (VdpCtrl), a0
; Write DMA length
move.w #VDPREG_DMALEN_L, d6
move.b d0, d6
move.w d6, (a0)
lsr.w #8, d0
move.w #VDPREG_DMALEN_H, d6
move.b d0, d6
move.w d6, (a0)
; Write DMA source
move.w #VDPREG_DMASRC_L, d6
move.b d7, d6
move.w d6, (a0)
lsr.l #8, d1
move.w #VDPREG_DMASRC_M, d6
move.b d7, d6
move.w d6, (a0)
lsr.l #8, d1
move.w #VDPREG_DMASRC_H, d6
move.b d7, d6
move.w d6, (a0)
In practice the above can feel cumbersome with all the bit shifts, but if
you prefer to use the movep
trick
instead it can look like this (note that source address is written first
since we only need three bytes, the length write stomps over the excess
byte written by movep
).
If you aren't sure which one is better, you may want to stick with the shifts (see above) for now.
lea (VdpCtrl), a0
lea -10(sp), sp
move.l sp, a6
move.l #(VDPREG_DMALEN_H<<16)|VDPREG_DMALEN_L, (a6)+
move.l #(VDPREG_DMASRC_H<<16)|VDPREG_DMASRC_M, (a6)+
move.w #VDPREG_DMASRC_L, (a6)+
movep.l d7, -7(a6)
movep.w d0, -9(a6)
move.l (sp)+, (a0)
move.l (sp)+, (a0)
move.w (sp)+, (a0)
Execute the transfer
Now that the source and its length are set, we can set the destination address (where in video memory), which will also start the DMA transfer. The command we write is similar to the one used to set up the address to write to video memory.
The commands used to set the video address normally are like this:
VRAM_ADDR_CMD: equ $40000000
CRAM_ADDR_CMD: equ $C0000000
VSRAM_ADDR_CMD: equ $40000010
The commands to start the DMA transfer are similar, but they're OR'd with
$80
:
VRAM_DMA_CMD: equ $40000080
CRAM_DMA_CMD: equ $C0000080
VSRAM_DMA_CMD: equ $40000090
- If the Z80 accesses the 68000 memory at the same time as DMA begins, there's risk that the data will not be transferred properly (which will show up as glitchy or flashing graphics). Unless you absolutely know what you're doing, the best way to work around it is to pause the Z80 while the DMA is running.
- Official documentation states that you should do the writes from RAM and they should be two words instead of one long. While we aren't sure what it actually means (and may have been a red herring related to the bug above), for now we'll just go with it.
Given all the above, making the DMA transfer begin would be as follows
(assuming a transfer to VRAM, otherwise change the value in the
or.l
instruction):
; d1 = destination address
; a0 = VdpCtrl (as before)
; Convert destination address
; into the VDP command
and.l #$FFFF, d1
lsl.l #2, d1
lsr.w #2, d1
swap d1
or.l #VRAM_DMA_CMD, d1
; Start the transfer
move.l d1, -(sp)
FastPauseZ80
move.w (sp)+, (a0)
move.w (sp)+, (a0)
ResumeZ80
If you aren't using the Z80 yet you can remove the FastPauseZ80
and ResumeZ80
macros for now, but it's
recommended to add them as soon as possible to avoid forgetting it
in the future.
128KB boundary bug
This is an extremely annoying limitation regarding DMA: the
source address can't cross a 128KB boundary (e.g. cross from $01FFFF
to $020000
, from $03FFFF
to
$040000
, etc.). This is not an issue when transferring
graphics from RAM (since it never crosses a 128KB boundary), but it
is an issue when using ROM.
There are two solutions to this problem:
- Make sure that the graphics never cross a 128KB boundary.
- Check if a boundary is crossed and split it into two transfers.
Using a DMA queue
The above explanation gives the gist of how DMA works, but in practice it's wasteful: you can transfer much more data to video memory during vertical blank so you should keep all your transfers there if possible, and you don't want to waste this time processing what needs to be transferred.
The solution to this is to use a queue: every frame you build up the DMA commands and store it in this queue, then when the frame is over and vertical blank begins you immediately send everything in the queue to the VDP. Note that you need to keep around the graphics until the queue has been processed (not an issue for ROM, may be an issue for RAM).