DMA transfer

Graphics are large and can take a while to write to video memory, which can be a problem for performance. You can write directly to video memory using the 68000, but there's a faster way: DMA ("Direct Memory Access", referring to the fact that the hardware accesses memory directly without the CPU's help).

The Mega Drive's VDP supports multiple DMA operations but the most common is "DMA transfer" (used to load graphics into video memory), which is what we'll see in this page.

How it works

The idea behind a DMA transfer is that the 68000 tells the VDP to read data from ROM or RAM and write it somewhere in video memory. The 68000 tells the VDP what to look for and then the VDP takes over and copies the data as fast as it can.

The VDP needs to know three things:

Set up source address and length

The source address and length are set by five VDP registers (three for the source and two for the length). These registers were already defined in the VDP setup page, but here they are again:

VDPREG_DMALEN_L:  equ $9300  ; DMA length (low)
VDPREG_DMALEN_H:  equ $9400  ; DMA length (high)
VDPREG_DMASRC_L:  equ $9500  ; DMA source (low)
VDPREG_DMASRC_M:  equ $9600  ; DMA source (mid)
VDPREG_DMASRC_H:  equ $9700  ; DMA source (high)

First of all: source address and length are measured in words, not bytes. This means that we need to drop the bottom bit, so shift right by one if you haven't already. Addresses are usually passed in one of the a0~a7 registers, you'll need to move the source address to d0~d7 instead.

Another thing worth noting is that the source address is 23-bit. This matters, since the 24th bit is used for other DMA operations, so you must make sure to not leave any garbage in the upper bits (we can get rid of them with and if needed).

; length = d0
; source = a0 -> d7

    lsr.w   #1, d0
    move.l  a0, d7
    lsr.l   #1, d7
    and.l   #$7FFFFF, d7

Now you need to write all five VDP registers (each register holds one byte). The obvious way to do this is to write one register at a time and shift by 8 for each byte:

    lea     (VdpCtrl), a0
    ; Write DMA length
    move.w  #VDPREG_DMALEN_L, d6
    move.b  d0, d6
    move.w  d6, (a0)
    lsr.w   #8, d0
    move.w  #VDPREG_DMALEN_H, d6
    move.b  d0, d6
    move.w  d6, (a0)
    ; Write DMA source
    move.w  #VDPREG_DMASRC_L, d6
    move.b  d7, d6
    move.w  d6, (a0)
    lsr.l   #8, d1
    move.w  #VDPREG_DMASRC_M, d6
    move.b  d7, d6
    move.w  d6, (a0)
    lsr.l   #8, d1
    move.w  #VDPREG_DMASRC_H, d6
    move.b  d7, d6
    move.w  d6, (a0)

In practice the above can feel cumbersome with all the bit shifts, but if you prefer to use the movep trick instead it can look like this (note that source address is written first since we only need three bytes, the length write stomps over the excess byte written by movep).

If you aren't sure which one is better, you may want to stick with the shifts (see above) for now.

    lea     (VdpCtrl), a0
    lea     -10(sp), sp
    move.l  sp, a6
    move.l  #(VDPREG_DMALEN_H<<16)|VDPREG_DMALEN_L, (a6)+
    move.l  #(VDPREG_DMASRC_H<<16)|VDPREG_DMASRC_M, (a6)+
    move.w  #VDPREG_DMASRC_L, (a6)+
    movep.l d7, -7(a6)
    movep.w d0, -9(a6)
    move.l  (sp)+, (a0)
    move.l  (sp)+, (a0)
    move.w  (sp)+, (a0)

Execute the transfer

Now that the source and its length are set, we can set the destination address (where in video memory), which will also start the DMA transfer. The command we write is similar to the one used to set up the address to write to video memory.

The commands used to set the video address normally are like this:

VRAM_ADDR_CMD:  equ $40000000
CRAM_ADDR_CMD:  equ $C0000000
VSRAM_ADDR_CMD: equ $40000010

The commands to start the DMA transfer are similar, but they're OR'd with $80:

VRAM_DMA_CMD:   equ $40000080
CRAM_DMA_CMD:   equ $C0000080
VSRAM_DMA_CMD:  equ $40000090

Given all the above, making the DMA transfer begin would be as follows (assuming a transfer to VRAM, otherwise change the value in the or.l instruction):

; d1 = destination address
; a0 = VdpCtrl (as before)

    ; Convert destination address
    ; into the VDP command
    and.l   #$FFFF, d1
    lsl.l   #2, d1
    lsr.w   #2, d1
    swap    d1
    or.l    #VRAM_DMA_CMD, d1
    ; Start the transfer
    move.l  d1, -(sp)
    move.w  (sp)+, (a0)
    move.w  (sp)+, (a0)

If you aren't using the Z80 yet you can remove the FastPauseZ80 and ResumeZ80 macros for now, but it's recommended to add them as soon as possible to avoid forgetting it in the future.

128KB boundary bug

This is an extremely annoying limitation regarding DMA: the source address can't cross a 128KB boundary (e.g. cross from $01FFFF to $020000, from $03FFFF to $040000, etc.). This is not an issue when transferring graphics from RAM (since it never crosses a 128KB boundary), but it is an issue when using ROM.

There are two solutions to this problem:

Using a DMA queue

The above explanation gives the gist of how DMA works, but in practice it's wasteful: you can transfer much more data to video memory during vertical blank so you should keep all your transfers there if possible, and you don't want to waste this time processing what needs to be transferred.

The solution to this is to use a queue: every frame you build up the DMA commands and store it in this queue, then when the frame is over and vertical blank begins you immediately send everything in the queue to the VDP. Note that you need to keep around the graphics until the queue has been processed (not an issue for ROM, may be an issue for RAM).