If a driver must store some context data that will only be used for a
single request but that may be accessed by each of the smaller DMA
operations within the request (current offset from the beginning of
the buffer for instance), doesn't there need to be a memory barrier
immediately before each write to the register which initiates DMA?

(since after the DMA begins it may interrupt on a different CPU to
signal its completion, and the different CPU must see the most recent
offset from the beginning of the buffer in order to use it for the
next segment of the request)

This seemed like an odd case when I thought about it. A value (the
offset) is being accessed by multiple CPUs (each running an interrupt
handler and/or DPC), but it doesn't immediately seem necessary to grab
a lock (which would imply a memory barrier) because the code that
handles each segment of the request on a particular CPU will finish
what it's doing before causing the next segment to run. There's no
issue with concurrency, but there may be an issue with stale data in
the CPU's pipeline.

Right?

Re: Memory barriers and split DMA context by BubbaGump

BubbaGump
Fri Oct 13 14:55:04 CDT 2006

On Fri, 13 Oct 2006 15:15:04 -0400, BubbaGump <> wrote:

>If a driver must store some context data that will only be used for a
>single request but that may be accessed by each of the smaller DMA
>operations within the request (current offset from the beginning of
>the buffer for instance), doesn't there need to be a memory barrier
>immediately before each write to the register which initiates DMA?
>
>(since after the DMA begins it may interrupt on a different CPU to
>signal its completion, and the different CPU must see the most recent
>offset from the beginning of the buffer in order to use it for the
>next segment of the request)
>
>This seemed like an odd case when I thought about it. A value (the
>offset) is being accessed by multiple CPUs (each running an interrupt
>handler and/or DPC), but it doesn't immediately seem necessary to grab
>a lock (which would imply a memory barrier) because the code that
>handles each segment of the request on a particular CPU will finish
>what it's doing before causing the next segment to run. There's no
>issue with concurrency, but there may be an issue with stale data in
>the CPU's pipeline.
>
>Right?

Actually, I guess if KeSynchronizeExecution is used to program the
device's registers (like I've heard suggested in the WDM book and the
DDK) then there'll be a memory barrier from the spinlock it acquires
and releases before the next interrupt handler runs, and everything
will be okay. Is it convention to use KeSynchronizeExecution for
programming device registers? I'd only gotten the impression this was
necessary for certain devices that can interrupt at any moment and
unnecessary for devices where the driver knows whether to expect an
interrupt and so knows whether to expect the interrupt handler to
access any registers at the same time. I hadn't even dreamed of
KeSynchronizeExecution being useful as a memory barrier.