If a driver must store some context data that will only be used for a
single request but that may be accessed by each of the smaller DMA
operations within the request (current offset from the beginning of
the buffer for instance), doesn't there need to be a memory barrier
immediately before each write to the register which initiates DMA?
(since after the DMA begins it may interrupt on a different CPU to
signal its completion, and the different CPU must see the most recent
offset from the beginning of the buffer in order to use it for the
next segment of the request)
This seemed like an odd case when I thought about it. A value (the
offset) is being accessed by multiple CPUs (each running an interrupt
handler and/or DPC), but it doesn't immediately seem necessary to grab
a lock (which would imply a memory barrier) because the code that
handles each segment of the request on a particular CPU will finish
what it's doing before causing the next segment to run. There's no
issue with concurrency, but there may be an issue with stale data in
the CPU's pipeline.
Right?