I'm concerned with compiler variable caching in general and
out-of-order stores on multiprocessor systems. Recently I started
thinking about this with relation to filling in the initial values for
data that is operated on by interrupt handlers and DPCs. An IRP
contains of set of these initial values. An IRP is created by the I/O
manager (or another driver) then passed to the dispatch routine of a
driver. This probably all happens in one thread context on one CPU.
If the IRP is then pended by doing IoMarkIrpPending and returning
STATUS_PENDING from the dispatch routine, then the IRP will eventually
be completed in another thread context possibly on a different CPU.
If the IRP is completed by a worker thread, then there will have been
an implicit compiler and CPU memory barrier in the event that woke up
the thread to handle the IRP. This case should be okay.
If the IRP is completed by a DPC, then on a uniprocessor machine the
DPC may see the old values of some of the fields in the IRP that the
compiler did not yet update, and on a multiprocessor machine the DPC
may see the old values that the other CPU did not yet update. Doesn't
there need to be a memory barrier to flush any out-of-order stores
immediately before writing the device register that will initiate the
operation that will eventually cause an interrupt and the DPC handler
to run?
I get the impression that sometimes KeAcquireInterruptSpinLock and
KeSynchronizeExecution are used while programming device registers,
and the interrupt spinlock that is held would provide the necessary
barrier, but it was not clear to me this was the purpose of these
routines. Besides, does everyone even use them?