Hi,

I just tried to run my driver on a multi-cpu platform - and unfortunately
this results in a non-responding system (possibly due to a deadlock).
I've reviewed my code several times, but cannot find a possible reason for
this behaviour, so:

- Is there a good way / a software that can help me on this? I've tried
driver verifier, but it doesn't block the deadlock by bluescreen?

- Is there any other possible reason for this behaviour than deadlocks?

- Just to make sure: Is it possible to acquire more than one different (!)
spinlocks? e.g.
NdisAcquireSpinLock(&a);
NdisAcquireSpinLock(&b);
NdisReleaseSpinLock(&b);
NdisReleaseSpinLock(&a);


Best wishes and thanks for any help,

Peter

Re: Detecting deadlocks by cristalink

cristalink
Wed Feb 08 15:32:28 CST 2006

"Peter Schmitz" <PeterSchmitz@discussions.microsoft.com> wrote in message
news:8F79812C-7564-4E92-8399-F268D00480F8@microsoft.com...
> Hi,
>
> I just tried to run my driver on a multi-cpu platform - and unfortunately
> this results in a non-responding system (possibly due to a deadlock).
> I've reviewed my code several times, but cannot find a possible reason for
> this behaviour, so:
>
> - Is there a good way / a software that can help me on this? I've tried
> driver verifier, but it doesn't block the deadlock by bluescreen?
>
> - Is there any other possible reason for this behaviour than deadlocks?
>
> - Just to make sure: Is it possible to acquire more than one different (!)
> spinlocks? e.g.
> NdisAcquireSpinLock(&a);
> NdisAcquireSpinLock(&b);
> NdisReleaseSpinLock(&b);
> NdisReleaseSpinLock(&a);

In theory, you can use the above. You need to make sure you don't acquire
the same splinlock twice in the same thread.

The below will instantly lock up your MPU system.

void lock()
{
NdisAcquireSpinLock(&a);
NdisAcquireSpinLock(&a);
}

Also make sure your code does not try to acquire b after a:

NdisAcquireSpinLock(&b);
NdisAcquireSpinLock(&a);
NdisReleaseSpinLock(&a);
NdisReleaseSpinLock(&b);

Check your spinlocks are properly initialized. Don't do anything pageable
while you hold a spinlock. Don't do anything that might cause recursive
calls. Don't call any external functions.

--
http://www.cristalink.com

>
>
> Best wishes and thanks for any help,
>
> Peter




RE: Detecting deadlocks by pavel_a

pavel_a
Wed Feb 08 16:24:04 CST 2006

Have you tried to manually select verifier options?
There is the deadlock detection.

--PA

"Peter Schmitz" wrote:
> Hi,
>
> I just tried to run my driver on a multi-cpu platform - and unfortunately
> this results in a non-responding system (possibly due to a deadlock).
> I've reviewed my code several times, but cannot find a possible reason for
> this behaviour, so:
>
> - Is there a good way / a software that can help me on this? I've tried
> driver verifier, but it doesn't block the deadlock by bluescreen?
>
> - Is there any other possible reason for this behaviour than deadlocks?
>
> - Just to make sure: Is it possible to acquire more than one different (!)
> spinlocks? e.g.
> NdisAcquireSpinLock(&a);
> NdisAcquireSpinLock(&b);
> NdisReleaseSpinLock(&b);
> NdisReleaseSpinLock(&a);
>
>
> Best wishes and thanks for any help,
>
> Peter

Re: Detecting deadlocks by Maxim

Maxim
Wed Feb 08 20:08:07 CST 2006

!process 0 7 in WinDbg, and you will see the stacks of the deadlocked
threads cleanly. Note: the command is very slow.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com

"Peter Schmitz" <PeterSchmitz@discussions.microsoft.com> wrote in message
news:8F79812C-7564-4E92-8399-F268D00480F8@microsoft.com...
> Hi,
>
> I just tried to run my driver on a multi-cpu platform - and unfortunately
> this results in a non-responding system (possibly due to a deadlock).
> I've reviewed my code several times, but cannot find a possible reason for
> this behaviour, so:
>
> - Is there a good way / a software that can help me on this? I've tried
> driver verifier, but it doesn't block the deadlock by bluescreen?
>
> - Is there any other possible reason for this behaviour than deadlocks?
>
> - Just to make sure: Is it possible to acquire more than one different (!)
> spinlocks? e.g.
> NdisAcquireSpinLock(&a);
> NdisAcquireSpinLock(&b);
> NdisReleaseSpinLock(&b);
> NdisReleaseSpinLock(&a);
>
>
> Best wishes and thanks for any help,
>
> Peter


Re: Detecting deadlocks by Maxim

Maxim
Wed Feb 08 20:08:51 CST 2006

> Have you tried to manually select verifier options?
> There is the deadlock detection.

On ERESOURCE locks only IIRC.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com


Re: Detecting deadlocks by Doron

Doron
Wed Feb 08 23:44:05 CST 2006

it works on spin locks as well

d

--
Please do not send e-mail directly to this alias. this alias is for
newsgroup purposes only.
This posting is provided "AS IS" with no warranties, and confers no rights.


"Maxim S. Shatskih" <maxim@storagecraft.com> wrote in message
news:epQrs0RLGHA.3984@TK2MSFTNGP14.phx.gbl...
>> Have you tried to manually select verifier options?
>> There is the deadlock detection.
>
> On ERESOURCE locks only IIRC.
>
> --
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> maxim@storagecraft.com
> http://www.storagecraft.com
>



Re: Detecting deadlocks by Bill

Bill
Thu Feb 09 23:36:44 CST 2006

Peter Schmitz <PeterSchmitz@discussions.microsoft.com> wrote:

> Hi,
>
> I just tried to run my driver on a multi-cpu platform - and unfortunately
> this results in a non-responding system (possibly due to a deadlock).
> I've reviewed my code several times, but cannot find a possible reason for
> this behaviour, so:
>
> - Is there a good way / a software that can help me on this? I've tried
> driver verifier, but it doesn't block the deadlock by bluescreen?
>
> - Is there any other possible reason for this behaviour than deadlocks?

If you raise the IRQL of one of the processors to DISPATCH_LEVEL for
some other reason and there's a logic bug in your code that causes it
to get stuck in an infinite loop, you will also wedge one of the CPUs.

Another possibility is if you botch your interrupt handling: if for some
reason you fail to properly acknowledge or mask an interrupt event,
your hardware may keep the interrupt line asserted, causing an endless
stream of calls to your ISR.

Or, to quote a friend of mine, it could be communist gnomes.

> - Just to make sure: Is it possible to acquire more than one different (!)
> spinlocks? e.g.
> NdisAcquireSpinLock(&a);
> NdisAcquireSpinLock(&b);
> NdisReleaseSpinLock(&b);
> NdisReleaseSpinLock(&a);

Yes, it is possible. However, you _must_ be sure to always acquire
the locks in the same order. That is, you can't have one part of your
code that does this:

NdisAcquireSpinLock(&a);
NdisAcquireSpinLock(&b);
NdisReleaseSpinLock(&b);
NdisReleaseSpinLock(&a);

and another part that does this:

NdisAcquireSpinLock(&b);
NdisAcquireSpinLock(&a);
NdisReleaseSpinLock(&a);
NdisReleaseSpinLock(&b);

If you do this, and the two pieces of code manage to run on different
CPUs, you will trigger a deadlock due to lock order reversal. Say you
have a system with two processors. The sequence would be something like
this:

- CPU0 and CPU1 are both currently running threads containing your
driver code. By chance, both CPUs arrive at the different blocks
of code that manipulate locks A and B at about the same time.
- CPU0 wants locks A and B, in that order.
- CPU1 wants locks B and A, in that order.
- CPU0 acquires lock A.
- CPU1 acquires lock B.
- CPU0 now wants lock B, but CPU1 is already holding it.
- CPU1 now wants lock A, but CPU0 is already holding it.
- CPU0 can't release lock A until it acquires lock B.
- CPU1 can't release lock B until it acquires lock A.
- Both CPUs want something the other has, and won't release.
- The system is now wedged, and you must press the reset button to recover.

If your code is very complicated, lock order reversals can sneak up
on you very easily. They're also very difficult to reproduce and diagnose,
since they depend on both processors executing just the right code and just
the right time. You could go for months without ever seeing the deadlock,
or it may happen right away.

On a single CPU system, the Windows installer will load the uniprocessor
version of ntoskrnl.exe, in which acquiring a lock just raises the IRQL to
DISPATCH_LEVEL: the actual atomic test and set of the lock is skipped as
a form of optimization, since it serves no purpose when there's just one
CPU. That means if you accidentally create a lock order reversal in your
code, you'll never notice it when testing on a single CPU machine: no
matter which spinlock you acquire first, you'll always just be doing
KeRaiseIrql(DISPATCH_LEVEL) twice in a row, so the coding error will
be hidden.

Luckily, you had the presence of mind to actually test your code on an
SMP machine. :)

-Bill

--
=============================================================================
-Bill Paul (510) 749-2329 | Senior Engineer, Master of Unix-Fu
wpaulATNOSPAMPLEASEwindriverDHATcom | Wind River Systems
=============================================================================
"Ignorance may be bliss, but delusion is ecstasy!" -Perki
=============================================================================