Hello,

I found some strange behaviour, and I want to ask about your experience
on that.

Currently, I'm working on a driver which is not WDM, but legacy NT4. It
is installed and uninstalled via the service control manager (SCM).

As the driver verifier does some tests when a driver is unloaded, I have
the following procedure in a test application for now:

1. Start the driver (equivalent to "net start <drivername>")
2. Open the device and do what I want
3. ...
4. Close the device
5. Stop the driver (equivalent to "net stop <drivername>")

No. 1 and 5 will not be available in the resulting version of the
driver, but they are there for now.


Now, everything works for now, even the driver verifier does not
complain. One of the persons I send the driver for additional testing
did the following on his machine: He starts several instances (6, to be
precise) of this test application above, to make sure that this case is
handled correctly. Here, the "..." above (no. 3) is especially short,
thus, the driver does not perform any work at all, but quits rather
quickly.

On my machine (uniprocessor), everything works fine, no problems at all.
To my surprise, on his machine (hyper-threading) as well as on another
machine (SMP), the system bugchecks. Most often, it bugchecks with a
DRIVER_UNLOADED_WITHOUT_CANCELLING_PENDING_OPERATIONS, sometimes, there
is a IRQL_NOT_LESS_OR_EQUAL.

Unfortunately, I do not have a HT or an SMP machine here. Anyway, from
the kernel dumps both send me, I found out that the driver was just
executing something in the DriverEntry() routine. Every time, it
bugchecks at different places, but it is always in the DriverEntry(), or
in functions which are only called from there.

If no. 1 and 5 above (starting and stopping the driver) is omitted,
everything works perfectly.

So, to me, it seems that No 5 (stopping the driver) in one instance of
the test application, and no 1 (starting the driver) in another instance
can occur simultaneously in the system on HT or SMP machines. Is this
true? Is there anything I can do about it to prevent this?

From my understanding, starting a driver via SCM will return with an
error message if the driver is already loaded. Thus, if stopping has not
yet taken place, another instance of the test app cannot start the
driver, thus, the driver would never reach the DriverEntry() routine.
But, from the crash dump, it must have reached the DriverEntry()
routine.

If the bugcheck would occur while operating the driver (no. 2-4), I
would think it is a design flaw of mine. But this does not happen!



Currently, to me, it seems like a design flaw of the OS, but I hope that
I'm mistaken here. ;-)

Any hints, suggestions, and remarks are welcome!

Regards,
Spiro.

--
Spiro R. Trikaliotis
http://www.trikaliotis.net/

Re: Start and stop via SCM not multiprocessor-safe? by nospam

nospam
Mon Nov 08 13:37:23 CST 2004

I would think it's your driver not multiprocessor-safe. Most likely you are
not cleaning up properly on exit.

--
http://www.firestreamer.com


"Spiro Trikaliotis" <news+200406@trikaliotis.net> wrote in message
news:slrncovdmo.qgi.news+200406@news.trikaliotis.net...
> Hello,
>
> I found some strange behaviour, and I want to ask about your experience
> on that.
>
> Currently, I'm working on a driver which is not WDM, but legacy NT4. It
> is installed and uninstalled via the service control manager (SCM).
>
> As the driver verifier does some tests when a driver is unloaded, I have
> the following procedure in a test application for now:
>
> 1. Start the driver (equivalent to "net start <drivername>")
> 2. Open the device and do what I want
> 3. ...
> 4. Close the device
> 5. Stop the driver (equivalent to "net stop <drivername>")
>
> No. 1 and 5 will not be available in the resulting version of the
> driver, but they are there for now.
>
>
> Now, everything works for now, even the driver verifier does not
> complain. One of the persons I send the driver for additional testing
> did the following on his machine: He starts several instances (6, to be
> precise) of this test application above, to make sure that this case is
> handled correctly. Here, the "..." above (no. 3) is especially short,
> thus, the driver does not perform any work at all, but quits rather
> quickly.
>
> On my machine (uniprocessor), everything works fine, no problems at all.
> To my surprise, on his machine (hyper-threading) as well as on another
> machine (SMP), the system bugchecks. Most often, it bugchecks with a
> DRIVER_UNLOADED_WITHOUT_CANCELLING_PENDING_OPERATIONS, sometimes, there
> is a IRQL_NOT_LESS_OR_EQUAL.
>
> Unfortunately, I do not have a HT or an SMP machine here. Anyway, from
> the kernel dumps both send me, I found out that the driver was just
> executing something in the DriverEntry() routine. Every time, it
> bugchecks at different places, but it is always in the DriverEntry(), or
> in functions which are only called from there.
>
> If no. 1 and 5 above (starting and stopping the driver) is omitted,
> everything works perfectly.
>
> So, to me, it seems that No 5 (stopping the driver) in one instance of
> the test application, and no 1 (starting the driver) in another instance
> can occur simultaneously in the system on HT or SMP machines. Is this
> true? Is there anything I can do about it to prevent this?
>
> From my understanding, starting a driver via SCM will return with an
> error message if the driver is already loaded. Thus, if stopping has not
> yet taken place, another instance of the test app cannot start the
> driver, thus, the driver would never reach the DriverEntry() routine.
> But, from the crash dump, it must have reached the DriverEntry()
> routine.
>
> If the bugcheck would occur while operating the driver (no. 2-4), I
> would think it is a design flaw of mine. But this does not happen!
>
>
>
> Currently, to me, it seems like a design flaw of the OS, but I hope that
> I'm mistaken here. ;-)
>
> Any hints, suggestions, and remarks are welcome!
>
> Regards,
> Spiro.
>
> --
> Spiro R. Trikaliotis
> http://www.trikaliotis.net/



Re: Start and stop via SCM not multiprocessor-safe? by Maxim

Maxim
Mon Nov 08 13:47:51 CST 2004

Probably your bug, your Unload routine can forget to cancel outstanding IO
or to wait till the IRPs sent down will complete.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com

"Spiro Trikaliotis" <news+200406@trikaliotis.net> wrote in message
news:slrncovdmo.qgi.news+200406@news.trikaliotis.net...
> Hello,
>
> I found some strange behaviour, and I want to ask about your experience
> on that.
>
> Currently, I'm working on a driver which is not WDM, but legacy NT4. It
> is installed and uninstalled via the service control manager (SCM).
>
> As the driver verifier does some tests when a driver is unloaded, I have
> the following procedure in a test application for now:
>
> 1. Start the driver (equivalent to "net start <drivername>")
> 2. Open the device and do what I want
> 3. ...
> 4. Close the device
> 5. Stop the driver (equivalent to "net stop <drivername>")
>
> No. 1 and 5 will not be available in the resulting version of the
> driver, but they are there for now.
>
>
> Now, everything works for now, even the driver verifier does not
> complain. One of the persons I send the driver for additional testing
> did the following on his machine: He starts several instances (6, to be
> precise) of this test application above, to make sure that this case is
> handled correctly. Here, the "..." above (no. 3) is especially short,
> thus, the driver does not perform any work at all, but quits rather
> quickly.
>
> On my machine (uniprocessor), everything works fine, no problems at all.
> To my surprise, on his machine (hyper-threading) as well as on another
> machine (SMP), the system bugchecks. Most often, it bugchecks with a
> DRIVER_UNLOADED_WITHOUT_CANCELLING_PENDING_OPERATIONS, sometimes, there
> is a IRQL_NOT_LESS_OR_EQUAL.
>
> Unfortunately, I do not have a HT or an SMP machine here. Anyway, from
> the kernel dumps both send me, I found out that the driver was just
> executing something in the DriverEntry() routine. Every time, it
> bugchecks at different places, but it is always in the DriverEntry(), or
> in functions which are only called from there.
>
> If no. 1 and 5 above (starting and stopping the driver) is omitted,
> everything works perfectly.
>
> So, to me, it seems that No 5 (stopping the driver) in one instance of
> the test application, and no 1 (starting the driver) in another instance
> can occur simultaneously in the system on HT or SMP machines. Is this
> true? Is there anything I can do about it to prevent this?
>
> From my understanding, starting a driver via SCM will return with an
> error message if the driver is already loaded. Thus, if stopping has not
> yet taken place, another instance of the test app cannot start the
> driver, thus, the driver would never reach the DriverEntry() routine.
> But, from the crash dump, it must have reached the DriverEntry()
> routine.
>
> If the bugcheck would occur while operating the driver (no. 2-4), I
> would think it is a design flaw of mine. But this does not happen!
>
>
>
> Currently, to me, it seems like a design flaw of the OS, but I hope that
> I'm mistaken here. ;-)
>
> Any hints, suggestions, and remarks are welcome!
>
> Regards,
> Spiro.
>
> --
> Spiro R. Trikaliotis
> http://www.trikaliotis.net/



Re: Start and stop via SCM not multiprocessor-safe? by Spiro

Spiro
Tue Nov 09 02:55:53 CST 2004

Hello,

Maxim S. Shatskih <maxim@storagecraft.com> wrote:

> Probably your bug, your Unload routine can forget to cancel
> outstanding IO or to wait till the IRPs sent down will complete.

Well, this is what I thought myself, too. The only thing that confuses
me is that it crashes in the DriverEntry() routine, not in the Unload()
routine or somewhere in the kernel. From my understanding, the
DriverEntry() routine should not even be called, should it?

Ok, I'll try to investigate further.

Regards,
Spiro.

--
Spiro R. Trikaliotis
http://www.trikaliotis.net/

Re: Start and stop via SCM not multiprocessor-safe? by Calvin

Calvin
Tue Nov 09 07:34:03 CST 2004

What does the call stack look like?

"Spiro Trikaliotis" <news+200406@trikaliotis.net> wrote in message
news:slrncp11gp.hkc.news+200406@news.trikaliotis.net...
> Hello,
>
> Maxim S. Shatskih <maxim@storagecraft.com> wrote:
>
> > Probably your bug, your Unload routine can forget to cancel
> > outstanding IO or to wait till the IRPs sent down will complete.
>
> Well, this is what I thought myself, too. The only thing that confuses
> me is that it crashes in the DriverEntry() routine, not in the Unload()
> routine or somewhere in the kernel. From my understanding, the
> DriverEntry() routine should not even be called, should it?
>
> Ok, I'll try to investigate further.
>
> Regards,
> Spiro.
>
> --
> Spiro R. Trikaliotis
> http://www.trikaliotis.net/



Re: Start and stop via SCM not multiprocessor-safe? by Spiro

Spiro
Sun Nov 14 08:26:12 CST 2004

Hello,

Calvin Guan <cguan@pleasenospams.ati.com> wrote:

> What does the call stack look like?

Sorry for the late answer, but there was some other work to be done on
that driver. As always, one has to work on different things
concurrently. :-(


I see the following call stack:

1: kd> kv
ChildEBP RetAddr Args to Child
f7b79b5c 805296be 00000050 f1121a89 00000000 nt!KeBugCheckEx+0x1b
f7b79bac 804e0f07 00000000 f1121a89 00000000 nt!MmAccessFault+0x77e
f7b79bac f1121a89 00000000 f1121a89 00000000 nt!KiTrap0E+0xd0 (FPO: [0,0] TrapFrame @ f7b79bc4)
WARNING: Frame IP not in any known module. Following frames may be wrong.
f7b79c34 00000000 f7b79c54 f11234a0 f8146000 <Unloaded_DRIVER.sys>+0x2a89

Hm, the last entry does not look that promising. Ok, let's look by hand
at the frames:

1: kd> dd f7b79b5c
f7b79b5c f7b79bac 805296be 00000050 f1121a89
f7b79b6c 00000000 f7b79bc4 00000000 f1121a89
f7b79b7c e29586a0 f7b79c58 00000000 83bc65f0
f7b79b8c 00000000 00000000 00000000 00000000
f7b79b9c 00000000 00000000 00000260 00000000
f7b79bac f7b79bc4 804e0f07 00000000 f1121a89
f7b79bbc 00000000 f7b79bc4 f7b79c58 f1121a89
f7b79bcc badb0d00 311f0010 0010000e 8059d980
1: kd> dd f7b79bac
f7b79bac f7b79bc4 804e0f07 00000000 f1121a89
f7b79bbc 00000000 f7b79bc4 f7b79c58 f1121a89
f7b79bcc badb0d00 311f0010 0010000e 8059d980
f7b79bdc 00000008 00000002 80000f98 f7b79c18
f7b79bec 8000148c 80000bb8 80000000 80000023
f7b79bfc 00130023 311f0010 31200011 00000000
f7b79c0c 00000000 f7b79dcc 00000030 fb495d28
f7b79c1c e29586a0 00000000 f7b79c58 00000000
1: kd> dd f7b79bc4
f7b79bc4 f7b79c58 f1121a89 badb0d00 311f0010
f7b79bd4 0010000e 8059d980 00000008 00000002
f7b79be4 80000f98 f7b79c18 8000148c 80000bb8
f7b79bf4 80000000 80000023 00130023 311f0010
f7b79c04 31200011 00000000 00000000 f7b79dcc
f7b79c14 00000030 fb495d28 e29586a0 00000000
f7b79c24 f7b79c58 00000000 f1121a89 00000008
f7b79c34 00210246 00000000 f7b79c54 f11234a0
1: kd> kv = f7b79bc4
ChildEBP Ret