Re: problem with isprint(), isspace() and the likes... by Dodo
Dodo
Wed Dec 17 00:22:39 CST 2003
"Doug Harrison [MVP]" <dsh@mvps.org> wrote in message
news:l0nvtv8t892vh36bblejolja4nnmrokttj@4ax.com...
> Dodo wrote:
>
> >Ok, here is the problem.
> >---
> >char a[] = "\162";
> >if(isprint(a[0]))
> > do_somethin();
> >---
> >The code snippet above will intermitenttly GPF or yield
> >wrong results . is*() are declared as
> >int is*(int c). When c is passed to the function the compiler
> >does implicit typecast: isprint((int)a[0]). It is signed so therfore
> >c becomes -32 (the compiler uses the movsx instruction
> >to load the eax with FFFFFFFFFFFFFFDE).
> >
> >It is all good until now. You see the Microsoft implementation
> >of is*() uses c as direct index into a lookup table which
> >value is returned. Very optimized but...No range checking,
> >no nothing....:-)
> >
> >And now you are playing minesweeper with where the
> >lookup array is located. If you are lucky you will get GPF.
> >If you are unlucky you will get intermittently wrong results.
> >
> >MS's implementation is buggy and should perform some
> >range checking (at least). Do I have to tell them what to do
> >with a lookup index out of range?...:-)
> >
> >BTW the problem is observed in VC6(latest SP) and VC7.1
> >and still is not corrected. Neither gcc or intel's libraries
> >have this problem (and VC shouldn't either).
> >If somebody from M$ can take a note and fix it in the
> >next SP "it would be just great".
>
> This is not a bug. Per the C Standard, the <ctype.h> functions are defined
> only on those int values representable in unsigned char, plus EOF. By
> default, plain char is signed in VC, and that together with integral
> promotion accounts for your problem. The portable way to pass a plain char
> to a <ctype.h> function is to cast it to unsigned char, e.g.
>
> isupper((unsigned char) c)
Thanks for the prompt reply. I guess I should have read
the standard more carefully :-/. I realize MS has followed
it strictly and the result of an argument unrepresentable
as an unsigned char is literally undefined. Well, still...a bit
of defensive programming wouldn't hurt. But that's just wishful
thinking. The standard does says the result is undefined so
what the hell? I just finished porting a very popular UNIX utility
to Windows which had a snippet lik above and it worked on
fine on UNIX but broke when compiled under VC because
of VC's conformance to the standard :-))) It just sounds funny.
Anyway, thanks again.
George.