Alex
Sun Jun 25 10:19:12 CDT 2006
cristi wrote:
> I get compiler warnings for the following lines:
>
> printf( "4. size : %d\n", sizeof('\u01F3'));
> printf( "5. size : %d\n", sizeof('\uFB94'));
> printf( "6. size : %d\n", sizeof('\U0001FB94'));
>
> and here are the warnings:
>
> warning C4566: character represented by
> universal-character-name '\u01F3' cannot be represented
> in the current code page (1250)
> warning C4566: character represented by
> universal-character-name '\uFB94' cannot be represented
> in the current code page (1250)
> warning C4566: character represented by
> universal-character-name '\UD83EDF94' cannot be
> represented in the current code page (1250)
>
> We get the same warning for all the above lines. I am a
> Unicode novice and playing with the compiler. My
> intention was *not* to use wide character literals. The
> standard mentions that ordinary character literals
> containing a single c-char are of type char. For the
> first to lines the behaviour is standard. Even if the
> characters specified by the universal character names do
> not fit in a char, the standard behaviour is preserved -
> that is, the type is char - and we get a size of 1 (of
> course, the value is implementation defined). But for the
> last line the type is changed. It seems very strange to
> me.
It's implementation defined. Here's what paragraph 2.13.2/5
says:
"A universal-character-name is translated to the encoding,
in the execution character set, of the character named. If
there is no such encoding, the universal-character-name is
translated to an implementation defined encoding."
Execution character set for MS compiler is ASCII character
set (with current system locale for extended characters).
See here for more info:
"Phases of Translation"
http://msdn2.microsoft.com/en-us/library/bxss3ska.aspx
Why MS compiler translates \uNNNN into single byte character
and \UNNNNNNNN into integer is beyond me.