At the bottom of this message is an example of what I believe is a bug
in the behavior of __forceinline when used with SSE intrinsics in
Visual Studio 2005. The summary is it seems that when __forceinline
is used on functions that contain SSE intrinsics, the functions are
not inlined after a very small depth (something under 10 levels of
functions). If __forceinline is replaced with __inline, the functions
are all inlined. From what I've read, there is no way __forceinline
should cause less inlining than __inline, but that's the behavior I'm
seeing. Also, if the SSE intrinsics are replaced with regular C, the
problem goes away as well.

I know the functions don't get inlined by warning 4714 which gets
raised on level 4 (not level 1 as MSDN seems to suggest), and by
examining the assembly which clearly shows function calls.

In the code below, the template parameter to the inlinetest function
determines how many levels of function calls there are before the call
to vset, which contains the SSE intrinsics. On my system, only very
small numbers (around 5) will successfully be inlined - there's some
ambiguity because the exact value has changed a few times and I'm not
sure why. Replacing __forceinline with __inline or changing vset to
not use intrinsics successfully inlines for much larger numbers (I've
tested with 100).

If anyone has any input on this problem - something I'm doing wrong or
ways to proceed, I'd appreciate it very much. Thanks,

-stephen diverdi
-stephen.diverdi@gmail.com


#include <stdio.h>
#include <intrin.h>

#define FORCEINLINE __forceinline

FORCEINLINE void vset ( float * const _a, float x )
{
#if 1
__m128 &a = *( __m128 * )_a;
a = _mm_set1_ps( x );
#else
for( int i = 0 ; i < 4 ; ++i )
_a[ i ] = x;
#endif
}

template < int N >
FORCEINLINE void inlinetest ( float * const a, int i )
{
return inlinetest< N - 1 >( a, i + 1 );
}

template < >
FORCEINLINE void inlinetest< 1 > ( float * const a, int i )
{
vset( a, ( float )i );
}

int main ()
{
float a[ 4 ];
inlinetest< 100 >( a, 0 );
fprintf( stderr, "%f\n", a[ 0 ] );
getc( stdin );
return 0;
}

Re: __forceinline and SSE bug in VS2005? by stephen

stephen
Tue Oct 16 18:23:33 PDT 2007

> The summary is it seems that when __forceinline
> is used on functions that contain SSE intrinsics, the functions are
> not inlined after a very small depth (something under 10 levels of
> functions). If __forceinline is replaced with __inline, the functions
> are all inlined. From what I've read, there is no way __forceinline
> should cause less inlining than __inline, but that's the behavior I'm
> seeing. Also, if the SSE intrinsics are replaced with regular C, the
> problem goes away as well.
>
As I've continued working on this problem, I've noticed another
interesting result - the problem is only present when whole program
optimization is turned on (and the warnings are generated in the link
step). If whole program optimization is turned off in the compiler
and linker settings, the functions are all inlined correctly.

-stephen diverdi
-stephen.diverdi@gmail.com


Re: __forceinline and SSE bug in VS2005? by Carl

Carl
Tue Oct 16 19:22:42 PDT 2007

stephen.diverdi@gmail.com wrote:
>> The summary is it seems that when __forceinline
>> is used on functions that contain SSE intrinsics, the functions are
>> not inlined after a very small depth (something under 10 levels of
>> functions). If __forceinline is replaced with __inline, the
>> functions are all inlined. From what I've read, there is no way
>> __forceinline should cause less inlining than __inline, but that's
>> the behavior I'm seeing. Also, if the SSE intrinsics are replaced
>> with regular C, the problem goes away as well.
>>
> As I've continued working on this problem, I've noticed another
> interesting result - the problem is only present when whole program
> optimization is turned on (and the warnings are generated in the link
> step). If whole program optimization is turned off in the compiler
> and linker settings, the functions are all inlined correctly.

Sounds like a bug. Have you filed a bug report on
http://connect.microsoft.com?

-cd



Re: __forceinline and SSE bug in VS2005? by stephen

stephen
Wed Oct 17 10:56:32 PDT 2007

>
> Sounds like a bug. Have you filed a bug report onhttp://connect.microsoft.com?
>
Yeah, I did at the end of the day yesterday.

It's too bad - with forceinline in the right places, I see a ~90%
speed increase, and with whole program optimization I get around 5%,
but if I use both, I don't get 95%, instead I get more like 50%
(because of the small functions that don't get inlined as a result of
the bug).

-stephen diverdi
-stephen.diverdi@gmail.com


Re: __forceinline and SSE bug in VS2005? by Alexander

Alexander
Wed Oct 17 11:16:20 PDT 2007

Your math is off :) - combining the two should net you 90.5%
speed increase - 5% from 10% (100% - 90%) is 0.5% totalling
90.5% when you combine the two. That assuming the two are
independent of course - you may get even less than 0.5% in reality,
but you may also get a bit more.

--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: agnickolov@mvps.org
MVP VC FAQ: http://vcfaq.mvps.org
=====================================

<stephen.diverdi@gmail.com> wrote in message
news:1192643792.799919.306220@i13g2000prf.googlegroups.com...
> >
>> Sounds like a bug. Have you filed a bug report
>> onhttp://connect.microsoft.com?
>>
> Yeah, I did at the end of the day yesterday.
>
> It's too bad - with forceinline in the right places, I see a ~90%
> speed increase, and with whole program optimization I get around 5%,
> but if I use both, I don't get 95%, instead I get more like 50%
> (because of the small functions that don't get inlined as a result of
> the bug).
>
> -stephen diverdi
> -stephen.diverdi@gmail.com
>



Re: __forceinline and SSE bug in VS2005? by stephen

stephen
Wed Oct 17 15:49:21 PDT 2007

>
> Your math is off :) - combining the two should net you 90.5%
> speed increase - 5% from 10% (100% - 90%) is 0.5% totalling
> 90.5% when you combine the two. That assuming the two are
> independent of course - you may get even less than 0.5% in reality,
> but you may also get a bit more.
>
Ah, I realize now the confusion is I was measuring performance in
frames per second. So for a program that gets 100fps, a 90% increase
is +90fps, and a 5% increase is +5fps, so combined that would be
195fps or a 95% increase. By "90% speed increase", I meant nearly
twice as fast, not ten times as fast, lest I oversell the advantages
of inlining. =)

-stephen diverdi
-stephen.diverdi@gmail.com


Stephen Diverdi, why are you drawing 195 fps ? Isn't 24 fps enough ? by Jeff_Relf

Jeff_Relf
Wed Oct 17 22:11:25 PDT 2007

Stephen Diverdi, why are you drawing 195 fps ? Isn't 24 fps enough ?


Re: Stephen Diverdi, why are you drawing 195 fps ? Isn't 24 fps enough ? by Ben

Ben
Thu Oct 18 07:37:24 PDT 2007


"Jeff?Relf" <Jeff_Relf@Yahoo.COM> wrote in message
news:Jeff_Relf_2007_Oct_17__10_11_PY@Cotse.NET...
> Stephen Diverdi, why are you drawing 195 fps ? Isn't 24 fps enough ?
>

Jeff, most people can detect any frame rate less than 30fps as being jerky.

Also, if a higher frame rate is attainable, then if only a lower frame rate
is needed you have spare cycles, which could translate to background work or
longer battery life.



Re: Stephen Diverdi, why are you drawing 195 fps ? Isn't 24 fps enough ? by Alexander

Alexander
Thu Oct 18 11:38:01 PDT 2007

Actually, for smooth gameplay you want 60fps on average.
This is the refresh rate on a typical LCD monitor. And it _is_
perceptible in highly dynamic games like first person shooters.
And let's not forget the other very important characteristic -
the minimum frame rate. The higher your average frame rate,
the lower the probability of the minimum frame rate dipping
below the playability threshold (30fps). Admittedly, 100fps
on average should be sufficient, but the less time the CPU takes
on graphics tasks the better for the overall game performance...

--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: agnickolov@mvps.org
MVP VC FAQ: http://vcfaq.mvps.org
=====================================

"Ben Voigt [C++ MVP]" <rbv@nospam.nospam> wrote in message
news:%23Lv4gRZEIHA.5856@TK2MSFTNGP04.phx.gbl...
>
> "Jeff?Relf" <Jeff_Relf@Yahoo.COM> wrote in message
> news:Jeff_Relf_2007_Oct_17__10_11_PY@Cotse.NET...
>> Stephen Diverdi, why are you drawing 195 fps ? Isn't 24 fps enough ?
>>
>
> Jeff, most people can detect any frame rate less than 30fps as being
> jerky.
>
> Also, if a higher frame rate is attainable, then if only a lower frame
> rate is needed you have spare cycles, which could translate to background
> work or longer battery life.
>



Re: Stephen Diverdi, why are you drawing 195 fps ? Isn't 24 fps enough ? by stephen

stephen
Thu Oct 18 13:41:13 PDT 2007

> > > Stephen Diverdi, why are you drawing 195 fps ? Isn't 24 fps enough ?

Oh, I'm not talking about running something at 195fps. First, I was
just using those numbers as an example. If you'd prefer, I could have
spoken about 6fps vs 11.5fps. Second, if a scene runs at 195fps,
that's great - it means you can run scenes of significantly higher
complexity at interactive rates. Finally, I'm just looking for a
performance measure here. If I had said the code runs in 10ms vs.
5ms, would you ask why do I want my code to run at 5ms, isn't 10ms
enough? No bceause clearly, faster = lower CPU burden = better.

-stephen diverdi
-stephen.diverdi@gmail.com