GLSL fragment program limits on GMA950 (MacBook)

Member
Posts: 26
Joined: 2006.09
Post: #1
Hi,

I just wanted try out some GLSL magic on my MacBook and I was super disappointed after the third texture fetch. It seems that the GMA950 supports only 3 texture reads in fragment shader. Or at least in my case it reverts to software rendering (or some something else that is 2fps) if I try to read the texture the fourth time. Does someone know more about the GMA950 fragment program specs?

I tried to browse some docs at Intels site but I could not get further than their volumetric PR BS.

[edit] According to this: http://homepage.mac.com/arekkusu/bugs/GLInfo.html I should be able to do 32 texture reads...
Quote this message in a reply
Moderator
Posts: 1,140
Joined: 2005.07
Post: #2
Yes, but do you use the result for your previous lookups for you new lookups? If so, then you need to look at your texture indirect limits.
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #3
FWIW, the GMA 950's limits are the same as the Radeon 9600-9800-X600, except that it only has 8 temporaries instead of 16.
Quote this message in a reply
Member
Posts: 26
Joined: 2006.09
Post: #4
The max indirect texture read should be 4 on this chip. My fragment program looks like this (that double negation is of course stupid, but it should not break it Smile):

Code:
uniform sampler2D tex;
void main()
{
    vec2 tc0 = gl_TexCoord[0].st - vec2(-0.01,0);
    vec2 tc1 = gl_TexCoord[0].st - vec2(-0.02,0);
    vec2 tc2 = gl_TexCoord[0].st - vec2(-0.03,0);
    vec2 tc3 = gl_TexCoord[0].st - vec2(-0.04,0);

    vec4 t0 = texture2D(tex, tc0);
    vec4 t1 = texture2D(tex, tc1);
    vec4 t2 = texture2D(tex, tc2);
    vec4 t3 = texture2D(tex, tc3);

    gl_FragColor = (t0+t1+t2+t3)/4.0;
}

The vertex shader is just transforming the vertex and passing the texcoord through. If I remove the last lookup (or remove the usage of it in the last line) it'll work just fine. That does not look to indirect texture access to me, I'm just sampling the same tex mutiple times.
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #5
Tried binding the same texture to four different texture units and using four different uniforms? Perhaps there's some limit on the number of times you can sample from a single texture or something.

It could also be that you're running out of temporaries, I suppose. I don't think the register allocator in Apple's GLSL implementation does a very good job of reuse... can you write an ARB_fragment_program that's equivalent and runs in hardware?
Quote this message in a reply
Member
Posts: 26
Joined: 2006.09
Post: #6
I tried binding the same texture for 4 units, duplicate the texcoords in vertex program and then just access and add up the texture values in the fragment shader. Same problem. I think I'll test next how many textures I can use using the fixed function pipeline.

This finding correlates what I see happening in the GLSLShowPiece. The shaders that access more textures (like life) are superslow where as the more procedural shaders run just fine. I tried adding one more texture read to the Earth sample and it went slow too. It looks like they supported just enough to enable diffuse+bump+shadow maps. Damn Intel, damn Apple!
Quote this message in a reply
Moderator
Posts: 1,140
Joined: 2005.07
Post: #7
You're already using 8 temporaries directly. it's also going to need to use a temporary behind the scenes for each of your subtractions, and 3 for your 4-way average. (one for each addition) That's a total of 15, so it's easy to see where it may not re-use temporaries efficiently enough. If you do this in assembly shaders, then you could probably find a way to have it fit into enough temporaries to work in hardware.
Quote this message in a reply
Moderator
Posts: 1,140
Joined: 2005.07
Post: #8
I took the liberty of writing a fragment program in shader assembly to use the fewest amount of resources possible. I got it down to 2 temporaries, 4 parameters, 8 ALU instructions, and 4 texture lookups. If this won't work in hardware, then there's no hope.

Code:
!!ARBfp1.0

PARAM offset1 = {-0.01, 0, 0, 0};
PARAM offset2 = {-0.02, 0, 0, 0};
PARAM offset3 = {-0.03, 0, 0, 0};
PARAM offset4 = {-0.04, 0, 0, 0};

TEMP tc;
TEMP currentColor;
SUB tc, fragment.texcoord[0], offset1;
TEX currentColor, tc, texture[0], 2D;

SUB tc, fragment.texcoord[0], offset2;
TEX tc, tc, texture[0], 2D;
ADD currentColor, tc, currentColor;

SUB tc, fragment.texcoord[0], offset3;
TEX tc, tc, texture[0], 2D;
ADD currentColor, tc, currentColor;

SUB tc, fragment.texcoord[0], offset4;
TEX tc, tc, texture[0], 2D;
ADD currentColor, tc, currentColor;

MUL result.color, currentColor, 0.25;

END
Quote this message in a reply
Member
Posts: 26
Joined: 2006.09
Post: #9
Thanks akb825!

I tried the above code in Shader Builder using the Basics.shdr as basis and it indeed runs a slide show to me too. If I remove the last tex access it's fine again.

For some reason I could not get the second texture to work with texture environment, but I suspect my texenv skillz are a bit rusty.

I knew the HW was bad when I got this machine, but I don't really respect the fact that they are lying in the drivers caps that the HW is capable of doing more than it actually can. Actually... it really pissed me off.

I think I'll close this thread by the famous words of Mark Rein: "Intel is killing PC gaming" Smile
Quote this message in a reply
Oldtimer
Posts: 832
Joined: 2002.09
Post: #10
FWIW, I've filed a Radar bug on this under "Performance" and it got upclassed to "Serious bug" right away.
Quote this message in a reply
Moderator
Posts: 522
Joined: 2002.04
Post: #11
Thanks for filing it and the update! But it makes me feel stupid for not filing it in last December when I apparently was finding this same problem when making motion blur effects in Unity. Hopefully I'll learn Smile

-Jon
Quote this message in a reply
Sage
Posts: 1,199
Joined: 2004.10
Post: #12
Every time I see a MacBook I wish I could have bought it instead of my MBP, being cheaper, smaller and better looking. But then, I remember: GMA950. I'm sorry for your pain, here.
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #13
Bumping this thread now that Leopard is out.

First, to understand what's going here you need to read the ARB_fragment_program spec, in particular
Issue (24) What is a texture indirection, and how is it counted?

Now that you know what an indirection is, you can see how to stay within the limit-- calculate texcoords in a batch, then do a bunch of samples in a batch. Any time you use a temporary variable (as opposed to a varying) as a texture coordinate, you are entering another indirection phase, and GMA 950 (and ATI cards) only support four phases.

So you could get more than 3 lookups on a GMA 950 in Tiger if you used ARB_fragment_program, carefully structuring your program into indirection stages. But, it turns out that with GLSL ARB_fragment_shader, there was a bug which resulted in texture samples being counted as indirections when they shouldn't be.

The good news is that this is fixed in Leopard. Now on GMA 950 you can write a shader like this:

Code:
// 1D motion blur using 31-sample box filter

// this shader is carefully constructed to run on limited hardware.
// it will just barely fit on GMA 950-- 62/64 ALU, 31/32 TEX, 4/4 indirections.

varying vec2 texcoord;
uniform sampler2DRect tex;

void main (void)

{
    vec2 add1, add2, add3, add4, add5;
    vec2 sub1, sub2, sub3, sub4, sub5;
    vec4 samp;

    // indirection phase 1 (always have 1 phase)
    add1 = texcoord + vec2(1, 0);
    add2 = texcoord + vec2(2, 0);
    add3 = texcoord + vec2(3, 0);
    add4 = texcoord + vec2(4, 0);
    add5 = texcoord + vec2(5, 0);
    sub1 = texcoord - vec2(1, 0);
    sub2 = texcoord - vec2(2, 0);
    sub3 = texcoord - vec2(3, 0);
    sub4 = texcoord - vec2(4, 0);
    sub5 = texcoord - vec2(5, 0);
    
    // indirection phase 2 -- use temps as coords
    samp = texture2DRect(tex, texcoord);
    samp += texture2DRect(tex, add1);
    samp += texture2DRect(tex, add2);
    samp += texture2DRect(tex, add3);
    samp += texture2DRect(tex, add4);
    samp += texture2DRect(tex, add5);
    samp += texture2DRect(tex, sub1);
    samp += texture2DRect(tex, sub2);
    samp += texture2DRect(tex, sub3);
    samp += texture2DRect(tex, sub4);
    samp += texture2DRect(tex, sub5);

    add1 = texcoord + vec2( 6, 0);
    add2 = texcoord + vec2( 7, 0);
    add3 = texcoord + vec2( 8, 0);
    add4 = texcoord + vec2( 9, 0);
    add5 = texcoord + vec2(10, 0);
    sub1 = texcoord - vec2( 6, 0);
    sub2 = texcoord - vec2( 7, 0);
    sub3 = texcoord - vec2( 8, 0);
    sub4 = texcoord - vec2( 9, 0);
    sub5 = texcoord - vec2(10, 0);

    // indirection phase 3 -- use updated temps as coords
    samp += texture2DRect(tex, add1);
    samp += texture2DRect(tex, add2);
    samp += texture2DRect(tex, add3);
    samp += texture2DRect(tex, add4);
    samp += texture2DRect(tex, add5);
    samp += texture2DRect(tex, sub1);
    samp += texture2DRect(tex, sub2);
    samp += texture2DRect(tex, sub3);
    samp += texture2DRect(tex, sub4);
    samp += texture2DRect(tex, sub5);

    add1 = texcoord + vec2(11, 0);
    add2 = texcoord + vec2(12, 0);
    add3 = texcoord + vec2(13, 0);
    add4 = texcoord + vec2(14, 0);
    add5 = texcoord + vec2(15, 0);
    sub1 = texcoord - vec2(11, 0);
    sub2 = texcoord - vec2(12, 0);
    sub3 = texcoord - vec2(13, 0);
    sub4 = texcoord - vec2(14, 0);
    sub5 = texcoord - vec2(15, 0);

    // indirection phase 4 -- use updated temps as coords
    samp += texture2DRect(tex, add1);
    samp += texture2DRect(tex, add2);
    samp += texture2DRect(tex, add3);
    samp += texture2DRect(tex, add4);
    samp += texture2DRect(tex, add5);
    samp += texture2DRect(tex, sub1);
    samp += texture2DRect(tex, sub2);
    samp += texture2DRect(tex, sub3);
    samp += texture2DRect(tex, sub4);
    samp += texture2DRect(tex, sub5);

    gl_FragColor = samp * (1.0 / 31.0);
}

And it will be hardware accelerated.

Note that you still have to be careful about how you write your shader-- follow the same guidelines for ARB_fragment_program to group texture lookups into indirection groups. Also, as noted in the spec, some additional operations such as swizzling can cause an indirection because they implicitly use a temporary variable.
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  OpenGL 3.2 on MacBook Air Nick 3 5,872 Jul 21, 2011 04:32 PM
Last Post: Nick
  passing values from vertex to fragment shader Sumaleth 6 10,237 Feb 18, 2011 01:54 AM
Last Post: Holmes
  Gaussian blur, software fallback on GMA950 Fenris 11 6,754 Jul 28, 2007 05:00 PM
Last Post: OneSadCookie
  Linking multiple glsl source files into one program TomorrowPlusX 5 6,461 Nov 2, 2006 02:18 PM
Last Post: OneSadCookie
  CG Fragment Shaders hangt5 10 6,230 Oct 17, 2005 12:21 PM
Last Post: NicholasFrancis