Real Time Simulation Of Fluids

Member
Posts: 142
Joined: 2002.11
Post: #1
Hello, just wanted to drop by and show off a video of what I've been working on the past few weeks. Inspired by plasma pong I decided to investigate real time fluid dynamics. The video below is a short recording of my program in action. The mouse is moving around, jostling the fluid about, and a few little balls are circling around, emitting ink into the fluid. Everything is being run on the GPU using FBO and GLSL.

Click here for video

I would post the actual executable, but it's very shader intensive and uses GL_RGBA_FLOAT16_APPLE, so it won't run on a lot of systems, and still needs mechanisms to detect when it can and cannot run, and in what situations it can make a compromise.
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #2
Looks awesome Smile

I'd be really interested to see the source for this Grin
Quote this message in a reply
DoG
Moderator
Posts: 869
Joined: 2003.01
Post: #3
Ditto, looks quite spiffy.
Quote this message in a reply
Member
Posts: 142
Joined: 2002.11
Post: #4
Does anybody know a place I can go to get a comprehensive view of how various cards perform using fragment shaders? Graphics card reviews tend to only compare direct competitors -- I want to know how graphics cards from NVidia and ATI perform on shaders from the low end all the way to the high end.
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #5
Define "comprehensive". This is an infinite domain.
Quote this message in a reply
Member
Posts: 142
Joined: 2002.11
Post: #6
arekkusu Wrote:Define "comprehensive". This is an infinite domain.

With regards to ATI, perhaps from the Radeon 9600 to the X1950. I'm not as familiar with NVidia's offerings, but maybe the 6 to 8 series?
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #7
What I meant is, what do you want to know from the infinite domain of "how various cards perform using fragment shaders". Which shaders? You can break this down into how every possible language construct translates into native GPU instructions, and then built an infinite number of shader permutations from that. Various cards will perform differently for different instructions (simple math, swizzles, trancendentals, looping, texture fetches, framebuffer writes.) The various cards also have different max limits for different types of instructions, which when exceeded will cause the shader to fall back to software rasterization (slow.)

In very gross terms, you can break the cards into three sets:

Pretty good: [Radeon X1600-X1900, Geforce 6600-7800]
Not as good: [GMA 950, Radeon 9550-X850, Geforce 5200]
Can't use fragment shaders: [Rage128-Radeon 9200, Geforce2MX-Geforce 4Ti]

If you want to look at raw capabilities, check this table for rows like MAX_PROGRAM_INSTRUCTIONS_ARB, MAX_PROGRAM_NATIVE_TEMPORARIES_ARB, MAX_PROGRAM_NATIVE_TEX_INDIRECTIONS_ARB, etc. Speaking in gross generalities:
The GMA 950 is roughly equivalent to a Radeon 9600, and
Nvidia cards can generally handle larger, more complex shaders than their same-timeframe ATI counterparts.

If you want to know about performance, narrow your search criteria to some specific shaders and start benchmarking.
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #8
You can also ask other meaningful questions like "can it do dynamic branching and looping" (Radeon X1600+, GeForce 6600+ at a penalty) and "can it do dynamic indexing of uniform arrays" (GeForce 8600+ maybe?)
Quote this message in a reply
Member
Posts: 142
Joined: 2002.11
Post: #9
OneSadCookie Wrote:You can also ask other meaningful questions like "can it do dynamic branching and looping" (Radeon X1600+, GeForce 6600+ at a penalty) and "can it do dynamic indexing of uniform arrays" (GeForce 8600+ maybe?)

I tested my program on a Radeon X1600 mobility today (Macbook Pro). It ran 10x slower than on my X1900XT (Mac Pro) Cry
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #10
I guess that's the right ballpark if you're relying heavily on rapidly-changing functions for loop counters and branch conditions.

I suspect you may cry if you try to run it on a GeForce < 8600 Wink
Quote this message in a reply
Member
Posts: 142
Joined: 2002.11
Post: #11
OneSadCookie Wrote:I guess that's the right ballpark if you're relying heavily on rapidly-changing functions for loop counters and branch conditions.

I suspect you may cry if you try to run it on a GeForce < 8600 Wink

There are no branches or loops, actually.
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #12
Without looking at the shader it's hard to say what the bottleneck is. You mentioned you're using GL_RGBA_FLOAT16_APPLE, maybe you are texture fetch limited.
Quote this message in a reply
Member
Posts: 142
Joined: 2002.11
Post: #13
arekkusu Wrote:Without looking at the shader it's hard to say what the bottleneck is. You mentioned you're using GL_RGBA_FLOAT16_APPLE, maybe you are texture fetch limited.

Code:
/*
    Jacobi iteration
*/

uniform sampler2D x; // x vector (Ax = b)
uniform sampler2D b; // b vector (Ax = b)
uniform float alpha;
uniform float r_beta; // recipricol of beta
uniform vec2 units_per_pixel;

void main(void) {

    vec2 coords = gl_TexCoord[0].xy;
        
    vec4 xL = texture2D(x, coords - vec2(units_per_pixel.x, 0.0));
    vec4 xR = texture2D(x, coords + vec2(units_per_pixel.x, 0.0));
    vec4 xB = texture2D(x, coords - vec2(0.0, units_per_pixel.y));
    vec4 xT = texture2D(x, coords + vec2(0.0, units_per_pixel.y));
    
    // b sample, from the center
    vec4 bC = texture2D(b, coords);
    
    gl_FragColor = (xL + xR + xB + xT + alpha * bC) * r_beta;

}

This is where the main action is going on. This fragment shader gets executed about 7 million times per frame (512x512 pixels times 25 iterations). Obviously any way to make it go faster would greatly speed up the program. Any ideas? I've looked at it pretty thourougly and I can't think of any way to speed it up -- other than the texture lookups, which can't be avoided, it's not really doing very much.
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #14
It looks likely that you're fetch limited.

As an experiment, try reducing your texture to 1x1 and see what happens to your frame rate. You're taking the same number of samples, but they should all come from the HW's texture cache.

You could also experiment with 512x512, RGBA8 and see what impact that has (ignore wrong results-- just see if the reduced fetch bandwidth make a speed difference.)

And of course, the min/mag filter directly impacts the number of fetches required (although on X1600, FLOAT16 will always be nearest filtered, this isn't true of other GPUs.)
Quote this message in a reply
Member
Posts: 142
Joined: 2002.11
Post: #15
arekkusu Wrote:It looks likely that you're fetch limited.

As an experiment, try reducing your texture to 1x1 and see what happens to your frame rate. You're taking the same number of samples, but they should all come from the HW's texture cache.

You could also experiment with 512x512, RGBA8 and see what impact that has (ignore wrong results-- just see if the reduced fetch bandwidth make a speed difference.)

And of course, the min/mag filter directly impacts the number of fetches required (although on X1600, FLOAT16 will always be nearest filtered, this isn't true of other GPUs.)

Yeah, I noticed that FLOAT16's are always nearest filtered on my card (an X1900) -- it actually could have simplified my code (elsewhere, in a velocity backtrace) to use linear filtering.

Switching to from GL_RGBA_FLOAT16_APPLE to RGBA8 my framerate goes up from 92 fps to 156fps.

A quick calculation, under the assumption that it takes a float16 twice as long to fetch as an rgba8 suggests that I'm spending 5/6th of my time fetching textures ...
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  Decoupling rendering loop from simulation sau_ers 3 5,842 Aug 2, 2012 07:02 AM
Last Post: Blacktiger
  Real-Time Soft Shadows MarkJ 6 5,049 Feb 17, 2007 03:10 PM
Last Post: MarkJ
  Tutorials on real-time cloth physics monteboyd 2 2,924 Mar 9, 2003 08:54 PM
Last Post: OneSadCookie