## Vector (Normal) Map blending operations?

Member
Posts: 469
Joined: 2002.10
Post: #1
So, I'm toying around with the idea of using a big normal map to hold a 2D vector force field. I was thinking of just sticking the x/y components of each vector force into the 8bits of R and G with the magnitude in the 8bits of A. (OpenGL of course)

My question: Can I use this method to "draw" my vector forces into the field? And if so, what blending modes do I use to ensure that the vectors are added correctly?

i.e.
If I use 8 bits for x, then I'd have a [0,255] value representing a component force of [-127, 128]. If I add two forces -8 and -8 (119 + 119 = 238 wrong) I expect -16 (111).

I need to do this operation fast. I want to draw hundreds (if not thousands) of force areas into the field every frame. Is there any way to get higher precision?

Oh yeah, and here's a demo I whipped up with a very much more simple vector field that doesn't add many forces: SwarmDemo

---Kelvin--
15.4" MacBook Pro revA
1.83GHz/2GB/250GB
Member
Posts: 469
Joined: 2002.10
Post: #2
I have a feeling I'm going to need to use Fragment Shaders.
Am I correct in assuming that all I need is OpenGL 1.5 (for ARB_fragment_shader)?
i.e. if I limit my target platform to 1.5 capable machines, it'll be fine? Any negatives?

---Kelvin--
15.4" MacBook Pro revA
1.83GHz/2GB/250GB
Oldtimer
Posts: 834
Joined: 2002.09
Post: #3
If I understand you correctly, you want to piggyback the GPU to draw vector field "sprites" that add to each other in a framebuffer, and then read that back and use it to do maths?

My suggestion would be to skip that, since the readback will be costly, and you can probably do it fast enough on the CPU with the Accelerate framework.

OTOH, this would probably be well suited for a pixel shader, come to think of it. That would allow you to do signed operations... What you would do in this case is still draw your "textured" quads and then let the pixel shader accumulate it well into another texture, which you could probably get decent readback from if you use some extensions...

[Edit: you beat me to the fragment program suggestion ]
Member
Posts: 469
Joined: 2002.10
Post: #4
Basically right. I want to use the GPU to sum all my forces per-pixel. And yes, I'm thinking I'll have to use shaders and render to texture for read-back.

As for performance... Is it really faster to do thousands of blits (64x64 sprites) over a 1024x1024 area in the CPU vs the GPU? I was kinda thinking I could just load 1 texture and draw however many thousand quads I need, then read back. Seems costly on the CPU.

---Kelvin--
15.4" MacBook Pro revA
1.83GHz/2GB/250GB
Luminary
Posts: 5,143
Joined: 2002.04
Post: #5
GPU readbacks are not fast, certainly. If you don't need the results immediately (ie before rendering the next frame), you can do them asynchronously, however, which should give adequate performance.

To use GLSL fragment shaders, you need a Mac which exports the ARB_shader_objects, ARB_fragment_shader and ARB_shading_language_100 extensions, or OpenGL 2.0. To use ARB fragment programs, you need a Mac which exports the ARB_fragment_program extension. The former is Mac OS X 10.4.3 and later on Radeon 9500+, GeForce 5200+, or GMA 910+. The latter is Mac OS X 10.3.4 or so on Radeon 9500+, GeForce 5200+ or GMA 910+. Either way, you will find OS releases and hardware which do not work well. Of particular concern is that the PowerPC OpenGL drivers have not been updated since 10.4.3, so GLSL is possibly unusably buggy for PowerPC machines on 10.4.x.
Member
Posts: 469
Joined: 2002.10
Post: #6
Welps, after some preliminary tests with Accelerate.framework, I think a fragment program will probably be faster. I whipped up this test program that does the vector field addition with vImage. I'd appreciate some numbers from anyone with a few minutes to spare. (My iMacG5 and MBP numbers are included for reference)
-> Test program <-

Right now, with 1500 32x32 vector force areas, I can get 60"fps", but this is only the addition! I'd imagine 1500 textured quads would be a lot faster than 60fps regardless of the size, even with the render2tex and readback. I'm thinking I could probably push it to upwards of 5k quads drawn on the GPU but I don't have hard numbers. Can anyone tell me if this sounds right?

---Kelvin--
15.4" MacBook Pro revA
1.83GHz/2GB/250GB
Luminary
Posts: 5,143
Joined: 2002.04
Post: #7
As I said, it depends if you need the results of the previous frame to calculate the next, or whether you can wait a frame or two to get the results back.

If you need the results immediately, the CPU and GPU can't work in parallel, so you can expect poor performance. It might still be better than the CPU alone though!
Member
Posts: 469
Joined: 2002.10
Post: #8
Does readback performance depend on the area being read back? Because I really don't need to readback the whole thing. My algorithm only requires that I read back 2 to 5 points per particle. So if I can get away with only reading back ~6k pixels (vs 1M), that'd be fine.

I'm thinking I could probably just compute the points I need in a second pass program on the GPU and dump them by index into a vertex buffer.

If I go CPU, I'm going to get abysmal performance if I use the area sizes I want. *sigh* I may have to get creative here...

---Kelvin--
15.4" MacBook Pro revA
1.83GHz/2GB/250GB
Luminary
Posts: 5,143
Joined: 2002.04
Post: #9
The area being read is less important than you'd hope. If you can read one rectangle of 6000 pixels it will be faster than reading the whole screen, but if you have to do several reads to reduce the volume, it may well be slower.
Member
Posts: 469
Joined: 2002.10
Post: #10
I was thinking it might be feasible to just sample the 6k points into a smaller array and read that back once. Since the state of all the particles is static while doing the math, I could just read all the points back needed to do the calculation for all particles. This will require generating all the target locations in the GPU program, but from what I can tell, this is feasible, right?

---Kelvin--
15.4" MacBook Pro revA
1.83GHz/2GB/250GB
Luminary
Posts: 5,143
Joined: 2002.04
Post: #11
yes, that's probably feasible.