iDevGames Forums
Cross-platform vectorization. - Printable Version

+- iDevGames Forums (
+-- Forum: Development Zone (/forum-3.html)
+--- Forum: Programming Languages & Scripting (/forum-8.html)
+--- Thread: Cross-platform vectorization. (/thread-3512.html)

Cross-platform vectorization. - Skorche - Feb 7, 2007 03:04 PM

Right now, my physics library implements it's own 2D vector operations. In short, what are the options for vectorizing this code in a cross-platform manner? I'm not even sure what to Google for at this point.

Cross-platform vectorization. - OneSadCookie - Feb 7, 2007 03:38 PM

Crap all.

If you don't mind the license ( which you do Wink ) and using C++ ( which you do Wink ) then there's MacSTL, which is, contrary to the name, cross-platform.

If you don't mind being Mac-only ( which you do Wink ), there's some Apple-provided stuff in the Accelerate framework.

Otherwise, I'm not aware of anything.

Not, I suspect, that vectorizing Chipmunk would work particularly well, anyway Smile

Cross-platform vectorization. - Skorche - Feb 7, 2007 04:36 PM

Kinda what I figured.

I know that it spends a large percentage of time doing vector manipulation when solving impulses. Though I'm not sure how much of a benefit you would get from vectorizing 2D operations.

Maybe I'll look into vectorizing the OS X version later then. I guess I'm happy enough with the performance now.

Cross-platform vectorization. - TomorrowPlusX - Feb 8, 2007 06:23 AM

I have to say I'm very impressed with your performance right now. Running your box-stacking demos Activity Monitor showed the CPU at damn near zero. I'm very impressed.

Cross-platform vectorization. - Skorche - Feb 8, 2007 11:11 AM

Yeah, I'm pretty proud of it. It's mostly based on Erin Catto's contact persistence idea, but is implemented from the ground up to store the contacts in a different way. I've also been tuning my spatial hashing code for about a year now.

Running Shark on the box demo again, the main impulse solver function uses 30% of the CPU time, and the functions that apply the impulses are using almost. fminf() and fmaxf() are using 12%. (6% more for dyld_stub_fmaxf()) I declared the impulse application functions as extern inline, but they've never seemed to work. What's the deal? Also, aren't fminf() and fmaxf() supposed to be built in functions? I suppose I could just write my own and inline them.

Cross-platform vectorization. - Skorche - Feb 8, 2007 11:32 AM

Wow. Shock

I created my own static inline versions of the min/max functions, that shaved off 3%-ish. Not bad for a 1 minute fix. ≈6600ms to ≈6400ms for 5000 frames of the box demo.

Then I moved the impulse functions into the header as static inline functions. It jumped from a 3% to a 24% improvement! ≈6600ms -> ≈5000ms for 5000 frames. Barely more than 1ms per frame!

I suppose that I could just leave the impulse applying functions in the header, they are only two lines long, but I'd like to know why the extern inline declaration didn't work.

Cross-platform vectorization. - OneSadCookie - Feb 8, 2007 12:45 PM

Functions can only be inlined if their definition occurs in the same compilation unit as their use. That generally means "static inline" in a header, for C.

The GCC folks are working on link-time optimization that'll allow you to be lazy, but who knows when we see the results.

Cross-platform vectorization. - Skorche - Feb 8, 2007 02:48 PM

OSC: Ah, I see.

After adding a midphase aabb check I was able to take off another couple percent bringing it to a 30% drop in CPU use. Not bad for less than an hour of work. I didn't expect to get that much more out of it even with vectorization. Maybe I was just barking up the wrong tree.

However, now the function that calculates the collision impulses is using 70% of the CPU in the box stacking example. Unsurprisingly, it does a lot of vector operations. If vectorizing the vector operations Wacko even speeds them up by 10%, that would still probably lead to another 5% overall speedup.

Oh, and what ever happened to auto-vectorization? I thought that was supposed to be a big deal.

Cross-platform vectorization. - OneSadCookie - Feb 8, 2007 03:06 PM

What happened to it? it exists in the compiler; you can turn it on if you like (Xcode has a checkbox, or you can add -ftree-vectorize to your CFLAGS).

The auto-vectorizer in GCC 4.0 (that Apple uses) is next to useless. There's so little code it can vectorize, and when it does, it's just as likely to slow it down as speed it up. The auto-vectorizers in GCC 4.1 and the soon-to-be-released 4.2 are much better, but who knows when Apple'll upgrade.

Cross-platform vectorization. - Skorche - Feb 8, 2007 04:09 PM

Despite the bleak description of the -ffast-math option in the GCC man page, I tried it. It gives another sizable gain in performance on my G5. (another 6-10%) It doesn't even seem to affect the simulations at all either. The gain is minimal if non-existant on my MacBook though.

I'm a bit confused though, according to the man page, it assumes that I'm not using a lot of things that I am (Inf's, etc.), and warns that the results aren't exact. Yet it still works verbatim (near as I can tell) on both my PPC and Intel machines. Huh Strange then that the PPC and Intel versions run slightly differently, but using -ffast-math doesn't seem to have any affect.

Should I be worried that this is going to cause problems in the future? It sounds like the optimizations should cause the math I'm doing to not work at all.