Cross-platform vectorization.

Sage
Posts: 1,482
Joined: 2002.09
Post: #1
Right now, my physics library implements it's own 2D vector operations. In short, what are the options for vectorizing this code in a cross-platform manner? I'm not even sure what to Google for at this point.

Scott Lembcke - Howling Moon Software
Author of Chipmunk Physics - A fast and simple rigid body physics library in C.
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #2
Crap all.

If you don't mind the license ( which you do Wink ) and using C++ ( which you do Wink ) then there's MacSTL, which is, contrary to the name, cross-platform.

If you don't mind being Mac-only ( which you do Wink ), there's some Apple-provided stuff in the Accelerate framework.

Otherwise, I'm not aware of anything.

Not, I suspect, that vectorizing Chipmunk would work particularly well, anyway Smile
Quote this message in a reply
Sage
Posts: 1,482
Joined: 2002.09
Post: #3
Kinda what I figured.

I know that it spends a large percentage of time doing vector manipulation when solving impulses. Though I'm not sure how much of a benefit you would get from vectorizing 2D operations.

Maybe I'll look into vectorizing the OS X version later then. I guess I'm happy enough with the performance now.

Scott Lembcke - Howling Moon Software
Author of Chipmunk Physics - A fast and simple rigid body physics library in C.
Quote this message in a reply
Sage
Posts: 1,199
Joined: 2004.10
Post: #4
I have to say I'm very impressed with your performance right now. Running your box-stacking demos Activity Monitor showed the CPU at damn near zero. I'm very impressed.
Quote this message in a reply
Sage
Posts: 1,482
Joined: 2002.09
Post: #5
Yeah, I'm pretty proud of it. It's mostly based on Erin Catto's contact persistence idea, but is implemented from the ground up to store the contacts in a different way. I've also been tuning my spatial hashing code for about a year now.

Running Shark on the box demo again, the main impulse solver function uses 30% of the CPU time, and the functions that apply the impulses are using almost. fminf() and fmaxf() are using 12%. (6% more for dyld_stub_fmaxf()) I declared the impulse application functions as extern inline, but they've never seemed to work. What's the deal? Also, aren't fminf() and fmaxf() supposed to be built in functions? I suppose I could just write my own and inline them.

Scott Lembcke - Howling Moon Software
Author of Chipmunk Physics - A fast and simple rigid body physics library in C.
Quote this message in a reply
Sage
Posts: 1,482
Joined: 2002.09
Post: #6
Wow. Shock

I created my own static inline versions of the min/max functions, that shaved off 3%-ish. Not bad for a 1 minute fix. ≈6600ms to ≈6400ms for 5000 frames of the box demo.

Then I moved the impulse functions into the header as static inline functions. It jumped from a 3% to a 24% improvement! ≈6600ms -> ≈5000ms for 5000 frames. Barely more than 1ms per frame!

I suppose that I could just leave the impulse applying functions in the header, they are only two lines long, but I'd like to know why the extern inline declaration didn't work.

Scott Lembcke - Howling Moon Software
Author of Chipmunk Physics - A fast and simple rigid body physics library in C.
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #7
Functions can only be inlined if their definition occurs in the same compilation unit as their use. That generally means "static inline" in a header, for C.

The GCC folks are working on link-time optimization that'll allow you to be lazy, but who knows when we see the results.
Quote this message in a reply
Sage
Posts: 1,482
Joined: 2002.09
Post: #8
OSC: Ah, I see.

After adding a midphase aabb check I was able to take off another couple percent bringing it to a 30% drop in CPU use. Not bad for less than an hour of work. I didn't expect to get that much more out of it even with vectorization. Maybe I was just barking up the wrong tree.

However, now the function that calculates the collision impulses is using 70% of the CPU in the box stacking example. Unsurprisingly, it does a lot of vector operations. If vectorizing the vector operations Wacko even speeds them up by 10%, that would still probably lead to another 5% overall speedup.

Oh, and what ever happened to auto-vectorization? I thought that was supposed to be a big deal.

Scott Lembcke - Howling Moon Software
Author of Chipmunk Physics - A fast and simple rigid body physics library in C.
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #9
What happened to it? it exists in the compiler; you can turn it on if you like (Xcode has a checkbox, or you can add -ftree-vectorize to your CFLAGS).

The auto-vectorizer in GCC 4.0 (that Apple uses) is next to useless. There's so little code it can vectorize, and when it does, it's just as likely to slow it down as speed it up. The auto-vectorizers in GCC 4.1 and the soon-to-be-released 4.2 are much better, but who knows when Apple'll upgrade.
Quote this message in a reply
Sage
Posts: 1,482
Joined: 2002.09
Post: #10
Despite the bleak description of the -ffast-math option in the GCC man page, I tried it. It gives another sizable gain in performance on my G5. (another 6-10%) It doesn't even seem to affect the simulations at all either. The gain is minimal if non-existant on my MacBook though.

I'm a bit confused though, according to the man page, it assumes that I'm not using a lot of things that I am (Inf's, etc.), and warns that the results aren't exact. Yet it still works verbatim (near as I can tell) on both my PPC and Intel machines. Huh Strange then that the PPC and Intel versions run slightly differently, but using -ffast-math doesn't seem to have any affect.

Should I be worried that this is going to cause problems in the future? It sounds like the optimizations should cause the math I'm doing to not work at all.

Scott Lembcke - Howling Moon Software
Author of Chipmunk Physics - A fast and simple rigid body physics library in C.
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  Cross Platform using C and Opengl matt_new_york 7 8,375 Jan 14, 2012 11:08 PM
Last Post: GolfHacker