Accelerate and working with Integer Arrays SIMD
Hey guys,
I recently saw the WWDC Lecture on the accelerate framework and how to use it to speed up vector-arithmetic on the iPhone.
Now in order to fill my Interleaved Vertex-Arrays I have to do some vector-calculations and I thought I could improve my fill rates that way.
Sadly BLAS only seems to support floating-point operations.
But for my Elements-Array that tells OpenGL which Triangles to connect, I need vector operations on integers.
Basically I would want to add an "unsigned int" componentwise to a vector of "unsigned int"'s (in matlab-like pseudo-code):
bigArray[n:n+83] = smallArray[0:83] + v * ones(84);
EDIT: For the general interleaved Vertex Array of type:
Using BLAS vector operations and some strides, I can overwrite all the float types in this struct but I'm having a hard time with the integer color component.
If I have to do a for-loop again to set the color, I would loose all the benefit of SIMD I aimed for in the first place.
I recently saw the WWDC Lecture on the accelerate framework and how to use it to speed up vector-arithmetic on the iPhone.
Now in order to fill my Interleaved Vertex-Arrays I have to do some vector-calculations and I thought I could improve my fill rates that way.
Sadly BLAS only seems to support floating-point operations.
But for my Elements-Array that tells OpenGL which Triangles to connect, I need vector operations on integers.
Basically I would want to add an "unsigned int" componentwise to a vector of "unsigned int"'s (in matlab-like pseudo-code):
bigArray[n:n+83] = smallArray[0:83] + v * ones(84);
EDIT: For the general interleaved Vertex Array of type:
Code:
typedef struct {
float x;
float y;
float z;
} Vertex3;
typedef struct _iVertex3D
{
unsigned int color;
Vertex3 v;
Vertex3 n;
float uv[2];
} iVertex3D;Using BLAS vector operations and some strides, I can overwrite all the float types in this struct but I'm having a hard time with the integer color component.
If I have to do a for-loop again to set the color, I would loose all the benefit of SIMD I aimed for in the first place.
Side note: unsigned int != GLubyte[4]. If you're going to pack colors like that, please be aware of endianness.
arekkusu Wrote:Side note: unsigned int != GLubyte[4]. If you're going to pack colors like that, please be aware of endianness.
Well I'm using "GL_UNSIGNED_BYTE" and that works quite well.
I mean the whole thing is running for quite a while now and I just thought I could save some battery power/speed it up a little by using vector operations for the filling of my interleaved array.
Anybody here not using for-loops to add vertices to an interleaved array?
I mean it might be possible to save quite a lot of fp-operations here, if done correctly. I can't be the only one wonderin about this!
Managed to get this working and posted a new blog entry with detailed speed comparison for OpenGL interleaved Array fill rates:
http://tacticadev.wordpress.com/2010/07/...p-vs-vdsp/
http://tacticadev.wordpress.com/2010/07/...p-vs-vdsp/
I'm not exactly sure if compilers optimize memcpy, but even so, it would be nice to see a comparison of the SIMD example comparing memcpy and a for-loop with:
Just that I've only ever seen memcpy as a straight byte-for-loop-and-assign.
(Please prove me wrong here)
Code:
//Assuming you're using triangles here
memcpy(dest, basicScrub, scrubVertexCount*sizeof(iVertex3D));
//versus
iVertex3D* dest = _interleavedVerts+_vertexCount;
for (int k=0; k<scrubVertexCount;k+=3) {
dest[k+0]=basicScrub[k];
dest[k+1]=basicScrub[k];
dest[k+2]=basicScrub[k];
}Just that I've only ever seen memcpy as a straight byte-for-loop-and-assign.
(Please prove me wrong here)
"iVertex3D" is the data structure for one vertex, that is a point in 3D.
So I tried the following sequential code:
Since I don't have my iPod-Cable with me, I could only test on the Simulator, that gives about 10% worse results then memcpy.
But since I stop the whole drawing process for one scrub, 10% of the whole thing seem to be quite a big chunk for memcopy alone. I'm rechecking at the iPod once I get home
Update: It's roughly the same on the iPod, also about +10% time using the loop.
So I tried the following sequential code:
Code:
iVertex3D* dest = _interleavedVerts+_vertexCount;
for (int k=0; k<scrubVertexCount;k++) {
dest[k]=basicScrub[k];
}Since I don't have my iPod-Cable with me, I could only test on the Simulator, that gives about 10% worse results then memcpy.
But since I stop the whole drawing process for one scrub, 10% of the whole thing seem to be quite a big chunk for memcopy alone. I'm rechecking at the iPod once I get home

Update: It's roughly the same on the iPod, also about +10% time using the loop.
Possibly Related Threads...
| Thread: | Author | Replies: | Views: | Last Post | |
| Weird problem passing integer variables.. | quarus | 6 | 4,127 |
Mar 15, 2009 12:47 PM Last Post: quarus |
|
| Getting a function to recognize an integer | FlamingHairball | 7 | 4,302 |
Jan 20, 2008 06:35 AM Last Post: FlamingHairball |
|
| reading integer input from user in C | anthony | 4 | 5,766 |
Nov 24, 2007 02:38 PM Last Post: unknown |
|
| Accelerate framework and an odd error | LongJumper | 3 | 3,455 |
Jul 3, 2005 05:02 PM Last Post: LongJumper |
|

