glDrawElements vs. glDrawArrays - The numbers are in!

Member
Posts: 145
Joined: 2002.06
Post: #1
As promised in a thread probably 2 people read, I wrote a little tester app to evaluate the comparative speed of glDrawElements (wrapped in array locks) and glDrawArrays. It tests each with three array setups: packed discrete arrays, aligned (to double-word boundires) discrete arrays, and interleaved arrays. it does also does a depth buffer clear and swap about 30 times a second (a bit less in reality). Some results:

iMac (G3/333mhz) "five flavors", Rage Pro OpenGL Engine (os9)
winner: glDrawElements (around 112kpolys/sec, aligned discrete arrays seemed to have an almost unmeasureable advantage)

iMac (G3/400mhz), Rage 128 OpenGL Engine (osX)
winner (by a nose): glDrawElements (no clear best array format) (around 125kpolys/sec)

Power Macintosh (G3/350mhz) "blue & white", Rage 128 OpenGL Engine (os9)
no clear winner. (around 150kpolys/sec)

Power Macintosh (g4/400) not sure what model, Rage 128 OpenGL Engine (os9)
no clear winner. (around 160kpolys/sec, with iTunes running)

Power Macintosh (g4/533x2) "digital audio", NVIDIA GeForce2 MX OpenGL Engine (osX)
winner: glDrawElements (no clear best array format) (around 1330kpolys/sec!!!)

This test was not designed to judge FILL RATE, and so real-world poly counts will be SIGNIFICANTLY lower.

Here is the app, data, and source code. I had to hack the code out of my current project, if I missed anything just let me know.

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
Member
Posts: 145
Joined: 2002.06
Post: #2
just did a few more tests with lighting disabled and a color array.

You get much higher poly counts with lighting off. My Geforce2MX crossed the 5Mpoly/sec mark! The Rage Pro iMac I mentioned got almost 200kpolys/sec, and the blue&white got 620 kpolys/sec. The peak poly rates on all of them definately came from double-word aligned discrete arrays drawn with glDrawElements, allthough all glDrawElements paths were releatively fast. In the case of the GF2MX, glDrawElements is over 4 times faster than glDrawArrays.

translation: if you can store static lighting (or quickly generate dynamic lighting) for some part of your scene, do it.

this is all with glColorPointer(4, GL_UNSIGNED_BYTE, 0, *) or glInterleavedArrays(GL_T2F_N3F_V3F, 0, *) BTW.

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
Nimrod
Unregistered
 
Post: #3
Wow, thanks Ian, that's helped me quite a bit! Two questions:

So what is the fastest way of drawing dynamic (animated) meshes?

And have you tried using the VAR extension? (have Apple even written the drivers for it yet?)

Thanks Smile
Quote this message in a reply
Member
Posts: 145
Joined: 2002.06
Post: #4
Quote:Originally posted by Nimrod
So what is the fastest way of drawing dynamic (animated) meshes?
Code:
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
if (useGLLighting) {
    glEnableClientState(GL_NORMAL_ARRAY);
    glEnable(GL_LIGHTING); // might want to switch this around to assume lighting is on.
} else
    glEnableClientState(GL_NORMAL_ARRAY);
glEnableClientState(GL_VERTEX_ARRAY);


glTexCoordPointer(2, GL_FLOAT, 0, texCoords)
    // texCoords is array of struct{GLfloat u, v;};
if (useGLLighting)
   glNormalPointer(GL_FLOAT, 0, normals);
        // normals is array of struct{GLfloat x,y,z,ignore;};
else
   glColorPointer(4, GL_UNSIGNED_BYTE, 0, colors);
        // colors is array of struct{GLubyte r,g,b,a;};
glVertexPointer(3, GL_FLOAT, 0, vectors);
        // vectors is array of struct{GLfloat x,y,z,ignore;};


glLockArraysEXT(0, numberOfVertices);
glDrawElements(GL_TRIANGLES, group->numFaces*3, GL_UNSIGNED_SHORT, indices);
// GL_UNSIGNED_INT [b]may be faster - untested![/b]
glUnlockArraysEXT();

glDisableClientState(GL_TEXTURE_COORD_ARRAY);
if (useGLLighting) {
    glDisableClientState(GL_NORMAL_ARRAY);
    glEnable(GL_LIGHTING); // might want to switch this around to leave lighting off.
} else
    glDisableClientState(GL_NORMAL_ARRAY);
glDisableClientState(GL_VERTEX_ARRAY);
Calculating the normal vectors on the fly is beond the scope of this post.

your indicies should be arranged so that you're drawing triangle-strips (generalized triangle strips may be good enough, not sure) to help older cards which can't cache your entire vertex set. I found a great thesis on generating optimal tristrips (1MB, 85pg.) that went completley over my head, so I used the actc library I found instead (sortaBSD license - included in the archive).

That make sense?

Quote:And have you tried using the VAR extension? (have Apple even written the drivers for it yet?)
Which one?

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
Nimrod
Unregistered
 
Post: #5
Thanks!

VAR (Vertex Array Range) is a NVIDIA specific extension, available on GeForce cards and upwards. I hear it brings speed gains that make it worth using, even though it won't work on ATI stuff.

http://developer.nvidia.com/view.asp?IO=...L_NV_fence
http://developer.nvidia.com/view.asp?IO=vardemo

These 2 links should be helpful, there might be other stuff if you poke around NVIDIA's site.

Although this thread would suggest there aren't mac drivers for it yet (and you might want to read this, because I notice you also use GLUT). IIRC Jaguar will bring driver support for such features.
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #6
It's common knowledge (mac-opengl mailing list at any rate Smile) that 10.2 brings GL_APPLE_vertex_array_range, GL_APPLE_fence and GL_APPLE_vertex_array_object, allowing AGP & VRAM storage of vertices.

It looks as if the extensions work slightly differently from their proprietary counterparts, but should provide good speedups.

Display lists are also improved by the new additions, and probably become the fast path for OSX, rather than the slow one as they currently are Smile
Quote this message in a reply
Feanor
Unregistered
 
Post: #7
Do you think using Compiled Vertex Arrays (on cards that support it) would make a notable difference? Many devs on mac-opengl have talked about them. The main consideration is that the arrays are limited to 2048 vertices apiece. Not a subject I've had much like finding info on.

-- FÎanor
Quote this message in a reply
Member
Posts: 145
Joined: 2002.06
Post: #8
OK, so I did more testing with the colorized (rather than lit) geometry, including comparing optimized (run through ACTC) to unoptimized geometry. Here are the results:

G4/533x2, OSX, GeForce2MX:
optimized: 5550 kpolys/sec
unoptimized: 4580 kpolys/sec

G3/350, OS9, Rage128:
optimized: 560 kpolys/sec
unoptimized: 360 kpolys/sec

And now the reason for the subject line:

G3/333, OS9, RagePro:
optimized: 188 kpolys/sec
unoptimized: 195 kpolys/sec
This is reliable, not a freak occurance. These numbers were repeated through several repetitions of the loop.

[edit] Check the end of this thread for a theory on why the unoptimized method is faster[/edit]

App and data.

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
Nimrod
Unregistered
 
Post: #9
I'm very happy this thread exists, 'cos I'd been meaning to ask about this. I've just been playing about with rendering a mesh with 1682 polies, 901 vertices, normals for every vertex (which I calculated when I exported the 3ds file to my own format IIRC), and a 256*256 texture, and OpenGL lighting with one light (this is on a non TnL card).

This is all done on a G4 350MHz with Rage128, OS X 10.1.5.

Up till now this was done in immediate mode, where I was getting on average 70 FPS. Now I'm using glDrawElements() and getting almost 120 FPS. I think this comes out at about 175 - 180 thousand polies / sec. This is faster than on Ian's G4 400 for some reason (OSX vs OS9?). If the thread at macscene.org (linked to above) is to be believed, perhaps it's because I'm using Carbon and not GLUT.

Cool speed boost though!

This is without triangle strips, which I intend to look into next. Part of the speedup is probably also down to me previously using wrappers to get access to the vertices, indices etc... which were non-inlined, I don't know how much of a slowdown they would be.

Ian, do you know of the STRIPE algorithm for creating tri-strips? It works best on quads by triangulating them, but it decides on the optimal way to put the diagonal. Thanks for the link to ACTC.

EDIT:

I also meant to say that I had been put off trying glDrawElements, because I use STL vectors for storing all my data. I assume it works because the internal format of the data in a vector is no different from a plain array. But can I guarantee that this will always be the case?
Quote this message in a reply
Member
Posts: 145
Joined: 2002.06
Post: #10
Quote:Originally posted by Nimrod
Ian, do you know of the STRIPE algorithm for creating tri-strips? It works best on quads by triangulating them, but it decides on the optimal way to put the diagonal. Thanks for the link to ACTC.

I found them, but ACTC works good enough, and is free. As I just stated, it appars that on the RagePro, which is the minimum config we plan to support, RANDOMIZING the polygons may produce the best results.

edit: BTW, I would be interested on what numbers these apps get on a Radeon rig, I have yet to find one I can test on.

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
Member
Posts: 145
Joined: 2002.06
Post: #11
OK, I re-ran the tests after ramdomly shuffling the triangles in the input. As expected, the Rage128 and GeForce2 got lower poly rates for the unoptimized geometry. Supprizingly, the Rage Pro got higher poly rates with the unoptimized geometry. At the peak it hit 207kpolys/sec - with tri strip optimized geometry its max was 189kpolys/sec.

tip: Check if you're running on a Rage Pro (glGetString(GL_RENDERER)=="Rage Pro OpenGL Engine"). If you are, shuffle the order of the polygons in your models around a bit.

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #12
That's really strange that randomizing the triangles order would be consistently better.. maybe you've got something else going on (if it were Radeon/GF3, I'd say Z-buffer , but that doesn't seem to make sense for Rage Pro).

-----

CVA is supported on all hardware under OSX.

It doesn't provide significant benefits unless you can interleave your CPU work with your glDrawElements calls sufficiently. Basically, it seems like VAR/Fence is at least as good in the worst case, and significantly better in the best case.
Quote this message in a reply
henryj
Unregistered
 
Post: #13
Nimrod:
Quote:I also meant to say that I had been put off trying glDrawElements, because I use STL vectors for storing all my data. I assume it works because the internal format of the data in a vector is no different from a plain array. But can I guarantee that this will always be the case?

STL vectors guarantee that they can be passed to functions that expect c style arrays so you will be fine. On the other hand you should take care using them because they can allocate and de-allocate at in-opportune times and when they do it's not cheap. You can avoid this if you take care.
Quote this message in a reply
henryj
Unregistered
 
Post: #14
A couple of things about compiled vertex arrays...

On OSX you are limited to 2048 indices. Any more than this and openGL reverts to the non compiled path.

CVA is only going to benefit you if you are touching the same geometry multiple times. eg doing multi passes for lightmapping or you have some objects that some how share exactly the same vertices. You should be calling glDrawElements lots between your lock calls. This...

Code:
glLockArraysEXT( 0, size);
    
glDrawElements( GL_TRIANGLES, indices->Size(), GL_UNSIGNED_INT, indices->Data());
    
glUnlockArraysEXT();

is a waste of time.
Quote this message in a reply
Jeff Binder
Unregistered
 
Post: #15
It may also be worth a shot if you're using indexed vertex arrays (i.e. glDrawElements()), if you're using any vertices more than once.
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  OpenGL glDrawArrays not working dotbianry 12 11,520 Dec 21, 2012 09:21 AM
Last Post: Skorche
  Drawing using glDrawArrays agreendev 9 17,189 Jul 17, 2010 05:20 AM
Last Post: Bersaelor
  glDrawElements and Face indices Ashford 8 12,582 Nov 11, 2009 03:03 PM
Last Post: Ashford
  glColor4f not working after glDrawArrays Technoman 2 6,418 Aug 15, 2009 08:09 AM
Last Post: Technoman
  Agh! glDrawElements kills my artwork ferum 2 3,552 Nov 23, 2006 09:05 AM
Last Post: ferum