textures vs lines and polygons

Luminary
Posts: 5,143
Joined: 2002.04
Post: #16
henryj, I suggested vertex arrays right back at the beginning.

If the CGLMacro stuff had made it fast enough, though, it's a much easier solution.

There is virtually nothing card-specific in Q3A. That's Doom III you're thinking of.

Quake III is so fast because it only draws 5-10K polys/frame, doesn't do very much blending, and uses glDrawElements and CVA for all the rendering.
Quote this message in a reply
henryj
Unregistered
 
Post: #17
Quote:henryj, I suggested vertex arrays right back at the beginning.


Sorry, missed that.

FYI Quoting Mr Carmack...
Quote:Optimizing OpenGL drivers for Quake3

This is intended mostly for people working on 3D drivers for Linux, and is basically the same information we have provided to the Windows and Mac driver coders.

First off, if your driver is communicating over any standard communication pipe (like X), you are pretty much SOL. The data traffic is so high that a good framerate is going to be almost impossible to achieve. A direct rendering model is needed to get reasonable performance.

Next, if your driver is directly writing to a small command FIFO on the chip, you will be limited to about 2/3 or less of the framerate you could get with a fully decoupled DMA buffer approach. It is possible to get a playable game with a directly writing driver, but it wonít be running with the best of them.

If the hardware is capable of it, supporting ARB_multitexture gives a significant performance boost.

Quake3ís rendering architecture has been defined with the primary goal of minimizing API calls and focusing as much work as possible in a single place to make optimization more productive.

During gameplay, 99.9% of all primitives go through a single API point:

glDrawElements( GL_TRIANGLES, numIndexes, GL_UNSIGNED_INT, indexes );

GL_VERTEX_ARRAY is always enabled, and each vertex will bet four floats. The fourth float is just for padding purposes so that each vertex will exactly fill an aligned 16 byte block suitable for SIMD optimizations.

GL_TEXTURE_COORD_ARRAY is always enabled for the base texture unit, and points at pairs of floats.

If ARB_multitexture is available, GL_TEXTURE_COORD_ARRAY may or may not be enabled for the second texture unit.

GL_COLOR_ARRAY is always enabled and pointing at four unsigned chars in the current release, but we can expose a path where the color is constant for all vertexes and the color array is disabled. We removed this at the last minute because of some driver problems, but we may be putting it back in later. The push for this option is that a multitexture vertex is an odd 36 bytes if color is included, but a very comfortable 32 bytes if a constant color was set ahead of time. On some older cards that require manual setup, knowing that the color gradients donít need to be calculated is also a speed win.

If EXT_compiled_vertex_array is not present, we set up the same vertex arrays, but we do strip finding ourselves and issue glBegin() / glArrayElement() / Ö / glEnd(). This is faster than the discrete triangle path for most drivers that donít have compiled vertex arrays (because they donít retransform every vertex), but results in a lot more API overhead and limits batch processing. You can change between this behavior and the single draw elements call with the variable "r_drawstrips 0/1". The optimal path is to have compiled vertex arrays and take it as one big glDrawElements call.

So, for a single texturing card with the current (1.03) Quake3 release, there is one single set of conditions to optimize: completely full featured (vertex, color, texture coord) discrete triangles going through the DrawElements path.

A multitexture driver will also see the case where two texture units are active, which requires a different code path.

Note that the 2D overlay graphics, including the console text, currently go through standard glBegin / glTexCoord / glVertex / glEnd paths, but they donít amount to many triangles during gameplay. If you are profiling the startup or connection process, this might confuse your data.

While the array primitives are discrete triangles, they are arranged so that the triangles actually neighbor each other in tristrip order when possible. You can just send them all to the card as completely separate triangles, but to optimize the bus bandwidth utilization you can compare the indexes of the current triangle with the previous triangle to see if there are shared vertexes. Exactly how this needs to be done is hardware dependent. The easiest case is hardware that just has three or more vertex registers, where you can change any given one of them and the others stay the same. Hardware that requires separate begin_tri_strip type commands will require a bit more work to take advantage of. This type of optimization work will only matter after all the other stuff is done.

Ideally, drivers should supporting EXT_compiled_vertex_arrays, which allows us to explicitly tell you that we arenít going to change the vertex values after we have specified them, so you can batch process the entire load. There are two levels of benefit from this: shared vertexes in a single DrawElements call and shared vertexes across multiple rendering passes on the same geometry. Some drivers get the first benefit even without the compiled vertex arrays by scanning the indexes before processing the triangles, but to save the work across multiple rendering passes the extension is necessary.

If a given set of triangles is only going to need a single pass of rendering, we will set up all the vertex arrays before issuing the lock arrays. This allows color and texcoord data to be munged if necessary, but the performance benefits are minor compared to the work saved by the vertex arrays.

glColorPointer( 4, GL_UNSIGNED_BYTE, 0, tess.svars[0].colors );

glTexCoordPointer( 2, GL_FLOAT, 0, tess.textureSt );

glVertexPointer (3, GL_FLOAT, 16, input->xyz);

glLockArraysEXT(0, input->numVertexes);

<set some rasterization state>

glDrawElements(GL_TRIANGLES, input->numIndexes, GL_UNSIGNED_INT, input->indexes);

glUnlockArraysEXT();

If the triangles are going to need to be rendered in multiple passes we only lock the vertex array, then change the color and texcoord arrays on each pass. This allows you to cache the vertex data, but not the color or texcoord data.

glVertexPointer (3, GL_FLOAT, 16, input->xyz);

glLockArraysEXT(0, input->numVertexes);

<set some rasterization state>

glColorPointer( 4, GL_UNSIGNED_BYTE, 0, tess.svars[0].colors );

glTexCoordPointer( 2, GL_FLOAT, 0, tess.textureSt );

glDrawElements(GL_TRIANGLES, input->numIndexes, GL_UNSIGNED_INT, input->indexes);

<set some rasterization state>

glColorPointer( 4, GL_UNSIGNED_BYTE, 0, tess.svars[0].colors );

glTexCoordPointer( 2, GL_FLOAT, 0, tess.textureSt );

glDrawElements(GL_TRIANGLES, input->numIndexes, GL_UNSIGNED_INT, input->indexes);

glUnlockArraysEXT();

_

The only weird thing we do with geometry is enabling a single user clip plane when looking through a portal in the game. This usually punts drivers to an unoptimized path, so we donít use it very often.

There are a couple common optimizations that I would recommend avoiding, due to artifacts they introduce.

In many cases triangle back face culling can be done more efficiently by a fast CPU than by the graphics card, especially if the card is taking discrete triangles instead of strips. The problem is that the CPU and card will have slightly different computations, and triangles that are very near edge-on may be considered culled by one and not the other. The result is a brief crack between polygons when a polygon goes edge on.

Guard band clipping is another optimization that usually leads to tiny cracks between polygons. The idea behind guard band clipping is that triangles that poke some distance off the screen are more efficiently handled by letting the hardware scissor them instead of manually clipping them. Only triangles that extend far off the screen or cross the near clip plane are actually clipped by the CPU. The problem is that when two triangles share an edge that hits the screen bounds and one of them stays within the guard band and the other doesnít, the clipped triangle will get a slightly different edge slope if it is clipped to the screen bounds while the other triangle scissors off the edge. This can be solved by clipping to the guard band edge instead of the screen edge, but on current hardware that can exact a fairly high pixel cost, blunting the benefit of the saved clipping. Plus, there are all sorts of other common bugs with drivers that try guard band clipping.

Basically what this says is if you want quake 3 to run fast on your driver do this because that's what Carmack is using. By extension, as a developer, if you want your code to go fast, do as quake does.
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  SLOW performance drawing multiple polygons per frame with stencil buffer clam61 7 3,176 Apr 27, 2013 11:53 AM
Last Post: clam61
  antialiasing polygons honkFactory 37 15,248 Apr 3, 2006 09:19 AM
Last Post: akb825
  n00b question: getting rid of lines along edges of polygons Andrew 2 2,643 Jun 5, 2005 05:57 PM
Last Post: Andrew
  Blending -&gt; Black Lines Through Textures (Wireframe like) hangt5 2 3,168 Apr 5, 2005 05:06 AM
Last Post: ThemsAllTook
  Weird lines on my display? AngelDaniel 1 2,166 Mar 12, 2005 08:55 AM
Last Post: AngelDaniel