Faster OpenGL in Cocoa

Krevnik
Unregistered
 
Post: #1
I have been hammering and hammering to get a better way to constantly render OpenGL frames than NSTimer in Cocoa which tends to dominate and 'ruthlessly' be suggested at every turn. Sneaky The thing is, on certain machines (*cough*Pismo*cough*) this just doesn't work. For example, NeHe Lesson 37 (Cel-shading) renders roughly 5 frames a second on a Pismo using NSTimer, but jumps up to 30-60 when the outline is disabled. Something just isn't right if there is that much of a difference between the outline and no outline being rendered. I managed to find the solution in OmniGroup's 2001 talk: Porting to MacOS X. Specifically it covered how to port games to the MacOS X platform, including OpenGL under Cocoa. It showed a method of setting up a custom event loop (which sat on top of the NSApplication event loop, which isn't always a good thing).

The slides and example code can be found at:
http://www.omnigroup.com/developer/gamedevelopment/

I found this rather intriguing, as someone else mentioned that they use a custom run method in NSApplication for their OpenGL Cocoa app. To be honest, there is just too much that needs to be done in a smooth game to allow Cocoa's NSTimer (which can be pretty inaccurate, and just LAGS a system if you set the interval too low to the point where it gets called an absurd amount of times per frame on older machines) to do the task of scheduling. Oni pushes more polygons than a lot of other shareware/freeware games which seriously lag my Pismo (Amiju Super Golf, Neverball) because of the use of NSTimer instead of a custom event loop. While many are saying 'why bother?', I feel this is important as even if you aim higher than I do on system requirements, the number of CPU cycles freed up and made accessible for use (at the expense of CPU usage and battery life) means more polygons, better texturing, and better quality on the same machines being developed for already.

A simple example of a custom run method using psudeo-code:

Code:
- (void)run {
  while( running ) {
   // Call whatever updates physics/input/etc
   updatePhysics();

   // Update the world
   updateWorld();

   // Render a frame
   renderWorld();

   // Bring in and handle an event, no waiting
   event = [self nextEventWithMask: <...> untilDate:distantPast dequeue:YES];
   // If the event wasn't handled personally, pass it off so the proper delegate gets it
   if( handleEvent(event) != HANDLED )
     [self sendEvent:event];
  
   // Do a single run of the RunLoop, no waiting
   [[NSRunLoop currentRunLoop] runUntilDate:distantPast];
  }
}

Now, obviously this can be made a little faster, and reorganized a little, but it basically does what NSApplication already does, plus a little more (makes sure a frame gets rendered, etc). What it is missing is code to pass around how much time has elapsed per loop, which is pretty self-explanitory. Just sub-class NSApplication, write up the custom code that fleshes this out (- (void)finishLaunching is a candidate for OpenGL context initialization and setting up the scene), and modify the NIB file in IB to use the custom sub-class.

To give an idea, while I haven't fleshed this method out completely yet (I just have an event loop running on top of NSApplication right now), NeHe Lesson 37 started rendering faster with full outlines and cel-shading on with this method, than with NSTimer and outlines turned off. That is a SERIOUS improvement.

Comments/etc are welcome.
Quote this message in a reply
Moderator
Posts: 608
Joined: 2002.04
Post: #2
Would you be willing to post the project so I could check it out?
Quote this message in a reply
Krevnik
Unregistered
 
Post: #3
Once I have the event loop moved into an NSApplication sub-class and cleaned up (man, does it need cleaning up from the delegate's event loop method I am using!), I can post a modified version of NeHe Lesson 37. It probably won't be squeaky clean or optimal, but it will prove the point. I will post it once I am happy with the final modified NSApplication design and implement it (shouldn't be later than Sunday or so).

I actually developed this while working on my own 3D engine in Objective-C(++) on a Pismo, where speed/timing is pretty vital. The goal is to actually create a uDG entry based off this engine, but this little tidbit was too important to keep secret. Smile
Quote this message in a reply
Moderator
Posts: 608
Joined: 2002.04
Post: #4
Cool. Thanks for sharing this with everyone!
Quote this message in a reply
nabobnick
Unregistered
 
Post: #5
I've been doing this for a while and posted the source code also, guess I didn't explain myself as well as you did to convince people. Everywhere you turn in PC games they use an event loop, why can't we? Grin

If you look at Apple's GLUT source code they do the same thing.

NutzyGLEngine
Quote this message in a reply
neverever
Unregistered
 
Post: #6
Jeff Binder used that method in his projects that he released. He also makes the context draw directly to the screen, which "Allows the full potential for your graphics card to kick in, plus eliminates the overhead of the window server."

http://home.fuse.net/obvious/developer.html

its called cocoaopenglfullscreen or something.
Quote this message in a reply
Krevnik
Unregistered
 
Post: #7
Ugh, I felt like not doing any actual building work on this tonight, but I couldn't resist. I have a fully working NSApplication sub-class which pulls off rendering of the scene through a delegate so far, and NSTimers for animation still work like expected, but now without the large impact on rendering speed. I modified NeHe's Lesson 37 (using the available Cocoa code) to use the custom NSApplication class, and have attached it. A quick description on how it works is included, and the sad thing is that this isn't even optimal yet, a kludge thrown together in about 30 minutes. An RTF file in the zip includes information on exactly what was changed to switch over (it is pretty painless to switch to, interestingly).

You can download my example at:
http://home.comcast.net/files/CustomGL.zip

Oh, and yeah... nabonick, you were the guy whom I saw before, but I never saw the link in the forums during my searching. Thanks for that link as well. I will probably learn how to hone my version a little from yours. Your post inspired me to hunt down this route and make sure I did it, which lead me to the OmniGroup site. And you are right, why can't we use an event loop? Timers can cause a whole MESS of problems and waste of cycles (as in, cycles being idle when you could be using them) that are just not needed for rendering loops. A good VBL sync and your render loop is good to go. Smile
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #8
I agree NSTimer is bogus (inaccurate, sometimes firing after the desired time) but you don't necessarily have to have a custom event loop. Especially if you want to easily switch between fullscreen & windowed modes.

I've found the best thing to do is simply render as fast as possible and let VBL sync frame limit you to the display refresh rate. Of course, this requires your physics & animation to be time-based...
Quote this message in a reply
Krevnik
Unregistered
 
Post: #9
Yeah, that is another method, although it uses the App's delegate. The problem with this is that it thickens the call stack needlessly, and adds a layer of complexity that has to be managed in a place where it really shouldn't.

Both nabonick and I have similar solutions (imagine that), I take a different tack and limit 1 event handled per frame, rather than emptying the queue every frame. At the very least, there should be a limit, as too many events can hurt consistency in the frame refresh and it is best to spread those spikes out over a few frames, IMO. Also, it appears the posted source leaks as it is terminating from an autoreleased object but I am not sure how it can be fixed, wish I could help with that, nabonick.
Quote this message in a reply
Krevnik
Unregistered
 
Post: #10
arekkusu Wrote:I agree NSTimer is bogus (inaccurate, sometimes firing after the desired time) but you don't necessarily have to have a custom event loop. Especially if you want to easily switch between fullscreen & windowed modes.

I've found the best thing to do is simply render as fast as possible and let VBL sync frame limit you to the display refresh rate. Of course, this requires your physics & animation to be time-based...
The only problem with this is that on older machines, this tends to saturate the run loop with triggers that overlap each other in time and wind up making the rendering speed worse, even though I was only drawing 30 polygons. The reason why I looked elsewhere is because the timer method is just plain bad. You either get inaccurate timers, or you saturate and get a huge overhead spike of some sort on certain machines. Rendering as fast as possible and letting VBL do it's job (through an NSTimer) only works when your framerates are above refresh already. On older hardware like mine, it tends to hurt performance more than it helps it seems.

A custom event loop is not really anything to fear, and the benefits are pretty good. This was the way things were done on the Mac before Cocoa, and are done on the PC because you remove overhead to be able to push more polygons, better textures, or better physics. If I am pulling a full doubling/tripling of performance on cel-shading from doing this, it is a better solution when you approach the limits of the hardware.
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #11
There's no NSTimer. Just VBL sync. If you're not rendering at >= the display refresh rate, your thread won't block on buffer swap, and you waste 0 cycles. I went through this on a rev A Wallstreet, btw. No GL acceleration under X at all.
Quote this message in a reply
nabobnick
Unregistered
 
Post: #12
That code leaks due to a bug in NSApplication stop that I mentioned here also. I tried various work arounds but I couldn't find anything that works. bug.tar.gz is the quickest example I could come up with to prove that it's Apples fault Rasp

Just a note I went the route of reading all events possible in a frame because of all the PC specific sites mentioning games loops that I looked at many of them used that technique and although I've forgotten why so I couldn't explain it now they convinced me that it was a good thing to do. Also if I remember right Apple's GLUT code uses this way of doing it as well (but don't quote me on that). Wink
Quote this message in a reply
Krevnik
Unregistered
 
Post: #13
arekkusu Wrote:There's no NSTimer. Just VBL sync. If you're not rendering at >= the display refresh rate, your thread won't block on buffer swap, and you waste 0 cycles. I went through this on a rev A Wallstreet, btw. No GL acceleration under X at all.
Then how do you render as fast as possible and let VBL sync do it's thing without a timer or custom event loop involved? If there is a way to insert the scene rendering into the main loop I would love to know, as it would be less messy.
Quote this message in a reply
Moderator
Posts: 608
Joined: 2002.04
Post: #14
Krevnik: your link is a 404. Sad
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #15
Multithread your app; stick the rendering in a thread that goes as fast as possible and use locks to communicate event changes from the main thread's event loop. You're going to have to multithread to get best performance on dual-CPU machines anyway.

But, I should also say that I don't see any of the event saturation you're talking about. In Shoot Things, I used a 1000Hz NSTimer if the current display has a 60Hz mode, and VBL sync to frame limit. The timer won't fire until the run loop is idle again, so on slow machines the behavior you get is:
1) timer fires at time A
2) render until A+d1 (where d1 > 1/60th sec depending how slow the machine is)
3) if we've run out of buffers (conceptually double, but could be more depending on the hardware/driver) then block thread waiting for VBL at A+d1+mod(1/60). In a single-threaded app, this is dead time.
4) timer fires at A+d1+mod(1/60)+d2 (where d2 is ~1/1000 sec)
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  Opengl/Cocoa text rendering tesil 15 17,379 Mar 20, 2012 11:16 AM
Last Post: OneSadCookie
  OpenGL Text Rendering (in Cocoa) daveh84 5 7,886 Feb 19, 2009 12:44 PM
Last Post: TomorrowPlusX
  OpenGL &amp; Cocoa - Improving frame rate daveh84 4 5,545 Feb 2, 2009 06:53 AM
Last Post: backslash
  bad depth sorting in Cocoa OpenGL aldermoore 2 4,637 Dec 30, 2008 03:07 PM
Last Post: ThemsAllTook
  Loading and using textures with alpha in OpenGL with Cocoa corporatenewt 4 6,381 Dec 8, 2007 02:06 PM
Last Post: Malarkey