Introduction and some performance questions

Apprentice
Posts: 5
Joined: 2010.07
Post: #1
Hi everyone!

Recently I decided to get into (iOS-based) game development. Before that I knew Java, some C and C++, OpenGL and some computer graphics / gamedev basics.

So I bought an iPod Touch (2G), learned Objective-C, and started to develop a 3D game engine. Took me a lot of work, but I think I'm almost at a stage where I can start programming the actual game instead of just working on the engine.

The only tangible result I have right now is a test scene with two simple objects moving around, two buttons and a virtual joystick (no lighting, few textures). The performance is pretty bad, it was around 40 fps before I had finished the controls, after that I had to limit it to 30 fps so that it doesn't take an input several seconds to reach the game logic. Also I added some trick I found in this forum to wait some ms for system events.

I'm a little freaked out by that, because I'm still just drawing really simple stuff. And I'm already following most of the advice from the performance guide, I remove logging in release mode and let the compiler optimize for speed.

I hope you guys can help me find out if I did something really wrong!

Here's my engine in a nutshell:
  • I'm using Objective-C for everything. Only the (vector-) math stuff is done in pure C.
  • I only use floats, no doubles or fixedpoints.
  • Almost no matrices; Transforms consist of a quaternion and a translation vector. No model/view scaling.
  • There's an abstraction layer between the graphics subssystem and OpenGL ES 1.1 - It's for reducing redundant gl calls as well as making sure I can implement a ES 2.0 version if I want to, without doing anything to the ES 1.1 version.
  • No sound or physics yet.
  • Grossly over-engineered resource system with smart pointers. Loading geometry is pretty fast when I use my own binary format.
  • Input system that tries to unify all kinds of inputs, also a little over-engineered.
  • Data structures for 3d scenes are designed to allow octrees, frustum culling and whatnot, but the current implementation is pretty "greedy" (doesn't make a difference yet since everything that's there is also visible all the time).
  • There are render queues for opaque z-buffered objects, alpha-tested objects and alpha-blended objects, rendered in this order (as suggested by the performance guide). The queues are sorted by states/textures or by z coordinate. State sorting can be improved, but that wouldn't make a difference yet.
  • The game logic goes into "GameState" and "Entity" objects... not sure if those are very common concepts, but I think the names speak for themselves. Entities can be connected to each other using signals/slots (wasn't exactly easy to do that using obj-c), inactive ones are pooled, groups of entities can be serialized/deserialized and so on.

For my first game, I want to do a simple rail-shooter (think Starfox). I'd like it to be playable on a device using the PowerVR MBX (like my iPod) and run very smoothly on faster ones.

So, what can I do to achieve better framerates? Where/how should I look for bottlenecks? Do you think I'm doing something fundamentally wrong, based on the list? Is there some software that could help me?

(by the way, English is not my native language, so please excuse grammatical errors and weird phrasings)
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #2
Not answering your question, since I don't think there's enough information here to give an answer. What does Instruments think is slow? Are you using texture atlases? How many draw calls per frame? How large are the textures?

(Jul 12, 2010 11:18 AM)Netcob Wrote:  Entities can be connected to each other using signals/slots (wasn't exactly easy to do that using obj-c)

I don't see how this can be hard. The target/action paradigm from AppKit, and Cocoa Bindings, are exactly this. You shouldn't have to do anything at all to support it...
Quote this message in a reply
Sage
Posts: 1,482
Joined: 2002.09
Post: #3
When it comes to performance, Shark and Instruments are you best friends. Shark is Apple's profiler, it will give you performance information down to the function and even line. Instruments will sample you program in just about every other way.

In my experience, both tools have a lot more bugs and issues when working with the iPhone. So they may also be your worst enemies.

Scott Lembcke - Howling Moon Software
Author of Chipmunk Physics - A fast and simple rigid body physics library in C.
Quote this message in a reply
Apprentice
Posts: 5
Joined: 2010.07
Post: #4
(Jul 12, 2010 11:25 AM)OneSadCookie Wrote:  Not answering your question, since I don't think there's enough information here to give an answer. What does Instruments think is slow? Are you using texture atlases? How many draw calls per frame? How large are the textures?

Instru...what? OK thanks, I'll look into that Smile

I'm using one 256x256 texture for one of the two 3d objects and one 512x512 one for the controls. 6 objects in totals, one draw call each.

I have a bad habit of re-inventing the wheel when I code, so that might explain why I implemented signals/slots before looking for a ready solution... damn.
(Jul 12, 2010 11:34 AM)Skorche Wrote:  When it comes to performance, Shark and Instruments are you best friends. Shark is Apple's profiler, it will give you performance information down to the function and even line. Instruments will sample you program in just about every other way.

In my experience, both tools have a lot more bugs and issues when working with the iPhone. So they may also be your worst enemies.

I'll give them a try!
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #5
Too many draw calls, and those textures are pretty darn huge unless the objects are absolutely filling the screen...

But, Instruments/Shark.
Quote this message in a reply
Sage
Posts: 1,482
Joined: 2002.09
Post: #6
6 is too many draw calls? We had to have done like 100 per frame in Twilight Golf, and that game burned fillrate for lunch. We still could get 40-60 fps on a 1G iPod touch depending on the level.

Basically a draw call for every light * the number of shadow casting entities. Each sprite or particle was a draw call, and each tickmark drawn on the predicted path was a draw call.

Scott Lembcke - Howling Moon Software
Author of Chipmunk Physics - A fast and simple rigid body physics library in C.
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #7
Well, Apple's official recommendation (see WWDC vid) is ~10 draw calls per frame.

That's not to say that some apps can't do better, or some apps aren't better with more, or whatever. That's just to say that 6 draw calls for next-to-no content is a lot by that standard.
Quote this message in a reply
Apprentice
Posts: 5
Joined: 2010.07
Post: #8
Are we maybe talking about different things? Smile

I was referring to glDrawElements / glDrawArrays. Not sure how to keep that strictly below 10 without hacking like crazy...

As I've said, I'm very new to Objective-C and the iPhone SDK, so I might have misunderstood the term "draw call".
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #9
We're talking about the same thing.

The Apple example in question only had one moving object, so they had an easy time. Still, you shouldn't need more than (say) 3 draw calls for your static geometry unless you desperately need mipmapping...
Quote this message in a reply
Member
Posts: 87
Joined: 2006.08
Post: #10
(Jul 12, 2010 07:05 PM)OneSadCookie Wrote:  We're talking about the same thing.

The Apple example in question only had one moving object, so they had an easy time. Still, you shouldn't need more than (say) 3 draw calls for your static geometry unless you desperately need mipmapping...

Either way, sticking to arbitrarily defined rules when you have the capacity to gather actual data is not a productive thing to do.

The draw-call suggestions are really about reducing the amount of CPU time spent in the GL driver. You can run Instruments and the CPU resources are exhausted, and also determining the portion of time spent in GL vs. elsewhere. If you can prove that you have CPU time to spare, then optimizing draw call counts isn't the best thing to be doing with your time (or, alternatively, you may prove that you really do need more aggressive batching). Specifically, you're looking for proportion of CPU time spent beneath glDrawArrays/glDrawElements, vs. spent elsewhere or idle.

Other than that, use the OpenGL ES tool in Instruments to monitor the GPU utilization percentages (Tiler Utilization % and Renderer Utilization %). This will indicate if the GPU itself is potentially a bottleneck.
Quote this message in a reply
Apprentice
Posts: 5
Joined: 2010.07
Post: #11
What about Objective-C itself, are there some common pitfalls?
Quote this message in a reply
Sage
Posts: 1,482
Joined: 2002.09
Post: #12
Well, you need to count more than just glDrawArrays/glDrawElements. The rendering is queued up and geometry is copied then, but the actual rendering is done when you flip or flush the buffer.

By having so many draw calls in Twilight Golf that generally only draw a few triangles each, we are probably just wasting CPU time on the code that sets up communication with the GPU. Though the game is fillrate bound due to the shadows, and CPU bound from the physics on a few levels. Reducing the shadow quality further let us get a pretty solid 60 fps, but it looked terrible. So while I thought about optimizing the draw calls by batching them together and transforming the sprite vertexes myself, it didn't seem worth it.

One thing to watch out for with Objective-C is the cost of calling methods. Method calls are always dynamic, and are always handled by the function objc_msgSend. Good to know when it shows up at the very top of the profile. It's always pure overhead that can be reduced by reducing the number of method calls. Though again, don't waste too much time if you are only going to gain a couple of percent from it.

Scott Lembcke - Howling Moon Software
Author of Chipmunk Physics - A fast and simple rigid body physics library in C.
Quote this message in a reply
Apprentice
Posts: 5
Joined: 2010.07
Post: #13
(Jul 13, 2010 06:47 AM)Skorche Wrote:  Reducing the shadow quality further let us get a pretty solid 60 fps, but it looked terrible. So while I thought about optimizing the draw calls by batching them together and transforming the sprite vertexes myself, it didn't seem worth it.

By the way... you did shadow mapping on the powervr mbx and got 60 fps? That sounds pretty awesome!
I haven't even dared to implement that (yet) and I thought it would make more sense to make that effect exclusive to the es 2.0 version.
Quote this message in a reply
Moderator
Posts: 452
Joined: 2008.04
Post: #14
(Jul 14, 2010 08:20 AM)Netcob Wrote:  By the way... you did shadow mapping on the powervr mbx and got 60 fps? That sounds pretty awesome!
I haven't even dared to implement that (yet) and I thought it would make more sense to make that effect exclusive to the es 2.0 version.

Yep, and SHADE, the shadow engine that Skorche developed is available for sale- just contact us from our contracting page.

Sales pitch aside, yeah, it worked pretty well. Even on the first generation devices. For the the iPhone 3G and later we add additional resolution to the shadow map, but it looks quite good in either case at >30 fps.

Howling Moon Software - CrayonBall for Mac and iPhone, Contract Game Dev Work
Quote this message in a reply
Member
Posts: 166
Joined: 2009.04
Post: #15
(Jul 14, 2010 08:20 AM)Netcob Wrote:  
(Jul 13, 2010 06:47 AM)Skorche Wrote:  Reducing the shadow quality further let us get a pretty solid 60 fps, but it looked terrible. So while I thought about optimizing the draw calls by batching them together and transforming the sprite vertexes myself, it didn't seem worth it.

By the way... you did shadow mapping on the powervr mbx and got 60 fps? That sounds pretty awesome!
I haven't even dared to implement that (yet) and I thought it would make more sense to make that effect exclusive to the es 2.0 version.

At least looking at the video of the game, that's entirely different than what people generally refer to as shadowmapping in 3d.
I think it is geometry based (calculated on the CPU) and is using pregenerated I8 images (gradients) for generating shadows.

It looks very cool though.
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  Texture Performance Questions OptimisticMonkey 4 2,730 Mar 18, 2009 08:06 PM
Last Post: Frogblast