Postmortem: MotoGP '06

pipo

Veteran
http://gamasutra.com/features/20060808/motogp_01.shtml

It was in early January 2005 that we received our first Xbox360 development kit from Microsoft and were tasked with moving the MotoGP series onto the next-generation of hardware. Over the previous 5 years we’d developed three versions of the game on Xbox and PC but this was the first time the game was going to receive the radical overhaul needed when jumping a generation.

Our Core Technology Group had been writing our tools in preparation for the next-generation of consoles for up to 2 years previously, but most of the game team were coming from PS2 and Xbox projects with little idea of what to expect.

Next-gen buzzwords were everywhere: normal maps, HDTV and HDR were all new and all being touted as the next big thing.

...

All the MotoGP games by Climax have to run at 60fps. It’s a completely non-negotiable part of the project. And so when, in January 2005, we sat down and hammered out the feature set for MotoGP’06 with THQ, the requirement of 60fps was writ large.

Getting launch or near launch titles to run at 60fps is always going to be challenging. You start development on hardware (or sometimes an emulator) that bares scant resemblance to the final product – and final hardware usually only shows up very late in the cycle and even then you’re lucky if you get more than a handful of kits.

In the beginning we planned fastidiously to hit the magical 60 mark, but at that stage we had little idea of the final hardware so our only option was to make all our assets scaleable. We’re lucky that all our tools are built around modelling higher order surfaces and so at a touch of a button we change a bike from 1,000,000 polygons to 1,000 polygons. The same goes for the environment. And our exporters will scale the textures or vegetation to whatever limits we desire. Basically, we thought we had all angles covered.

Now, I’m a console programmer as are most of my colleagues. It’s been 10 years since I released a PC game. This lack of PC experience led us to overlook something that would have been obvious to a PC coder. The single biggest performance drain on MotoGP’06 wasn’t the number of vertices or textures but the number of draw calls.

In November 2005 our game was running at 12fps, with the render loop taking nearly 2 frames.

:LOL:

Sure puts the other launch games in some perspective.

It's a pretty good read. They have some nice technology over there. Their modelling software looks clever too...
 
Last edited by a moderator:
Thanks for the link! I loved the article. It's very interesting to hear someone talk about a project with such openness and detail.
 
an awesome article, especially for someone with relatively little 3D graphics programming knowledge. it never really went over my head. I'd loooove to get a closer look at that tomcat modeler. :9~
 
Even assuming a perfect data read rate it would take about 32 seconds to fill 512Mb of memory. In practise you need to factor in seek times as the game loads different files, and so MotoGP’06 takes about 40 seconds to load a level. And 40 seconds is a long time.

There are many different ways of speeding up load time from having fully stream-able worlds to keeping as much as possible data resident in memory. The old MotoGPs employed none of these techniques. They’d never needed to. They could fill the Xbox1’s memory in 12 seconds and better than that they could dump all that data to its internal hard-drive so that next time round it loaded 10x faster.

On the 360 we were aware of our shortcomings but the engineering effort required to rectify them was so huge, and the launch window so close, that they never got addressed.
40 secs!!! is it possible for xb360 users with HDs to have the game read levels from harddisk?
 
This lack of PC experience led us to overlook something that would have been obvious to a PC coder. The single biggest performance drain on MotoGP’06 wasn’t the number of vertices or textures but the number of draw calls.

In November 2005 our game was running at 12fps, with the render loop taking nearly 2 frames.

That month news came through from Microsoft that the changes to the Xbox360 SDK that would allow us to circumvent these draw commands wouldn’t be ready in time for our launch. We were in very serious trouble indeed.
A while ago, I asked this :
Since the XBox OS is a derivative of Windows NT/2000/XP, will its development follow alongside the desktop OS's and get some of the feature/performance enhancements of Vista? Specifically can we expect it to, or has it already, dropped legacy GDI and added the WGF framework and it's primary enhancements to reduce draw call overhead, add texture array support, support predicated rendering, reduce state change impact, etc. I'm not familiar with console development, and not sure to what extent the OS and its core API's might change over a console's lifespan.
To which ERP responded no, but according to Climax, it sounds like the 360 is getting some performance enhancements similar to DX10.
 
I think ERP's answer is correct. The dev kit will be enhanced anyway, and since its drivers really don't have any meaningful relationship to Vista (as far as I can tell) it's a no. I don't think it 'follows' the desktop OS at all.

The impression I got is it's more or less written from the ground up anyway. More on that here: http://www.beyond3d.com/forum/showthread.php?t=28327
 
Last edited by a moderator:
Over the previous 5 years we’d developed three versions of the game on Xbox and PC but this was the first time the game was going to receive the radical overhaul needed when jumping a generation.

I must have missed that "radical overhaul" because it sure feels like a straight port of MotoGP 3 with only slight graphics upgrades.
 
I must have missed that "radical overhaul" because it sure feels like a straight port of MotoGP 3 with only slight graphics upgrades.

They state explicitly that the team left the gameplay relatively untouched, and focused primarily on graphics. So the major overhaul was the graphical one.
 
A while ago, I asked this :

To which ERP responded no, but according to Climax, it sounds like the 360 is getting some performance enhancements similar to DX10.



The answer to your original question is no.

What Climax is refering to is the evolution of the Xbox API, even at launch a lot of features were not exposed. MS had to prioritise features and get them into developers hands in some stable fashion.

When you make a draw call it basically does evaluation of the lazy state and copies command packets into a FIFO, unfortunately "next gen" consoles are probably only about 2x as fast as say an Xbox at doing this. So if you make a LOT of draw calls they become dominant. The usual way to fix this is to batch up multiple draw calls into a single call by building the display list and just pointing at it. Support for this went in very late in the API (months after 360 shipped) probably because there are issues getting it to work with things like automatic tiling and the FX libraries.

The issue in DX9 drivers on PC's is more insidious, a DX9 driver builds a display list in a platform independant format, the kernel driver then translates that and does a second copy into the harware specific format. As far as I understand it Vista drivers remove the translation step.
 
Support for this went in very late in the API (months after 360 shipped) probably because there are issues getting it to work with things like automatic tiling and the FX libraries.

The issue in DX9 drivers on PC's is more insidious, a DX9 driver builds a display list in a platform independant format, the kernel driver then translates that and does a second copy into the hardware specific format. As far as I understand it Vista drivers remove the translation step.

I'm probably just not reading this right, but does the developer have to make use of multiple draw calls or is that something that is now done automatically?
 
I'm probably just not reading this right, but does the developer have to make use of multiple draw calls or is that something that is now done automatically?

Don't understand the question, but I'll take a shot at the answer.

Imagine you are drawing an object that contains two different materials, for each material I have to set the state and then make a DrawPrim call, so I need 2 draw calls. What a precompiled display list does is let me write the set of hardware instructions that would be copied into the push buffer into a piece of memory. I can then insert a "call" command into the pushbuffer execute that set of instructions instead of having to copy all the instructions into the pushbuffer. This greatly reduces the CPU overhead.

Anytime you change state while rendering you will need to make another draw call, and a common mistake is to not batch large enough pieces of geometry to reduce overhead to the point it's acceptable.

Let's say I was drawing a cafe, you might be tempted to try and use the same model for all of the chairs outside, since the only difference is the tranfsform matrix. However that would require lots of draw calls (one per chair), having a single model with all of the chairs in it would be much more efficient at the cost of the extra memory.

If your talking about the DX9 statement, it's all just an implementation detail, and applications know nothing about it other than DrawPrim is very expensive.
 
Let's say I was drawing a cafe, you might be tempted to try and use the same model for all of the chairs outside, since the only difference is the tranfsform matrix. However that would require lots of draw calls (one per chair), having a single model with all of the chairs in it would be much more efficient at the cost of the extra memory.

If your talking about the DX9 statement, it's all just an implementation detail, and applications know nothing about it other than DrawPrim is very expensive.
When you say it's expensive, how so? What is the exact cost? Idle GPU while the objects to render are set up? In what way are draw calls balanced, so that if a draw call is a cost of 1,000 cycles, you can determine how many you want per frame and balance your rendering to fit?
 
Don't understand the question, but I'll take a shot at the answer.

Thanks, I think I understand now. :)

Sorry, I seem to be having trouble with making clear questions these days. I was just curious about what you meant by support for "batching multiple draw calls being exposed later in the 360 API" and how that applies to the development side of things in terms of explicit implementation or not - whether it would be something that already released games could make use of without further developer action.
 
When you say it's expensive, how so? What is the exact cost? Idle GPU while the objects to render are set up? In what way are draw calls balanced, so that if a draw call is a cost of 1,000 cycles, you can determine how many you want per frame and balance your rendering to fit?

It's expensive from a CPU standpoint, it's extremly easy to become CPU bound, leaving the GPU idle.
Generally you massage the data to balance the load, I guess you could artificially limit the number of draw calls based on some metric. I really don't do a lot of PC work, so I'm not in a position to answer there.

On our Xbox 1 racing title we used to aim for <200 draw calls a frame, but Xbox 1 has somewhat different issues. We'd often see 10,000 plus draw calls a frame taking the artist generated data out of Maya, then we'd do a pass to group objects. Doing this pass on the data would raise the frame rate from single digits to 60fps, because we'd no longer be CPU limited.

I haven't honestly spent a lot of time benchmarking X360 stuff, so I can't give a reasonable batch number. My guess would be that your best bet would be to use precompiled lists organised spacially to provide the best chance of early cull in the Tiling predication. Personally I'd just measure with whatever data is being used.
 
Back
Top