Most games are 80% general purpose code and 20% FP...

Discussion in 'Console Technology' started by Oda, Jun 29, 2005.

  1. Frank

    Frank Certified not a majority
    Veteran

    Joined:
    Sep 21, 2003
    Messages:
    3,187
    Likes Received:
    59
    Location:
    Sittard, the Netherlands
    If you look at any application, you see that it consists of lots and lots and lots of logic and error checking and correcting, to determine what actions have to be performed. Like a tree, that branches out to some relatively small functions (and collects the required parameters on the way). And then those functions are actually going to do the work that has to be done.

    In current games, it is the same. First you collect the states and inputs, fall very fast through a branch of the logic tree, start crunching away, and repeat until done.

    So, while the layout of the program resembles lots and lots of simple, general purpose (mostly integer) logic, the actual work is done in only a small piece of the code. And that is mostly floating point streams and data structures, for current 3D games.

    Like, when you want to optimize your program, you don't care at all about the efficiency of all that logic. Quite the opposite, actually: slower but better readable logic is much better to understand and maintain. But you spend your time trimming a few clocks of each iteration of those few core functions instead.
     
  2. ERP

    ERP
    Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA

    While for the most part that used to be true.

    My recent experience, tells me that it's getting less so, the problem is that even as little as 5 years ago FP performance was expensive and dominant. However as FP performance has increased memory latencies have increased even more. In most application code today walking a data structure is MUCH more expensive that working on it.

    Now Cell does have an advantage here, it encourages splitting data into chunks and working on them in local memory, and you could set up X360 with the same paradigm. However this relies on the fact that restructuring your data doesn't add a lot of extra work to the process.

    An example a simple search algorithm.

    I can write a red/black tree and search in logN time, or I can trivially search every element serially in an array in order N time. The latter algorythm is trivially done in stream processing but at the cost of significantly more memory visits. However since the memory accesses are serial, you can actually get much better bandwidth to the memory when doing the search. My guess is on both Cell and X360 the second algorythm would actually be faster for even relatively large (100's and possibly 1000's of entry's) datasets. At some point though the Tree wins because it does less work.

    The problem is that for a lot of programmers the above is counter intuitive and they will probably never profile both approaches. Running over complex datastructures with the X360's processor is probably going to be faster.

    Less and less of a game is really "optimised", it simple can't be I've worked on a couple of projects recently with >500K lines of code, no one on the project will see all of it.

    It's trivial to split any game engine into 2 threads (I've done it over a weekend), it's relatively trivial to make graphics a parallel problem. Going beyond that is a hard problem.
     
  3. Frank

    Frank Certified not a majority
    Veteran

    Joined:
    Sep 21, 2003
    Messages:
    3,187
    Likes Received:
    59
    Location:
    Sittard, the Netherlands
    True. But the problem and the solution are the same in both cases: reduce random memory access as much as possible and use your data stream to steer the logic, instead of doing it the other way around. Which, I agree, is counter intuitive to most programmers.

    But it has to be done for multi-core processors and multi-threaded / stream centric applications anyway. Random accesses to all data structures at random times is exactly why multi-threaded applications are so hard to get right. You cannot do that, as it will trash your game logic for all threads that depend on it.
     
  4. Fox5

    Veteran

    Joined:
    Mar 22, 2002
    Messages:
    3,674
    Likes Received:
    5
    Probably part of the reason they went Intel as well, they wanted someone who could supply enough chips and was willing to aggressively price their chips.
    Nvidia also had more high end chips under their belt than powervr and was actively working on a 6 month product cycle, powervr's next part may have been a while off.

    I don't even remember hearing powervr being considered, though gigapixel was...
     
  5. ERP

    ERP
    Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    Or from a hardware standpoint, reduce cache latencies, and put in some mechanism to hide them.

    Both MS and Sony have chosen to remove the mechanisms that hide these latencies in favor of better peak FP performance at a resaonable cost.

    This may or may not be the right call.
     
  6. Frank

    Frank Certified not a majority
    Veteran

    Joined:
    Sep 21, 2003
    Messages:
    3,187
    Likes Received:
    59
    Location:
    Sittard, the Netherlands
    But hiding the latencies does nothing for read/write locks, synchronizing and serialization of your threads and streams. So I think what they did is the best way to go, as hiding the latencies doesn't help you use the other processor cores.
     
  7. ERP

    ERP
    Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    Your opinion, Mine is that Sony is just continuing down the design path they started with the EE and MS feels it has to follow to be competitive.

    I don't think either party is basing it's designs on what's best for the software.
     
  8. Frank

    Frank Certified not a majority
    Veteran

    Joined:
    Sep 21, 2003
    Messages:
    3,187
    Likes Received:
    59
    Location:
    Sittard, the Netherlands
    So, what are the specs of the processor that you feel would be best?
     
  9. seismologist

    Regular

    Joined:
    Apr 25, 2005
    Messages:
    659
    Likes Received:
    8
    Isn't the idea behind Cell that you wont have to worry about data locking?
    Only the PPE would be used for memory management and running general purpose code.

    Then you fire off computations to be run in parallel on the SPE array. The local memory for each SPE is only to be used as sort of a scratch pad.
    All of that FP power would allow you could do some pretty advanced physics calculations.
     
  10. Frank

    Frank Certified not a majority
    Veteran

    Joined:
    Sep 21, 2003
    Messages:
    3,187
    Likes Received:
    59
    Location:
    Sittard, the Netherlands
    Exactly.
     
  11. seismologist

    Regular

    Joined:
    Apr 25, 2005
    Messages:
    659
    Likes Received:
    8
    That makes sense then. So in your game code example you wuld traverse the list, grabbing all of the data then burst it out to the SPE array? Sounds like it should work well.

    Xenon is a different story though. I'm still having a hard time seeing the benefit there.
     
  12. jboldiga

    Newcomer

    Joined:
    Jun 11, 2005
    Messages:
    5
    Likes Received:
    0
    Apple was pissed with IBM because they wanted faster clock speeds and IBM said they couldnt...then turned around and released 3.2 ghz ppc to sony and m$. IBM simply didnt make enough money from Apple to be concerned with them. It is also true that IBM could not fab enough.

    As for the general purpose code thing from M$....heh its true yes that general purpose code is important for games the problem is M$ isnt any better at it then sony...
     
  13. DeanoC

    DeanoC Trust me, I'm a renderer person!
    Veteran Subscriber

    Joined:
    Feb 6, 2003
    Messages:
    1,469
    Likes Received:
    185
    Location:
    Viking lands
    This is one (misguided IMO) view of the SPEs. Everybody looks at the FLOPS too much. Just for a second ignore the float units, and relook at Cell. Cell has 8 processors, each one about 10 times faster than the PS2 main core. Thats alot of power!
    Of course its nice that there is also 9 float SIMD units at 3.2Ghz, for when you want to burn some FLOPs but your not forced to use them.

    The issues of actually getting good performance out of the architecture is a seperate issue.
     
  14. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    It'll be nice when the first real snippets of code and algorithms for Cell appear, so people can get a handle on what it can and can't do. This idea that they're glorified FPU's is pervasive, but what they can and can't do is relatively an unknown; at least to the masses. What sort of apps can a SPE run effectively? A typical Java or Macromedia Flash based web-game perhaps? A Spectrum Emulator? Word or Excel? Or only FFT's and vertex transforms - data crunching processes to feed the PPE?
     
  15. Frank

    Frank Certified not a majority
    Veteran

    Joined:
    Sep 21, 2003
    Messages:
    3,187
    Likes Received:
    59
    Location:
    Sittard, the Netherlands
    Good point. But that would be about the use for physics and the like, and what kind of tasks you want them to do in general, if I get what you mean. It doesn't change what would probably be the best strategy to get the most out of them, whatever it is you want them to do. And I agree, it is a lot of power.

    But I think, that the current single-thread paradigm calls for an extension, not a radical different way to program. And I think it might be easier to use the stream / micro thread view, than to try and break a current game loop into multiple independent threads.

    While the latter might seem much easier to do at first, it doesn't change or solve anything. It mostly complicates matters. Using the PPU for management and game logic, and spawning streams to other units that are available would probably be easier and "feel" much more like the way things work now, with the added benefit of a pool of speedy processors at your disposition, that can be as custom as you want them to be.
     
  16. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    FLOPS are virtually free on both CELL and XeCPU. To the point where performance is going to be determined by other factors, like control logic complexity, memory latency etc.

    It'll be interesting to see which is better, the traditional load/store architecture,caches and all, from M$ or the radically different Sony system that [h]has[/b] to be programmed with coarser memory transactions.

    Cheers
    Gubbi
     
  17. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    955
    Likes Received:
    52
    Location:
    LA, California
    ERP/DeanoC - just curious - have you guys experimented with van Emde Boas layout, or min-max cost layout tailored to cache-line/page size?
     
  18. Frank

    Frank Certified not a majority
    Veteran

    Joined:
    Sep 21, 2003
    Messages:
    3,187
    Likes Received:
    59
    Location:
    Sittard, the Netherlands
    Think about this: there are very many successful stream architectures in use (most at the API level) at the moment, while symmetric multiple thread architectures still suffer from a lot of serious problems.

    But I'm really curious what paradigm will take off as well. This is an interesting time for that. :D
     
  19. seismologist

    Regular

    Joined:
    Apr 25, 2005
    Messages:
    659
    Likes Received:
    8
    Is this really true? There must be a poiint where computation becomes a bottleneck. Sure for modern day games where 80% of the interactions are scripted (i.e. general purpose code).
    But how about when you're running a real time physics simulation of say a plane crashing through a building. Things might start to get bogged down a bit.
     
  20. archie4oz

    archie4oz ea_spouse is H4WT!
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    1,608
    Likes Received:
    30
    Location:
    53:4F:4E:59
    I have... And I can think of a whole bunch of G4 programmers who have too :p The only difference is what drove them there... ;)
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...