AnandTech article up (X360 vs PS3)

_phil_ · Jun 25, 2005

jvd said:
JVD: there is only one PPU.

Click to expand...

But the board configurations will be diffrent . Just like video cards .

We may get a budget one with 64 megs , one with a 128 , one with 256 megs . As i said its hard to know which one .

Cell is 256 megs.But you probably won't put all that for physics ,and the OS has its reserved share.

jvd · Jun 25, 2005

Cell is 256 megs.But you probably won't put all that for physics ,and the OS has its reserved share.

I don't think so. Unless you believe the rsx is only going to use 256 megs for textures and framebuffers and what not

DemoCoder · Jun 25, 2005

Fafalada said:
As far as branching alone goes, SPE could potentially outperform a PPE in situations like tree-traversal - so long as the data structures fit inside local store that is.

Even if they don't, tree search is inherently, embarassingly, parallel. Hell, even alpha-beta game tree searches have been parallelized, e.g. Deep Blue Chess. You could pipeline load subtrees serially as needed, or load them on multiple SPEs. Also, when performing searches in a geometry database, since branches are usually NOT taken (comparison fails much more often than it succeeds) you can pretty much achieve maximum performance but letting the CPU assume that a comparison fails, and start executing instructions to recurse down the tree.

In fact, I think the XB360's problem might turn out to be its small caches, as a cache miss is going to be very expensive, and it lacks the control over cache algorithms that the SPE developer has available, since coordinating cache behavior of 6 parallel concurrent threads sharing 1 cache is going to be more difficult for the compiler.

_phil_ · Jun 25, 2005

Unless you believe the rsx is only going to use 256 megs for textures and framebuffers and what not

64 megs frame buffer for Postprocessing,the rest fo texture,yes,that' ok.Stream for more.

I just said 256 as a max,(but perhaps it could use some of the RSX memory,i don't know),but your game want probably to use only 4megs ,or 128megs for physics alone.Well ,i guess developpers have the freedom to balance that.

i have no problem if someone want to make a 80% physics ,5% texture and 15% base code with the whole system memory , since(imho) it doesn't forbid fun an entertainement,wich is all the point.

Shifty Geezer · Jun 25, 2005

DemoCoder said:
In fact, I think the XB360's problem might turn out to be its small caches, as a cache miss is going to be very expensive, and it lacks the control over cache algorithms that the SPE developer has available, since coordinating cache behavior of 6 parallel concurrent threads sharing 1 cache is going to be more difficult for the compiler.

What cache controls systems has XeCPU got relative to Cell? Devs are recommended to manage cache to gain best performance, are they not, so there must be some level of control and prefetching(?).

Embedded Sea · Jun 25, 2005

Cache is one of Xenon's strengths...SPE cache is one of its weaknesses (unless you are into "kinky" VU style programming like some of these guys who don't know better).

Generally it's a divergence continuing from last generation - PS3 is even harder to code from a pure C++ angle, Xbox 360 is even easier. PS3 will again be only utilized at first by Japanese companies who throw tons of bodies on the graphic engine, Xbox 360 will look pretty darn good out of the gate, even if you're a 10 person team working on the graphics and engine.

The word on the street is that you will need twice the programmer staff to work on a PS3 title compared to your Xbox 360 SKU. If you don't believe me go look at all these companies scrambling to hire up for next gen.

Which companies will bite the bullet and say "Screw that, we can profit more easily from Xbox only"?

Shifty Geezer · Jun 25, 2005

Cache is one of Xenon's strengths...

In what way? And how do you personally think XeCPU will cope with general purpose (cache demanding) code?

DemoCoder · Jun 25, 2005

Programmers don't scale that way. That's the mythical man month. Adding more coders to a project doesn't make it go faster and doesn't make more technically complex code easier.

AlgebraicRing · Jun 25, 2005

Embedded Sea said:
Cache is one of Xenon's strengths...SPE cache is one of its weaknesses (unless you are into "kinky" VU style programming like some of these guys who don't know better).

Generally it's a divergence continuing from last generation - PS3 is even harder to code from a pure C++ angle, Xbox 360 is even easier. PS3 will again be only utilized at first by Japanese companies who throw tons of bodies on the graphic engine, Xbox 360 will look pretty darn good out of the gate, even if you're a 10 person team working on the graphics and engine.

http://arstechnica.com/articles/paedia/cpu/xbox360-2.ars/3?56749

Aren't the Xbox CPU's also 2-3 issue, in-order processors and will suffer from the same situation that the PS3 is facing?

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2379&p=6

According to anandtech, automated cache is actually a very tricky deal for in-order cores. A cache miss will halt processing because the instructions need to be executed in order. I'm assuming that in the Xbox 360 case the programmer will not be able to manage the cache as easily. All three chips, two threads each, will be querying the same cache. Is there a way to ensure that thrashing and cache misses are at a minimum?

http://research.scea.com/research/html/CellGDC05/15.html
http://research.scea.com/research/html/CellGDC05/16.html

In the PS3 case the control is in the hands of the programmer. Each processor has it's own local memory and two pipes. I see a really simple and fairly efficient use of the PS3 situation. Split your local memory in half 128K is for current execution, and 128k is for prefetching data from main memory. (In real life it would more likely be 100K for execution, 100K for prefetching, and 56K for controll flow.) This would mask the 500 cycle "cache miss" that would cripple the processor while it waits for data.

AFAIK because the 360 is also in-order, so this problem is going to exist and will require intervention on the part of the programmer. The 360 also has 128 128-bit VMX/Altivec registers which need custom coding to take advantage of.

I don't think the 360 is going to be simple to code for. The single cache serving 3 processors might be a nightmare, or it could be a walk in the park, I don't know. I would think that each processor needs it's own cache.

Embedded Sea said:
The word on the street is that you will need twice the programmer staff to work on a PS3 title compared to your Xbox 360 SKU. If you don't believe me go look at all these companies scrambling to hire up for next gen.

If you need twice the programming staff (going from 50-100 instead of from 2-4) then you are doing it wrong. What you need is 3-6 low-level gurus that are good at working together and can code up an abstraction layer or an intermediate language which everyone else codes to. If you have 50 people all working on the hardware, you're just begging for an internal meltdown. Step 1) Solidify an abstraction layer. Step 2) 3-6 guys reduce the abstraction layer to machine code, working around all the quirks of the hardware and optimizing the code wherever possible. Step 3) 10-50 guys write the rest of the application on top of the abstract layer.

I am not a game programmer, so maybe I'm wrong. If I am, let me know and explain why so I can adjust my views accordingly.

This really should be a case of divide and conquer. The abstraction layer in the middle is where the problem can be split in half. Video game programming can still be about making games and getting good hardware preformance. Investing in a good abstraction layer upfront is will smooth things out greatly. (Of course the downside to that is that you have to have a good intuition and foresight as to what the abstraction layer should look like. )

AR

DemoCoder · Jun 25, 2005

Does the XeGPU cores even have branch hinting?

Qroach · Jun 25, 2005

They still have branch prediction from what I've heard. certainly better than branch hinting.

Fafalada · Jun 25, 2005

certainly better than branch hinting.

Depends on who is doing the hinting.

ERP · Jun 25, 2005

Fafalada said:
certainly better than branch hinting.

Click to expand...

Depends on who is doing the hinting.

The problem with dynamic hinting in tree traversal is as far as I can see you have to do the same amount of work to do the hint as you do to do the branch, so I'm not sure I see how it saves you in this case.

Of course it's also true that the branch penalty is significantly less than a cache miss. But IME tree searches in applications tend to have good temporal coherency frame to frame and the L2 cache bails you out.

archie4oz · Jun 25, 2005

Even if they don't, tree search is inherently, embarassingly, parallel. Hell, even alpha-beta game tree searches have been parallelized, e.g. Deep Blue Chess. You could pipeline load subtrees serially as needed, or load them on multiple SPEs. Also, when performing searches in a geometry database, since branches are usually NOT taken (comparison fails much more often than it succeeds) you can pretty much achieve maximum performance but letting the CPU assume that a comparison fails, and start executing instructions to recurse down the tree.

In fact, I think the XB360's problem might turn out to be its small caches, as a cache miss is going to be very expensive, and it lacks the control over cache algorithms that the SPE developer has available, since coordinating cache behavior of 6 parallel concurrent threads sharing 1 cache is going to be more difficult for the compiler

I agree for the most part, although I'm not as pessimistic about cache behavior of the 360 as you are. PowerPC implementations have been getting progressively more rich in the cache manipulation area, and in the 360's case you have a lot more fine-grained control over the caches. Granted while having an gobs of register space helps, the L2 latencies aren't exactly stellar and with an execution unit that can swallow 96-128bytes a cycle, hiding ~500ish clocks to main memory requires a healthy amount of cache to lock down (at least 64KB), it's certainly doable... In a way it'll be alot like optimizing on a high-clock 744x, only instead of working around FSB constraints, it'll be memory latency instead (although having my choice I'd rather work around the former than the later)...

patsu · Jun 25, 2005

Embedded Sea said:
Generally it's a divergence continuing from last generation - PS3 is even harder to code from a pure C++ angle, Xbox 360 is even easier.

I was lost here. Xbox 360 is even easier... compared to what ? C++ programming :?:

Fafalada · Jun 25, 2005

ERP said:
The problem with dynamic hinting in tree traversal is as far as I can see you have to do the same amount of work to do the hint as you do to do the branch, so I'm not sure I see how it saves you in this case.

Depends on the type (and amount) of work we're doing at each node I guess.

and the L2 cache bails you out.

Even when the L2 latency is more then 2x that of a branch mispredict?

I think we already know from experience what effect latencies of roughly this size have on an in-order CPU if you don't take special care with the code. And that was with much shorter pipelines to boot...

DeanoC · Jun 25, 2005

ERP said:
Last I looked Havok was using a KdTree.

But yes BSP tree's have their place.

You may well be right, thats the whole point of middleware, I can forgot the implementation details ;-)

_phil_ · Jun 25, 2005

i love your sig ,archie.

Oda · Jun 25, 2005

Isn't this whole 'PS3 will require more man hours and expenses to program for' the same bunk that we heard before this gen? It's like deja vu all over again -- I feel like I'm continually reading the same stuff I did 6 years ago.

Everyone cried out that the PS2's VUs would reek havoc on devs, and that all the small studios would migrate to the other systems. As we all know, this didn't happen. Hell, I can give plenty of examples of titles made for the PS2 by extremely small teams.

Drakengard was developed by a 17 person team from Cavia. And one of the first visually stunning titles for the PS2, The Bouncer, was made by only 26 guys from Dream Factory.

And correct me if I'm wrong, but aren't the SPE's in the Cell sorta similar to the VUs in the PS2? If so, that means the devs going into this next gen will have more experience than last. Wasn't the PS2 only able to be coded in Assembly, yet now the PS3 can do C and C++? Wouldn't these factors make initial developing for the PS3 easier than it was for PS2?

One of the main weaknesses of the PS2 was the difficulty in porting PC titles, as the publishers wanted to keep ports as cheap as possible, but PS2 would need a rather significant code rewrite (is this correct?). nVidia's inclusion of OpenGL and the simple fact that the PS3 can handily run UE3 should drastically change this, no?

So then, why should any of us believe that these dev horror stories will come true, if they didn't last gen?

ERP · Jun 25, 2005

DeanoC said:
ERP said:

Last I looked Havok was using a KdTree.

But yes BSP tree's have their place.

Click to expand...

You may well be right, thats the whole point of middleware, I can forgot the implementation details ;-)

Yep you can forget them right up to the point where they screw you.... :/

AnandTech article up (X360 vs PS3)

_phil_

jvd

DemoCoder

_phil_

Shifty Geezer

uber-Troll!

Embedded Sea

Shifty Geezer

uber-Troll!

DemoCoder

AlgebraicRing

DemoCoder

Qroach

Fafalada

ERP

archie4oz

ea_spouse is H4WT!

patsu

Fafalada

DeanoC

Trust me, I'm a renderer person!

_phil_

Oda

ERP

Similar threads