AnandTech article up (X360 vs PS3)

jvd said:
JVD: there is only one PPU.
But the board configurations will be diffrent . Just like video cards .

We may get a budget one with 64 megs , one with a 128 , one with 256 megs . As i said its hard to know which one .

Cell is 256 megs.But you probably won't put all that for physics ,and the OS has its reserved share.
 
Cell is 256 megs.But you probably won't put all that for physics ,and the OS has its reserved share.

I don't think so. Unless you believe the rsx is only going to use 256 megs for textures and framebuffers and what not
 
Fafalada said:
As far as branching alone goes, SPE could potentially outperform a PPE in situations like tree-traversal - so long as the data structures fit inside local store that is.

Even if they don't, tree search is inherently, embarassingly, parallel. Hell, even alpha-beta game tree searches have been parallelized, e.g. Deep Blue Chess. You could pipeline load subtrees serially as needed, or load them on multiple SPEs. Also, when performing searches in a geometry database, since branches are usually NOT taken (comparison fails much more often than it succeeds) you can pretty much achieve maximum performance but letting the CPU assume that a comparison fails, and start executing instructions to recurse down the tree.

In fact, I think the XB360's problem might turn out to be its small caches, as a cache miss is going to be very expensive, and it lacks the control over cache algorithms that the SPE developer has available, since coordinating cache behavior of 6 parallel concurrent threads sharing 1 cache is going to be more difficult for the compiler.
 
Unless you believe the rsx is only going to use 256 megs for textures and framebuffers and what not

64 megs frame buffer for Postprocessing,the rest fo texture,yes,that' ok.Stream for more.

I just said 256 as a max,(but perhaps it could use some of the RSX memory,i don't know),but your game want probably to use only 4megs ,or 128megs for physics alone.Well ,i guess developpers have the freedom to balance that.

i have no problem if someone want to make a 80% physics ,5% texture and 15% base code with the whole system memory , since(imho) it doesn't forbid fun an entertainement,wich is all the point.
 
DemoCoder said:
In fact, I think the XB360's problem might turn out to be its small caches, as a cache miss is going to be very expensive, and it lacks the control over cache algorithms that the SPE developer has available, since coordinating cache behavior of 6 parallel concurrent threads sharing 1 cache is going to be more difficult for the compiler.
What cache controls systems has XeCPU got relative to Cell? Devs are recommended to manage cache to gain best performance, are they not, so there must be some level of control and prefetching(?).
 
Cache is one of Xenon's strengths...SPE cache is one of its weaknesses (unless you are into "kinky" VU style programming like some of these guys who don't know better).

Generally it's a divergence continuing from last generation - PS3 is even harder to code from a pure C++ angle, Xbox 360 is even easier. PS3 will again be only utilized at first by Japanese companies who throw tons of bodies on the graphic engine, Xbox 360 will look pretty darn good out of the gate, even if you're a 10 person team working on the graphics and engine.

The word on the street is that you will need twice the programmer staff to work on a PS3 title compared to your Xbox 360 SKU. If you don't believe me go look at all these companies scrambling to hire up for next gen.

Which companies will bite the bullet and say "Screw that, we can profit more easily from Xbox only"?
 
Programmers don't scale that way. That's the mythical man month. Adding more coders to a project doesn't make it go faster and doesn't make more technically complex code easier.
 
Embedded Sea said:
Cache is one of Xenon's strengths...SPE cache is one of its weaknesses (unless you are into "kinky" VU style programming like some of these guys who don't know better).

Generally it's a divergence continuing from last generation - PS3 is even harder to code from a pure C++ angle, Xbox 360 is even easier. PS3 will again be only utilized at first by Japanese companies who throw tons of bodies on the graphic engine, Xbox 360 will look pretty darn good out of the gate, even if you're a 10 person team working on the graphics and engine.


http://arstechnica.com/articles/paedia/cpu/xbox360-2.ars/3?56749


Aren't the Xbox CPU's also 2-3 issue, in-order processors and will suffer from the same situation that the PS3 is facing?

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2379&p=6

According to anandtech, automated cache is actually a very tricky deal for in-order cores. A cache miss will halt processing because the instructions need to be executed in order. I'm assuming that in the Xbox 360 case the programmer will not be able to manage the cache as easily. All three chips, two threads each, will be querying the same cache. Is there a way to ensure that thrashing and cache misses are at a minimum?

http://research.scea.com/research/html/CellGDC05/15.html
http://research.scea.com/research/html/CellGDC05/16.html

In the PS3 case the control is in the hands of the programmer. Each processor has it's own local memory and two pipes. I see a really simple and fairly efficient use of the PS3 situation. Split your local memory in half 128K is for current execution, and 128k is for prefetching data from main memory. (In real life it would more likely be 100K for execution, 100K for prefetching, and 56K for controll flow.) This would mask the 500 cycle "cache miss" that would cripple the processor while it waits for data.

AFAIK because the 360 is also in-order, so this problem is going to exist and will require intervention on the part of the programmer. The 360 also has 128 128-bit VMX/Altivec registers which need custom coding to take advantage of.

I don't think the 360 is going to be simple to code for. The single cache serving 3 processors might be a nightmare, or it could be a walk in the park, I don't know. I would think that each processor needs it's own cache.

Embedded Sea said:
The word on the street is that you will need twice the programmer staff to work on a PS3 title compared to your Xbox 360 SKU. If you don't believe me go look at all these companies scrambling to hire up for next gen.

If you need twice the programming staff (going from 50-100 instead of from 2-4) then you are doing it wrong. What you need is 3-6 low-level gurus that are good at working together and can code up an abstraction layer or an intermediate language which everyone else codes to. If you have 50 people all working on the hardware, you're just begging for an internal meltdown. Step 1) Solidify an abstraction layer. Step 2) 3-6 guys reduce the abstraction layer to machine code, working around all the quirks of the hardware and optimizing the code wherever possible. Step 3) 10-50 guys write the rest of the application on top of the abstract layer.

I am not a game programmer, so maybe I'm wrong. If I am, let me know and explain why so I can adjust my views accordingly. :)

This really should be a case of divide and conquer. The abstraction layer in the middle is where the problem can be split in half. Video game programming can still be about making games and getting good hardware preformance. Investing in a good abstraction layer upfront is will smooth things out greatly. (Of course the downside to that is that you have to have a good intuition and foresight as to what the abstraction layer should look like. )

AR
 
Fafalada said:
certainly better than branch hinting.
Depends on who is doing the hinting. :p

The problem with dynamic hinting in tree traversal is as far as I can see you have to do the same amount of work to do the hint as you do to do the branch, so I'm not sure I see how it saves you in this case.

Of course it's also true that the branch penalty is significantly less than a cache miss. But IME tree searches in applications tend to have good temporal coherency frame to frame and the L2 cache bails you out.
 
Even if they don't, tree search is inherently, embarassingly, parallel. Hell, even alpha-beta game tree searches have been parallelized, e.g. Deep Blue Chess. You could pipeline load subtrees serially as needed, or load them on multiple SPEs. Also, when performing searches in a geometry database, since branches are usually NOT taken (comparison fails much more often than it succeeds) you can pretty much achieve maximum performance but letting the CPU assume that a comparison fails, and start executing instructions to recurse down the tree.

In fact, I think the XB360's problem might turn out to be its small caches, as a cache miss is going to be very expensive, and it lacks the control over cache algorithms that the SPE developer has available, since coordinating cache behavior of 6 parallel concurrent threads sharing 1 cache is going to be more difficult for the compiler

I agree for the most part, although I'm not as pessimistic about cache behavior of the 360 as you are. PowerPC implementations have been getting progressively more rich in the cache manipulation area, and in the 360's case you have a lot more fine-grained control over the caches. Granted while having an gobs of register space helps, the L2 latencies aren't exactly stellar and with an execution unit that can swallow 96-128bytes a cycle, hiding ~500ish clocks to main memory requires a healthy amount of cache to lock down (at least 64KB), it's certainly doable... In a way it'll be alot like optimizing on a high-clock 744x, only instead of working around FSB constraints, it'll be memory latency instead (although having my choice I'd rather work around the former than the later)...
 
Embedded Sea said:
Generally it's a divergence continuing from last generation - PS3 is even harder to code from a pure C++ angle, Xbox 360 is even easier.

I was lost here. Xbox 360 is even easier... compared to what ? C++ programming :?:
 
ERP said:
The problem with dynamic hinting in tree traversal is as far as I can see you have to do the same amount of work to do the hint as you do to do the branch, so I'm not sure I see how it saves you in this case.
Depends on the type (and amount) of work we're doing at each node I guess.

and the L2 cache bails you out.
Even when the L2 latency is more then 2x that of a branch mispredict? :p
I think we already know from experience what effect latencies of roughly this size have on an in-order CPU if you don't take special care with the code. And that was with much shorter pipelines to boot...
 
ERP said:
Last I looked Havok was using a KdTree.


But yes BSP tree's have their place.
You may well be right, thats the whole point of middleware, I can forgot the implementation details ;-)
 
Isn't this whole 'PS3 will require more man hours and expenses to program for' the same bunk that we heard before this gen? It's like deja vu all over again -- I feel like I'm continually reading the same stuff I did 6 years ago.

Everyone cried out that the PS2's VUs would reek havoc on devs, and that all the small studios would migrate to the other systems. As we all know, this didn't happen. Hell, I can give plenty of examples of titles made for the PS2 by extremely small teams.

Drakengard was developed by a 17 person team from Cavia. And one of the first visually stunning titles for the PS2, The Bouncer, was made by only 26 guys from Dream Factory.

And correct me if I'm wrong, but aren't the SPE's in the Cell sorta similar to the VUs in the PS2? If so, that means the devs going into this next gen will have more experience than last. Wasn't the PS2 only able to be coded in Assembly, yet now the PS3 can do C and C++? Wouldn't these factors make initial developing for the PS3 easier than it was for PS2?

One of the main weaknesses of the PS2 was the difficulty in porting PC titles, as the publishers wanted to keep ports as cheap as possible, but PS2 would need a rather significant code rewrite (is this correct?). nVidia's inclusion of OpenGL and the simple fact that the PS3 can handily run UE3 should drastically change this, no?

So then, why should any of us believe that these dev horror stories will come true, if they didn't last gen?
 
DeanoC said:
ERP said:
Last I looked Havok was using a KdTree.


But yes BSP tree's have their place.
You may well be right, thats the whole point of middleware, I can forgot the implementation details ;-)

Yep you can forget them right up to the point where they screw you.... :/
 
Back
Top