will PS3's GPU be more modern than PS2's GS for its time?

Jaws said:
Panajev2001a said:
....
With this said, I still say that IMHO the PlayStation 3 GPU is not CELL based, it does not have the SPUs/APUs.
....

It would seem to me a not bad idea to assign all the Vertex Shading work to the CELL based CPU:
....

To me the above two statements seem to contradict each other.. :?

Explain how.

Vertex Shading done by units like the SPUs/APUs or like the VUs in the EE can be done very effectively, while Pixel Shading might not (nVIDIA does not believe in using the same hardware for both and for now they might be right).

Splitting the VS and PS work between the CPU and the GPU is what some PS heavvy games on Xbox 2/Xenon will do: all the unified shading units would be dedicated to Pixel Shading work and Vertex Shading would then be done on the enhanced VMX units of the Xbox 2/Xenon's CPU.
 
1. Basically, the way I see it is this...if you're gonna have vertex shading on the CELL CPU, then they're gonna be VS CELL threads, (aka software Cells), no?

If Hofstee was talking about two-way comms between CPU<=>GPU, then these VS CELL threads should run on the GPU also, no?

Ideally that is what they would want, which is why I talked about the direction for CELL 2.0 for example. IBM, with CELL, sees a point in pushing towards the same direction ATI is pushing with unified shading hardware.

It makes sense as having two very integrated (very fast connection between CPU and GPU) ICs in which workload can be easily balanced between the two ICs leveraging CELL technology.

This is not the way nVIDIA sees things, not for theshort-to-medium term at least.

Two-way communication between CPU and GPU is not CELL dependent (although I can see how having a CELL based CPU and a CELL based GPU could work very well here): you will see it in Xbox 2/Xenon (with the GPU being able to read/write a portion of the CPU L2 cache) and future PCI-Express systems.
 
Qroach said:
Here's my prediction on what has happened thus far with PS3 development


1. Many people working on PS3 hardware didn't know or expect this to happen.

2. The Nvidia's invovlement in PS3 did not start for one or two years before this annoucement. Nvidia's involvement was minimal, nothing more than NV execs talking on a high level to Sony execs, and sending out graphics cards for Sony to evaluate performance. Sony had been evaulating Nvidia tech for a while with zero cash exchanged between both parties. Nvidia didn't change what anything in thier roadmap to accomodate sony up until this annoucement.

3. Toshiba was left out in the cold with this change and didn't know it was going to happen. The work on the GS for PS3 will continue, but that graphics hardware won't be used in the PS3.

4. One of sony's offices led the charge to make this change, as internally issues were raised with the expected performance of the new GS and/or expected feature set/delivery time frame.

5. The nvidia licensing deal will cost Sony more money, but allow them to compete better on the graphics technology side and a later launch. The trade off was cost vrs performance & delivery time. Nvidia will incorporate support for the graphics API that Sony is jointly developing with other partners.

6. Nvidia will supply the entire GPU with minor modifications. It will be a custom version of a future Nvidia graphics chip (along the lines of Xgpu) however sony will be in charge of fabbing (as we already know). The Nvidia processor "can" handle all vertex and pixel shading, or the cell CPU can be used for vertex shading. They will not be incredibly integrated together.

7. Cell is a really good CPU, but NOT the second coming of christ.

8. There are few to zero actual Cell workstations available for developers to get cranking on. Sony is currently using emulators and PC's to perform their development, and the final devkits won't arrive until Nvidia has delivered their custom GPU late next year.

9. You will not see any realtime raytracying, radiosity lighiting, or any advanced rendering techniques beyond what is readily capable with the graphics hardware found on PC's of the same time frame. you might see real support for subdivision surfaces in hardware, but they will only be used on chracters.

Feel free to quote me on this or post this when we see real PS3 specs or actual performance out of developers. grab and save this post if you want I don't F'ing care. I think there's some people around here that are living in an imaginary dreamworld with regards to sony and PS3 and their plans for it. It's a game console, no company is invincible (MS could fuck up easily, just as sony or nintendo) and sometimes tradeoffs are made.

good post Qroach. I think I agree with all of it. I think most or many of your predictions & assumptions are most likely on target.
 
I agree with QRoach in this thread. I think Sony did their best internally, but came up a little short on time or performance and went with NVidia to get something that will do PS3 justice. Certainly not the end of the world for Sony, but a small setback in terms of PS3 cost.
 
Panajev2001a said:
Splitting the VS and PS work between the CPU and the GPU is what some PS heavvy games on Xbox 2/Xenon will do: all the unified shading units would be dedicated to Pixel Shading work and Vertex Shading would then be done on the enhanced VMX units of the Xbox 2/Xenon's CPU.

Just to interject here (on a slightly unrelated note). Its my belief that MS expects the majority of VS work to be done via the CPU (in fact I believe their initial tendor guidelines suggested that they wanted no VS capabilities) having unified shaders just offers a convenient method of giving all the functionality and more flaxability as to where things are processed.
 
Johnny Awesome said:
I agree with QRoach in this thread. I think Sony did their best internally, but came up a little short on time or performance and went with NVidia to get something that will do PS3 justice. Certainly not the end of the world for Sony, but a small setback in terms of PS3 cost.
How so?
I don't think Sony has ever made a console all by themselves, they always have had some others to help design them and licence technology: Toshiba, MIPS, Rambus, IBM.
I don't see how that deal with nVidia would be any different cost wise to SOny than the deals with other partners.

Had the graphics part been designed by Toshiba, would that have been more cost effective?
 
Panajev2001a said:
Vertex Shading done by units like the SPUs/APUs or like the VUs in the EE can be done very effectively, while Pixel Shading might not (nVIDIA does not believe in using the same hardware for both and for now they might be right).
But programmable vertex shading is only one component in the middle of a lot of fixed function geometry processing. This stuff while not being 'sexy' can be extremely expensive without custum hardware.
A simple example vertex indexation. A Modern GPU vertex processor reads a index list, and then that index list is used be read the actual vertex data from vertex pools.
In psuedo code a simplified triangle assembly is
Code:
Let I be a register with 3 16 bit tuples identified as 0,1,2
Let A be the address of the index stream
Let B be the address of the vertex stream
Let n be the current triangle number
Load I from n[A]
Load R0 from I.0[B]
Load R1 from I.1[B]
Load R2 from I.2[B]
VertexShade( R0 )
VertexShade( R1 )
VertexShade( R2 )
If we assume a memory latency of 200 cycles (thats very generous...) and that we can process 3+ memory requests at once we have 400 cycles to lose. Load I from n[A] is relatively easy, a predictor will notice the linear read and prefetch 200 cycles early, so when the load happens its hopefully already there. But the Load Rx from I.x is much harder... I can't predict until the index register is filled, so I have to stall the shader for 200 cycles. The only serious option is a thread context switch.

Modern GPU's have this entire thing hidden, it just works. Unless a CPU is going to be fitted with a GPU style memory fetch system hows it going to hide it?

Of course you can rework you pipeline not to require things like this but its comes up several times (bone indices are the next obvious one).
 
DaveBaumann said:
Panajev2001a said:
Splitting the VS and PS work between the CPU and the GPU is what some PS heavvy games on Xbox 2/Xenon will do: all the unified shading units would be dedicated to Pixel Shading work and Vertex Shading would then be done on the enhanced VMX units of the Xbox 2/Xenon's CPU.

Just to interject here (on a slightly unrelated note). Its my belief that MS expects the majority of VS work to be done via the CPU (in fact I believe their initial tendor guidelines suggested that they wanted no VS capabilities) having unified shaders just offers a convenient method of giving all the functionality and more flaxability as to where things are processed.

I disagree. There is alot of magic behind the scene keeping the vertex throughput up, coming up with replacements on the CPU will be hard and I'd imagine why the idea was shot down by ATI.

Having had to write CPU geometry pipes, there is no doubt that FLOPs is largely irrelevant to geometry throughput. Almost every time you are memory bound, you either come up with wacky bandwidth saving methods or just do more vertex work... because you will have to have fairly complex vertex shaders to hide the cost of memory latency.
 
DeanoC said:
If we assume a memory latency of 200 cycles (thats very generous...) and that we can process 3+ memory requests at once we have 400 cycles to lose....[cut]
That's why we pre-encode data packets to fit in local memory on the PS2 and that's why we'll probably do the same even on the PS3.
Indexed primitives on the PS2 are not a problem at all when we need them (and most of time we don't need them)
No predictor would be effective in a situation like that, except in some very rare case (and probably you don't need indexed primitives in those cases)
I know you were just showing an example of the difficulties that can arise on non-customized hw, but it's also true one can think different ways to achieve the same goal, and sometimes exotic solutions works even better ;)
(sorry DeanoC, I know you know, it's just a ps2 pride post.. :) )

ciao,
Marco
 
DeanoC said:
Panajev2001a said:
Vertex Shading done by units like the SPUs/APUs or like the VUs in the EE can be done very effectively, while Pixel Shading might not (nVIDIA does not believe in using the same hardware for both and for now they might be right).
But programmable vertex shading is only one component in the middle of a lot of fixed function geometry processing. This stuff while not being 'sexy' can be extremely expensive without custum hardware.
A simple example vertex indexation. A Modern GPU vertex processor reads a index list, and then that index list is used be read the actual vertex data from vertex pools.
In psuedo code a simplified triangle assembly is
Code:
Let I be a register with 3 16 bit tuples identified as 0,1,2
Let A be the address of the index stream
Let B be the address of the vertex stream
Let n be the current triangle number
Load I from n[A]
Load R0 from I.0[B]
Load R1 from I.1[B]
Load R2 from I.2[B]
VertexShade( R0 )
VertexShade( R1 )
VertexShade( R2 )
If we assume a memory latency of 200 cycles (thats very generous...) and that we can process 3+ memory requests at once we have 400 cycles to lose. Load I from n[A] is relatively easy, a predictor will notice the linear read and prefetch 200 cycles early, so when the load happens its hopefully already there. But the Load Rx from I.x is much harder... I can't predict until the index register is filled, so I have to stall the shader for 200 cycles. The only serious option is a thread context switch.

Modern GPU's have this entire thing hidden, it just works. Unless a CPU is going to be fitted with a GPU style memory fetch system hows it going to hide it?

Of course you can rework you pipeline not to require things like this but its comes up several times (bone indices are the next obvious one).


Well, what do you do on PlayStation 2 ? I do not expect SPUs/APUs to be any worse at Vertex processing than EE's VUs or any less flexible.

Also, we should keep in mind that the other solution was supposed to be PS only so it was understood that the SPUs/APUs would be working on Geometry Processing/Vertex Shading: I think Sony/SCE made sure IBM knew from the start what kind of code they would see runnign most of the time on the SPUs/APUs and that means also Geometry Processing/Vertex Shading.

A GPU doing PS only work was basically the idea since almost the start I believe (at least since the "other option" was taken): I do think Sony/SCE had the idea of maybe pushing more programmable shaders on the GPU, but to keep Vertex Processing and Pixel Processing resources separate.

I would think that they thought about this kind of problems.

What you are describing is basically SoE MTwhich the GPU do automatically: if what they look for is not in the caches (Texture or Vertex caches [pre or post T&L]) they switch to a different thread of execution (a different vertex).

You are presenting an interesting problem, which we have asked ourselves a lot: what will the SPUs/APUs do when the current thread stalls (for any reason) ?

Are SPUs/APUs in-order processors with no MT capability or are they Multi-Threaded ?

I think context-switching and MT can be "pushed" upon the SPUs/APUs on the PU side: the patents do mention how the PU can stop the SPU/APU execution, load a new Program Counter and a new context in the LS and re-start the execution on a new program using an APU RPC. I would assume that when the PU "forces its will" on the APUs that the APUs do save their context and can go back to it.

Maybe we will see this kind of multi-threaded handled by the CELL OS transparently enough to developers using the providd libraries (unless they want to go down to the metal).

IMHO, several things have changed since Sony first sought to use CELL technology to ease developers work.

I do believe that the PU might have changed in complexity and performance compared to earlier concepts.

Something I hear PlayStation 2 developers really hating is not VU programming or the GS having a limited feature-set or having to handle the DMAC to make sure VUs are working alongside the RISC core and the GS, etc...

What a lot of people hate is the fact that there is big need of people that take tons and tons of lines of C/C++ general purpose code (for physics, A.I. or other tasks) and convert it into R5900i friendly code working around the tiny D-Cache and using the SPRAM as much as possible: all things that PlayStation 2's GCC is not able to do.

We can have a simple and not too fast PU that does not recognizes the limits of compilers' technology or does not do much to work better with the compilers and thus push people again to re-write tons of lines of C/C++ code in ASM for all kinds of code.

Yes, I see the challenge... I see the glitter in your eyes when thinking of fighting the machine with the stenghts of your ASM knowledge, but this is not where game development is going IMHO: just like we stopped writing large portions of OS's in ASM (I seriously doubt that for example Longhorn is being written directly in ASM or even a significantly large portion of it) we will stop writing large portions of general purpose code in ASM for game programming.

Take something like the Athlon 64 and the MSVS 7.x compiler: I doubt it is worth writing huge portions of a PC game in ASM directly or to convert large portions of existing C/C++ code into x86 optimized ASM code because the code MSVS 7.x generates is way too crappy for the CPU it runs on.

Sure, an ASM programmer in most cases can still design faster and more efficient code, but we have to ask ourselves: how much faster is it ? how much time did this took ?

Modern micro-processors are designed to go towards compilers (an extreeme case of this is IA-64/EPIC): they contain larger on-chip caches (SPRAM is a very nice concept which I like, but it requires the programmer to manually take care of it and use it which means hand-writing ASM code) they are able to look at relatively large portions of the code and re-organize instructions to improove performance, hide latencies due to memory fetches or other kinds of problems creating stalls in the pipeline, switch thread of execution or even execute more than one thread at once.

If you compare how standard C/C++ code produced by the same version of GCC runs on PlayStation 2 and on Xbox, you will likely see an increase in performance that goes beyond the MHz difference.

In the patents I have seen recently we have shifted from e-DRAM on the CPU and in general from the use of large Work RAM buffers to caches and the rumors of a very fast PU core and a single PE with a certain number of APUs (instead of multiple PEs with smaller and less complex PU cores): PU's went from maybe L1 cache only to having an L2 cache which could be read/written by the SPUs/APUs and the e-DRAM on the CPU seems to have been taken out and instead of it we have, for the SPUs/APUs, a shared L1 cache.

IMHO for the APUs having a shared L1 cache is much better than having e-DRAM. Why ? Because the caches, being transparent to the applications, help performance without requiring major changes to the compiler and other high level libraries used or re-writing or large portions of code in ASM.

A cache-less system with lots of storage readable and writeable might be fast, but it would have to receive well optimized and hand-coded instruction streams and it would perform like crap if you just passed it code produced by GCC or MSVS's compiler.

The amount of responsability given to the PU has risen now (at least in PlayStation 3... the PU can be changed in performance, ISA, etc... if needed) and so have its complexity, frequency and over-all perfomance IMHO.

We expect the CELL OS and high-level libraries to handle more and more things transparently and we want to write more and more code using high-level languages: all of this requires a different idea of central CPU core that was present in the minds of the Emotion Engine's architects.
 
DeanoC said:
Code:
Let I be a register with 3 16 bit tuples identified as 0,1,2
Let A be the address of the index stream
Let B be the address of the vertex stream
Let n be the current triangle number
Load I from n[A]
Load R0 from I.0[B]
Load R1 from I.1[B]
Load R2 from I.2[B]
VertexShade( R0 )
VertexShade( R1 )
VertexShade( R2 )
If we assume a memory latency of 200 cycles (thats very generous...) and that we can process 3+ memory requests at once we have 400 cycles to lose. Load I from n[A] is relatively easy, a predictor will notice the linear read and prefetch 200 cycles early, so when the load happens its hopefully already there. But the Load Rx from I.x is much harder... I can't predict until the index register is filled, so I have to stall the shader for 200 cycles. The only serious option is a thread context switch.


i'm afraid a contex switch may not help either and you'd end up with things "just not working" even on a gpu if your indexing locality throughout the buffer is piss poor. so i'll second marco in this say that you don;t attacj brick walls head-on, you usually try to circumvent them : )
 
I don't deny it can be made to work, but my point is that ease of use just got thrown out the window.

The idea that a CPU is as good as a hardware vertex unit is false IMO. Its may do the job but to get VS3.0 functionality out of a CPU is hard, not because of FLOPs or functionality but memory latency.

As for Cell context switching I'm not convinced, we have to assume the 128K local RAM is shared among contexts (no way it could be saved/restored in time). We still have 2K of register to save/restore, assuming 128 bits per cycle read/write (to L1 cache I suppose) thats 32 cycles (16 per way) unless we have hardware contexts. Thats assuming hardware switching, if its actually software involving a PU interupt, then we have at least 10 cycles latency for the PU to react.
 
darkblu said:
i'm afraid a contex switch may not help either and you'd end up with things "just not working" even on a gpu if your indexing locality throughout the buffer is piss poor. so i'll second marco in this say that you don;t attacj brick walls head-on, you usually try to circumvent them : )

At least one architecture has made the problem largely go away (of course there are pathologic cases but for most uses of index buffers and depedent texture reads its copes)

You just needs lots and lots of contexts (and a seriously good memory system). You have to stop thinking like a CPU where 8 contexts would be considered alot.
 
DeanoC said:
I don't deny it can be made to work, but my point is that ease of use just got thrown out the window.

The idea that a CPU is as good as a hardware vertex unit is false IMO. Its may do the job but to get VS3.0 functionality out of a CPU is hard, not because of FLOPs or functionality but memory latency.

True, but I expect developers to do some pre-packaging work for data and to work-around problems like this while still using a high level graphics library and not going to a PlayStation 2 kind of approach.
 
DeanoC said:
You just needs lots and lots of contexts (and a seriously good memory system). You have to stop thinking like a CPU where 8 contexts would be considered alot.

Like what? 32 or 64 or ... ?
 
DeanoC said:
I don't deny it can be made to work, but my point is that ease of use just got thrown out the window.

The idea that a CPU is as good as a hardware vertex unit is false IMO. Its may do the job but to get VS3.0 functionality out of a CPU is hard, not because of FLOPs or functionality but memory latency.


TRUE, but APUs run on 4.6 GHZ in cell, while in gpu VS run 800MHZ
cell will faster, in gpu more pixelshader will faster than ati's gpu

sony save transistor count with this method
 
Here's my prediction on what has happened thus far with PS3 development

1. Many people working on PS3 hardware didn't know or expect this to happen.

Companies always have Plan B, C, D, E....

2. The Nvidia's invovlement in PS3 did not start for one or two years before this annoucement. Nvidia's involvement was minimal, nothing more than NV execs talking on a high level to Sony execs, and sending out graphics cards for Sony to evaluate performance. Sony had been evaulating Nvidia tech for a while with zero cash exchanged between both parties. Nvidia didn't change what anything in thier roadmap to accomodate sony up until this annoucement.

That's the case too IMO. As we know, their concentration is on cell, the pixel engine was not something they make a big deal about. Even on the cell patent it was seems to be tacked on.

Anyway if they did something, you would already seen this annoucement earlier.

3. Toshiba was left out in the cold with this change and didn't know it was going to happen. The work on the GS for PS3 will continue, but that graphics hardware won't be used in the PS3.

Toshiba needs to develop their own pixel engine or license them from somewhere to be paired with their own Cell. I doubt they work specifically for PS3, and their exchange on this could be similar to what you said about NVIDIA. They would certainly make somekind of announcement earlier if they did otherwise.

4. One of sony's offices led the charge to make this change, as internally issues were raised with the expected performance of the new GS and/or expected feature set/delivery time frame.

That could be the case. Though it wasn't time frame or feature set, its most likely cost.

5. The nvidia licensing deal will cost Sony more money

Executives will not allow this. Executives will force the engineers to downgrade the spec either from NV part or Cell part to meet the budget. NV license deal was just a better value for money, that's why they chose it over their other options.

If I am speculating, for all we know, Sony owns solution could be superior to NV, but it broke the bank, and downgrading their solution, made NV solution a better performer. That could be why they go with NV.

But Sony would have a fixed range for PS3 budget, that they wouldn't break. That's not a speculation, that's just real world. Sony is building a PS3, not a bridge.
 
Look, sometimes budgets are increased if there is the need.

If you planned to spend $35,000 on a BMW 3 Series, but you see that for the model you want is either $1,000 more or a Yugo, you spend the extra $1,000.

According to your reasoning, V3, PSP would have never been launched in Japan this December, they would have delayed the launch as the budget they thought was good for it had to be majorly revised: PSP was pushed through the fabs in December because Son y wanted to hit the launch date even if it would cost them a lot of money (some hardware bugs had been recently fixed and to posh the amount of units through their manufacturing plants costed a premium in that period: the volume was quite low).

Sony will not lose an infinite amount o money on their hardware, but they still can adjust their decisions on how much they plan to lose per unit and how soon after launch these losses can be lowered.

Also, V3... neither Toshiba nor nVIDIA is providing ROPs/Pixel Engines to Sony: both of them were racing for a full GPU contract which includes Shaders, Texture Units, etc...
 
Look, sometimes budgets are increased if there is the need.

If you planned to spend $35,000 on a BMW 3 Series, but you see that for the model you want is either $1,000 more or a Yugo, you spend the extra $1,000.

No, not in this case. Consumers aren't in it to make money :), Sony is. Anyway, in business, luxury items are normally the first thing to get cut off. When you can't afford it, you can't afford it. Even the extra $1000.

According to your reasoning, V3, PSP would have never been launched in Japan this December, they would have delayed the launch as the budget they thought was good for it had to be majorly revised: PSP was pushed through the fabs in December because Son y wanted to hit the launch date even if it would cost them a lot of money (some hardware bugs had been recently fixed and to posh the amount of units through their manufacturing plants costed a premium in that period: the volume was quite low).

No, they already budgeted for that. Last minute hardware bugs are emergency, that sometimes happend. But those are bugs.

Sony will not lose an infinite amount o money on their hardware, but they still can adjust their decisions on how much they plan to lose per unit and how soon after launch these losses can be lowered.

No, those ranges are fix when they made the investment, and plans layed out. Options like abandoning Cell, going with Toshiba, NV, ATI, downgrading or upgrading spec, those are not fix though.

Also, V3... neither Toshiba nor nVIDIA is providing ROPs/Pixel Engines to Sony: both of them were racing for a full GPU contract which includes Shaders, Texture Units, etc...

That's what a GPU is isn't ? Pixel Engine is just Sony fancy name.

Look I am not saying PS3 is low budget or anything (its probably higher than the other two for all I know), but they wont increase the cost of it, just for the sake of increase in performance, that's futile in the eye of consumers. If it other features that may be visible, like Blu Ray, maybe they'll make allowance. But going to NV because their solution is expensive and will over budget the project but it gives better performance, thats just silly. They just won't go over budget. There are too much risk involved in doing that.

Like I said before going to NV is definitely the cheapest solution for Sony. Sony Plan A solution broke the bank so not feasible, Plan B look elsewhere for solutions that meet the cost. NV won the contract because it meets their cost, where others might not.
 
V3,

It is not going over budget, as you yourself said they already left wiggle room in the budget ;).

No, those ranges are fix when they made the investment, and plans layed out.

See ;) ?



That's what a GPU is isn't ? Pixel Engine is just Sony fancy name.

If you think that nVIDIA is in to provide only Pixel Engines/ROPs then you are mistaken.

Look at NV40, there is a lot more stuff that makes a GPU than just Pixel Engines/ROPs.

Like I said before going to NV is definitely the cheapest solution for Sony. Sony Plan A solution broke the bank so not feasible, Plan B look elsewhere for solutions that meet the cost. NV won the contract because it meets their cost, where others might not.

That is a fancy way of looking at things: it is not that nVIDIA's solution is superior, it is the most affordable.

Plan A might have not passed budgeting (internal... Visualizer), but I am not aware that plan B (which jad been the official plan for quite a while) went overbudget as the contractors for plan B seemed to be a bit surprised at the coup that nVIDIA managed.

I think nVIDIA solution might have out-performed and out-featured plan B, but Toshiba had the lower cost (being Sony/SCE's partner, having already done work with XDR, Redwood and CELL they could afford to push the envelope without needing too much R&D money [less money than what Sony/SCE deemed as maximum R&D budget for the GPU]).

I also think that nVIDIA might have convinced Sony/SCE because of their change on the royaltes issues: it might have been what Sony/SCE needed (plus a nice big argument by nVIDIA about where the future of GPUs was headed and how the GPU they selected so far was nowhere near, according to nVIDIA of course, what Sony/SCE needed for PlayStation 3) to finally change their minds.

nVIDIA did not even seem a likely winner of the contract in the two weeks before the announcement was made, not at all.
 
Back
Top