nVidia shader patent (REYES/Raytracing/GI) destined for PS3?

Li Mu Bai wrote:

Why is it that every patent filed by Toshiba/Sony/IBM somehow relates directly to the PS3? Especially when Cell has much further reaching technical implications for all parties involved? Reyes has been Pana's love child since any Cell patent was 1st filed, or the technology discussed. We all know that this amalgamation of companies didn't join together to solely create the PS3. I realize there's some merit in well-grounded intelligent speculation, but when the PS3 spec. sheet is finally revealed I can already foresee disappointment due to this type of thing. It's already been proven that the PS3 is not the behemoth that it was initially touted to be, but a beast nonetheless. Also, how many tech. patents are existing out there that have yet to come to fruition? (engineers will face headaches even in initial Reyes implementation, not to mention the external memory requirements for essentially a stream processor, ala the PS3 & its subpar random memory access for this type of full application.) Reyes has some time yet before it arrives & creates a paradigm shift in console or PC GPUs. (for many of the developers are one in the same, & this generation will see even more PC dev. cross-platform work, due to the more powerful console architectures & advanced feature sets identical to those of the high-end cards. Surpassing them even for a time) PS4, XBX2, & the NES6 is far more feasible for full Reyes utilization.

It will indeed be some time before the x86 domination of PCs comes to an end as well, unless you're underestimating the time frame in which MS wants/dictates the direction to advance or change. Unfortunately, they also possess the power & consumer mindshare currently to marginalize almost any OS application. Even if it is more efficient, etc. Gasp, I'm tired of reading these Reyes-centric threads. I can still remember when all of those early adherents of Sony's 1 teraflop power claims were defending its technical validity in a multitude of threads. Rolling Eyes Let's wait for more tangible evidence that this indeed is a real possibility, & not simply the first intersection rays if that.


I tend to agree with some of the things you've said here. although other things you said, I don't understand. but lets focus on where we agree.

I believe that alot of the things that some people were hoping for on PS3 will not happen until PS4. like fully Reyes rendering. raytracing. GI. etc.

It's already been proven that the PS3 is not the behemoth that it was initially touted to be, but a beast nonetheless.

I agree that will probably end be being true.

some of the patents filed will probably not be implemented until PS4. I don't think we'll see much if any raytracing until PS4. maybe not even on Xbox3 if X3 comes out by 2009-2010. I don't think we will see even moderate amounts of medium quality raytracing on complex objects and in evironments until well into the next decade. even that might be too soon. but i would hope not. I'd like to see this stuff before i'm too old & grey to see videogames 8)

PS3 will indeed be a beast. but not a GodStation (tm) or RayStation (tm)
:LOL:
 
What would you consider the gecko in gamecube?
There's not much to consider - Gekko is a customized version of PPC750cx core. The docs are crystal clear about it, just as they are about PSP CPU being a modified R4000.
(it's actually instruction level compatible(backwards) with 7xx series, so in theory it could be possible to make GCN run older versions of MacOS).
 
Megadrive1988 said:
I don't think we will see even moderate amounts of medium quality raytracing on complex objects and in evironments until well into the next decade. even that might be too soon. but i would hope not. I'd like to see this stuff before i'm too old & grey to see videogames 8)
Seeing how relatively well realtime software raytracing works on a boggstandard PC, I would think that, at least shadow raytracing, would be doable nextgen.
 
Fafalada said:
What would you consider the gecko in gamecube?
There's not much to consider - Gekko is a customized version of PPC750cx core. The docs are crystal clear about it, just as they are about PSP CPU being a modified R4000.
(it's actually instruction level compatible(backwards) with 7xx series, so in theory it could be possible to make GCN run older versions of MacOS).

So xbox2's cpu won't be compatible with the PowerPC line then? How similar will the final cpu be to the dev kits?
 
Fox5 said:
Fafalada said:
What would you consider the gecko in gamecube?
There's not much to consider - Gekko is a customized version of PPC750cx core. The docs are crystal clear about it, just as they are about PSP CPU being a modified R4000.
(it's actually instruction level compatible(backwards) with 7xx series, so in theory it could be possible to make GCN run older versions of MacOS).

So xbox2's cpu won't be compatible with the PowerPC line then? How similar will the final cpu be to the dev kits?

How did you reach that conclusion based on Faf's statement.

The fact that the CPU core going into the next XBox *might be* a clean design doesn't mean that it doesn't implement the PowerPC ISA.

Considering the dev kits are PPC970 based I think it's fairly certain that the end product will be similar in features. In particular the SIMD extensions (Altivec, VMX, whatever) are well designed (elegant and powerful).

Cheers
Gubbi
 
Gubbi said:
Fox5 said:
Fafalada said:
What would you consider the gecko in gamecube?
There's not much to consider - Gekko is a customized version of PPC750cx core. The docs are crystal clear about it, just as they are about PSP CPU being a modified R4000.
(it's actually instruction level compatible(backwards) with 7xx series, so in theory it could be possible to make GCN run older versions of MacOS).

So xbox2's cpu won't be compatible with the PowerPC line then? How similar will the final cpu be to the dev kits?

How did you reach that conclusion based on Faf's statement.

The fact that the CPU core going into the next XBox *might be* a clean design doesn't mean that it doesn't implement the PowerPC ISA.

Considering the dev kits are PPC970 based I think it's fairly certain that the end product will be similar in features. In particular the SIMD extensions (Altivec, VMX, whatever) are well designed (elegant and powerful).

Cheers
Gubbi

So its relation to PPC970 might be more like athlon to a 386 than xcpu to pentium 3?
 
Fox5 said:
So its relation to PPC970 might be more like athlon to a 386 than xcpu to pentium 3?

Exactly. It'll implement the PPC Instruction Set Architecture (ISA), but might be a completely new core (altivec and all).

Cheers
Gubbi
 
Li Mu Bai said:
It's already been proven that the PS3 is not the behemoth that it was initially touted to be, but a beast nonetheless....

...I can still remember when all of those early adherents of Sony's 1 teraflop power claims were defending its technical validity in a multitude of threads. Rolling Eyes

Hate to nit-pick, but this is incorrect. All we know is that the PE concept is valid as many here have been stating and that they've exceeded their clock estimates by a significant margin when fabricated on 90nm sSOI.

Basically, we know that the modular PE is correct; what's still an unknown is how concurrent they can fabricate it on 65nm -- which will ultimately determine the processing capabilities. I wouldn't be surprised if it came out during an ISSCC presentation.
 
Hate to nit-pick, but this is incorrect. All we know is that the PE concept is valid as many here have been stating and that they've exceeded their clock estimates by a significant margin when fabricated on 90nm sSOI.

Basically, we know that the modular PE is correct; what's still an unknown is how concurrent they can fabricate it on 65nm -- which will ultimately determine the processing capabilities. I wouldn't be surprised if it came out during an ISSCC presentation.

I do respect your opinion Vince, but I still don't see the PS3 attaining that kind of computing power. I've conceeded that it will indeed be powerful, though until it actually *acheives* a 10th to the 12th power floating-point operations per second benchmarked, I must remain skeptical. IIRC, these same presentations, papers, etc. had the PS2 capable of accomplishing things no one has seen yet, & no developer has been able to extract. And I suppose that the PS4 will acheive petaflop speeds? ;)
 
Gubbi said:
Considering the dev kits are PPC970 based I think it's fairly certain that the end product will be similar in features. In particular the SIMD extensions (Altivec, VMX, whatever) are well designed (elegant and powerful).

VMX is o.k. but there are better SIMD architectures on the way ;)
 
DeanoC said:
Gubbi said:
Considering the dev kits are PPC970 based I think it's fairly certain that the end product will be similar in features. In particular the SIMD extensions (Altivec, VMX, whatever) are well designed (elegant and powerful).

VMX is o.k. but there are better SIMD architectures on the way ;)

APUs/SPUs or the beefed-up VMX units of Xenon/Xbox 2's CPU ;) ?????????????
 
though until it actually *acheives* a 10th to the 12th power floating-point operations per second benchmarked, operating under real-world gaming complexity scenarios, I must remain skeptical
Programmable FLOPS never refer to some nebulous fantasy of "real-world" usage - it's a simple benchmark of what's the fastest operations the computation units can do.
Any kind of real code will always be lower then that, even with 0 latencies and theoretical infinite bandwith available to the unit.

Because nowadays most vector units are pretty similar (1cycle dotproduct or muladd) directly comparing FLOP number across platforms actually has some meaning - so long as we stick to comparing non hardwired units.
 
Fafalada said:
though until it actually *acheives* a 10th to the 12th power floating-point operations per second benchmarked, operating under real-world gaming complexity scenarios, I must remain skeptical
Programmable FLOPS never refer to some nebulous fantasy of "real-world" usage - it's a simple benchmark of what's the fastest operations the computation units can do.
Any kind of real code will always be lower then that, even with 0 latencies and theoretical infinite bandwith available to the unit.

Because nowadays most vector units are pretty similar (1cycle dotproduct or muladd) directly comparing FLOP number across platforms actually has some meaning - so long as we stick to comparing non hardwired units.

Although it doesn't measure the practicalities of feeding the units in various usage scenarios or the difficulties in hiding latency incurred in using the units, or memory.
 
DeanoC said:
Gubbi said:
Considering the dev kits are PPC970 based I think it's fairly certain that the end product will be similar in features. In particular the SIMD extensions (Altivec, VMX, whatever) are well designed (elegant and powerful).

VMX is o.k. but there are better SIMD architectures on the way ;)

Do you have a concrete example ? Better how ?.

Inquiring minds want to know :)

Cheers
Gubbi
 
Pure speculation, but how about a superscalar SIMD design
with OOOE and SMT (4 threads say). Drop the PPC FPU, keep the integer units, and add multiple VMX units, allowing for dispatch of multiple SIMD ops out of order.
 
ERP said:
Although it doesn't measure the practicalities of feeding the units in various usage scenarios or the difficulties in hiding latency incurred in using the units, or memory.
Of course, but that would be much harder to clearly present with mere paper specs.

Gubbi said:
Do you have a concrete example ? Better how ?.
Actually while we're at it, I would argue that PSPs VFPU is already a step forward from VMX (for FPU portion that is, there's no integer SIMD there though :( ).
 
Fafalada said:
Gubbi said:
Do you have a concrete example ? Better how ?.
Actually while we're at it, I would argue that PSPs VFPU is already a step forward from VMX (for FPU portion that is, there's no integer SIMD there though :( ).

Do you have a link to some information on the CPU/VFPU? Googling turns up a sea of press releases but very little info regarding architecture and instruction sets.

Edit: Is it like PS2's VU0/1 ? In that case it's really not an SIMD extension but rather a stand alone SIMD vectur unit, right?

Cheers
Gubbi
 
Gubbi said:
Do you have a concrete example ? Better how ?.

Inquiring minds want to know :)

Any details and I'd have to kill you :(

But taking a general tack, its fairly easy to see how we can improve a SIMD unit for the modern CPU landscape.

Problem A: RAM speeds
A SIMD unit can use alot of RAM (a 4x4 float matrix takes 1/2KB). RISC memory units are too slow (Load/Store as seperate instructions) what we want is old fashioned CISC direct to/from memory. Of course we actually want a small pool of very fast RAM. Lets call that the "register pool", saves any embarassment from RISC fans :)
So solution to problem A is to have so many registers, its uses the same amount of memory as 8 bit computers used to have. Cell SPU has mentioned 128 128bit registers (16KB), which sounds a good figure.

Problem B: RAM speeds
O.K even with lots of registers I have to read/write stuff sometimes. If I'm going to it would be good to compress everything, say using a decoder like that is fitted to every vertex shader (including PS2) to unpack/pack data.
So solution to problem B is to have dedicated instructions/units for packing/unpacking in the formats most likely to be encountered by GPUs and CPUs.

Problem C: RAM speeds
Still sometimes we are going to stall due to memory latency, so if that happens lets makes sure we have some thread contexts we can switch to see if they could be doing somethin useful.
So solution to problem C is the have multiple thread contexts per core. If one thread stalls, switch to another and do some useful work.

Problem D: We need to pretend that FLOPs figures are really important.
SIMD ALUs are cheap, so lets have a few. Makes the paper figures look good, even though the real problems are A, B and C.
So solution to problem D is to have N SIMD cores.
Note: I'm being overly sarcastic ;-) There are lots of good reasons why having multiple cores is a good things. Its just finding more than about 2 non graphical math intensive (physics and sound are the obvious candidates) tasks gets real hard quickly.

A good SIMD unit will address at least 2 of theses, a really good one will address all 4... The last two are really CPU architecture issues, but the SIMD units have to integrated into the thread architecture to get good performance.
 
Back
Top