RSX and PR bull?

Nemo80 said:
You are forgetting that RSX might probably lack the whole MPEG decoder logic, which is about 20 mio Transistors alone (and some other PC only features).

Despit that, it has much more transistors than the current G70.:rolleyes:

Who's forgetting that? It's just not relevent here is all. And actually, I thought the transistor count came just about even in the initial claims: both are claimed to be ~300 million. Certainly 20 million might be soaked up by the new I/O interface (and perhaps turbo cache?) scheme alone.
 
The "built-in redundancy" could mean they added a quad to the g70. It could also mean that it's closer to 7800gt than the 7800gtx... that is, with part of the g70 being turned off.
 
Jaws said:
REYES pipeline!

We know the 136 inst/cycle matches with E3 but the DOTs/cycle don't. There are NV patents for geometry shaders/programmable primitive processors etc. The 24 PS units will largely remain unmodified but the vertex/geometry pipeline/triangle setup will get modified so that micro-polygons can get shaded like conventional fragments using the PS units!

*Warning: Extreme speculation!*

16simds*8renderprocessor+8spe=136 ints/cycle :D
 
By redundancy RSX will either have more pipes than the 7800 GTX for minimum performance, or the same number with some disabled. I dunno how big a quad is in transistors. Could be the spare room (video circuitry etc.) is taken up with a redundant quad? Or the quad takes up only 10% of that and the rest is 'special sauce'? Or the real-estate given over to pixel shaders is the same as G70 and there's two helpings of 'special sauce'! Really we need a proper transistor count for RSX to make any such guesses. But curses, Sony aren't as inept as MS at keeping secrets! (Okay, no-one's as inept as MS at keeping secrets. Or even development hardware!)
 
xbdestroya said:
Who's forgetting that? It's just not relevent here is all. And actually, I thought the transistor count came just about even in the initial claims: both are claimed to be ~300 million. Certainly 20 million might be soaked up by the new I/O interface (and perhaps turbo cache?) scheme alone.

One thing that probably is the easiest to do is to slapp on 2quads and 2VS so you have a 10VS/8Fragmentquads. I think that would be the most "logical" choise.

Wait and see i guess.
 
Jaws said:
REYES pipeline!

We know the 136 inst/cycle matches with E3 but the DOTs/cycle don't. There are NV patents for geometry shaders/programmable primitive processors etc. The 24 PS units will largely remain unmodified but the vertex/geometry pipeline/triangle setup will get modified so that micro-polygons can get shaded like conventional fragments using the PS units!

*Warning: Extreme speculation!*

The 136 number matches with the G70 but in either way nVidia wouldnt talk about any future parts. So in the end the DOT and all is pretty pointless because it dont tell anything specific. If MS had stated 192 ops for the ATI part im sure they either had said nothing or come up with some other "magic" formula-numbers.

I know the rest you said was bullshit but i answered to the part that wasnt.. :)
 
overclocked said:
One thing that probably is the easiest to do is to slapp on 2quads and 2VS so you have a 10VS/8Fragmentquads. I think that would be the most "logical" choise.

Wait and see i guess.

Well if we're taking it in the number of quads and VS pipes direction... I'm just wondering if we have any idea what the transistor count is for the various G70 sub-components, like the ROPs, quads, and VS's. That might allow us to venture forth some reasonable guesses in terms of a 'beefier' RSX, or it might help to rule it out.
 
Last edited by a moderator:
xbdestroya said:
Well if we're taking it to the number of quads and VS pipes direction, I'm just wondering if on the side we have any idea what the transistor count is for the various G70 sub-components, like ROP, quad, and VS. That might allow us to venture some reasonable guesses forth in terms of a 'beefier' RSX, or it might rule it out in a way.

Many have speculated as you know but any "certain" figure for the various elements is unknown even if some has made pretty good estimations.
But it would have an smaller die than G70 for certain.
Then theres the economics and i would guess they could verywell afford a big die as their 90nm process is very matured.

Also how many units Sony has estimated to sell the first year before going 65nm in 07 would also play a part i think.
 
I agree, Sony's 90nm process is very mature, and the fact that NVidia seems to get decent-ish yields at 110nm with an otherwise huge die size is encouraging. Still with cost considerations I'm wondering if they would actually push it as far as they could on a technical level. 65nm may even come on as early as next year for Sony though - so they might be willing to swallow otherwise outsized losses on the RSX for the launch, viewing it as a short term situation.
 
overclocked said:
The 136 number matches with the G70 but in either way nVidia wouldnt talk about any future parts.

But they did, they were talking about the future RSX.

overclocked said:
So in the end the DOT and all is pretty pointless because it dont tell anything specific.

Dots/cycle will tell you the 'number' of 'execution units' capable of Dot products/cycle, i.e. vec2 (Dot2), vec3 (Dot3), vec4 (Dot4) etc...

overclocked said:
If MS had stated 192 ops for the ATI part im sure they either had said nothing or come up with some other "magic" formula-numbers.

They never stated 192 ops, but they stated 48 billion 'shader' ops/sec -> 96 shader ops/cycle -> 96 inst./cycle for the shader units.
 
He's going with an RSX theory that puts 8 SPE's actually *on* the RSX chip itself, kind of an NVidia/'Visualiser' hybrid. Total zaniness, but typical Version fare. :)
 
Jaws said:
REYES pipeline!

We know the 136 inst/cycle matches with E3 but the DOTs/cycle don't. There are NV patents for geometry shaders/programmable primitive processors etc. The 24 PS units will largely remain unmodified but the vertex/geometry pipeline/triangle setup will get modified so that micro-polygons can get shaded like conventional fragments using the PS units!

*Warning: Extreme speculation!*


Reyes isn't just about micropolygons... First you bound every object in camera space, then it's size is tested and if it's too big, it'll be splitted. Any primitive that's outside of the view frustrum gets culled immediately. This loop continues until a given limit is reached, then each small primitive enters the pipeline indipendently. This is where it gets converted into a grid, which is diced into micropolygons. So at this point a large amount of the scene has already been thrown out.

PRMan then shades every vertex of the grid using a SIMD approach, and only after this will the grid undergo the hidden surface evaluation. It's performed in this order because a displacement shader usually moves vertices and thus the results from hiding before shading could be wrong. Micropolygons are tested individually, so you'll have to split the grid apart for this. This approach to visibility testing also means that geometry AA is decoupled from shading AA (which is dependent on the actual size of the micropolygons, ie. grid vertex density).
Motion blur is performed by shading the primitive's vertices at the starting point of its motion, and the movement is always linear. This creates a result that's phisically wrong, but the speed hit is very small for it.

I'm not a coding expert, but I can see the following possible issues with a hw-based reyes renderer:
The efficiency is optimized for huge scene sizes and HOS geometry. Small, simple scenes actually render pretty slow.
PRMan relies heavily on background storage and caching for both geometry and texture data. The way the pipeline works means that you can stream data through it from disk, and throw away a lot of data as soon as it's been processed.
But you do have to keep one information for each pixel, and that is a list of micropolygon vertices that cover any sampling points under that pixel. That's because primitives are streamed in no particular order and the pixel cannot be computed until all the geometry has been processed. This is a HUGE amount of data. PRMan overcomes this problem by bucket rendering, which is basically tiling; so primitives will also get sorted when they're bound in the first stage of the pipeline. This also means that the whole geometry database has to be kept in memory, as a tradeoff for huge lists of visible points.

Implementing Reyes in hardware means that it most likely has to be some sort of a deferred renderer. With offline rendering, PRMan already has the scene data written out to disk in RIB files (which can take more than 50% of total rendering time to create from Maya scenes or whatever), but in a realtime enviroment you have to capture it.
Also, dicing up primitives into grids and then into micropolygons can not be done with a vertex shader AFAIK, and Cell's SPEs probably don't have the bandwith to feed the RSX with micropolygons, that'd then had to be written out into some memory once the shading has been done. Working with tiles, this memory should be located on the chip - but can it store enough data for nextgen complexity?

All in all, Reyes is highly unlikely IMHO, perhaps even for the PS4/Xwhatever.
 
Laa-Yosh said:
Reyes isn't just about micropolygons... First you bound every object in camera space, then it's size is tested and if it's too big, it'll be splitted. Any primitive that's outside of the view frustrum gets culled immediately. This loop continues until a given limit is reached, then each small primitive enters the pipeline indipendently. This is where it gets converted into a grid, which is diced into micropolygons. So at this point a large amount of the scene has already been thrown out.

PRMan then shades every vertex of the grid using a SIMD approach, and only after this will the grid undergo the hidden surface evaluation. It's performed in this order because a displacement shader usually moves vertices and thus the results from hiding before shading could be wrong. Micropolygons are tested individually, so you'll have to split the grid apart for this. This approach to visibility testing also means that geometry AA is decoupled from shading AA (which is dependent on the actual size of the micropolygons, ie. grid vertex density).
Motion blur is performed by shading the primitive's vertices at the starting point of its motion, and the movement is always linear. This creates a result that's phisically wrong, but the speed hit is very small for it.

I'm not a coding expert, but I can see the following possible issues with a hw-based reyes renderer:
The efficiency is optimized for huge scene sizes and HOS geometry. Small, simple scenes actually render pretty slow.
PRMan relies heavily on background storage and caching for both geometry and texture data. The way the pipeline works means that you can stream data through it from disk, and throw away a lot of data as soon as it's been processed.
But you do have to keep one information for each pixel, and that is a list of micropolygon vertices that cover any sampling points under that pixel. That's because primitives are streamed in no particular order and the pixel cannot be computed until all the geometry has been processed. This is a HUGE amount of data. PRMan overcomes this problem by bucket rendering, which is basically tiling; so primitives will also get sorted when they're bound in the first stage of the pipeline. This also means that the whole geometry database has to be kept in memory, as a tradeoff for huge lists of visible points.

Implementing Reyes in hardware means that it most likely has to be some sort of a deferred renderer. With offline rendering, PRMan already has the scene data written out to disk in RIB files (which can take more than 50% of total rendering time to create from Maya scenes or whatever), but in a realtime enviroment you have to capture it.
Also, dicing up primitives into grids and then into micropolygons can not be done with a vertex shader AFAIK, and Cell's SPEs probably don't have the bandwith to feed the RSX with micropolygons, that'd then had to be written out into some memory once the shading has been done. Working with tiles, this memory should be located on the chip - but can it store enough data for nextgen complexity?

All in all, Reyes is highly unlikely IMHO, perhaps even for the PS4/Xwhatever.


FWIW I don't believe that REyes lends itself to a good hardware solution to the sorts of scenes we render in realtime.

A lot of it's arcitecture is built to allow the efficient streaming of assets from disk, important in the offline world, not so much in realtime where you have access to everything you need to draw the next frame.

It's also completly possible to do anything an REyes renderer can do on a conventional pipeline so I don't really understand the attraction. The only thing we've really been missing up to now is adaptive tessalation and you can do that without resorting to micropolygons for everything.
 
Jaws said:
But they did, they were talking about the future RSX.



Dots/cycle will tell you the 'number' of 'execution units' capable of Dot products/cycle, i.e. vec2 (Dot2), vec3 (Dot3), vec4 (Dot4) etc...



They never stated 192 ops, but they stated 48 billion 'shader' ops/sec -> 96 shader ops/cycle -> 96 inst./cycle for the shader units.

1. Yes an no i would say. They were talking about the 7800GTX, but in various statements from different people in nVidia AFTER they launched the GTX they havnt made a such with Sony.

2. Im not going to digg depper or pretend im a guru at programing because im a stinker(my thing is pre-artwork) but i would let common sense atleast for me decide that you cant trust on those numbers because they can be twisted in so many ways.
Numbers on consoles IS PR-based and as such should be taken with a big grain of your favorite spice.. :)

3. No they didnt but if they have done just that all im saying is that im quite sure we could not have this discussion right now as there wouldnt be any numbers to base anything on.
Or it could have been higher and then the discussion from the whole time since E3 and so on been different just based on it.

Im not trying to say that you can not come to accurate conclusions and its fun to play with numbers but its based on PR and thats my point really.
 
ERP said:

Yeah, I've been thinking about the same things.
Besides, content creation for a reyes-based architecture would be a nightmare, not to mention developing for it until there's final hardware ;)
 
Since we're speculating, what if the RSX is based on NV's Quadro GPU's? Isn't it a HW accelerator for the REYES based Gelato renderer from NV, though non-realtime? Out of curiosity, what are the main 'architecture' differences between Quadro and Geeforce GPU's?
 
Jaws said:
Since we're speculating, what if the RSX is based on NV's Quadro GPU's? Isn't it a HW accelerator for the REYES based Gelato renderer from NV, though non-realtime? Out of curiosity, what are the main 'architecture' differences between Quadro and Geeforce GPU's?

Well i haven't kept in the loop for a while, but up until the NV3x's there were NO hardware differences between Quadros and gaming cards. Apart from a switch that "told" the driver you had a Quadro obviously.
That's why it was relatively easy to softmod gaming card into Quadros. My 5900 was working fine as a Quadro for quite a while, then i reverted it back to normal for gaming reasons.

Not sure if it's still the case with newer cards though.
 
Gelato isn't reyes-based anyway. Larry Gritz and co. wrote a Renderman standard renderer called Entropy before, that was probably reyes-like - but as it was a serious competitor to PRMan, Steve Jobs sued them out of business...
 
Jaws said:
Since we're speculating, what if the RSX is based on NV's Quadro GPU's? Isn't it a HW accelerator for the REYES based Gelato renderer from NV, though non-realtime? Out of curiosity, what are the main 'architecture' differences between Quadro and Geeforce GPU's?

Quadros are basically slower versions of the mainstream cards with unique drivers that have better support for content creation apps. And accellerate AA lines.
 
Back
Top