More info about RSX from NVIDIA

About normal compression

" Normal maps can be compressed 2:1 using the V8U8 compression technology, and applications using the 3DC compression format will transparently work on the GeForce 7800 GTX." U8V8 is a signed two-component format that creates slightly larger textures than regular DXTC texture compression, but can be decompressed a bit more quickly. We haven't heard of any games that are using it, and don't know how its quality compares to ATI's 3DC texture compression for normal maps. What we do know is this: The "transparent" support of 3DC does not mean the 7800 GTX can read 3DC textures. If a game uses a compressed normal map in 3DC format, Nvidia's drivers use the CPU to decompress it at load time and recompress it to U8V8 compressed format before loading it into the card's local RAM. "
 
I'm a little confused. G70 doesn't seem to have 128-bit framebuffers or 128-bit blending on the framebuffer for FP32 HDR. Isn't that one of the things NVidia focussed on at the PS3 conference? Perhaps I'm confusing something somewhere..
 
Rockster said:
So, if G70 = RSX, then total programmable flops per clock is:

RSX - 464 (no tex) or 272 (with 24 tex), max 8 vertex fetches & 16 pixels w/ 2xAA
Xenos - 480 (with 16 tex), max 16 vertex fetches & 8 pixels w/ 4xAA
...

Xenos can only issue 96 instructions/cycle ~ 48 vec4 + 48 scalar. It doesn't have any further instructions available per cycle for any other execution units? :?

Xenos ~ 480 Flops/cycle + NOTHING ELSE?

G70 can issue 136 instructions per cycle.

e.g. 64 instructions on 56 vec4+ 8 scalar ~ 464 FLOPS/cycle AND it still has 72 UNUSED instructions per cylce.

G70 ~ 464 Flops/cycle + 72 instructions/cycle on further operations.

Otherwise you're breaking the shader ALUs instruction limits it seems...

EDIT: I know the Xenos texture ALUs are decoupled but something doesn't add up with these 'peak' use of instrucutions/cycle...
 
Titanio said:
I'm a little confused. G70 doesn't seem to have 128-bit framebuffers or 128-bit blending on the framebuffer for FP32 HDR. Isn't that one of the things NVidia focussed on at the PS3 conference? Perhaps I'm confusing something somewhere..

Maybe that's where the trannies went. Take out unnecessary PC junk out of G70 (DVD/MPEG4 decode assist for example), and replace it with 128 bit HDR stuff. There still both around 300M, but that provides a feature difference for RSX that Sony could tout over G70.
 
Don't you think that those mini-alus (2 flops each) should be counted too.

They are. Please understand that the pixel shader alu's have 4 inputs and 4 outputs. So if you use the mini-alu it means you aren't using 1 of the add and mul units inside that main alu. It's just exchanging 2 flops in the mini-alu for 2 flops in the main alu, so the total is still the same. There is no way to use both the mini-alu and all of the main alu at the same time. Hopefully that makes sense.

The change in philosophy doesn't change the basic building blocks and supporting structures, just their arrangement and implementation. There are more reasons to keep the things we are talking about in the architecture, than there are for taking them out. There isn't an argument for not normalizing them at this point.

In terms of transistor count, it's important to note Xenos lacks much of the extraneous components (all the ROPS, 2D, Display TMDS, video engine, etc.) vs a PC chip and that vendors don't always count the same way.
 
Though 128 bit HDR sounds a ridiculous waste to me. Doubt there'll be any visible bonus - at least nothing tangible to Joe Public. I'm wondering if RSC incorporates some degree of GS emulation hardware to aid BC?
 
:? Using double to colour resolution data size at it's cost, to communicate with Cell that works at single precision? Can't see any sense in that. What am I missing?
 
Jaws, those instruction counts are not max's across the entire chip as you suggest. Clearly they relate to the maximum number of execution units active per clock, but which units those are hasn't been clearly defined. The 136 likely includes things like norm, fog, etc. Surely you aren't suggesting that ALU's must sit idle for fog or vertex fetch because of a lack of instruction slots.
 
Shifty Geezer said:
Though 128 bit HDR sounds a ridiculous waste to me. Doubt there'll be any visible bonus - at least nothing tangible to Joe Public.

The speculation is that it would be included so that the results of ops outputted at 128-bit precision could be shared seamlessly with Cell. Although I'm not sure how using 64-bit precision would make that any more diffult (perhaps some of your register width on cell would go to waste with 64-bit results, if you couldn't then pack 2 into one register?).

edit - nao beat me to it.
 
Shogmaster said:
Maybe that's where the trannies went. Take out unnecessary PC junk out of G70 (DVD/MPEG4 decode assist for example), and replace it with 128 bit HDR stuff. There still both around 300M, but that provides a feature difference for RSX that Sony could tout over G70.

MPEG4 decode can't be used? It will no doubt be used in Blu-Ray discs.

So the Cell will be more than enough to process all the codecs which might be used in Blu-Ray discs?

People who've tried to play back 1080p movie trailers using QuickTime 7 noted they choked their high-end CPUs (Athlon-64s).
 
wco81 said:
MPEG4 decode can't be used? It will no doubt be used in Blu-Ray discs.

So the Cell will be more than enough to process all the codecs which might be used in Blu-Ray discs?

People who've tried to play back 1080p movie trailers using QuickTime 7 noted they choked their high-end CPUs (Athlon-64s).

Cell is perfect for that kind of work, and thoroughly outperforms the latest desktop chips in those areas. IIRC, Sony mentioned the PS3 CPU could decode 12 hi-def streams simultaneously (don't know if that was 720p or 1080p, but even if it were the former, I guess that'd suggest perhaps up to 6 1080p streams simultaneously), so yeah, that logic on the GPU could be superfluous.
 
wco81 said:
Shogmaster said:
Maybe that's where the trannies went. Take out unnecessary PC junk out of G70 (DVD/MPEG4 decode assist for example), and replace it with 128 bit HDR stuff. There still both around 300M, but that provides a feature difference for RSX that Sony could tout over G70.

MPEG4 decode can't be used? It will no doubt be used in Blu-Ray discs.

So the Cell will be more than enough to process all the codecs which might be used in Blu-Ray discs?

People who've tried to play back 1080p movie trailers using QuickTime 7 noted they choked their high-end CPUs (Athlon-64s).

Why bother with GPU based "assist" when the Cell can decode video so damn well by itself? That's a damn waste. no? Cell is more or less built for streaming media anyways.
 
MPEG4 decode can't be used? It will no doubt be used in Blu-Ray discs.
I would HOPE they removed it - it's a completely useless waste of transistors. Actually there's other useless legacy crap in G70 that has no real place in RSX (Fog ALUs, HELLO? :? )... but oh well...

So the Cell will be more than enough to process all the codecs which might be used in Blu-Ray discs?
Even EE is enough to decode 1080P codecs, Cell doesn't even have a proper workout doing decoding.
 
Yeah it's just astounding that these high-priced PC CPUs can't do it so they will need a video card that does H.264 hardware decoding.

EE can decode 1080p in software?

One of the thing MS boasts about VC-1 compared to H.264 is that it doesn't chew up CPU as much.

Guess it must be true but there doesn't seem to be as much VC-1 content in 1080p. At least not those as widely distributed as the movie trailers Apple is hosting.
 
http://www.4gamer.net/news.php?url=/news/history/2005.06/20050622060100detail.html

machine translation:

An extreme getting depressed is not seen too much in 7800GTX (graph 5) while the frame rate greatly decreases to 6800Ultra when the full-screen anti-aliasing (henceforth FSAA) etc. are applied by a high resolution when the frame rate of "Far Cry" of the performance difference in the single card operation is seen. Even if FSAA x8 and anisotropic filtering x16 (Anisotoropic Filtering and henceforth Anisotropic) are applied by especially 1280×1024 dots, the point that 1024×768 dots and scores hardly change is praiseworthy.
It is similar, and the performance of 7800GTX in a high resolution is very excellent in the frame rate of "DOOM 3" in graph 6. It is paid attention when filtering is applied though the numerical value indeed gets depressed. necessary of (57.8fps)'s that becomes a game even if FSAA x8 and Anisotropic x16 are applied at the same time by 1024×768 dots
There are not so many differences between 7800GTX and 6800Ultra because the load of the game is light when only the resolution is purely changed and it compares it with Unreal Tournament 2003 and Unreal Tournament 2004 (graph 7 and 8). There is 6800Ultra according to the map even when the score is high.
7800GTX is ,in a word, a graphics chip that obtains a good performance from 6800Ultra when an advanced filtering is applied a higher resolution environment. It can be called a product suitable for needs of a display market and a high-end game player advanced from around to a high resolution of putting of the reference card the graphics memory capacity 512MB at the time of beginning in view and the designs.


20050622060100_33.jpg
 
wco81 said:
Yeah it's just astounding that these high-priced PC CPUs can't do it so they will need a video card that does H.264 hardware decoding.

EE can decode 1080p in software?

One of the thing MS boasts about VC-1 compared to H.264 is that it doesn't chew up CPU as much.

Guess it must be true but there doesn't seem to be as much VC-1 content in 1080p. At least not those as widely distributed as the movie trailers Apple is hosting.

Both the EE and certainly the CELL are FP monsters, which is exactly what decoding needs. Add in the fact that PC's beyond not being particularly good at floating point instructions, have a huge ass OS and tons of 'background' processes running, and the decoding applications have to be written to generic hardware and OS software specifications, and you can see why they struggle to do it well.

If a decoder could be written and got the CPU dedicated to that process, it would most likely be able to handle it.
 
ERP said:
Yes I did this math about a year ago, but to put that in perspective write a shader that does subsurface scattering self shadowing and parallax mapping etc and look at the ALU op count. It's REALLY easy to get to 100+ ops/pixel without trying hard.
I agree, but we should look this from a different standpoint, RSX is not alone, CELL could help with a lot of this kind of stuff.
Shading can be partially 'moved' from pixels to vertices..or SPE can be used to post process shaded fragments, I've no doubts a lot of people will come up with things we never thought before ;)
 
Back
Top