More info about RSX from NVIDIA

nAo · Jun 22, 2005

jvd said:
The rsx will never have acess to 57gb of bandwidth . Unless of course the cell will be disabled in games

RSX can use all 35 GB/s from/to CELL without hitting XDR ram

ERP · Jun 22, 2005

nAo said:
I believe most NG games will use a first pass to lay down zbuffer to remove opaque ovedraw.
Now RSX has 24 pixel pipelines clocked at 550 MHz and we have to use them to shade 100 Mpixel/s (a 1080p frame buffer @ 50 FPS).
24*550/100 = 132 clocks per pixel.
More than 200 dot4 and 100 nrm per pixel (efficiency is not 100% of course!)..well..it seems good to me
Even with a lot of texture layers available programmable flops count is still very high

Yes I did this math about a year ago, but to put that in perspective write a shader that does subsurface scattering self shadowing and parallax mapping etc and look at the ALU op count. It's REALLY easy to get to 100+ ops/pixel without trying hard.

Xenus · Jun 22, 2005

No it think its just an extrapolation of the 700mhz GDDr3 if it was using a 256bit memory bus.

scooby_dooby · Jun 22, 2005

hmm strange wording.

"Supports" != Requires

I think common sense tells you that if this were true Sony would've announced it in a more visible way, to make a big deal of the fact their games will all be 1080p. Not some strangely worded press release.

jvd · Jun 22, 2005

nAo said:
jvd said:

The rsx will never have acess to 57gb of bandwidth . Unless of course the cell will be disabled in games

Click to expand...

RSX can use all 35 GB/s from/to CELL without hitting XDR ram

And what is it going to do with that 35gbs ? Is it going to store its framebuffer there ? feature textures from the cell ? What exactly ?

If it does either of those its going to still have to hit xdr ram through the cell thus either slowing the cells acess to the xdr ram or having its access to the xdr ram slown down .

Tap In · Jun 22, 2005

scooby_dooby said:
hmm strange wording.

"Supports" != Requires

I think common sense tells you that if this were true Sony would've announced it in a more visible way, to make a big deal of the fact their games will all be 1080p. Not some strangely worded press release.

as I stated earlier, I don't see how them saying "considered the standard" changes anything from what we new yesterday about this possibility.

We all know it will support it and I can "consider" myself to be the brightest guy on the board

but we all know that it is only true at times when I'm the only one logged in, not all the time.

I still don't see where it says "no game will be approved without it" as MS has told devs about 720p.

alexsok · Jun 22, 2005

scooby_dooby
Oh you mean required as in for instance "MS requires anti aliasing utilization and high-definition standard across all of their products"? If so, then yeah, i haven't heard a similar statement from SONY alluding to 1080p's status too, and it's a long shot for that to actually happen, though there is plenty of time till 2006 rolls on...

Titanio
Well, novelty is a good feature right, and undeniably, G70 is a fast chip, actually it's a top performer that is beating everything to a pulp at this point, BUT, as a forward-looking, novelty-seeking product, it undermines all of rumours about it and the RSX. It doens't seem to incorporate any new ideas that ATI have already included in Xenon which is why i'm asking if by chance the chips are pretty much identical (hightly doubtful).

Rockster · Jun 22, 2005

A couple of Qs - How are you figuring these numbers out, exactly? I'm guessing your mapping the texture address functionality in Xenos to flops, but..how?

Not counting tmu's in the flops. I have taken out all the fixed function flops that Nvidia is counting like norm/scale/bias etc. because Xenos seems to have that seperate functionality as well. Because each pixel shader ALU in the G70 can perform 2 ops (ie. multiply and add) on upto 4 components, this provides you with 8 flops per ALU. Take 24 pipelines * 2 ALU's per pipe * 8 flops = 384flops in the pixel shader array. The vertex shader ALU can do 2 ops on upto 5 components resulting in 10 flops per ALU. 8 vertex shaders * 10 flops = 80 flops in the vertex shader array. Total is 80 + 384 = 464. I also showed "or 272 when texturing" because half of the pixel shader ALU's also are used in texturing. Xenos has 48 ALU's that can do 2 ops on upto 5 components resulting in 480 flops per clock. These can all be used while texturing so I indicated a number of texture samples possible in the same clock.

All these number are pretty meaningless in and of themselves, but there seems to be a general impression that the G70/RSX has more raw power, while Xenos is more efficient. I have seen numbers like 1.8TFlop vs 1TFlop, etc., when in terms of raw power they are very close.

marconelly! · Jun 22, 2005

Not counting tmu's in the flops. I have taken out all the fixed function flops that Nvidia is counting like norm/scale/bias etc. because Xenos seems to have that seperate functionality as well.

Where did you get that information/hint? I've never seen it mentioned that there's some kind of significant fixed functionality of that kind on R500, while Nvidia often talks about the free normals and SFU units (which is what I'm assuming you are referring to) as a significant part of their functionality. I understand they are not programmable but it's something that has to be used all the time anyways, and saves a lot of cyles.

Also, where are they counting FP16 functionality together with FP32? Can those two be used together (for different purposes)?

Rockster · Jun 22, 2005

Where did you get that information/hint?

I quoted this earlier from Wavey's article: "Additional to the 48 ALU's is specific logic that performs all the pixel shader interpolation calculations which ATI suggests equates to about an extra 33% of pixels shader computational capability."

It's a pretty standard thing. And if you look at benchmarks between Nvidia and ATI PC architectures you don't see any drastic, unexplained per clock advantage.

Edge · Jun 22, 2005

> " And what is it going to do with that 35gbs ? Is it going to store its framebuffer there ? feature textures from the cell ? What exactly ?"

Well your vertex info can come straight from the SPE's local SRAM stores, and you can store stuff in the XDR dram, and of course GPU Ram. You not tied down to a single bus. There is flexibility there, for developers to exploit. I have no problems stating 57 GB/sec, if one considers the memory layout. Sure it's different than having 57 GB/sec to GPU memory, but not by much. SLI dual renders using dual busses, so why can't RSX for games that are bandwidth limited. Using both busses to generate different sections of the scene.

Titanio · Jun 22, 2005

Rockster said:
A couple of Qs - How are you figuring these numbers out, exactly? I'm guessing your mapping the texture address functionality in Xenos to flops, but..how?

Click to expand...

Not counting tmu's in the flops. I have taken out all the fixed function flops that Nvidia is counting like norm/scale/bias etc. because Xenos seems to have that seperate functionality as well. Because each pixel shader ALU in the G70 can perform 2 ops (ie. multiply and add) on upto 4 components, this provides you with 8 flops per ALU. Take 24 pipelines * 2 ALU's per pipe * 8 flops = 384flops in the pixel shader array. The vertex shader ALU can do 2 ops on upto 5 components resulting in 10 flops per ALU. 8 vertex shaders * 10 flops = 80 flops in the vertex shader array. Total is 80 + 384 = 464. I also showed "or 272 when texturing" because half of the pixel shader ALU's also are used in texturing. Xenos has 48 ALU's that can do 2 ops on upto 5 components resulting in 480 flops per clock. These can all be used while texturing so I indicated a number of texture samples possible in the same clock.

All these number are pretty meaningless in and of themselves, but there seems to be a general impression that the G70/RSX has more raw power, while Xenos is more efficient. I have seen numbers like 1.8TFlop vs 1TFlop, etc., when in terms of raw power they are very close.

Sorry, I was looking at your figures and thinking they were per sec, not per clock - that makes a little more sense now. However I think you're working off incomplete information regarding what could and couldn't be taken out of the G70 flop figure, which indeed does make that kind of comparison meaningless for now.

Rockster · Jun 22, 2005

Almost forgot your second question. Nvidia is adding the norm in the flop count which can only be single cycle as an FP16 op, with the ALU's which are single cycle in either FP16 or FP32. Meaning the FP32 flop count should be less, but I throw out the norm anyway.

Should mention that there is lots of fixed function stuff all around modern GPU's. Xenos has hardware tesselation but we don't count that. The G70 has a seperate video processing unit which we ignore as well. The main focus these days in shader processing speed.

Rockster · Jun 22, 2005

...which indeed does make that kind of comparison meaningless for now.

No more meaningless than all the other figures being thrown about.

marconelly! · Jun 22, 2005

Additional to the 48 ALU's is specific logic that performs all the pixel shader interpolation calculations which ATI suggests equates to about an extra 33% of pixels shader computational capability."

Hmm, OK. I thought he was just referring to the logic integrated in the EDRAM module.

It's a pretty standard thing. And if you look at benchmarks between Nvidia and ATI PC architectures you don't see any drastic, unexplained per clock advantage.

I know, but so far their architectures have been fairly simillar, so that was expectable. Not so with R500, though.

Rockster · Jun 22, 2005

So you disagree, and think there is a big disparity in raw math performance? And if so, why?

marconelly! · Jun 22, 2005

Because each pixel shader ALU in the G70 can perform 2 ops (ie. multiply and add) on upto 4 components, this provides you with 8 flops per ALU

Don't you think that those mini-alus (2 flops each) should be counted too, for the total of 20 programmable flops? Or are they the same thing as SFU units which can't count as programmable?

So you disagree, and think there is a big disparity in raw math performance? And if so, why?

I don't know where are you getting this from my posts, I was saying that just because current GPUs have lots of hardwired functionality, doesn't mean R500 must have it too, it just looks like different design philosophy, so comparing them on per-clock basis probably doesn't mean much.

If you want some kind of direct real life comparision with curent/upcoming traditional GPUs, the only I can think of would be that R500 was demoed at E3, running the R520 ruby demo in 720p at (a bit unstable) 30FPS and no AA. Of course, I have no idea how the same demo would run on R520, or how Ruby demos typically run on hardware they are made for.

The new 'tradiitional' cards like G70 and R520 seem to use a lot more transistors for the core logic than R500 does, so I'd think they would perform better in the shader-heavy scenarios, where R500 uses those transistors as a major bandwidth saving measure and AA.

Shifty Geezer · Jun 22, 2005

Just a niggling request for people to stop calling ATi's XB360 chipset 'R500' and call it either 'C1' or 'Xenos', as there is no R500 chipset and now we know it's a totally unrelated part, we shouldn't artificially lump it with ATi's existing architecture numbering and confuse matters for people that are none the wiser.

Thank you for your cooperation, and we return now to normal scheduling...

Carl B · Jun 22, 2005

Shifty Geezer said:
Just a niggling request for people to stop calling ATi's XB360 chipset 'R500' and call it either 'C1' or 'Xenos', as there is no R500 chipset and now we know it's a totally unrelated part, we shouldn't artificially lump it with ATi's existing architecture numbering and confuse matters for people that are none the wiser.

Thank you for your cooperation, and we return now to normal scheduling...

Wait a minute, where did we learn that R500 is not related to Xenos or C1? All I remember learning is that Xenos is the name for public consumption and that C1 was the internal project name. I still thought R500 was the valid chip name designator though.

Shogmaster · Jun 22, 2005

xbdestroya said:
Shifty Geezer said:

Just a niggling request for people to stop calling ATi's XB360 chipset 'R500' and call it either 'C1' or 'Xenos', as there is no R500 chipset and now we know it's a totally unrelated part, we shouldn't artificially lump it with ATi's existing architecture numbering and confuse matters for people that are none the wiser.

Thank you for your cooperation, and we return now to normal scheduling...

Click to expand...

Wait a minute, where did we learn that R500 is not related to Xenos or C1? All I remember learning is that Xenos is the name for public consumption and that C1 was the internal project name. I still thought R500 was the valid chip name designator though.

Try telling that to ATI guys. They nearly tore my head off (figuratively) for insitiing on calling it "R500" at E3.

More info about RSX from NVIDIA

nAo

Nutella Nutellae

ERP

Xenus

scooby_dooby

jvd

Tap In

alexsok

Rockster

marconelly!

Rockster

Edge

Titanio

Rockster

Rockster

marconelly!

Rockster

marconelly!

Shifty Geezer

uber-Troll!

Carl B

Friends call me xbd

Shogmaster

Similar threads