Complete Details on Xenos from E3 private showing!

ERP said:
nAo said:
PC-Engine said:
Didn't SONY claim 48GB/s bandwidth for the eDRAM in GS when the EE to GS bandwidth stank?
Sony didn't claimed bandwith between EE and GS was 48 Gb/s, what's the problem?
They (MS + ATI) are claiming (or implying) that GPU core <-> edram BW is 256 GB/s.
Is it true? I don't think so.


Ok let me answer try to justify MS's claim in the form of a question.

Lets say you split the chips one with vertex shaders for T&L and one for the pixel shaders, if the second one had EDRAM would youy quibble about the available bandwidth?

All ATI has done is split the chip later in the pipeline, just be for the AA expansion, Z Logic and the ROPS. They split it there because there is less info to pass backwards and forwards, the fact remains the ZUinits and ROPS have a real 256 Megabytes/second bandwidth available.

Yes there is a bus between the two chips that is less than 256 MBytes, but that would be true of a chip that split out vertex work.

The only real problem I have with using this number is that it's only really meaningful if you actually are using all 256GB/s. If you are sending 32GB/s of data at the edram, but it still never surpasses say 128GB/s internally, it really doesn't matter that it can internally process 256GB/s because it will never realize that potential. The really important number in that case is the 32GB/s link to edram. The same goes for the PS3.

For this reason, I think trying to sum bus throughput (no matter which system you are talking about) is basically useless and misleading. Say the GPU had a 5 terabyte/s link to edram, but only specifically for framebuffer, while it still had a 22GB/s link to main memory. Would we still try to make some kind of generic claim that the GPU is capable of transfering 5.022terabytes of data per second even though the number is basically meaningless?

Nite_Hawk
 
The only real problem I have with using this number is that it's only really meaningful if you actually are using all 256GB/s. If you are sending 32GB/s of data at the edram, but it still never surpasses say 128GB/s internally, it really doesn't matter that it can internally process 256GB/s because it will never realize that potential. The really important number in that case is the 32GB/s link to edram. The same goes for the PS3.

It depends on where in the rendering pipeline the bottleneck is in a traditional GPU architecture. If the bottleneck is the framebuffer bandwidth for the majority of the time then ATI's solution will be superior.
 
nAo said:
ERP, I completely understand what ATI has done and I'm not criticizing it (hell, I think R500 is much more interesting than RSX).
What I'm saying is that they shouldn't imply GPU <-> edram bandwith is 256 GB/s, cause it's not. That's all.

I'm going to disagree here.
Your right of course, the interconnect between the two chips isn't 256MB/s, but the GPU in my book still contains thing like Z logic and combiners and those do have 256 GB/s to the EDRAM. In fact those are the only parts that can access EDRAM.

I think the biggest problem here is the term "smart memory", when really it's the back end of the chip with EDRAM.

Personally I think it's an interesting design.

The unified shaders are more interesting to me, I know we spent >50% of our vertex time and an unknown amount of our pixel time unused on NV2A, I'm interested to know how much of that we get back on a unified architecture.
 
PC-Engine said:
The only real problem I have with using this number is that it's only really meaningful if you actually are using all 256GB/s. If you are sending 32GB/s of data at the edram, but it still never surpasses say 128GB/s internally, it really doesn't matter that it can internally process 256GB/s because it will never realize that potential. The really important number in that case is the 32GB/s link to edram. The same goes for the PS3.

It depends on where in the rendering pipeline the bottleneck is in a traditional GPU architecture. If the bottleneck is the framebuffer bandwidth for the majority of the time then ATI's solution will be superior.

But that also depends on how well you've implemented other features. If nVidia has a better solution for reducing overdraw, then the edram solution might be overkill for their graphics system. Simply upping the throughput of the video memory might achieve the same effect at a lower cost in this case. This is where it gets tricky. What might be the best solution for ATI might not be the best solution for nVidia and visa-versa. We need more information about what the real throughput requirements for the framebuffer are after overdraw, blending, aa filtering, and compression are factored in. Is the edram solution limited by it's internal or external throughput?

Nite_Hawk
 
PC-Engine said:
The only real problem I have with using this number is that it's only really meaningful if you actually are using all 256GB/s. If you are sending 32GB/s of data at the edram, but it still never surpasses say 128GB/s internally, it really doesn't matter that it can internally process 256GB/s because it will never realize that potential. The really important number in that case is the 32GB/s link to edram. The same goes for the PS3.
It depends on where in the rendering pipeline the bottleneck is in a traditional GPU architecture. If the bottleneck is the framebuffer bandwidth for the majority of the time then ATI's solution will be superior.
No-one's arguing if it's a good method or not! I think everyone that understands it agrees this is a good bandwidth saving device. The complaint is with it's description and MS including a summation that has no real-world interpretation at all, and even then their summation was very incomplete. Even if taken as a graphical pipeline measurement of bandwidth to exclude the local bandwidth available for CPUs, if the Cell's are capable of contributing t the grapihcal rendering along with RSX, by MS's measurement's their local bandwidths should be added as well.

It's a stupid figure. MS (and ATI?) should have described it another way. eg. "Use of eDRAM this way eliminates 35/56 GB/s drain on main system RAM bandwidth" or whatever it's advantage would be.
 
ERP said:
I'm going to disagree here.
Your right of course, the interconnect between the two chips isn't 256MB/s, but the GPU in my book still contains thing like Z logic and combiners and those do have 256 GB/s to the EDRAM. In fact those are the only parts that can access EDRAM.
I don't understand on what are you disagreeing with me if then you say I'm right :)

Personally I think it's an interesting design.
I think this too.
The unified shaders are more interesting to me, I know we spent >50% of our vertex time and an unknown amount of our pixel time unused on NV2A, I'm interested to know how much of that we get back on a unified architecture.
Dunno about NV2A, but I believe an ideal perfectly autobalanced architecture would spank a traditional design having the same ALUs count.
What we don't know is how much bigger is the unified shading GPU than a traditional GPU ;)
 
Shifty Geezer said:
It's a stupid figure. MS (and ATI?) should have described it another way. eg. "Use of eDRAM this way eliminates 35/56 GB/s drain on main system RAM bandwidth" or whatever it's advantage would be.

Well if M$ came out and said that the EDRAM saves 56GB/s of bandwidth, and points out that the system RAM supports 22.4GB/s then people would subtract 56 from 22.4, get a negative number and go:

:!: UH :?: :!:

Bad marketing :devilish:

Jawed
 
nAo said:
ERP said:
I'm going to disagree here.
Your right of course, the interconnect between the two chips isn't 256MB/s, but the GPU in my book still contains thing like Z logic and combiners and those do have 256 GB/s to the EDRAM. In fact those are the only parts that can access EDRAM.
I don't understand on what are you disagreeing with me if then you say I'm right :)

That's the problem with semantic arguments really.
The problem is were arguing about language not functionality ;)
The only point of contention is if the logic on the second chip can be considered part of the GPU or whether it's just clever RAM.
 
ERP said:
nAo said:
ERP said:
I'm going to disagree here.
Your right of course, the interconnect between the two chips isn't 256MB/s, but the GPU in my book still contains thing like Z logic and combiners and those do have 256 GB/s to the EDRAM. In fact those are the only parts that can access EDRAM.
I don't understand on what are you disagreeing with me if then you say I'm right :)

That's the problem with semantic arguments really.
The problem is were arguing about language not functionality ;)
The only point of contention is if the logic on the second chip can be considered part of the GPU or whether it's just clever RAM.

Or somewhere inbetween.

Nite_Hawk
 
Jawed, Wouldn't you add the bandwidth saved, not subtract? ie. Saving 56GB/sec + 22.4GB/sec means you have the equivalent of 78.4GB/sec of bandwidth.
 
Rockster said:
Jawed, Wouldn't you add the bandwidth saved, not subtract? ie. Saving 56GB/sec + 22.4GB/sec means you have the equivalent of 78.4GB/sec of bandwidth.

You and I would, but your man in the street would go "yeah, wow, 56GB/s was saved out of a total of 22.4GB/s - oh hang on a second..."

When you "save" something like this, it's presumed it was using bandwidth out of the original 22.4GB/s. "Save" just implies a straight-forward subtraction.

"Hey, we saved 10GB/s out of the 22.4GB/s we had." Now that is confusing.

Jawed
 
PC-Engine said:
It depends on where in the rendering pipeline the bottleneck is in a traditional GPU architecture. If the bottleneck is the framebuffer bandwidth for the majority of the time then ATI's solution will be superior.
By the way, where is the main bottleneck for current pc graphic processors? It seems that if ATI's solution was really speeding things up in real world apps, they would be trumpeting this since they have beta kits out now.
 
ralexand said:
PC-Engine said:
It depends on where in the rendering pipeline the bottleneck is in a traditional GPU architecture. If the bottleneck is the framebuffer bandwidth for the majority of the time then ATI's solution will be superior.
By the way, where is the main bottleneck for current pc graphic processors? It seems that if ATI's solution was really speeding things up in real world apps, they would be trumpeting this since they have beta kits out now.

based on what Dave was saying, MS embargoed all of their partners from talking about technology in x360 cause... well... it isn't their technology its MS' but even they dont really understand it!
 
Nite_Hawk said:
The only real problem I have with using this number is that it's only really meaningful if you actually are using all 256GB/s. If you are sending 32GB/s of data at the edram, but it still never surpasses say 128GB/s internally, it really doesn't matter that it can internally process 256GB/s because it will never realize that potential.

16zixel x 4xFSAA x 2 (read-test-write) uses 256GB/s (4bytes/subpixel @ 500MHz).

Interestingly X360 is setup for worst case bandwidth usage, much like the current PS2, whereas PS3 is set up for average bandwidth usage like the current XBOX.

Cheers
Gubbi
 
All of the ROP units are set for 4x FSAA so that are all quadrupled up (including the double Z, so there is 64 Z per clock with 4X MSAA).
 
All that edram bandwidth is only helpful if you're actually going to use it. It will only be an advantage on lots of alpha belnded polygons and short shaders. If the shader is of any decent length, you won't ever use that much bandwidth.

Recall that the X360 GPU's edram is also a disadvantage. A 720p image with 2xAA will NOT be able to be rendered in one pass. The scene has to be tiled, and resubmitted. There is some built in support for the hardware assist with this task, but it's definitely far from "free".
 
fresh said:
All that edram bandwidth is only helpful if you're actually going to use it. It will only be an advantage on lots of alpha belnded polygons and short shaders. If the shader is of any decent length, you won't ever use that much bandwidth.
I'd rather have too much than too little!
 
Shifty Geezer said:
fresh said:
All that edram bandwidth is only helpful if you're actually going to use it. It will only be an advantage on lots of alpha belnded polygons and short shaders. If the shader is of any decent length, you won't ever use that much bandwidth.
I'd rather have too much than too little!

Even if it means less resources (in some abstract sense) for other things?

Nite_Hawk
 
DaveBaumann said:
All of the ROP units are set for 4x FSAA so that are all quadrupled up (including the double Z, so there is 64 Z per clock with 4X MSAA).

Is this confirmed? It's quite an interesting bit of info. 32gigazixels/s is quite impressive. But if a quadrupled-up ROP can write 4 aa-samples per clock (each having their own Z) how do you arrive at 8-z samples per clock when color writes are disabled? I assume somehow the color write logic is borrowed to write the extra Z, but why isn't it 128 Z per clock then?
 
Back
Top