Next NV High-end

caboosemoose said:
Interesting. Well, in that case, perhaps a G70 clocked over 500MHz might be on the cards.

That's what way too many thought about the NV40.
 
Ailuros said:
Then look back on page2 on caboosemose's comment.

The elliptical unstated purpose of the exercise was to wonder if the R580 some of us are wish-fulfilling on might be too big for 90nm at this stage in its life-cycle as a process. But I seem to remember you are a proponent of that point, so hopefully you got that.
 
Xenos fits what looks like 64 unified shader pipes into 232m transistors.

b3d34.jpg



Each array of 16 pipes seems to be split into two halves. Either side of the register file and sequencer (+ other gubbins)?

The assumption that 1 array is given up for yield seems reasonable. The area I've outlined above for the US (should have called them USA, now I think about it, Unified Shader Array) is about 32% of the die, so one US is about 8%.

As far as I can tell, each of those little black rectangles (e.g. above "3D Core" in two areas of 4x4) is 16KB.

Jawed
 
Last edited by a moderator:
Perhaps, but they also offload all of their ROP hardware onto another chip, and the Xenos' pipelines appear to be much simpler than today's pipelines.

That said, you can do the splitting somewhat differently from what you have there and see three groups on the left side, with the two on the right interpretted as the texture pipelines.
 
If you count 20m transistors for ROP functions on the other die, for 8 ROPs, double that, and add a little more to include compression hardware - say 60m transistors - then Xenos's total logic for 64 pipes, 16 texture pipes and a hypothetical 16 ROPs would be in the region of 300m transistors.

US pipes in Xenos are simpler than conventional pipes because of the separation of texturing functionality - so assuming that R520/580 use the same separation - then an SM3 pipeline in Xenos is prolly about the same size as an SM3 fragment shader pipeline in R520/580.

I estimate there's about 16-18m transistors in each USA in Xenos.

Presumably as the number of USAs scales, so does the register file and scheduler complexity (because the GPU needs to handle more objects in flight to stay efficient)...

So roughly equating a USA in Xenos to an array in R520, and being generous - I'm guessing an extra 70m transistors for R580.

Jawed
 
Chalnoth said:
That said, you can do the splitting somewhat differently from what you have there and see three groups on the left side, with the two on the right interpretted as the texture pipelines.
Can't see it.

Jawed
 
Jawed said:
Can't see it.

Jawed
The second one would be half above the IO interface, half below. Another way to look at it is that you can split the shader area you highlighted into six equal pieces. Put two into each shader unit and voila.
 
Jawed said:
So roughly equating a USA in Xenos to an array in R520, and being generous - I'm guessing an extra 70m transistors for R580.

Thanks. One of the things I really appreciate about your participation here is you're always willing to take a knowledgeable swing at the high hard one (err, baseball reference) based on the best information available to us as you understand it. :D

Edit: <looking down thread> Well, I didn't include "never wrong" in your virtues! :LOL:
 
Last edited by a moderator:
Jawed said:
If you count 20m transistors for ROP functions on the other die, for 8 ROPs, double that, and add a little more to include compression hardware - say 60m transistors - then Xenos's total logic for 64 pipes, 16 texture pipes and a hypothetical 16 ROPs would be in the region of 300m transistors.
Ah, but ATI was able to make those ROP's simpler than the ones that you see in today's hardware, because they have much higher memory bandwidth to the memory that they talk to, so they don't have to worry about any sort of compression, for example.

US pipes in Xenos are simpler than conventional pipes because of the separation of texturing functionality - so assuming that R520/580 use the same separation - then an SM3 pipeline in Xenos is prolly about the same size as an SM3 fragment shader pipeline in R520/580.
It's not just that, though. It appears that they have done away with the ALU + mini ALU structure (though I'm not 100% certain on that....either way, the next part is true), and their pipelines are certainly less complex, ALU-wise, than nVidia's G70, which has two full ALU's in each pipeline.

And finally, if the "3" in the description of the R580's pipelines is an increased number of ALU's per pipeline, then there really isn't any basis for comparison between the Xenos and R580.
 
Jawed said:
Yeah I'm convinced, Chalnoth. Brilliant.

Jawed

Based on your dissection of the Xenos diagram, that would give the Xenos 64 pipes instead of the 48 reported in the specs?

Is there a redundant array to improve yields?

J
 
As I said earlier:

The assumption that 1 array is given up for yield seems reasonable. The area I've outlined above for the US (should have called them USA, now I think about it, Unified Shader Array) is about 32% of the die, so one US is about 8%.

Jawed
 
Jawed said:
As I said earlier:

The assumption that 1 array is given up for yield seems reasonable. The area I've outlined above for the US (should have called them USA, now I think about it, Unified Shader Array) is about 32% of the die, so one US is about 8%.

Jawed

Apologies didnt read back that far, got too interested in a 64pipe xenos :)

Is there a chance that with the 'amazing' yileds ATI has been bragging about that we could see a 64 pipe Xenos or is it usually a practice to design the rest of the chip to only account for/support 48?

J
 
I suspect M$ would choose yield. The extra 16 pipes prolly wouldn't balance very well if the architecture is scaled for 48.

Jawed
 
Jawed said:
I suspect M$ would choose yield. The extra 16 pipes prolly wouldn't balance very well if the architecture is scaled for 48.

Jawed

That what i figured. Though if the rest of the chip could support it, a 64 pipe xenos running at 550mhz sounds like fun doesnt it? Somehting like a 40% increase in shader-ops. :)

J
 
Jawed said:
Yeah I'm convinced, Chalnoth. Brilliant.
Well, just bear in mind that all I'm trying to say is that there are so many variables here, that you should expect to be spectacularly wrong about at least one of the speculations you've made in this thread :)
 
I'm quite happy to be spectacularly wrong. The difference is, you're not contributing to the discussion - you're just half-reading what I'm saying and making gestures of an argument.

Jawed
 
The other possibility is that one of those highlighted US areas is dedicated to parameter interpolation. A diagram I saw somewhere indicated that the setup unit passes barycentric coordinates to the interpolation unit (which provides 16 interpolated attributes per clock).
Since interpolation with barycentric coordinates is basically just dot-products, the ALU structure for the shader units and interpolation unit could well be the same. That would make the large cache (or register?) blocks above the "3D Core" text the vertex cache.

Also, given the large area of one of the blocks marked US, doesn't it make more sense to provide redundancy within each block? If you count the small rectangles (they look like one of the larger rectangles split in two) inside each US block, you'll notice there are 9. Assuming that each of these smaller rectangles is 2 register files, perhaps each US shader unit actually has 18 pipes.

[edited for typos]
 
Last edited by a moderator:
  • Like
Reactions: Geo
Back
Top