AMD/ATI Evergreen: Architecture Discussion

Notice that a second card (in both NV and ATI formats) scales almost exactly 2x compared to their "base" card. I can't say what that tells us, but the fact that both vendor single cards are precisely bottlenecked to the same score, and then both vendors X2 cards are precisely bottlenecked at 200% of the same score has to be an obvious pointer to something...
Obviously that tells you that the OS/benchmark/CPU are not limits in any way.

The 4890 and 5870 have the same clock speed. These tests have a throughput limit somewhere in the geometry pipeline. That the GTX285 is the same speed as those two despite having a lower core clock is just a coincidence, as there are all sorts of things like vertex/primitive cache sizes/policies that can hurt one architechture or another. The 3DMark Vantage GPU Physics test is another one where the bottleneck is not in the stream processors but the data flow around them and fixed-function units.
 
I expect everyone is doing rasterization through parallel intersection tests of tiles nowadays, only using the traditional scanline algorithm to determine covered tiles.
Yup. If a GPU processes two such tiles in parallel, it doesn't have two rasterizers, it's just an implementation detail of a single, faster throughput rasterizer. It still only rasterizes at a max rate of one triangle per clock.

I think this is where all the confusion is coming from.
 
Rys, a ray triangle intersection. It basically comes down to the Pixel Planes method (although they don't specifically call out the equivalence with ray triangle intersection tests, but it's there).

BTW are you saying there are not two distinct/separated areas on the die doing rasterization? (Moved this from the other thread.)
 
Rys, a ray triangle intersection. It basically comes down to the Pixel Planes method (although they don't specifically call out the equivalence with ray triangle intersection tests, but it's there).

BTW are you saying there are not two distinct/separated areas on the die doing rasterization? (Moved this from the other thread.)
Cool, I'll have a read (surprised I've never read it before actually :oops:). And yeah, for Cypress I'm saying there's just one block/area/unit/rasteriser.
 
Dave is just pretty much insisting on dual rasterizers ...But do we also have a claim from AMD of being able to consume more than one (already set up) triangle, or is it just some made-up explanation for it?

Maybe there ARE two distinct areas working on different tiles, if the thing about the two dispatch processors etc is correct. But purely for layout reasons; to keep the "fragment producers" close to where their output is needed. So from a functional viewpoint just one rasterizer (with fragments ending up in two different buckets).

mfa, maybe there was one between the fermi papers from nvidia :runaway:
 
The thing is, there is a lot of data expansion in the rasterizer ... with two rows of SIMD engines it makes an awful lot of sense to divide up the rasterizer in two, one for each row, and save yourself a whole lot of wires.
 
The thing is, there is a lot of data expansion in the rasterizer ... with two rows of SIMD engines it makes an awful lot of sense to divide up the rasterizer in two, one for each row, and save yourself a whole lot of wires.
Euh, you want to work on quads, not rows. There's an ATI patent somewhere for walking a triangle along its longest dimension, either horizontal or vertical, two rows/columns wide.

Jawed
 
Oh hey, Espacenet finally allows you to conveniently download PDFs (probably years ago, but I have been avoiding them for ages). Guess it's time to stop getting datamined on the other free patent sites.
 
:LOL: I switched to EP years ago because of the PDFs - though it lags the US for US documents a little bit.

Psycho, :LOL::LOL: that's funny, whoops.

Jawed
 
It has to be said, you're all scoring very low marks for paying attention! :p

http://forum.beyond3d.com/showpost.php?p=1343440&postcount=436
That's not two rasterizers. That's two out of three parts of a rasterizer moved to different parts of a chip. Scan conversion of a triangle to generate tiles and preparing the edge equations (as in the paper MfA pointed to) are both part of rasterization and are part of the same one tri per clock unit. Splitting the inherently parallel tile rasterization stage of an 8-quad rasterizer into two 4-quad groups does not mean Cypress has two rasterizers. If it did, you could call it 8 rasterizers if you wanted to.

It rasterizes one triangle at a time at a max rate of one triangle per clock and 8-quads per clock. Thus it's a single rasterizer.
 
It rasterizes one triangle at a time at a max rate of one triangle per clock and 8-quads per clock. Thus it's a single rasterizer.
No, it has a single primitive engine that feeds into two separate (physical) rasterisers. They can be operating on two difference tri's because of buffering.
 
According to Scott Wasson on the subject -

"Sharp-eyed readers may recall that AMD claimed it had dual rasterizers upon the launch of the Cypress GPU in the Radeon HD 5870. Based on that, we expected Cypress to be able to exceed the one polygon per cycle limit, but its official specifications instead cite a peak rate of 850 million triangles per second—one per cycle at its default 850MHz clock speed. We circled back with AMD to better understand the situation, and it's a little more complex than was originally presented. What Cypress has is dual scan converters, but it doesn't have the setup or primitive interpolation rates to support more than one triangle per second of throughput. As I understand it, the second scan converter is an optimization that allows the GPU to push through more pixels, in cases where the polygons are large enough"
 
Where does the scan conversion of a triangle into tiles occur? In the rasterizer or the primitive engine?
Its actually theother way around. Cypress has two Scan Conversion blocks and the rasterisation occurs in those. Each of the SC's are post the Primitive Assembler.
 
Its actually theother way around. Cypress has two Scan Conversion blocks and the rasterisation occurs in those. Each of the SC's are post the Primitive Assembler.
Okay, let me be a bit more clear. Are the SC's blocks fed tiles and edges to test against or just raw (duplicated) triangles?
 
Last edited by a moderator:
The diagrams for Cypress make me wonder if there is a pathological case where a screen can be filled with tiny triangles that fall on every other screen tile. If the rasterizers serve alternating tiles, and each rasterizer feeds a separate dispatch processor that controls one bank of SIMDs, it could be possible to cut shader throughput in half.

There's a write crossbar on the shader output portion, so I'm not sure how ROP utilization may be affected.
 
Back
Top