AMD: R9xx Speculation

Harison · Nov 2, 2010

Dave Baumann said:
You know these things?

Barts did not exist on the roadmap in 32nm. Barts in 40nm turned up before the 32nm cancellation.

Interesting, so 32nm cancellation mainly affected Caymans?

trinibwoy · Nov 2, 2010

Maybe Barts was conceived after the 32nm "delay" but before the 32nm "cancellation"

CarstenS · Nov 2, 2010

caveman-jim said:
Nah, came from email from NV.

Seconded. Typically there should be 1.5 million tris created per frame, not counting the planes, the trees and the buildings, so they say. I guess they're referring to full hd res.

Ailuros · Nov 2, 2010

Harison said:
Interesting, so 32nm cancellation mainly affected Caymans?

Some problems with 40G (TSMC warned in early 2008 according to Anand's article) and the cancellation of 32nm must have changed quite a bit both IHVs roadmaps.

Who knows if AMD wouldn't had reduced Cypress somewhere in 2008 to end up at 334mm2, it's successor under a hypothetical 32nm node might have been another notch higher than Cayman is today. But those are all theoretical "might have beens"; either way we have to live with whatever is possible today.

Mintmaster · Nov 2, 2010

Jawed said:
It shouldn't be, no. But Cypress had features cut due to the 40nm fuck-up...

Read my post a little more carefully, Jawed. Load balancing DS across SIMDs is virtually identical to what Xenos onwards does with regular geometry and the VS. The only difference is fetching of control points.

So what is the mechanism by which an off-chip buffer increases tessellation performance (or allows it to increase along with other changes for better scaling in the architecture)?

It bypasses the bottleneck of the on-chip pipeline, whatever that is.

because a single SIMD is the only place those vertices can go (according to the locked HS/DS theory).

And that theory seems sketchy to me, because:
1. It runs counter to everything that the unified shader architecture is about
2. The solution is so simple: Write control points to GDS, and now you can use any free SIMD.
3. Simple tests like Damien's wouldn't result in so few triangles per clock

We can test it pretty easily, though. Run a test with a high TF, and use a long DS. If you're right, then performance will be very predictable as the number of verts/clk output by one SIMD running the DS.

6 clocks (you later adjusted to 6.5 clocks) is so slow, even at LOD 25, that it can't get any slower? A comparison with Juniper and Redwood could be useful, as only math would vary.

I was asking if you had any data performance vs. TF.

liolio · Nov 2, 2010

I know people here are not that interested in lower end offering but are there any leaks in regard to the 56xx/55xx known as Turks?
I'm interested in those lowered end derivatives as if naming is consistent the perfs should come close to the actual HD57xx.

I wonder if those cards could be closer to Cayman than Barts.

mczak · Nov 2, 2010

liolio said:
I know people here are that interested in lower end offering but are their any leaks in regard to the 56xx/55xx known as Turks?
I'm interested in those lowered end derivatives as if naming is consistent the perfs should come close to the actual HD57xx.

I wonder if those cards could be closer to Cayman than Barts.

I think it's quite likely these will also be VLIW-4 and have the architecture changes of Cayman (whatever those are...). Why? Well two reasons mostly: 1) They are launched later, and 2) Barts is a somewhat downscaled Cypress with cut-off features and some tweaks (talking about the shader core). But downscaling doesn't really make sense for the other chips - they don't have some of the features which were cut from Cypress in the first place (like DP, second crossfire connector) and just cutting simds will just make them slower, without making them smaller a whole lot (they are pretty small anyway). So it would be pretty much the same chips except the display/uvd stuff, which sounds a bit pointless to me. Haven't seen any real rumors wrt these chips though - well haven't seen anything reliable for Cayman neither...

ECH · Nov 2, 2010

If Nvidia plans to launch the 580 this month (rumored to be Nov 9) shouldn't we be seeing benchmark leaks of the 6970 soon?

neliz · Nov 2, 2010

ECH said:
If Nvidia plans to launch the 580 this month (rumored to be Nov 9) shouldn't we be seeing benchmark leaks of the 6970 soon?

Maybe AMD can push 5970's against GTX580? After all, that card was devised to fight the 512CC GTX380 way back when.

Jawed · Nov 2, 2010

Mintmaster said:
Read my post a little more carefully, Jawed. Load balancing DS across SIMDs is virtually identical to what Xenos onwards does with regular geometry and the VS. The only difference is fetching of control points.

And there's no evidence that Cypress is load-balancing DS dynamically. Cypress easily has the theoretical capability to exceed GF100 on DS math in absolute terms, particularly since DS is easily vec2 at minimum, where NVidia's "scalar sequential" ALU architecture holds no advantage.

It bypasses the bottleneck of the on-chip pipeline, whatever that is.

Buffering or some other bottleneck?

And that theory seems sketchy to me, because:
1. It runs counter to everything that the unified shader architecture is about

The design is a kludge founded upon R600, when TF went up to 16, in an era when there were only 4 SIMDs. "DS" (VS, effectively) back then wouldn't have occupied all the SIMDs - it probably only occupied 1.

2. The solution is so simple: Write control points to GDS, and now you can use any free SIMD.

Barts supposedly uses "Improved thread management and buffering" for its tessellation performance gain over Evergreen. I've not found any detailed explanation of that. Perhaps those changes are what you're suggesting. They'd match the "refresh" nature of Barts.

3. Simple tests like Damien's wouldn't result in so few triangles per clock

I don't understand those tests.

We can test it pretty easily, though. Run a test with a high TF, and use a long DS. If you're right, then performance will be very predictable as the number of verts/clk output by one SIMD running the DS.

Yes, some more tests based upon TF would be very welcome.

I was asking if you had any data performance vs. TF.

http://www.ixbt.com/video3/gf100-2-part2.shtml

PN-Triangles near the bottom of the page shows tests for varying TF. TF=19 gives GTX480 a 6.8x advantage over HD5870. According to AMD's marketing, at TF=19, HD6870 is about 20% faster than HD5870 - presumably that's not normalised per clock (HD6870 is 6% higher-clocked).

Maybe there's more up-to-date results somewhere. I've got no idea if Cypress tessellation performance has varied with driver (or GF100's).

ECH · Nov 2, 2010

neliz said:
Maybe AMD can push 5970's against GTX580? After all, that card was devised to fight the 512CC GTX380 way back when.

I've heard that the 580 won't do well against the 5970. However, that's not what people want to see though (as they are looking for the 6970). It's going to be interesting to see reviews using the 5970 for a comparison for both the 580 and 6970. Hopefully many include the 5970 instead of just the 5870.

no-X · Nov 2, 2010

I expect we'll see HD5970 more often in HD6970 than in GTX580 reviews. Only a few revies compared GTX480 to HD5970 and I don't expect it will change with GTX580.

Bouncing Zabaglione Bros. · Nov 2, 2010

ECH said:
If Nvidia plans to launch the 580 this month (rumored to be Nov 9) shouldn't we be seeing benchmark leaks of the 6970 soon?

You would think so, but given AMD's recent ninja performance at keeping benchmarks, specs and everything else in a whirlwind of misinformation, I'm actually expecting no leaks right up until the last minute.

Harison · Nov 2, 2010

Bouncing Zabaglione Bros. said:
You would think so, but given AMD's recent ninja performance at keeping benchmarks, specs and everything else in a whirlwind of misinformation, I'm actually expecting no leaks right up until the last minute.

I guess it will depend if AMD will feel threatened by GTX580 or not. If GF100b is a hard launch and (almost) beats 5970, AMD might want to spoil the launch with a leaked benchmark or two. If AMD wont be threatened (i.e. GTX580 paper launched), they might keep tight info control like with 6800.

Unknown Soldier · Nov 2, 2010

Dave Baumann said:
You know these things?

Barts did not exist on the roadmap in 32nm. Barts in 40nm turned up before the 32nm cancellation.

Which now suggests that Bart was indeed 40nm and using older tech(newer than Cypress of course) while Cayman was designed for 32nm and is a newer/different tech than Bart.

Thx Dave.

I take it two different design teams were working on both chips.

Unknown Soldier · Nov 2, 2010

neliz said:
Maybe AMD can push 5970's against GTX580? After all, that card was devised to fight the 512CC GTX380 way back when.

Well I've heard that 580 won't reach 5970 levels and that bodes well for 6990/5 if there are two 699x cards coming.

caveman-jim · Nov 2, 2010

Unknown Soldier said:
I take it two different design teams were working on both chips.

I imagine design teams can work on multiple projects at the same time, but you still need concurrency to deliver 4 asic's in a 6 month launch window (evergreen). I suspect different teams for each major asic, plus others working on secret sauce and plan b's. and you can make 'special forces' teams using key people from existing teams, to give more functional logical teams out of the same people.

Alexko · Nov 2, 2010

neliz said:
Maybe AMD can push 5970's against GTX580? After all, that card was devised to fight the 512CC GTX380 way back when.

Maybe, but I'd take a pair of HD 6800s over a 5970 any day! Right now, that option is actually cheaper, so…

Mintmaster · Nov 2, 2010

Jawed said:
And there's no evidence that Cypress is load-balancing DS dynamically.

There's no evidence against it, either. Time to generate some one way or the other...

Buffering or some other bottleneck?

That's my point. We don't know.

The design is a kludge founded upon R600, when TF went up to 16, in an era when there were only 4 SIMDs. "DS" (VS, effectively) back then wouldn't have occupied all the SIMDs - it probably only occupied 1.

R600 VS ability was an order of magnitude faster than R580's five Vec5 VS engines, so it's pretty safe to assume that a VS could occupy many SIMDs if the load demanded it.

http://www.ixbt.com/video3/gf100-2-part2.shtml

PN-Triangles near the bottom of the page shows tests for varying TF. TF=19 gives GTX480 a 6.8x advantage over HD5870. According to AMD's marketing, at TF=19, HD6870 is about 20% faster than HD5870 - presumably that's not normalised per clock (HD6870 is 6% higher-clocked).

The framerates are really high at lower TF, though, so driver overhead and fillrate will become a problem. I'll try to do some coding.

I just got a sweet Acer with Redwood (under 4 lbs!

), so I'll try doing some experiments there in a week or two.

Unknown Soldier · Nov 2, 2010

caveman-jim said:
I imagine design teams can work on multiple projects at the same time, but you still need concurrency to deliver 4 asic's in a 6 month launch window (evergreen). I suspect different teams for each major asic, plus others working on secret sauce and plan b's. and you can make 'special forces' teams using key people from existing teams, to give more functional logical teams out of the same people.

Ye, makes sense thanks.

AMD: R9xx Speculation

Harison

trinibwoy

Meh

CarstenS

Moderator

Ailuros

Epsilon plus three

Mintmaster

liolio

Aquoiboniste

mczak

ECH

neliz

GIGABYTE Man

Jawed

ECH

no-X

Bouncing Zabaglione Bros.

Harison

Unknown Soldier

Unknown Soldier

caveman-jim

Alexko

Mintmaster

Unknown Soldier

Similar threads