Interesting, so 32nm cancellation mainly affected Caymans?You know these things?
Barts did not exist on the roadmap in 32nm. Barts in 40nm turned up before the 32nm cancellation.
Interesting, so 32nm cancellation mainly affected Caymans?You know these things?
Barts did not exist on the roadmap in 32nm. Barts in 40nm turned up before the 32nm cancellation.
Seconded. Typically there should be 1.5 million tris created per frame, not counting the planes, the trees and the buildings, so they say. I guess they're referring to full hd res.Nah, came from email from NV.
Interesting, so 32nm cancellation mainly affected Caymans?
Read my post a little more carefully, Jawed. Load balancing DS across SIMDs is virtually identical to what Xenos onwards does with regular geometry and the VS. The only difference is fetching of control points.It shouldn't be, no. But Cypress had features cut due to the 40nm fuck-up...
It bypasses the bottleneck of the on-chip pipeline, whatever that is.So what is the mechanism by which an off-chip buffer increases tessellation performance (or allows it to increase along with other changes for better scaling in the architecture)?
And that theory seems sketchy to me, because:because a single SIMD is the only place those vertices can go (according to the locked HS/DS theory).
I was asking if you had any data performance vs. TF.6 clocks (you later adjusted to 6.5 clocks) is so slow, even at LOD 25, that it can't get any slower? A comparison with Juniper and Redwood could be useful, as only math would vary.
I think it's quite likely these will also be VLIW-4 and have the architecture changes of Cayman (whatever those are...). Why? Well two reasons mostly: 1) They are launched later, and 2) Barts is a somewhat downscaled Cypress with cut-off features and some tweaks (talking about the shader core). But downscaling doesn't really make sense for the other chips - they don't have some of the features which were cut from Cypress in the first place (like DP, second crossfire connector) and just cutting simds will just make them slower, without making them smaller a whole lot (they are pretty small anyway). So it would be pretty much the same chips except the display/uvd stuff, which sounds a bit pointless to me. Haven't seen any real rumors wrt these chips though - well haven't seen anything reliable for Cayman neither...I know people here are that interested in lower end offering but are their any leaks in regard to the 56xx/55xx known as Turks?
I'm interested in those lowered end derivatives as if naming is consistent the perfs should come close to the actual HD57xx.
I wonder if those cards could be closer to Cayman than Barts.
If Nvidia plans to launch the 580 this month (rumored to be Nov 9) shouldn't we be seeing benchmark leaks of the 6970 soon?
And there's no evidence that Cypress is load-balancing DS dynamically. Cypress easily has the theoretical capability to exceed GF100 on DS math in absolute terms, particularly since DS is easily vec2 at minimum, where NVidia's "scalar sequential" ALU architecture holds no advantage.Read my post a little more carefully, Jawed. Load balancing DS across SIMDs is virtually identical to what Xenos onwards does with regular geometry and the VS. The only difference is fetching of control points.
Buffering or some other bottleneck?It bypasses the bottleneck of the on-chip pipeline, whatever that is.
The design is a kludge founded upon R600, when TF went up to 16, in an era when there were only 4 SIMDs. "DS" (VS, effectively) back then wouldn't have occupied all the SIMDs - it probably only occupied 1.And that theory seems sketchy to me, because:
1. It runs counter to everything that the unified shader architecture is about
Barts supposedly uses "Improved thread management and buffering" for its tessellation performance gain over Evergreen. I've not found any detailed explanation of that. Perhaps those changes are what you're suggesting. They'd match the "refresh" nature of Barts.2. The solution is so simple: Write control points to GDS, and now you can use any free SIMD.
I don't understand those tests.3. Simple tests like Damien's wouldn't result in so few triangles per clock
Yes, some more tests based upon TF would be very welcome.We can test it pretty easily, though. Run a test with a high TF, and use a long DS. If you're right, then performance will be very predictable as the number of verts/clk output by one SIMD running the DS.
http://www.ixbt.com/video3/gf100-2-part2.shtmlI was asking if you had any data performance vs. TF.
Maybe AMD can push 5970's against GTX580? After all, that card was devised to fight the 512CC GTX380 way back when.
If Nvidia plans to launch the 580 this month (rumored to be Nov 9) shouldn't we be seeing benchmark leaks of the 6970 soon?
I guess it will depend if AMD will feel threatened by GTX580 or not. If GF100b is a hard launch and (almost) beats 5970, AMD might want to spoil the launch with a leaked benchmark or two. If AMD wont be threatened (i.e. GTX580 paper launched), they might keep tight info control like with 6800.You would think so, but given AMD's recent ninja performance at keeping benchmarks, specs and everything else in a whirlwind of misinformation, I'm actually expecting no leaks right up until the last minute.
You know these things?
Barts did not exist on the roadmap in 32nm. Barts in 40nm turned up before the 32nm cancellation.
Maybe AMD can push 5970's against GTX580? After all, that card was devised to fight the 512CC GTX380 way back when.
I take it two different design teams were working on both chips.
Maybe AMD can push 5970's against GTX580? After all, that card was devised to fight the 512CC GTX380 way back when.
There's no evidence against it, either. Time to generate some one way or the other...And there's no evidence that Cypress is load-balancing DS dynamically.
That's my point. We don't know.Buffering or some other bottleneck?
R600 VS ability was an order of magnitude faster than R580's five Vec5 VS engines, so it's pretty safe to assume that a VS could occupy many SIMDs if the load demanded it.The design is a kludge founded upon R600, when TF went up to 16, in an era when there were only 4 SIMDs. "DS" (VS, effectively) back then wouldn't have occupied all the SIMDs - it probably only occupied 1.
The framerates are really high at lower TF, though, so driver overhead and fillrate will become a problem. I'll try to do some coding.http://www.ixbt.com/video3/gf100-2-part2.shtml
PN-Triangles near the bottom of the page shows tests for varying TF. TF=19 gives GTX480 a 6.8x advantage over HD5870. According to AMD's marketing, at TF=19, HD6870 is about 20% faster than HD5870 - presumably that's not normalised per clock (HD6870 is 6% higher-clocked).
I imagine design teams can work on multiple projects at the same time, but you still need concurrency to deliver 4 asic's in a 6 month launch window (evergreen). I suspect different teams for each major asic, plus others working on secret sauce and plan b's. and you can make 'special forces' teams using key people from existing teams, to give more functional logical teams out of the same people.