NVIDIA Kepler speculation thread

If silent_guy doesn't like Kepler speculation in a Kepler Speculation thread, then he doesn't have to read it ...
- perhaps he should start a 'Kepler Known Facts' thread, and a have nice quite time there!
:rolleyes:

I think you're missing the point. You aren't speculating at all, you're throwing nonsense at the wall and making assumptions about how tacky it might be. The numbers that you have created have no basis in anything except some imaginary math that doesn't compute the way you think it does.

So, rather than getting upset at the fact that your arbitrary non-logic is being called into question, howabout instead trying to learn WHY your speculation is utterly absurd. You'll make more friends (or, if you don't care about that, you'll last longer on this forum) if you can follow along intelligently.

If you can't follow along intelligently, it becomes time to stop posting about it :) You don't see ME in here writing speculation, because it's beyond my current abilities and I don't have the time to get myself up to speed on all the new architectural nuances of either platform.
 
The point being that since there are 33% more shaders, and a total of 62% more transistors, it's obvious that the CU uses significantly more transistors per ALU....
It's not so obvious this is due to dropping VLIW actually as a lot more changed than just dropping VLIW. Double the LDS per CU, read/write cache, wider memory interface, ECC memories, etc.
 
If silent_guy doesn't like Kepler speculation in a Kepler Speculation thread, then he doesn't have to read it ...
- perhaps he should start a 'Kepler Known Facts' thread, and a have nice quite time there!
:rolleyes:
Instead of belaboring that point, let's try to do something constructive.

(But before that: anyone willing to buy a Cayman and a Tahiti based GPU and run it through an X-ray scanner or destroy it to finally get a decent die shot? Same thing for a GF114. For science! If it weren't for the fact that I don't even know where to start to do this, I'd honestly consider it. ;))

Quantifying:
Cayman has 1536 ALUs using 2.64B Transistors
Tahiti has 2048 ALUs using 4.31B Transistors
(substitute your own transistor counts if you disagree with these)
Comparing VLIW4 with GCN, then
--> 63% more transistors yields 33% more ALUs
--> therefore the overhead to support GCN vs VLIW4 is 22%
RV770 is the last AMD die for which we have a nice die shot. I'm using the one on TechReport. Fire up your favorite image editing tool and you end up with the shaders taking up 29%, tex units 12%, and MC/ROP/geometry combined 33% of the die.

Using those numbers, try to fit everything for a Cypress: apply linear scaling to the number of individual units, correct for process, apply a whole bunch of fudge factors everywhere until you match the known die size. (Potential error introduced: large, but hopefully better than not modeling anything at all.)

As a final step, use the same fudge factors for Tahiti but apply linear scaling again. You'll end up with a chip size of ~300mm2 and a shader core area of taking around 31% of the die size or 92mm2. We know that Tahiti is in reality 365mm2. If all this area is due to GCN, we end up with GCN costing 70% over a Cypress with similar amount of units.

Quite a difference compared to your 22%. Are my numbers in any way closer to the truth than yours? I have absolutely no idea, which is kinda my whole point. :LOL:

I don't account for the L2 caches. I fudged the fact that the number of MCs doesn't track the number of ROPs anymore. I have no way to estimate size of the uncore and geometry (same is true for Cypress, of course). What's the impact of doing away with the tex-specific L1 cache? etc.

The hysterical laughter you hear are AMD folks reading this post...
 
As a final step, use the same fudge factors for Tahiti but apply linear scaling again. You'll end up with a chip size of ~300mm2 and a shader core area of taking around 31% of the die size or 92mm2. We know that Tahiti is in reality 365mm2. If all this area is due to GCN, we end up with GCN costing 70% over a Cypress with similar amount of units.

Quite a difference compared to your 22%. Are my numbers in any way closer to the truth than yours? I have absolutely no idea, which is kinda my whole point. :LOL:

So, you are supposing that with GCN, the Core takes up a much greater % of the die area than with previous generations?
- so, my original point still stands
- i.e. with this generation, we're seeing AMD & NV come much closer together architecturally, and the perf/mm^2 advantage that AMD has enjoyed in recent generations will be substantially reduced.
;)

The point of my simplistic numbers was that the effective cost of adding GCN is not the cost of the GCN cores themselves, but the impact these cores have on the overall die size.
As we can see Tahiti has 22% less peak Shader performance compared to what a similar number transistors would have provided using VLIW4/5
- since I don't think the Uncore / core ratio will have changed much....
(since at the very least the MC & DDR5 interface are +50% on Cayman)

You're arguing that the effective cost of GCN is actually greater, because Tahiti has a great % of die area dedicated to the core.
- I'm not going to argue with your superior knowledge on the matter!
 
Last edited by a moderator:
Why don't you compare Juniper and Cape Verde? They are a bit closer together with at least the same amount of memory controllers, ROPs and Front-Ends; Cap Verde also having the cheapest possible DP implementation for GCN at 1/16th.
 
You're arguing that the effective cost of GCN is actually greater, because Tahiti has a great % of die area dedicated to the core.
- I'm not going to argue with your superior knowledge on the matter!

maybe English isn't your first language and his point gets lost along the way, but he isn't arguing what you think he is at all.......
 
maybe English isn't your first language and his point gets lost along the way, but he isn't arguing what you think he is at all.......

He's arguing on a speculation thread that we shouldn't speculate about things we don't know about... because that would just be too speculative ...

Look anyway, I'm not going to troll here. I think I made it clear what my opinion is about the way AMD & NV are converging on their architectures
- and that the perf/mm^2 advantages that AMD had will disappear since they have had to adopt the same kind of compromises that NV has chosen to do for the last several generations (since the G80, IIRC)

You wanted some numbers, so I plucked some out of my ass, but I actually don't think they're all that unreasonable given that they were highly speculative...
 
CarstenS said:
Why don't you compare Juniper and Cape Verde? They are a bit closer together with at least the same amount of memory controllers, ROPs and Front-Ends; Cap Verde also having the cheapest possible DP implementation for GCN at 1/16th.
Yeah, that's an exercise for the reader (and I thought about it too late.) Maybe an idea for an article in your magazine? ;)
 
He's arguing on a speculation thread that we shouldn't speculate about things we don't know about... because that would just be too speculative ...
No, I'm arguing against stupidity.

You wanted some numbers, so I plucked some out of my ass, but I actually don't think they're all that unreasonable given that they were highly speculative...
From a macro perspective, you could say something like '4.1 billion transistors for so many FLOPS'. There is at least some amount of value in that.

But this is what triggered my initial response:
... --> therefore no increase in GPGPU burden, but ditching the hot-clock gives massive benefit in terms of ALU density....
(emphasis mine.)
You use some rumored numbers (nothing wrong with that). Your number of transistors is a wild guess (+-20%?) and the die size was estimated by Charlie to be between 320 and 360. Who knows if the number of shaders is correct. The total hot clock area is probably only ~20% (+-5% points), already within the error variance of your starting numbers. Yet using only those fuzzy macro numbers, you make a sweeping statement about what's a fairly small part of the die: "massive benefit in terms of density".

Given a hotclocked ALU and an equivalent non-hotclocked ALU. What do you think the difference in area is going to be? 30% smaller? You're mostly going to save 50% (or less!) on FFs, and much less on other gates after all. Apply this 30% to an area that's only 20%(+-blah) of the total die to begin with. There is no way the area difference will fall outside the error margins of your original numbers. It's like arguing 1% differences in election polls that have an error margin of 5%.
 
Quantifying:
--> therefore no increase in GPGPU burden, but ditching the hot-clock gives massive benefit in terms of ALU density....

You're making a lot of assumptions, but the largest assumption is that Kepler shaders are clock/clock about as powerful as those in Fermi. It's entirely possible that Nvidia managed to cram in so many more shaders not just because they dropped the hot clock, but because their new shaders have been simplified to be less powerful than those in Fermi.
 
No, I'm arguing against stupidity.
No you're arguing from stupidity!
:rolleyes:

From a macro perspective, you could say something like '4.1 billion transistors for so many FLOPS'. There is at least some amount of value in that.

Gee wizz, you don't say?

blah blah blah
pointless election analogy

Look, this is a speculation thread.

There is absolutely no value in you coming on here saying we don't know what the shader count is, or the die size, therefore we cannot speculate

At this stage in the 4-6 week window before release, the numbers usually firm up pretty well.
- at the moment, in terms of rumors, there is only one spec that everyone is repeating - which is 1536SPs, 950MHz Clock (no hot clock), 256-bit bus, 5GHz DDR5, a Charlie die size of 320-360mm^2, performance around the HD7950
- so, any or all of these factoids could be right or wrong, but if you're participating in a speculation thread, then it's obviously got to be fair game to speculate about the most viable current rumor.

So, if you accept these factoids as a group, then it's perfectly reasonable of me to say
"but ditching the hot-clock gives massive benefit in terms of ALU density."
- i.e. there is no hot clock, and the numbers of SPs has effectively doubled from where you would expect them to be from say a straight 2x 28nm version of a GF114.
- sure there are other factors - such as the memory bus staying at 256-bit, and therefore being a smaller proportion of the die...
- but I don't see how you can call
"but ditching the hot-clock gives massive benefit in terms of ALU density."
anything but the bleeding obvious.

And in terms of the ALU, i.e. SP, it's a Multiply-Add block, it can't get much simpler, and not cease to be an SP.
- and if it were significantly less powerful, then 1536 new-SPs @950MHz wouldn't be able to give an HD7950 a run for it's money.
- so we can safely assume, for the purposes of a speculation thread that these are your typical, common or garden, SPs.

If you don't enjoy the speculation of others, then you should look elsewhere for your entertainment.
 
It still doesn't weigh anywhere near 4b transistors. And yes there are many ways to get any kind of unit "simpler" (and its respective surrounding logic in extension), be it SP, TMU, ROP or whatever else, just because N unit != N unit unless you know that both have the exact same capabilities.
 
@whitetiger
A little speculation is fine, but you're drawing conclusions from values you derived from other values which you pulled from your nether regions. It's not a useful starting point for meaningful conversation.
 
For those of us a little... out of the loop and fuzzy... could someone give/link to a Nvidia code-name overview of market segments for the current chips and the last generation architecture. From the above picture it looks like for Kepler:

GK110 = High End
GK104 = Performance
GK106 = Mainstream
GK107 = Mainstream (lower?)

You would think they would either flow up or down but it seems the model numbers don't correspond linearly and iirc this is a common thing with Nvidia. If I am wrong I blame my headache ;)

@ silent_guy: Thank you.
 
whitetiger said:
If you don't enjoy the speculation of others, then you should look elsewhere for your entertainment.
If you had used speculative numbers to conclusively prove that hot-or-not made a major impact on ALU density, I'd be totally fine with it. Hurray for speculation!
Unfortunately, you didn't and I pointed out that current numbers don't warrant any such conclusion. That's not a crusade against speculation. It's a crusade against not going the extra mile. But I'm not going to convince you of that point, so I'll leave it at that.
 
Back
Top