Nvidia BigK GK110 Kepler Speculation Thread

Anyway, what will need AMD to close the gap ? 25% of SP in more, who will bring their gpu to 5120Tflops ( 1000mhz ) or 5200Tflops SP / 1.3Tflops DP ( 1050mhz ) ..

40CU, 1050 mhz, 384bits bus and 3gb memory... going from a 32CU 7970 ..
 
Anyway, what will need AMD to close the gap ? 25% of SP in more, who will bring their gpu to 5120Tflops ( 1000mhz ) or 5200Tflops SP / 1.3Tflops DP ( 1050mhz ) ..

40CU, 1050 mhz, 384bits bus and 3gb memory... going from a 32CU 7970 ..

Wouldn't heat and power become a problem at this point.

AMD would have to go non-reference like cooler and that would add dramatic costs to the card.

I don't think chips are free to increase size/components while maintaining the clocks of their smaller brethren. Look at gk104 vs gk110. There had to be a substantial compromise. Gf110 to gf104 and etc. They might have this luxury if they were moving to a new node but we know this isn't happening soon.

GK110 already uses the same amount of power as a ghz edition(especially when you consider keplar has twice the amount of memory, which should equalize much of the difference in reviews where keplar uses more power). If AMD makes their chips bigger, adds more components, its going to have to reduce clocks as well or increase power consumption. This isn't a simple respin of the same chip(e.g 4870-4890 or gtx 460->gtx560), so clocks aren't going to get higher necessarily and efficiency isn't going to necessarily increase. They will go through some of the growing pains Nvidia's has had to go through when making a bigger chip(which they have less experience with).

The chip you imagine would be brand new chip considering the changes and on the same process. They might get a manufacturable chip after a couple revisions, but off the bat I don't see how they are going to maintain the same clocks while making the chip dramatically bigger. Add in AMD might need to increase the memory bus to 512bits bus to accommodate the improvements you envisioned as AMD's architecture seems more sensitive to bandwith than Nvidia's this generation and you are looking at something that isn't as simple as slapping more shaders, ROPS, etc, keeping the same clocks and expect it to beat titan.

And the specs you mentioned might not be enough. The gtx 680 produces 23% less tflops compared to a regular 7970 and if we ignore memory bound situations which we should unless AMD increase their bus to 512bits, it is faster in gaming situations. So take gk110's tflops(4.5) and multiply it by atleast 1.23 and you got more than 5.2 Tflops.

AMD has one more problem, W9000 is basically a failure. You don't hear anything about the sales of it and it basically loses or ties(hothardware and tomshardware, AMD made anandtech postpone his review) in the reviews out there to last gens quadro 6000. If AMD is serious about getting into the professional workstation market, its going have to do something serious about the design of GCN and this is tied to designing a big chip. To justify the manufacturing cost and R and D, something that big and is a competitor with titan, not only does it needs to compete against it for gaming performance, but applications performance(not open CL benchmarks). With this I can imagine the chip getting even bigger.
 
Power will be a limiting factor ofc, and there i cant tell if something can be made about it . But 40CU instead of 32 is just increase of 8x64 "SP" (ofc many will be added on other part who will need balance the number of CU ( on the wavefront side etc ).. taken the size of the CU in the complete GCN.. the increase is not so big. There's many tweaks still available on GCN, many things have not been added on the 7970 due to the timeline. And surely many things can be done for increase the efficiency of the mArch.

I dont think AMD have decided to release the next series by adding only 2CU anyway.

At the same time, i dont see why AMD should go for a 512bits, this will be a massive error.. a 512bits with 4Go or 8Go will ask too much size, and become a real problem for balance the chip ( on ROP indeed ).. It was one of the error of the R600.. a Big 512bits "ring bus" but without enough ROP and enough memory for be efficient. (i remember as the card was a beast without AA and suddenly drop in perf with AA enabled.. anyway the card was a monstruous beast for benchmarking and overclocking, there the 2900xtx was finally capable to use his full potential )

AMD is what under the Titan ? depending ther review 20-25 to 30% .. the 680 at 40% .. I can be wrong, but the evolution of the GCN + at least 25% more shader should attain this without problem. GCN have allready a 384bits bus and the same memory bandwith.. less rop, less TU.. increase efficiency of the arch. ( who is allready really good ), push more shader and the gap should be easy to close.

As i dont think Nvidia will release a GTX780 faster of Titan i use it as base ( lol, this will be criminal to sell the Titan at 1000$, and then release a GTX 780 at 500$-600$ who is faster. Even if they can argue on the DP performance )
 
Last edited by a moderator:
There are things you learn about your architecture and the process only by doing. Tahiti was the first of a new arch on a new process. They learned things about the process that they were able to incorporate into the rest of the lineup. The same can be said for optimizing the architecture and making tweaks to boost efficiency.

Without a die shrink I can think of two situations that would be relatively similar;
R520-R580, if you weren't around for that it is roughly the GF100-GF110 situation with an arch tweak.
RV670-RV770, 50% more transistors, 33% larger die, 50-60% increase in performance for a 40w increase to TDP.
 
Last edited by a moderator:
RV670 -> RV770 is an extreme example, because the R600 generation had fundamental flaws and a balancing based on mispredictions. It is very unlikely that a similar jump in performance and efficiency will occur at any HW vendor from a working-as-expected line of products to their successor.
 
RV670 -> RV770 is an extreme example, because the R600 generation had fundamental flaws and a balancing based on mispredictions. It is very unlikely that a similar jump in performance and efficiency will occur at any HW vendor from a working-as-expected line of products to their successor.

I agree, that's why I said "relatively similar situation."

The point is, sheepdog implying that AMD is at a brick wall until 20nm is false.
 
AMD would have to go non-reference like cooler and that would add dramatic costs to the card.

Well, they've got $550 to work with so I don't think a non-reference cooler would be a problem.

GK110 already uses the same amount of power as a ghz edition(especially when you consider keplar has twice the amount of memory, which should equalize much of the difference in reviews where keplar uses more power).

Most of that 6Gb will be idle so really you're only talking 2GB in use and the other 4GB will be adding a tiny amount of idle power.

Regardless, I don't see AMD even attempting to beat it with 7-series single gpu.
 
I agree, that's why I said "relatively similar situation."

The point is, sheepdog implying that AMD is at a brick wall until 20nm is false.

Not a brick wall, but not a simple task which I think people are underestimating.

Between that generation(rv670 and rv770), the architecture itself was in its infancy(the unified shader design) and as a result, AMD had the most to gain at the time. Basically it was fixing a broken architecture, by increasing the amount of shaders it possessed as it was completely underpowered in that sense.

There's a problem with doing that now. Both companies have balanced the architecture to the point where simply adding more shaders won't do much by itself. Look at how closely a gtx 680 and 670 perform with identical clocks, same with the 7970 and 7950. They perform almost identical with AMD's case being the closer of the two.

With both companies, you have a relatively big chips(rv670 was tiny), with such so many shaders, adding more doesn't add performance linearly. They have already reach the point of diminishing returns unlike the rv670 gen. It adds very little if anything.

As a result, what both companies have had to do is one of two things to eek out more performance, change the design of these shaders and the rest of the chip(e.g VLIW5, VLIW4 and now GCN), to increase utilization which probably won't happen unless AMD intends to bring out its 20nm parts very late, like end of late. If AMD intends to make GCN more efficient and more compute to tackle the professional market, expect power consumption to go up as the complexity of the shaders go up, similar to what happened with VLIW4. GCN's power savings is primarily related to 28nm.

Or they balance the rest of the chip out and bump of the specs of everything like Nvidia has done with all their monolithic chips which add drastic changes in size. This forces AMD to make a drastically larger chip. The risk of this is clocks speeds typically fall the bigger you make the chip and yields get worse.This was the story of Fermi's life, and Nvidia earlier monolithic designs. AMD's current power consumption with adding clocks with this generation is frightening. Increasing the die significantly is likely to force AMD to decrease clocks or get a couple revisions going which I don't think AMD has prepared for. Designing a big chip isn't easy. AMD never designed a big chip like r600 again because of this difficulty. AMD probably never prepared for making a super size chip because they probably would have never thought that gk104 was going to best their 7970.
 
I agree, that's why I said "relatively similar situation."

The point is, sheepdog implying that AMD is at a brick wall until 20nm is false.

I don't think it's a brick wall, but probably a wall AMD does not deem worthy the effort to climb it.
 
RV670 -> RV770 is an extreme example, because the R600 generation had fundamental flaws and a balancing based on mispredictions. It is very unlikely that a similar jump in performance and efficiency will occur at any HW vendor from a working-as-expected line of products to their successor.
There is a significant delta in power draw even from Tahiti to Pitcairn & Verde due to the maturity of the design rules. Processes themselves do not stay static either, even within the same node.
 
Ok so that was an oversimplification, but it's not like there is 6GB of data constantly being written to memory in the average game, at least not at 2560x1600 and lower. It's no more valid than Tahiti's 3GB being to blame for worse power draw than the 680.
 
Back
Top