NVIDIA Kepler speculation thread

Can probably get close with less bandwidth.

That's what "scratches its back" actually implies.

Then again you'd wonder why if the chip had more potential they'd cripple it so.

Probably in order to fit into the specific performance price/performance league. In such a case it's not "crippled" at all, especially if the rumored MSRP for it is correct. A $300 SKU isn't obviously high end, since high end will be in a completely different price/performance league.

Counter-question: the 7970 looks like an outstanding overclocker. Any idea why AMD didn't clock it at say 1.0 or even 1.1GHz? You'd take vendors some part of their business out of their hands, overclockers won't have as much a joyride as right now and the performance gap to the 7990 inevitably shrinks. Is the TahitiXT in that regard "crippled"? It's IMHO clocked at its sweet spot the majority of aspects encounted.
 
Ailuros said:
The ever repeating rumors want GK104 to scratch Tahiti's back and the first question mark that pops in my mind how it's possible since the first most likely has less bandwidth
Dally is a pretty big fan of bandwidth...
 
Dally is a pretty big fan of bandwidth...

Which bears the question given the obvious time constraints of how much influence Mr. Dally really could have had in Kepler after all and probably even before that if for the given price point more bandwidth makes even sense.
 
He will have been at Nvidia 3 years in 2 days (time flies). I'd say that is long enough to have some influence on Kepler....
 
Obviously. But then again I didn't say that he might not have any influence at all, just asked how much influence he could have had.
 
Didn't he leave Nvidia already some time ago and was succeeded by Cray's CTO as Chief Scientist? Or is he still a fellow?
 
Anyway if anyone would had asked me for a guess not too long ago about different upcoming NV architectures I would had said that the two level/register cache ideas are for Maxwell and hotclocks gone for Echelon/Einstein.

The RF cache and scheduler hierarchy actually seem to be a straightforward evolutionary enhancement to Fermi and fit perfectly with the current architecture. It already has the operand collectors and schedulers in place. I wouldn't be surprised to see both developments in Kepler. After all the RF cache is of little benefit without the smaller active thread pool so they will most likely arrive together.
 
Not sure if this has been posted

gtx680gpu-za54qo.jpg


Link
 
I'm not so sure. It looks more real than those slides which said GK110 had 2048 shaders. Honestly it looks like something Nvidia would do, 512bit bus, 1024 shaders, we need to see the ROPs but if it is 128, then it is basically 2x GF110, which would make some kind of sense. If they got an ideal shrink at 28nm, that would give a GF110 equivalent a size of around 290mm^2, double that is 580mm^2, add in some efficiency savings, cut a bit of superfluous stuff from Fermi and get that die to around 500mm^2 and it could be the real deal. Obviously without seeing the clock speeds and whatnot we can't make a judgement on the performance, but if they have got 2x GF110 in terms of performance then it will definitely be an interesting battle.

It also means a GK104 with 768 shaders and 96 ROPs would be competitive with 7970 and come in at around or under 400mm^2.

I'm not saying it's real, but it has a whiff of not being fake about it. That stuff about 2048 shaders and 4GB GDDR5 didn't make sense. Nvidia have never doubled up like ATi so 4GB on a 512bit bus never made sense. 2GB makes a lot more sense, and for all of the talk about Nvidia not going big die bla, bla we had no evidence over that assertion. I think this has a better chance of being the real deal than anything else we've seen so far.
 
There's a whiff of something all right....

Both those pictures are old and have been proven fake separately. Putting them together in a random blog post doesn't change that.
 
I'm not so sure. It looks more real than those slides which said GK110 had 2048 shaders. Honestly it looks like something Nvidia would do, 512bit bus, 1024 shaders, we need to see the ROPs but if it is 128, then it is basically 2x GF110, which would make some kind of sense. If they got an ideal shrink at 28nm, that would give a GF110 equivalent a size of around 290mm^2, double that is 580mm^2, add in some efficiency savings, cut a bit of superfluous stuff from Fermi and get that die to around 500mm^2 and it could be the real deal. Obviously without seeing the clock speeds and whatnot we can't make a judgement on the performance, but if they have got 2x GF110 in terms of performance then it will definitely be an interesting battle.

Or the GTX680/2GB is just a GK104 after all. The naming scheme isn't any serious indication IMO to immediately think that it'll be a high end SKU. A 2GB framebuffer can as well fit on a 256 as on a 512bit bus. The biggest joke about that one is the hypothetical transistor count.

It also means a GK104 with 768 shaders and 96 ROPs would be competitive with 7970 and come in at around or under 400mm^2.
Wait you're suggesting above (unless I've misunderstood you) that the GK110 might have 64 ROPs; on a theoretical 512bit bus (which is by far just an assumption based on the nonsense that floats around) it makes sense; why would the GK104 would have all of the sudden 96 ROPs? Unless you mean for both 16 ROPs per ROP partition? The first best question would be what for exactly (ironically I asked the very same question about the initially rumored 64 ROPs for Tahiti before it's launch) and since as many ROPs/partition sound like quite a waste of transistors I'd rather think of something like single cycle 8xMSAA instead while sticking with 8 ROPs/partition as up to now.

I'm not saying it's real, but it has a whiff of not being fake about it. That stuff about 2048 shaders and 4GB GDDR5 didn't make sense. Nvidia have never doubled up like ATi so 4GB on a 512bit bus never made sense. 2GB makes a lot more sense,
Assuming that the top dog has a 512bit bus after all, then I don't see why 4GB wouldn't make sense. If it's a 384bit bus after all would you say it'll still contain only 1.5GB? With a 512bit bus there's no specific need to go for very high memory frequencies, which makes ram in such a case somewhat cheaper.

I'd be personally quite surprised if their performance chip won't have a 2 but a 1GB framebuffer and their high end solution either just 1.5 or 2GB ram (depending on final buswidth).

and for all of the talk about Nvidia not going big die bla, bla we had no evidence over that assertion. I think this has a better chance of being the real deal than anything else we've seen so far.

I must have missed that first nonsense. If it would be true both AMD and Intel would have a joyride in the HPC market. I'm willing to take one bet that I'm absolutely sure I won't lose; none of the so far speculations for GK110 are even in the slightest correct.
 
The RF cache and scheduler hierarchy actually seem to be a straightforward evolutionary enhancement to Fermi and fit perfectly with the current architecture. It already has the operand collectors and schedulers in place. I wouldn't be surprised to see both developments in Kepler. After all the RF cache is of little benefit without the smaller active thread pool so they will most likely arrive together.

I could be wrong but I think the RF cache work is too recent to have been included in Kepler.
 
Back
Top