NVIDIA Kepler speculation thread

Please tell exactly how many of these magic 7970 dies can be produced vs the standard 7970 dies for each wafer?
Based on many reviews and forum threads, the stock clock and voltage on the HD7970 is quite conservative. Almost all mention clocks of higher than 1.1Ghz being achievable on stock voltage, and quite a few reach greater than 1.2Ghz on stock voltage. There have also been some impressive undervolting reports on stock clocks.
Essentially, many of the cards shipped to reviewers and the public already contain 'magic dies'.

If a significant number of these magic dies could have been produced then the 7970 would have been clocked higher than 975 mhz.
The default clock of the HD7970 is 925MHz.

Increase frequency and power/temps go up and number of good dies go down.
True, but those 'bad' dies can still be used in HD7970s and HD7950s.

And again if AMD goes down this road so can Nvidia. Quid Pro Quo.
Possibly, although the level of overclockability of the HD7970 on stock voltage has been seldom seen on modern non-cut-down GPUs before.
My suspicion is that the GTX 680 already is such a 'magic die' part. That would explain the naming confusion over the past months (if the 670 Ti was to be the top end part, and the 680 is a higher clocked 670 Ti quickly introduced when Nvidia realised such a part could beat the 7970) and the rumoured specifications showing the 670 not having any components fused off like most salvage parts do.
 
A1xLLcqAgt0qc2RyMz0y didn't Dave already confirm that on Tahiti/7900 series timing was everything instead of getting the "best performance reasonably possible", and that Tahiti is being "looked at again" hinting higher clocked "GHz edition" or "7980"
 
RF cache doesn't reduce the number of registers needed. It just changes their access pattern. It's a cache after all.

I know what a cache is. Again, I thought one of the options in the paper was to make the 2 level register hierarchy explicit, in which case it should be possible to arrange for at least some values with short live ranges never to occupy space in the main RF.

Does using faster GDDR5 mean that latency for off-chip memory access shrinks significantly, or does it just give you a bandwidth boost? Could just be that less threads are needed to hide memory access latency.
 
Speaking of bigK, do you guys think it's going to be a "straight" upscale by 1.5x shaders and almost 2x bandwidth (speculated 512bit)? The cache and register capacity seems good enough for gaming, but will it be enough for compute? Wonder if they keep dual dispatch schedulers for the big chip too, so maybe the architecture of a SMX will be a bit different?
 
2304SPs was often rumored for GK110.

Maybe:
- 4 GPC (12 rastered pixels each?)
- each 3 SMX (192SPs, 6 warp sheduler)
- extended caches

GK110 should be capable of DP @ 1/2 SP.
 
Please tell exactly how many of these magic 7970 dies can be produced vs the standard 7970 dies for each wafer?
Based on many reviews and forum threads, the stock clock and voltage on the HD7970 is quite conservative. Almost all mention clocks of higher than 1.1Ghz being achievable on stock voltage, and quite a few reach greater than 1.2Ghz on stock voltage. There have also been some impressive undervolting reports on stock clocks.
Essentially, many of the cards shipped to reviewers and the public already contain 'magic dies'.


The default clock of the HD7970 is 925MHz.


True, but those 'bad' dies can still be used in HD7970s and HD7950s.


Possibly, although the level of overclockability of the HD7970 on stock voltage has been seldom seen on modern non-cut-down GPUs before.
My suspicion is that the GTX 680 already is such a 'magic die' part. That would explain the naming confusion over the past months (if the 670 Ti was to be the top end part, and the 680 is a higher clocked 670 Ti quickly introduced when Nvidia realised such a part could beat the 7970) and the rumoured specifications showing the 670 not having any components fused off like most salvage parts do.


Actually there's 4 ASIC quality, and each have their stock vcore fixed:
3 have been reported on different forums ( the last one should be under the 1112mV )

- 1107mv ??? ( nobody have seen one yet, but they are reported by TSMC, and some even lower )
- 90% Asic = 1112mV: max reported 1200mv OC
- 80 to 88% = 1115mV: max reported 12225mv OC ( a rare one, i believe they are pushed in the 1117mV line )
- 75 to 80% = 1117mV: max reported ( limited by AB on vcore with stock cooling (under 80% Asic quality ) ( the one to choose now for OC )

Any of them go higher of 1100mhz with stock voltage and can be set at 1150mhz without tweaks. the 1117mV acheving the best overclock tweaked ( 1250mhz+ on stock cooling ), when the 1112mV ( 90% Asic quality ) reach the higher with stock voltage. but cant take the increase of voltage as the other with stock cooling.

I have 2 cards here: One early Sapphire who is 1112mV and one HIS who is 1117mV .. ( who is a problem on stock cooling, one doing 1275+mhz on stock cooling and the other stuck at 1200-1225mhz, but the difference is removed under watercooling then, both going to 1300+ mhz )


As for the fight 7970 / 680... Im not even sure yet a better part is needed, there's allready 1000mhz models available ( DirectCUII, MSI etc ). ( the XFX is a bad example )

If their numbers are true ( BF3 delta of 1.7fps (with 5% admiting difference due to turbo boost), this let LP2, and 3Dmark11... 3Dmark11 is clearly not done on the same system ( score is too low with an I7 3960K for the 7970 ( 6c vs 4 = 5000pnts on physic test difference ) ( the 6cores do a extremely big difference on 3Dmark11, vantage and 3Dmark06 ).

We will need a complete review and bench...
 
Last edited by a moderator:
Well let's take a game at 40 fps over a SLS, then drop each peripheral screen to 25 fps while boosting the main screen to 60 fps. That's lower fps overall but if Nvidia could convince enough game designers that the main screen was what counts...
I doubt that 40ish Fps would be VSynced. What could be nice is a 120Hz screen at the center and two 60Hz displays at both side - all VSynced (or 60/30Hz, for low-Fps games).
 
It seems really unbalanced in terms of compute/(cache+reg file) vs GF104. There is 4x more compute per core and only 2x more reg file
And there is no hot clock, so relative to gf104 Kepler has same amount of regs per sp (192 per 256 Kb vs 48 @ hot clock = 96 @ base clock per 128 kb in gf104)
 
Sorry for OT...
But why is the XFX a bad example?


It will be needed to test them again, but it seems the memory was not clocked enough high, the result look like bottlenecked by it, specially if you compare with the Asus DirectCUII or the future MSI lightning.
( Bad example is a bit hard, let say for the oc applied, the gain should be a bit higher )
The tests of the XFX have been made at the 7970 release, so with early driver ( 8.921 )maybe this have more impact of what the reality is.
 
For those wishing for a 7970 1ghz (1.2 ghz) edition to compete against the GTX680 I really don't see the purpose.

-----------------

Lets say AMD produces a factory overclocked 1.2 ghz 7970 to outperform the GTX680.

A 1050 or 1100 mhz card would be enough to compete with GTX 680 assuming that linked site recently isn't bogus. This already bumps performance up by ~20%.

Wouldn't the TPD of 250 watts have to be raised?

Not at the clocks I just mentioned. Those don't require an increase in voltage. And from reviews it results in a very minor increase in board power consumption. Basically still well under 225 watts.

Now what about pricing?
It would have to be higher than the current $549 and that means even lower number of units sold.

I'd imagine the same price as the GTX 680. At worst the 7970 base version would be moved down or just replace entirely by the new card. And even if that happens early adopters still got a better value than they did back in the Geforce 6800 Ultra and Geforce 7800 GTX days when card prices generally fell in as short as 2-4 weeks after launch. Plus, more below...

And since AMD owners claim they could over-clock the 7970 to that same 1.2 ghz why would they ever buy the factory over clocked 1.2 ghz version.

Easy. All currently clocked 7970's go EOL. If AMD are feeling generous they can potentially release a flashable BIOS that users could use to flash their cards to the "Ghz" version. And even if they don't, it's not like it wouldn't be easily obtainable over the net within minutes of someone getting a "Ghz" 7970.

Or failing that just OC your current card.

And what about the thermals that a 1.2 ghz card would produce. That would mean more/bigger/faster fans and lots of noise.

From what I've seen a measely 1-2 celcius increase in load temp at the clocks I mentioned. For ~20% more perf. Aggressive binning could make 1.2 ghz viable at current voltages, but IMO it isn't really needed.

And after all this all Nvidia has to do is either release a GTX685 with higher clocks or release the upcoming GK110.

Hard to say since not only do we still not know how it really performs, we have no idea how much overclocking headroom there is.

So all in all I do not see AMD releasing a factory over clocked 7970.

I agree, but not for the reasons you stated. I agree, because it is kind of pointless with all the AIB OC'd cards available.

What AMD basically has to do is the same thing Nvidia has done in the past.

Convince review sites to benchmark GTX 680 against AIB factory OC'd 7970's. Just like Nvidia has done for years now everytime ATI/AMD has launched a new card. I'm expecting lots of people that formerly said it was OK to do this when Nvidia did it to say it is unfair if AMD does it. :p Personally I don't like it when either vendor does it.

Regards,
SB
 
Not at the clocks I just mentioned. Those don't require an increase in voltage. And from reviews it results in a very minor increase in board power consumption. Basically still well under 225 watts.

And even if, it's just a number on a spec sheet. The cooling solution has lots and lots of untapped potential and if they want to, AMD can always use Powertune to implement any TDP no. they might want.
 
A Fermi SIMD is physically 16-wide. It only needs 48 input and 16 output regs per hot-clock. If the regfile was providing twice that per clock then it was running at core clock, not hot.
 
Is there any word yet on compute abilities, or more specificly, DP speed? Does it have the "midrange ratio" or similar to that of Tahiti (which was 1:2 dp:sp ratio, limited to 1:4 on consumer boards?)
 
A Fermi SIMD is physically 16-wide. It only needs 48 input and 16 output regs per hot-clock. If the regfile was providing twice that per clock then it was running at core clock, not hot.

You've got a point there. :)

that of Tahiti (which was 1:2 dp:sp ratio, limited to 1:4 on consumer boards?)
You know, i'd really love to see a link. All I've been told was, there's physically 1:4, nothing throttled.
 
2012-03-15_135626-11jfuql.jpg



heaven benchmark with everything maxed above 29fps in surround
3x1080p

http://www.overclock.net/t/584302/ocn-water-cooling-club-and-picture-gallery/18520#post_16733663
 
Back
Top