If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 | |
|
Member
Join Date: Feb 2010
Posts: 327
|
The Big Kepler adds 3.5 billion transistors to the GK104.
Some of the improvements are: reorganized processing cores with new instructions an improved memory system with faster atomic processing and low-overhead ECC So what are the additional changes that Nvidia has added to the Big Kepler that could use lots of transistors? ------------------------ Quote:
Change Drop Down date to Wednesday 5/16 |
|
|
|
|
|
|
#2 |
|
Member
Join Date: Jul 2010
Location: United States of America
Posts: 310
|
Well there's this set of rumors/speculation from 3DCenter (translated) saying 3072 CCs, so that would use lots of transistors. According to that rumor, GK110 seems close to an overall doubled GK104 in terms of basic specs.
|
|
|
|
|
|
#3 |
|
Member
Join Date: Aug 2011
Posts: 371
|
Big K isn't going to just be a doubled GK104. For nVidia, the HPC/workstation segment is bigger than the high-end gpu one. So Big K will likely emphasize 64-bit throughput, with a healthy helping of caches.
|
|
|
|
|
|
#4 |
|
Member
Join Date: Jul 2010
Location: United States of America
Posts: 310
|
I was thinking along the lines of CC count and bus width, but yeah you're right.
But if the 3072 CC stuff is true, then I'm interested to know how they could squeeze that many CCs into GK110, especially considering the additional compute features would presumably make the die bigger for the same CC count. |
|
|
|
|
|
#5 |
|
Member
Join Date: Dec 2009
Posts: 591
|
512-bit memory bus?
|
|
|
|
|
|
#6 | |
|
Senior Member
Join Date: Apr 2007
Posts: 1,394
|
Quote:
More interesting questions are power consumption and possibilities of partly deactivated units on top-SKU. Last edited by AnarchX; 21-Apr-2012 at 08:37. |
|
|
|
|
|
|
#7 |
|
Senior Member
Join Date: Sep 2010
Posts: 1,036
|
How big will its die be?
If they keep the same transistor density of ~12.04 MTr/mm2, then this 7000 M transisotors beast will need around 580 mm2. |
|
|
|
|
|
#8 |
|
Member
Join Date: Mar 2012
Location: Switzerland
Posts: 660
|
Its cause the 7 Billions of transistors ( 7000M ), are not confirmed yet ...
I really doubt Nvidia and their experience of 28nm will end with a 550mm2 chips.. In reality dont forget we are absolutely not speaking about Kepler. We are speaking about a card who could see the daylight in 5-6 month. |
|
|
|
|
|
#9 |
|
Regular
|
merge this pointless thread back with the main one
__________________
Can it play WoW? |
|
|
|
|
|
#10 |
|
Member
Join Date: Feb 2010
Posts: 327
|
You mean make this thread disappear in the useless noise of Bitcoin mining, how much tax the EU adds vs the USA, Physics jobs in Germany vs USA, etc , etc, etc.
If anything the other thread is the bloated pointless thread especially in relation to the Tesla line. Having a thread specifically on the BigK GK110 Tesla/HPC GPU without the above mentioned useless posts is useful. I expect that the GK110 will be fully dedicated to the professional market and would like to see what others expect the additional 3.5 billion transistors have added over the GK104 GPU. And if you really like the other thread so much you can stay and post on that one and ignore this one. Back to the speculation on what is added to make up the +3.5 billion transistors here are the guesses so far: 3072 CCs 64-bit throughput healthy helping of caches 512-bit memory bus |
|
|
|
|
|
#11 |
|
Specious Misanthrope
Join Date: May 2003
Location: Treading Water
Posts: 7,467
|
So should we expect gk110 to be a lot better at bitcoin mining per transistor?
|
|
|
|
|
|
#12 |
|
Member
Join Date: Sep 2009
Posts: 135
|
|
|
|
|
|
|
#13 |
|
Senior Member
|
3072 ALUs
-> 6x GPCs (à 512 SPs) --> 4 SMK to each GPC, 128 ALUs/SMK --> each SMK has ---> 4 groups of 32 ALUs ----> two of which are 64 Bit capable, re-using data-paths from the other ALUs ----> two groups share a quad TMU ----> 4x 32 kiB L1-Cache shared among the ALU blocks, configurable as scratchpad memory in block sizes of 32 kiB. 512 Bit MI -> 8x 64-Bit memory partitions -> 4 GiB default memory size for gaming cards, twice for Tesla, Quadro -> (probably) 2048, rather still 1024 kiB L2-Cache 850 MHz core clock plus advanced turbo (independently clockable GPCs?) and probably 1.40ish MHz GDDR5 speed not pushing the envelope here as much. Making close to 7 bln transistors and 550 mm² die size as agreed upon here. Hm? Plus as Big-K special sauce: - one dedicated physx processor per SMK - a broken and unfixable design *SCNR*
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts. Work| RecreationWarning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration! |
|
|
|
|
|
#14 |
|
Darlek ******
Join Date: Jun 2004
Posts: 9,501
|
I thought physx was adapted to run on standard shaders, hence there is no dedicated physx unit (unless theyve put the ageia stuff onchip)
scnr ???
__________________
Guardian of the Most holy Two Terabytes of Gaming Goodness™ |
|
|
|
|
|
#15 | |
|
Senior Member
|
Quote:
In other words, the PhysX part was a joke.
__________________
"Well, you mentioned Disneyland, I thought of this porn site, and then bam! A blue Hulk." —The Creature My (currently dormant) blog: Teχlog |
|
|
|
|
|
|
#16 | ||
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,877
|
Quote:
Quote:
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles) "[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions." |
||
|
|
|
|
|
#17 | |
|
Senior Member
|
Quote:
WRT to advanced GPU-Boost: Depending on how high you could go when enough GPCs idle I think this could make a difference for serial performance. In other words, depending on how power limited Big-K will turn out to be, the higher your possible gains for compiler-identifyable latency-dominated tasks.
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts. Work| RecreationWarning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration! |
|
|
|
|
|
|
#18 |
|
Senior Member
Join Date: Apr 2007
Posts: 1,394
|
An independent clock for all GPCs or for each GPC?
With ~850MHz base clock, GK110 could offer a much higher Boost, in cases when the performance is limited by the GPCs. On the other hand NV could use this and present a < 3072SPs GeForce version, with ~1GHz clock, since gaming performance favors a faster front-end. |
|
|
|
|
|
#19 |
|
Senior Member
|
What I meant was a common clock throughout each GPC, but individually adjustable, possibly based on available power and maybe even on thread priority or type.
In any case, Nvidia would need to cut down on something if they are going to stay within 300 watts power budget.
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts. Work| RecreationWarning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration! |
|
|
|
|
|
#20 | |
|
Regular
Join Date: Mar 2007
Posts: 8,988
|
Quote:
Likewise, the same could be applied to what CarstenS is suggesting with individually clocked GPCs. Don't compute oriented workloads generally push all compute units relatively uniformly? Hence even the current turbo on GK104 might be determined to be not needed and hence a waste of transistors. IMO, for big Keplar, compute performance will matter most, with gaming performance being secondary. Unlike GK104 where game performance was king and computer performance secondary. Regards, SB |
|
|
|
|
|
|
#21 | |
|
Epsilon plus three
Join Date: Feb 2002
Location: Chania
Posts: 7,768
|
Quote:
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs. |
|
|
|
|
|
|
#22 | |
|
Epsilon plus three
Join Date: Feb 2002
Location: Chania
Posts: 7,768
|
Quote:
If the so far information should be accurate, GK110 might have a slightly higher transistor density than GK104.
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs. |
|
|
|
|
|
|
#23 | |
|
yes, i'm drunk
|
Quote:
2 groups share a quad TMU > 1 SMK has 8 TMUs 4 SMK = 32 TMUs / GPC 6 GPC = 192 TMUs Yeah, I don't see that happening, especially considering that IIRC they're next to useless on GPGPU front? The second, probably wrong way, I read it, would give 48 TMUs which is surely even more off Any way one looks at it, I don't see how they could, even in theory, fit "double GK104" with added GPGPU + FP64 capabilities to chip twice the size of GK104.
__________________
I'm nothing but a shattered soul... Been ravaged by the chaotic beauty... Ruined by the unreal temptations... I was betrayed by my own beliefs... |
|
|
|
|
|
|
#24 |
|
Epsilon plus three
Join Date: Feb 2002
Location: Chania
Posts: 7,768
|
For one it's not twice the amount of units on all fronts (50% more raster/trisetups, 50% more TMUs etc.) and as a close second 7b transistors are almost twice as much as there are on GK104.
In fact there's nothing much that speaks against it considering a hypothetical 550mm2@28nm die; the only other case in point is that this time the desktop high end consumer has to pay for far more transistors than in the past which are HPC related and therefore not invested in 3D performance. With Fermi/GF110 it was roughly 35% more transistors compared to GF114 where the performance difference between the two was give or take at 40%. If now GK110 is let's say 50% faster (which isn't absurd at all assuming those hypothetical specs are true especially considering the shitload of added bandwidth a 512bit bus grants even with relatively low GDDR5 frequencies) than GK104 but at the cost of almost twice the transistors, it's a totally different chapter and possibly also affecting power consumption.
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs. |
|
|
|
|
|
#25 |
|
yes, i'm drunk
|
GF114 wasn't anywhere close to as stripped from GPGPU capabilities as GK104 is. There's far more things GK110 needs to add over GK104 than 110 had over 114 just to for the GPGPU speed
__________________
I'm nothing but a shattered soul... Been ravaged by the chaotic beauty... Ruined by the unreal temptations... I was betrayed by my own beliefs... |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|