Nvidia BigK GK110 Kepler Speculation Thread

Unless someone decides to release a game to utilise such amount of memory. Otherwise it's a waste, and most probably it will prove as such given that in the lifecycle of such a product you will not see any games to use it.

Once the next gen consoles come out and you want to run those ports in surround/eyefinity, you'll appreciate the 6GB.
 
But are all of these determined the same way? I mean, at least 580 & 480 had not issues going past their TDP in gaming of all things, while in most cases video cards tend to have healthy margin towards their TDP while gaming

No, on average they didn't.
 
So, assuming there isn't a (large) stupid error :)

Take a 6000x1000 framebuffer as an approximation of eyefinity, 6MB to simplify if it were 8bit and single buffer.
multiply by 4 (32bit) and then by (4 + 3 + 1) : 4x MSAA + triple buffer + zbuffer. About 200MB.

Double that for FP16 everything (roughly, because Z buffer should be 32bit and even the final output should be tone mapped to 32bit) : 400MB

for three 2560x1600 displays, double that : 800MB
three 4K displays, 1600MB but no way you will drive that with your card :)

Maybe your game has resolution dependent buffers, rendertargets of sort, I don't know how to quantify that. But your console has bandwith constraints so their usage won't explode.

Anyway it seems to me that a game that runs on a 2GB card in 1080p will definitely run with 3GB on an eyefinity set up or even has more memory for textures.

Feel free to proof it like mad with 6GB. It's more for intending to keep it long (such as still gaming on it 5 years later), running crazy mods, SVO tech demos etc. than for high res I think.
 
So, assuming there isn't a (large) stupid error :)

Take a 6000x1000 framebuffer as an approximation of eyefinity, 6MB to simplify if it were 8bit and single buffer.
multiply by 4 (32bit) and then by (4 + 3 + 1) : 4x MSAA + triple buffer + zbuffer. About 200MB.
You need MSAA z-buffer as well so it's (4 + 4 + 3), or ~260MB. Also, this is around ~450MB with 8x AA.
 
S
Anyway it seems to me that a game that runs on a 2GB card in 1080p will definitely run with 3GB on an eyefinity set up or even has more memory for textures.

Feel free to proof it like mad with 6GB. It's more for intending to keep it long (such as still gaming on it 5 years later), running crazy mods, SVO tech demos etc. than for high res I think.

The next Playstation seems to have 4GB of GDDR5, out of which 3.5GB is available for games. Although it's split between the CPU and GPU it's a huge upgrade from the PS3. Even current slightly enhanced console ports can be made to use quite a bit of memory on the PC. I'm thinking this will go up soon as you increase settings compared to next gen console settings. 3GB would be a compromise for this and at the suggested price point there just isn't any reason to go 3GB.
 
Is it possible that instead of 3 or 6 GB, they put 4 or 5 GB? I think it was proven with lower end cards that it is possible to put on 192-bit MI 1 GB or something...
 
For many compute tasks, more on-board memory is immensely beneficial, like PT final rendering for large productions/projects.
 
Ah, I was wondering when Nvidia were going to get around to trying to hit the 1,000 USD price point with a single GPU again. And this seems to be it, although it'll be the OC versions of the card likely hitting 999+ USD.

Regards,
SB
 
I think they confused it with K20X.
Nvidia must have gotten at least some 15 SMX dies. Seeing the price and name, I would guess they use a full GK110.
Given NVIDIA's apparently increased emphasis on Tesla for GK110 compared to GF1x0 etc., I'm guessing that any 15 SMX GK110s are reserved for a future Tesla instead.

The source doesn't even give any specs of the GeForce Titan (unless Google Translate is really messing up), it might even have fewer than 14 SMXes and less than 6 GB of VRAM (or it may have 15 SMXes as you suspect). But since it's supposed to have 85% the performance of the GTX 690, it should have a lot of performance in any case.

No mention of multiple variants of GeForce Titan is given, but I'd be a bit surprised if there will be only one variant through the end of 2013.
 
In many different forums, it look like some peoples have jump on the specifications given by Sweclockers thinking the cards was coming with them.


I have translate the article using google to french and it look sweclockers describe what is the K20x we can find in the Oak ridge Titan, not the spec of the Geforce Titan.:

La carte Nvidia Tesla K20X basée sur GK110 avec cœurs CUDA 2688, bus mémoire 384-bit, et 6 Go de mémoire GDDR5. Le circuit contient en fait 2880 unités, mais NVIDIA a désactivé un cluster (SMX), probablement pour des raisons de production. Tout en arrêtant les fréquences d'horloge à relativement faible GPU 732 MHz et 5,2 GHz GDDR5.
Retranslate in english:
The Tesla K20x based on GK110 have 2688 Cuda cores, a 384 memory controller bus, 6go of memory GDDR5, etc etc ..
It is not question in the article the "Geforce Titan" use thoses specifications, Sweclockers just describe the specifications of the K20x.
 
WRT 6 vs. less GiB:
Unless Nvidia wants to go single-flagship, i.e. no castrated versions, you need a differentiator inside the GF-GK110 line-up as well. Depending on die-selection, 3 vs. 6 GiByte could also make the lesser version stay inside 225 watts if Nvidia chooses so. With Kepler they seem to be able to control their cards' power pretty fine-grained.

WRT # of SMs: Compared to GTX 680 at nominal Turbo, even a fully blown K20X only has ~22% advantaged due to it's low clocks. Now, Geforce most of the time boast higher clock rates, but you cannot cut your flagship down so much that no one want to pay the price premium any more.

My take as of now:
1 flagship model with all SMX enabled, 6 GiB GDDR5-RAM and a TDP of >225 watts. I'd think, with OC-Editions, you would be able to go near 300w. If 300 watts is the target, I'd think we can see about 1 GHz, otherwise (235 watts of K20X + 15 watts throttled down fan) probably 850 MHz.

1 harvest-model with 13 SMX and 3 GiB and a TDP of <225 watts, probably enforced by 2x 6-pin connectors. Clocks in the same range as above, for marketing reasons probably a little less.
 
I dont know, the bigger problem i see is the price rumored. Make sense if you think 85% performance of the 690, but not so much in term of "single gpu card".
It is too close of the dual gpu price. What will be the price of a lower version Geforce of this " Titan " ? .. starting price, 700$ ?

- GTX690 sold for ~1000.- USD..
- K20 3400.- USD ( workstation compatible )
 
Last edited by a moderator:
Isn't GK110 a HPC-only chipset with no Tesselator, TMUs and ROPs?
I also read that the next chipset should be GK114 and is only 25% faster than GK104, so it should have 2048 FPUs, while GK110 has 2880 FPUs.
 
No, GK110 is the same identical ASIC whether used in Geforce, Quadro and Tesla products.
 
Aren't TMU and ROP essential?, like places that you can use to input and output your data. Especially the TMU : I thought pixel or other shaders used them to access gobs of data. Removing them means you're doing a quite different architecture instead of leveraging the guts of what you use in gaming GPUs otherwise.
 
I also read that the next chipset should be GK114 and is only 25% faster than GK104, so it should have 2048 FPUs, while GK110 has 2880 FPUs.

Well a SMX is 192 ALUs. You have 8 SMX in a GK104, and four GPC. a GPC being a "rasterizer".

GK114 (or maybe is it called GK204) may have 10 SMX, so 1920 ALUs.
If it has two more, that would be 2304 ALUs.

In between, an odd 2112 ALU chip would be possible (with three GPC "managing" three SMX and one managing two, or four GPC managing two SMX and one managing three)

What's the GPC to SMX relationship? on GK 10x it seems you have two SMX per GPC, except the GK106 which has a third GPC that goes with a lone SMX.
With GK110, you have six GPC and 15 SMX I think. So each GPC has two or three SMX under its "command". I have trouble understanding the role of a GPC, by the way. Do they have a "geometry" role (doing triangle set up and whatever things) only, are they perfectly unneeded when you do computation only and no rendering?
 
Last edited by a moderator:
GK110 has 5 GPCs (raster/trisetup units) according to NV's CUDA programming guide (thanks Carsten) and for the typical hairsplitting each SMX/cluster has 192 SPs or 6*SIMD32. As for the GPCs themselves it's my understanding that both raster and trisetup units are dedicated fixed function units.
 
As for the GPCs themselves it's my understanding that both raster and trisetup units are dedicated fixed function units.
The primitive setup units are dedicated blocks. And most likely NV doubled the scan-out rate, from 8 to 16 fragments per setup unit, to go with the vastly increased processing rate.
 
Back
Top