Nvidia Pascal Announcement

Nvidia has already listed in some SEC documents that they have some chips in production at Samsung. That should be enough of a motivation for TSMC to offer competitive wafer pricing.

Well they didn't exactly say that they have chips in production at Samsung. This was the SEC filing from March 2015 - "We do not manufacture the silicon wafers used for our GPUs and Tegra processors and do not own or operate a wafer fabrication facility. Instead, we are dependent on industry-leading foundries, such as Taiwan Semiconductor Manufacturing Company Limited and Samsung Electronics Co. Ltd., to manufacture our semiconductor wafers using their fabrication equipment and techniques."

AFAIK they didn't have any chips in production at Samsung at that time...

Also..while of course they would play off each foundry against the other..they are heavily competing with Apple and Qualcomm for wafer capacity.
Nvidia has much higher volumes overall, so that's a factor also.

And the smaller die makes for better yields. It's obviously possible that GF has better detect density than TSMC.

But even if the die price is the same, Nvidia still has the benefit of having less DRAM chips, at much higher volume. And a cheaper power solution.

In a price war, AMD would lose.

Yep..pretty much agree with you on all this..the only slight benefit AMD may have is lower overhead.
 
You're seriously saying that even the initial batches were priced lower than FE in Germany? Either you got majorly screwed by "higher-than-rest-of-the-world" with FE, or you were the only country with cheaper AIBs when they first arrived, US, UK, Finland, Sweden etc etc all had first AIBs priced cheaper than FE's at the time
I am seriously saying what I said. I don't know where you're from, but here, prices for endusers mandatorily include VAT/sales tax whatever. Only shipping fees can apply. Maybe that makes a difference to what you're used to?

Here's one of the FEs (typcial), starting from May 20th at 789 and recently come down to 720-730 EUR.
http://geizhals.de/?phist=1441847

And here the firs t(IIRC) AIB card, 659 since it's first listing on May 30th:
http://geizhals.de/?phist=1449278
(not in stock currently, but I know for a fact that there were some at different shops in the past.)

Aaargh.. FE bullshit again..
It's pretty much like saying "hey, the price is actually $300 but don't worry because it may go down to $250 eventually in some months' time".
I wouldn't like it either if someone forced at the point of a gun to go and buy these things. Lucky me, that's not the case.

The RX 480 is going through the exact same thing. $230 announced for the 8GB and the cheapest I can find is over 300€ in local stores (and even amazon has raised their prices?)..
Same story as for 1080/1070 here: Readily available here in GER after applicable taxes etc. for 260-270 since launch.
Heck, even a few select shops had limited stock of the 4-GiByte-models (of one of which I am the proud owner now) for 219 EUR. Granted, maybe it is even true that AMD seeded these in order to fulfill it's promise of 199 US-$ starting price (+taxes).
 
Last edited:
I am seriously saying what I said. I don't know where you're from, but here, prices for endusers mandatorily include VAT/sales tax whatever. Only shipping fees can apply. Maybe that makes a difference to what you're used to?
Finland here, VAT (24%) always included in prices, no extra fees on top of that.
Cheapest Founders Editions were 799€ on launch http://hintaseuranta.fi/tuote/msi-g...edition-8-gb-pci-e-naytonohjain/4685625#trend
Cheapest AIBs (like Asus Strix linked here) same 799€ on their respective launch http://hintaseuranta.fi/tuote/asus-geforce-gtx-1080-gaming-8-gb-pci-e-naytonohjain/4685614#trend

Some AIB versions which came week++ later than the first AIBs started under that, but the first round of AIBs were all there. Currently cheapest Founders is 769€ and cheapest AIB of all 729€
 
I'm still sort of confused... I can't see how it wouldn't be faster to just do this in "software". Even quantizing after every instruction should only be ~2-4x slower and even handling details like specials and denorms I'm sure you could do it faster than 1/64. Why even bother with the hardware at all?
It's all for compatibility purposes. Implement a hardware FP16x2 unit on GP104 exactly as there is on GP100 so that GP100 CUDA programs can be written and debugged on GP104. But don't make it fast enough that you'd actually want to deploy your application on a GeForce instead of a Tesla.

A software solution would be faster (relatively speaking) and could behave slightly differently from GP100, two things that NVIDIA does not want. It is very, very well executed market segmentation. And NVIDIA sees that as more beneficial than enabling fast FP16 performance on the desktop for games.
 
So, GTX 1060 pricing has been announced for Europe including taxes etc.
Germany: 279 EUR for AIB (AIB: 234 EUR excl. taxes), 319 EUR for SLFE which is only available through the nv-online-shop in UK, GER & FRA.
 
A software solution would be faster (relatively speaking) and could behave slightly differently from GP100, two things that NVIDIA does not want. It is very, very well executed market segmentation. And NVIDIA sees that as more beneficial than enabling fast FP16 performance on the desktop for games.
Yeah I get that, it's just that even if you wanted to get bit accurate results you could almost certainly still do it faster in software than 1/64. I guess it's maybe a tradeoff where adding one unit per SM or whatever is cheap enough to avoid the software hassle in the first place though.

In any case it still sort of sucks that you get segmentation in an area that would actually benefit lower power stuff more. Doubles aren't a big deal because only HPC folks who don't know how to code really need them (I tease as someone who used to do that stuff so I've earned the right :)).
 
The early posts on the FP16x2 functionality indicated that in GP104 the instructions were emitted with barrier flags as if they were being handled similarly to SFU instructions or other shared hardware.
That's not going to change the computational results, but if true it's a difference. Aside from the area cost and segmentation, it might have benefits to put a generally non-standard instruction type on a port that already handles the more varied behaviors compared to the SIMD issue ports.
Given GP104's divergence in other parts of its configuration that leave some similarity to Maxwell, perhaps it saves some implementation effort by providing ISA support while leaving the more important SIMD execution loop of the SM undisturbed?
 
I am interested to upgrade from 980Ti to 1080Ti. But i get conflicting views as to what is a 1080Ti..?

GP100 vs GP102...is the number difference just to denote a HBM and non-HBM memory controller?
Has Nvidia ever produced and maintained 2 separate lines of 'Big' gpu?

Because GP100 have low boost clocks, around the same as GM200, and we seen Pascal relies heavily on a 2Ghz boost. If 1080Ti is to be ~40% faster than 1080, then it needs to have GP100 CC and 1080 boosts, if not the additional CC at 1.4Ghz gets restricted..

Does this means that HBM is limiting high boost clocks? Fury did suffered from that.
GP102 with DDR5X controller suddenly can boost the cores to 2Ghz to give us that premium high end GPU.
 
GP100 vs GP102...is the number difference just to denote a HBM and non-HBM memory controller?
Has Nvidia ever produced and maintained 2 separate lines of 'Big' gpu?

Some sites claim there will be a Titan-P based on the GP100 with 16 GB HMB2.
A 1080Ti would be based on a GP102 with 12 GB GDDR5X
 
GP100 vs GP102...is the number difference just to denote a HBM and non-HBM memory controller?
Has Nvidia ever produced and maintained 2 separate lines of 'Big' gpu?.
Yes...
For "Big Kepler" they have two GPUs.
Titan, 780 Ti, Quadro K6000, Tesla K20/40 uses different revisions of GK110
Tesla K80 uses GK210 which doubles register file and shared memory.
 
So, anyway, GTX 1060 looks like the card that RX 480 should have been, as far as day one reviews will play out.

I do wonder whether just over 4 TFLOPS is a cut too far. I expect most review sites will be picking their games carefully to hide the compute shortfall.

We shall see.
 
They're basically trying to piss off RecessionCone.
Google's TPU accelerator works with 8 bit integer operations, not FP16, and are only usable for inference, not training. But they're probably also much higher volume.

Those who can't afford custom silicon can use P100s for training with double rate FP16, and order(s) of magnitude more GP104s with quad rate 8 bit.

I can't speak for RecessionCone, but that doesn't seem to be such a bad trade-off. The presence of quad 8 bit on cheap silicon may be a bigger plus than the lack of double rate FP16.

I expect most review sites will be picking their games carefully to hide the compute shortfall.
Seriously???
 
Stupid question: why does training require more precision?
My very amateur explanation: training is all about back propagation in which an error at the output is ripples back towards the input and is used to make minute changes to the coefficients. You're basically doing a gradient descent.

Coefficients start out with completely random values and are adjusted over millions of iterations.

8 bits are too coarse to do this. But once the parameters are set, the accuracy actually doesn't matter a whole lot.
 
Back
Top