2.5x registersGoing from 8 to 15 B transistors only 512 more cores ?
Nope, they are dedicated. And I guess Nvidia will iterate the consumer (desktop/mobile) SKUs into their own compute capability version (6.1 and 6.2) with some additional ISA changes.I don't think they will do that, they tend not to switch architectures over one line of products. Plus there might not be a need to because DP units are the same units doing SP now.....
That's pretty high Turbo clock for the big Pascal -- 1480MHz. I can only imagine how high the smaller consumer SKUs will reach.
2.5x registers
1.7x shared memory
1920 additional FP64 cores
512 additional FP32 cores
+NVlink
+HBM2
rumored GP102 makes sens. GP100 is too much HPC oriented to be viable as a consumer product. It also means that GP100 is the first GPU exclusively dedicated to HPC. At such performance leap with DGX1 on trendy deep learning market, they must have a long queue of customers wanting for this new toy.Hopefully, they drop the DP stuff for the consumer models and add more shaders instead.
Seems rather rash to judge an entire generation's gaming performance based upon the specs of a single chip aimed at the professional market.... but that's just me.So it seems they are using the majority of the new nodes benefit to service the professional market rather than the gaming market. Dissapointing.
So it seems they are using the majority of the new nodes benefit to service the professional market rather than the gaming market. Dissapointing.
The render backend configuration for P100 is probably 128/1024 color/depth samplers, judging from the MC and L2 design.Add to that 32 extra texture units, and no word on ROPs.
It's definitely made for compute first. Kind of first gen Maxwell.
From the dev blog:
So the memory speed would be 1.4 Gbps.Tesla P100 accelerators have four 4-die HBM2 stacks, for a total of 16 GB of memory, and 720 GB/s peak bandwidth
Great news. ALUs are great for marketing, but big+fast register files and LDS (including fast LDS atomics since Maxwell) are more important for actual compute performance.The amount of SMs has been doubled in GP100, it has 2x of registers and 1.5x of shared memory per lane
Clock speed.what is interesting is with a 25% increase in core counts they are getting a 74% increase in SP performance.....
Huh? This has been the case since Maxwell v1.Shared memory no longer just being a slice of L1? About time for that...
Means Pascal might actually allow mixed graphic/compute loads now.
Yep, Kepler design was very inefficient, particularly the shared memory bank organisation, resulting in a record low 32% efficiency. Maxwell improved both the throughput and latency of the shared memory by leaps and bounds on top of the overall SMM re-design.Great news. ALUs are great for marketing, but big+fast register files and LDS (including fast LDS atomics since Maxwell) are more important for actual compute performance.
As game workloads are shifting more to compute shaders, it is good that NVIDIA's focus has also shifted towards compute once again. NVIDIAs graphics frontend has been way ahead AMDs for long time (and still improving), but they haven't managed to beat GCN in compute. Maxwell and Kepler are both great for compute (huge improvements over Kepler).
The numbers are all available in the CFO notes. Tesla sales accounted for ~$100M of revenue last quarter, out of a total of $1.2B. Embedded is something similar. Doesn't come close to GeForce which is at $800M or $900M or so.Well, last time I heard Nvidia get most of their revenue from the embedded and HPC markets.