Nvidia Pascal Announcement

Right, but before you didn't have an extraordinary expensive memory configuration in the Tesla series either. That Titan / 1080Ti is clearly still a consumer product with the GDDR5X memory type. Plus it likely also shares the same shader configuration as the Gp104, feature wise, rather than the improved half precision rate and sported on the GP100. It also only provides only a 1/32 DP to FP ratio this time.

All in all, it looks as if Nvidia decided to ditch the "professional" label originally assigned to the Titan series, and stepped down to market it only as the fastet consumer card, but nothing more.

Guess if AMD wanted to really hurt Nvidia, all they would need to do, is to give Polaris and Vega already the improved half precision rate, and promote the use of it aggressively. Because after all, that DOES mean doubled performance in all aspects when you don't need full SP precision. When I'm not mistaken, then it would allow to keep up with up to ~50% stronger hardware just by the reduction of effective computational cost.

Polaris has an improved FP16 over previous GCN?
Ah that is interesting and needed by them as well IMO.

Regarding the 1080ti leak-rumour, also by breaking the model I mean they have increased the CUDA Cores on the Pascal Titan by 10% over the P100, even the 1080ti according to them is only just behind the P100 in number of cores.
I really cannot see that happening; where is the room for Quadro cards in the tiers as well now as that puts it even closer to competing with some aspects of Tesla.
Cheers
 
Last edited:
i29Oj7y.jpg

I give a 99% fake score since I saw someone doing rough calculations for GP102 on a forum (maybe even here?) a few days/weeks ago and those were the exact values used, including die size. Someone from Chiphell probably saw it and run away with it.
 
Three GP104 variants is also very curious and pointing towards a fake but you never know
 
Three GP104 variants is also very curious and pointing towards a fake but you never know
I thought that was sort of confirmed, the GP104-150 just no-one is sure what specific product model this refers to (assumption-speculation is it will be the 1060 but we have the recent news regarding the Asus laptop accidental leak by them with a GP104 die).
Cheers
 
So basically this generation of Pascal consist of a true Pascal GP100, which has the much hyped "Pascal" features like HBM2/NVLINK/FP16/New SM, whilst the "gaming" gpus like GP102/104 are merely tweaked (Pentium4-ed) Maxwell on a 16nm node.
 
And yet I still don't know who exactly will be selling Pascal at the lower MSRP? It's obvious Nvidia wont wth FE. Did Nvidia force their AiB partners to sell at least one SKU at lower MSRP? If not, I see no sane reason why any partner would sell their custom boards below FE price point.

No really need to sell them, they can just show it on their website.. this will even justify the price of other models .. This said, i have seen a photos of a cheap zotac with a black cooler similar to the "old " reference one, only plastic.

For be honest, im not quite sure it is on the interest of AIB to flood the markets with thoses MSRP gpu's..
 
Last edited:
Regarding that Chiphell chart: The GTX 1070 in the table has the wrong clock speeds. The GTX 1070 actually has a base clock of 1506 MHz and a boost clock of 1683 MHz.

The GTX 1060 specs also seem way too low. Given the variant number GP104-150, I expect specs along the lines of these:
  • 1664-1792 SPs, 1.4-1.5 GHz core clock, 256-bit bus, 7 Gbps memory, or
  • 1920 SPs, 1.3-1.5 GHz core clock, 192-bit bus, 8 Gbps memory [there are rumors of a 192-bit bus].
 
So wait, weren't nvidia saying they were using top of the shelf materials for the FE? And this guy starts by pointing out they cheap out on a 6-phase vrm/mosfet to reduce cost :LOL:

It really is an overpriced reference card, isn't it?
The 980 was only 4-phase (2 gaps) and it looks like the caps/fets are different as well, so NVIDIA can technically say they have improved, cheek to add an extra $100 for FE though when 980ti had some improvements over 980 as well (including very similar blower+vapor chamber) :)
6+2 phase was the 980ti or Titan X for the board.. Cannot remember if it was both or just one *shrug*
Cheers
 
Last edited:
FWIW, the blower fan has (had) the same model number since the original Titan-no-X.
It's a Delta BFB0712HF, rated at 1.8 ampere.

The cooler itself looks like a carbon copy from the design used in 980Ti/TitanX. It has been ever so slightly altered from what was used in the Titan-no-X.
 
So basically this generation of Pascal consist of a true Pascal GP100, which has the much hyped "Pascal" features like HBM2/NVLINK/FP16/New SM, whilst the "gaming" gpus like GP102/104 are merely tweaked (Pentium4-ed) Maxwell on a 16nm node.

Eh, don't buy it.

Do normal users need NVLink? HBM thus far has been underwhelming in terms of performance. Sure, some power savings, but performanse-wise, not much. Etc.
 
Eh, don't buy it.

Do normal users need NVLink? HBM thus far has been underwhelming in terms of performance. Sure, some power savings, but performanse-wise, not much. Etc.
Do you have results with Fiji GPU and GDDR5 somewhere no-one else knows about?
The fact is, we don't know if there would have been notable performance difference, and thus can't make any conclusions on HBM performance. Also worth noting is that slapping HBM2 on low-end GPU for example wouldn't give you anything, the chip needs to be able to harness the bandwidth, too, and we don't know if Fiji had the oomph to actually take serious advantage from HBM
 
we don't know if Fiji had the oomph to actually take serious advantage from HBM
Well, we do know that there were applications which could in fact put the additional memory bandwidth to good use. But that's not helping much when the majority of applications is tuned to be as conservative with memory bandwidth as possible. And we won't see a change of that paradigm unless the ratio of memory size to memory bandwidth has shifted towards the latter one on a broad hardware base.
 
About HBM is true that on consumers we dont have data on how good or bad it is. But I if the prices allow it I'd like to see it on at least mid range GPU for the power efficiency.
 
The irony...
Expecting a similiar slide from AMD at Polaris launch :)

I'm not sure why AMD would be different. It could possibly be "worse", unless there's a change in the measurement method or performance counters.
The 1080's clocks are more erratic, but generally do not deviate as much relative to the average clock and generally have longer stretches of consistent clocks.

The older 780 vs 290x graph is an interesting PR exercise.
The 290X graph looks much twitchier, but the flip side is that it demonstrated the ability to swing clocks by that much so readily. The way the 780 steps up and down might be spun as the result of a longer-latency DVFS scheme, which I think was actually the case.

The 1080 seems to be more reactive than the 780, but not as much as the 290X.


Well, we do know that there were applications which could in fact put the additional memory bandwidth to good use. But that's not helping much when the majority of applications is tuned to be as conservative with memory bandwidth as possible. And we won't see a change of that paradigm unless the ratio of memory size to memory bandwidth has shifted towards the latter one on a broad hardware base.

It would take more than just making software more wasteful of memory bandwidth. The bandwidth scaling behavior with read, write, and copy at different stride lengths shows that Fiji has difficulty distinguishing itself from Hawaii until after the on-die memory subsystem is thrashed. It's not just the the software, the whole architecture strives to amplify external bandwidth and maintain locality with the on-die network, and even with HBM I am skeptical that this situation has reversed.

It probably would have done better if delta color compression hadn't been brought in, but I think that's a win in general that shouldn't take a back seat to a currently niche standard.
 
So basically this generation of Pascal consist of a true Pascal GP100, which has the much hyped "Pascal" features like HBM2/NVLINK/FP16/New SM, whilst the "gaming" gpus like GP102/104 are merely tweaked (Pentium4-ed) Maxwell on a 16nm node.
I've seen these Pentium4 references at other places. They're a bit puzzling and show a profound lack of perspective: the Intel Prescott CPUs ran at close to 4GHz clock speeds on a 180nm process. Pascal is running at 2GHz on a 16nm process, at a time when 2GHz is commonplace for low power mobile phone SOCs.

If anything, what's strange is that it too so long for GPUs to break this, by today's process standards, relatively pedestrian clock speed.
 
I've seen these Pentium4 references at other places. They're a bit puzzling and show a profound lack of perspective: the Intel Prescott CPUs ran at close to 4GHz clock speeds on a 180nm process. Pascal is running at 2GHz on a 16nm process, at a time when 2GHz is commonplace for low power mobile phone SOCs.

If anything, what's strange is that it too so long for GPUs to break this, by today's process standards, relatively pedestrian clock speed.
Small typo: Prescott was a 90 mm design. Willamette was the 180 mm P4, but it didn't run at 4 GHz.
 
I've seen these Pentium4 references at other places. They're a bit puzzling and show a profound lack of perspective: the Intel Prescott CPUs ran at close to 4GHz clock speeds on a 180nm process. Pascal is running at 2GHz on a 16nm process, at a time when 2GHz is commonplace for low power mobile phone SOCs.

If anything, what's strange is that it too so long for GPUs to break this, by today's process standards, relatively pedestrian clock speed.

I saw some controversy on whether Prescott maintained the double-pumped ALUs from the rest of the lineage, although even without it, Northwood would have driven a chunk of the pipeline over 6 GHz.

The GPUs have historically been more than capable of maxing out their power budgets running in the GHz range, and the 1080 is not on the expected W/transistor curve one would get when trying to fully leverage the power improvements of 16nm.
The mobile variants would likely dial clocks back further.

Additionally, I am curious if there is an element of legacy hardware in the fixed-function or command processor sections. The front ends might have been tracking more with where smaller RISC controllers were scaling.
 
Back
Top