Recent content by constant

C
Nvidia Pascal Announcement

Looks to me like the RnD on average the last 3 years was around 300M / quarter or 1200 M per year. so that would be 3600 M over 3 years development, seems fairly reasonable that the Pascal could suck a good 2-2500 M.
- constant
- Post #1,940
- Jul 30, 2016
- Forum: Architecture and Products
C
Nvidia Pascal Announcement

Thanks Steven, those numbers were about 2x above my expectations aswell. I guess it goes to show how Nvidia has been dominating the HPC accelerator market. So in the whole of 2016, gaming revenue was a total of 2818 M $ and HPC was 339 M $, so that would constitute about 12.02 % of the gaming...
- constant
- Post #1,938
- Jul 30, 2016
- Forum: Architecture and Products
C
Nvidia Pascal Announcement

Well it seems you misunderstood my point: its not enough for Intel to dominate the accelerator HPC market with KNL (which hasnt happened yet), selling a few 100 K units per year is nowhere near enough for them to make a profit on it. The processor doesn't have a day job to pay for its existence...
- constant
- Post #1,929
- Jul 29, 2016
- Forum: Architecture and Products
C
Nvidia Pascal Announcement

FPGA:s have for some time now held a tiny insignificant niche within HPC. HPC is not where the FPGA bread and butter is, it is within tiny embedded Circuits, where one would not even Dream of putting a Power sucking intel x86 CPU... So you see, there is no "day job" (Money maker) for x86 CPU +...
- constant
- Post #1,925
- Jul 28, 2016
- Forum: Architecture and Products
C
Nvidia Pascal Announcement

I sort of doubt that the GP100 throwing on extra FP16/FP64 capabilities constitutes that big of a differentiation on the HW side that the research for HPC features would start weighing down on the company. FP16 is bound to become standard for standard for gaming aswell. Besides, on the GP102...
- constant
- Post #1,922
- Jul 28, 2016
- Forum: Architecture and Products
C
Nvidia Pascal Announcement

Careful there, GPUs in HPC and GPUs in gaming go hand in hand, i.e. the same architectures have been reused over the years with little HW segmentation. Just like inte Xeons are able to reuse functionality from the consumer core i7 components. The point being is that both GPUs in HPC (tesla) and...
- constant
- Post #1,918
- Jul 28, 2016
- Forum: Architecture and Products
C
Nvidia Pascal Announcement

Regarding, the expense of trying to support multiple precisions for things like Deep Learning + general HPC, in Deep Learning even FP16 is no longer relevant, they're able to do it all with INT8 operations (supported in next release of CUDNN). The new vec4 INT8 operations are supported by all...
- constant
- Post #1,883
- Jul 27, 2016
- Forum: Architecture and Products
C
Nvidia Pascal Announcement

Honestly, caliming that FP16 throughput is 1/64th crippled is slightly disingenuous as in practice you would still be able to get full FP32 throughput WITH the benefit of truncated storage. Just do: //load __half2float conversion // Compute as if FP32 __float2half conversion //store
- constant
- Post #1,169
- May 29, 2016
- Forum: Architecture and Products
C
NVIDIA Tegra Architecture

Nvidia seems to be making the most efficient SoC with regards to the GPU (claims 2x performance/watt in benchmarks) but they need to push harder to get their Denver CPUs out aswell. My theory is that while Denver is very efficient with its innovative instruction profiling & caching it pays a...
- constant
- Post #3,403
- Jan 17, 2015
- Forum: Mobile Graphics Architectures and IP
C
NVIDIA Tegra Architecture

Given that it's able to enter the CC4 power saving state in ~150 us (compared to 10s of ms for similar states on other CPUs) it's very likely that it could do that during a major cache stall. CC4 has the advantage of not flushing the caches or the register files.
- constant
- Post #2,825
- Aug 13, 2014
- Forum: Mobile Graphics Architectures and IP
C
NVIDIA Tegra Architecture

Yes the cache will very likely be filled with data for processing after the instruction that was just stalled upon. Very clever indeed. I guess you might call this some sort of prefetching?
- constant
- Post #2,824
- Aug 13, 2014
- Forum: Mobile Graphics Architectures and IP
C
Is everything on one die a good idea?

The future is definitely heterogeneous (aa spelling). The large dGPU will live on but will instead be a hetereogenous system with ARM+GPU oand /or x86+GPU. This is basically already a reality with AMD:s APUs (although they're not really in dGPU form factor... yet). Nvidia has plans to...
- constant
- Post #25
- Jul 22, 2014
- Forum: Architecture and Products
C
Nvidia Pascal Speculation Thread

Agreed. I wonder if the motherboard makers can implement multi-GPU NVLINK connectivity themselves, perhaps without fast main memory access? Also it would be especially feasible on the dual GPU systems where the two chips are connected via a PCIe switch today. There NVLINK really makes...
- constant
- Post #85
- Jul 20, 2014
- Forum: Architecture and Products
C
Nvidia Pascal Speculation Thread

They can't differentiate Gamer and Pro to much because the Professional market alone isn't enough to support the huge RnD costs required. The pro market is only riding piggy back on the gamers, hence only a handful of extrar features will be removed for the gamer versions. There will also be a...
- constant
- Post #79
- Jul 19, 2014
- Forum: Architecture and Products
C
NVIDIA Maxwell Speculation Thread

I also conccur that the 2 MB L2 is helping Maxwell a great deal despite lower bandwidth. However, when you get into high resolution settings on games and serious GPU compute applications you'll still be bound by the bandwidth to RAM, My question is then why people find the rumours...
- constant
- Post #1,769
- Jul 14, 2014
- Forum: Architecture and Products