Recent content by Nakai

N
Nvidia Pascal Announcement

Nope. As metioned before, for GP100 NV tried to increase registers and the size of the cache. This comes very handy, as Deep Learning networks could fit onto a single GPU. Each TCP consists of a PolymorphEngine and number of SMs. G80 had two SMs per TCP. GT200 => 3. GF100 switched to a 1:1...
- Nakai
- Post #887
- May 14, 2016
- Forum: Architecture and Products
N
Nvidia Pascal Announcement

Thats very plausible. So there are no 2x8xFP64 units per scheduler, but 16xFP64? Then this makes some sense.
- Nakai
- Post #149
- Apr 6, 2016
- Forum: Architecture and Products
N
Nvidia Pascal Announcement

I hope deep linking on computerbase is allowed. But this is the slide, I was referring to. So how does that concur?
- Nakai
- Post #144
- Apr 6, 2016
- Forum: Architecture and Products
N
Nvidia Pascal Announcement

I really don't get, how NV did FP64 on Pascal. According to some presentation, Pascal can issue a pair of fp16 per clock, a fp32-op per clock and a fp64-op every two clocks. The devblog still states that there are dedicated fp64 units on Pascal. If Pascal has only half the dp64 units, and can...
- Nakai
- Post #140
- Apr 6, 2016
- Forum: Architecture and Products
N
Nvidia Pascal Speculation Thread

I don't want a 232mm² chip with around 1W/mm². :razz:
- Nakai
- Post #775
- Feb 19, 2016
- Forum: Architecture and Products
N
Nvidia Pascal Speculation Thread

Of course, it is possible to run another ANN with a other input size. A CNN usually consists of mutliple different kinds of layers. There so-called Convolutional Layers (CL), Pooling Layers (PL) and Fully-Connected Layers (FL) at the end of the network. I've made a post regarding the structure...
- Nakai
- Post #768
- Feb 18, 2016
- Forum: Architecture and Products
N
Nvidia Pascal Speculation Thread

Neural Networks are always deterministic. The problem is always the quality of the underlying training algorithm and methods. It is necessary to make sure that your training algorithm and training examples covers unlikely inputs and outputs. The most commonly used training algorithm is the...
- Nakai
- Post #756
- Feb 17, 2016
- Forum: Architecture and Products
N
Nvidia Pascal Speculation Thread

There are plenty deep learning and machine learning algorithms for different use cases. Popular approaches are SLAM, HOG and CNNs. SLAM: Location and orientation in 3D space. HOG: Feature distinction. Feature is present or not. CNN: Recognition and differentiation of multiple features...
- Nakai
- Post #705
- Jan 19, 2016
- Forum: Architecture and Products
N
AMD Boltzmann Initiative: C++ Compiler w/CUDA Interop For AMD GPUs

I am very anxious about this matter. AMD could break the GPU HPC market with this. A sober point of view is always appreciated, especially it is Nvidia we are talking about. Their market strength is very permeating, meaning NV will definitely try to overcome this issue with multiple strategies...
- Nakai
- Post #5
- Nov 16, 2015
- Forum: Tools and Software
N
AMD Boltzmann Initiative: C++ Compiler w/CUDA Interop For AMD GPUs

Many people think that AMD hijacked CUDA with that. In my opinion this is exaggerating. This is not a sign that AMD tries to overcome CUDA, although this is a nice side effect. It is more like a sign that dedicated compute APIs like CUDA and OpenCL will come to an end.. ...instead of focusing...
- Nakai
- Post #2
- Nov 16, 2015
- Forum: Tools and Software
N
Nvidia Pascal Speculation Thread

Of course the ops needs to be the same, since these are SIMD units. And of course these kind of splits should be implemented via VLIW-style ops. Does Maxwell have an array of dedicated FP64 units or are these an "enhanced" FP32 arrays, with the possibility to execute FP64 ops (with lower...
- Nakai
- Post #372
- Nov 14, 2015
- Forum: Architecture and Products
N
Nvidia Pascal Speculation Thread

Let's wait...a day of good sleep, will always help. With NV it's always like that, if they don't do some marketing about their next-gen DP-Performance, their next-gen will just have a lack of DP-Performance. I think they will feature enough DP-Performance to be competetive and to provide a...
- Nakai
- Post #365
- Nov 13, 2015
- Forum: Architecture and Products
N
Nvidia Pascal Speculation Thread

Multiple-Precision FPU The problem with Kepler was, that it has limited register bandwidth. The throughput for FP32 ops is 128 FMAs/clk per SM, although one Kepler SM could achieve 192 FMAs/clk at max. https://forum.beyond3d.com/posts/1644206/ The register bandwidth was doubled from Fermi...
- Nakai
- Post #362
- Nov 10, 2015
- Forum: Architecture and Products
N
Nvidia Pascal Speculation Thread

Well, Nvidia already had FP32:FP64 2:1 with Fermi. AMD had it with Hawaii. NV had 3:1 with GK100/110. NV has FP16:FP32 2:1 with GM20B (Tegra X GPU). The most interesting fact is, that Fermi didn't have dedicated FP64 units. For instance, if GP110 has 6144 SPs and only FP32:FP64-ratio of 4:1, it...
- Nakai
- Post #356
- Nov 9, 2015
- Forum: Architecture and Products
N
Nvidia Pascal Speculation Thread

The chance is pretty high, as 16FF and HBM2 are new technologies for NV. If GP100 features over 17 billion transistors, the density (transistors per mm²) will be higher than expected. Because HBM2 delivers smaller PHYs (and NVs MI PHYs were always big), the general density could be higher than...
- Nakai
- Post #351
- Nov 9, 2015
- Forum: Architecture and Products