AMD: Speculation, Rumors, and Discussion (Archive)

Alexko · Jul 8, 2015

gamervivek said:
http://www.kitguru.net/components/c...y-cancels-20nm-chips-takes-33-million-charge/

With no new gpu architecture on the horizon, I hope they are quicker to market by a substantial amount to make up for their losses.

A boost to clockspeeds will be a godsend for GCN versus maxwell.

I sure hope they're working on a new architecture for next year, not just a shrink, or they're as good as dead.

Esrever · Jul 8, 2015

If they hit the 2x per/watt of fiji, they will be fine. They don't really need a revolutionary architecture, just GCN 1.4 or something with more power saving tools and maybe more cache on 14/16nm.

If they could somehow hit 2x per/watt of the nano, then it would be pretty insane.

Razor1 · Jul 8, 2015

hmm no architecturally something has to change, nV has the same advantage as AMD going to 16/14 nm, so can't just base performance per watt on the node change.

Alexko · Jul 8, 2015

Razor1 said:
hmm no architecturally something has to change, nV has the same advantage as AMD going to 16/14 nm, so can't just base performance per watt on the node change.

Right, and they'd need double-rate FP16 to be competitive in deep learning applications and future DX12 games designed with FP16, plus support for Feature Level 12.1.

Beyond this, power-efficiency, power-efficiency, and then some more power-efficiency.

silent_guy · Jul 8, 2015

Esrever said:
If they hit the 2x per/watt of fiji, they will be fine. They don't really need a revolutionary architecture, just GCN 1.4 or something with more power saving tools and maybe more cache on 14/16nm.

I don't think the Nano is the result of binning. Much more likely that they've done a serious voltage reduction and thus clock reduction as well. That's trading off perf/mm2 for perf/W. If they want to be at par again, they need both.

Alessio1989 · Jul 8, 2015

Razor1 said:
28 nm they already showed off the card

My bad, I was not able to edit the post when I found the typo :\ It still remains a mass-destruction heat weapon.

Dave Baumann · Jul 8, 2015

Alexko said:
Right, and they'd need double-rate FP16 to be competitive in deep learning applications and future DX12 games designed with FP16, plus support for Feature Level 12.1.

At the moment NVIDIA are trailing AMD with high performance FP16 support on consumer or professional PC graphics...

Razor1 · Jul 8, 2015

my bad fp, yeah

Alexko · Jul 8, 2015

Dave Baumann said:
At the moment NVIDIA are trailing AMD with high performance FP16 support on consumer or professional PC graphics...

As far as I know (and please correct me if I'm wrong) Tonga and Fiji support FP16 in 16-bit registers, but they can't compute twice as many operations on FP16 operands as they can on FP32 ones, i.e. they have the same peak power in both cases, although the lower register pressure with FP16 can lead to higher effective performance.

Meanwhile, Tegra X1 has double-rate FP16 support (wherever that chip may be) and more importantly, NVIDIA will bring that to Pascal. So it seems to me that AMD ought to have it too.

Alessio1989 · Jul 8, 2015

Half-precision (FP16) and lower precision (FP10) support is completely USELESS on PC gaming, it is mean to be used on mobile to reduce power and improve battery life. DirectX graphics also do NOT require a true FP16/FP10 support, ie: truncation could happen on a FP32 computation to get a FP16 value. FP16/FP10 can be useful on PC for development purpose only. Current PC GPUs with FP16 support should just use a simple FP32 computation and then truncate the mantissa.

Novum · Jul 9, 2015

This is probably wrong. Even if the ALU throughput is the same, you still have less register pressure and therefore a higher compute unit occupancy.

But one could also imagine a higher throughput with FP16 in some GPUs in the future even on the desktop. Some circuitry scales non-linear with the value bit size, so FP16 units take even less than half of the space of a FP32 unit.

I don't want to go into details, but for the products that I worked on, we would have been happy to invest some time for FP16, if that meant less register usage. And even more so if it would have meant higher throughput. That would have been a really low hanging fruit compared to some other optimizations.

Alessio1989 · Jul 9, 2015

Half space with half precision. Just because on FP32 you do not usually consider all the 23 bits of the mantissa, it doesn't mean that all those bits are useless since they will help to produce a more precise results on computations. With a true FP16 hardware support a sensible precision error can occur more frequently.
Personally, I would be more interesting to see what hardware now still do not support FP64 on D3D12 since with the last driver I installed, even my Surface Pro 3 now support double precision floats... A good FP64 support on IGP could be ideal for a physics GPGPU acceleration...
It will also require a complete GPU re-design from scratch. Another more interesting format to support also are integer, actually some GPU still use FP32 and FP64 to support 24 and 32 bit integers...

Novum · Jul 9, 2015

I can assure you that there are lots of computations in real-time 3D graphics that would be just fine in FP16 without any noticeable quality loss. And this is unlikely to ever change.

Alessio1989 · Jul 9, 2015

How would true FP16 support impact the rest of the GPU design? I mean, how would FP16 hardware support impact on FP32 and FP64 and integers performance. Also, current consumer GPUs share like 99% (maybe something lil less?) of the workstation and server GPU design. I don't know how much useful are FP16 on those GPUs.
Finally, I still dream on 10-bit per colour channel screen on consumer monitors (ie 10-bit per colour channel textures, ie no more r8g8b8a8 on RT..), but that's still a dream.. ): Still to much panels are 6-bit + dithering instead of "true" 8-bit, 10-bit deep share is still far away.

mczak · Jul 9, 2015

Alessio1989 said:
How would true FP16 support impact the rest of the GPU design? I mean, how would FP16 hardware support impact on FP32 and FP64 and integers performance. Also, current consumer GPUs share like 99% (maybe something lil less?) of the workstation and server GPU design. I don't know how much useful are FP16 on those GPUs.

At least the way nvidia seems to implement it, there's very little impact on the rest of the GPU design. All the registers used are still 32bit, they just hold 2xfp16 values, so outside the ALUs (which process these 2 values simd-style) there's no changes needed at all. I don't know useful fp16 would be in other markets, but presumably it doesn't cost all that much area, so it shouldn't be a big deal.

Alessio1989 · Jul 9, 2015

mczak said:
At least the way nvidia seems to implement it, there's very little impact on the rest of the GPU design. All the registers used are still 32bit, they just hold 2xfp16 values, so outside the ALUs (which process these 2 values simd-style) there's no changes needed at all. I don't know useful fp16 would be in other markets, but presumably it doesn't cost all that much area, so it shouldn't be a big deal.

So, SIMD vector-uint will be still 32-bit used as 2x16 instead of "true" 16-bit hardware/register units? Why not apply this design on FP64 too, ie 64-bit register only used to hold 1x64 bit, 2x32 bit or 4x16 values? (probably because they will no more sell Tesla GPUs? DX )

mczak · Jul 9, 2015

Alessio1989 said:
So, SIMD vector-uint will be still 32-bit used as 2x16 instead of "true" 16-bit hardware/register units? Why not apply this design on FP64 too, ie 64-bit register only used to hold 1x64 bit, 2x32 bit or 4x16 values? (probably because they will no more sell Tesla GPUs? DX )

Because 32bit is what you mostly want, you don't want to design your architecture around FP64 which you'll hardly ever need even (for consumer gpus). And there's a lot of value in having a scalar design at that level - you just take the inefficiencies this "mild" 2x simd gets you for fp16 precisely because it's still the same 32bit reg based architecture. That should get you some/most of the benefits a "pure" fp16 scalar based design would without having to invest heavily in more complex register fetch/store, requiring twice the instruction throughput etc. which would be totally wasted when not using fp16 (note though I don't actually know if Pascal implements it the same way but I'd assume so).

Alexko · Jul 9, 2015

Alessio1989 said:
So, SIMD vector-uint will be still 32-bit used as 2x16 instead of "true" 16-bit hardware/register units? Why not apply this design on FP64 too, ie 64-bit register only used to hold 1x64 bit, 2x32 bit or 4x16 values? (probably because they will no more sell Tesla GPUs? DX )

I far as I understand, this is pretty much how Hawaii works, which is fine for a dual-purpose GPU (FirePro/Radeon) but wasteful for a purely gaming-oriented design, such as Fiji.

silent_guy · Jul 9, 2015

Alessio1989 said:
Why not apply this design on FP64 too, ie 64-bit register only used to hold 1x64 bit, 2x32 bit or 4x16 values? (probably because they will no more sell Tesla GPUs? DX )

Probably because you loose quite a bit of flexibility in terms of register usage: you can only achieve 2 ops in parallel if both halves of the double wide register are used. If not, half of the data that you fetched goes to waste. Your compiler would need to be much better at scheduling things exactly right.

CarstenS · Jul 9, 2015

I'm not sure if that's also what your posting implied silent_guy but I don't think current architectures can fetch an additional half-register (16 bits) to fully populate a 32-bit-register that's already half used with 16 bit data. So you'd have to re-fetch everything with would negate the power saving effect of 16 bit usage. Without any hard data I can imagine that being able to do that (fetching and populating 16 bit portions of registers independently) your data paths and control would be quite a bit more complex.

AMD: Speculation, Rumors, and Discussion (Archive)

Alexko

Esrever

Razor1

Alexko

silent_guy

Alessio1989

Dave Baumann

Gamerscore Wh...

Razor1

Alexko

Alessio1989

Novum

Alessio1989

Novum

Alessio1989

mczak

Alessio1989

mczak

Alexko

silent_guy

CarstenS

Moderator

Similar threads