AMD: Volcanic Islands R1100/1200 (8/9 series) Speculation/ Rumour Thread

AnarchX · Sep 2, 2014

Where is HDMI 2.0? You can buy some cheap Samsung 4K TV with it, but no GPU to feed.

Rurouni · Sep 2, 2014

Dave Baumann said:
From a consumer standpoint: higher performance, TrueAudio, better encode/decode, XDMA, latest IP, FreeSync capabilities, etc.

Since I have a Kaveri rig, can TrueAudio be used if I don't use the IGP (as in using a dGPU without TrueAudio)? And I know there is a demo of some sort that use the IGP for compute and dGPU for graphics. Realistically, when do games start to use it like that? Do I need to wait for Mantle to blossom or DX12 for it to happen? I'm thinking of NVIDIA style physx where you could assign a gpu for physx (if you use 2 or more NVIDIA GPU). Otherwise, when I do buy a dGPU, the IGP would be wasted.
And this bandwidth saving compression should be in Kaveri in the first place. It's definitely bottlenecked by RAM speed.

Anyway, I'm hoping that AMD introduce a non TrueAudio variant of dGPU that hopefully could be cheaper because otherwise the TrueAudio on the APU is wasted.

mczak · Sep 2, 2014

Rurouni said:
And this bandwidth saving compression should be in Kaveri in the first place. It's definitely bottlenecked by RAM speed.

That much is true, but well Kaveri is older GCN gen. Carrizo should have it though (and in fact I'm pretty sure just about all the performance improvement there on the graphics side will be due to higher bandwidth efficiency). Though it could mean that unlike Kaveri (at least if not equipped with ddr3-2400) you'd see some actual perf difference again between the 8 CU and 6 CU parts

.

Kaotik · Sep 2, 2014

Rurouni said:
Since I have a Kaveri rig, can TrueAudio be used if I don't use the IGP (as in using a dGPU without TrueAudio)?

I'm not gonna put my head on this one, but IIRC someone at MuroBBS tested this and he could enable TrueAudio with Kaveri while using discrete GPU for graphics

CarstenS · Sep 2, 2014

Kaotik said:
I'm not gonna put my head on this one, but IIRC someone at MuroBBS tested this and he could enable TrueAudio with Kaveri while using discrete GPU for graphics

That would line up with what was promised at the Kaveri Launch.

edit:
WRT to tessellation performance: AMD can influency it heavily by tweaking parameters in their drivers. Between Cat 13.5b2 and now, Tessellation performance of R9 280 (then known as HD 7950 Boost) is down by 50% at exactly the tessellation factor where Tonga excels the most (10). Just look at the curve with different drivers for R9 280 compared to R9 285 and R9 290X with the 285-launch driver.
http://www.pcgameshardware.de/Grafi...80/Tests/AMD-Radeon-R9-285-Test-1134146/4/#a4
Apart from that, Tonga seems like a nice refinement, but it would have made better impact where it not for the ridiculously low priced R9 280 that's still in the market.

My personal guess is that AMD hoped to clear channel, but then felt it had to make its move before the impending release of Nvidias 2nd Maxwell chip.

trinibwoy · Sep 2, 2014

silent_guy said:
Encode/decode: is this still a thing? I've long cancelled my DVD Netflix subscription in favor of streaming. Maybe still important in non-USA regions, but even there, aren't we in the territory of 32x vs. 40x speed CDROM drives?

Don't Netflix, Hulu et al benefit from hardware decode? H.264 has probably been optimized up the wazoo by now though. No points for anything less than H.265 acceleration.

silent_guy · Sep 2, 2014

trinibwoy said:
Don't Netflix, Hulu et al benefit from hardware decode

I assume so, but for real-time playback, 1x decode speed should be enough.

pjbliverpool · Sep 2, 2014

mczak said:
That much is true, but well Kaveri is older GCN gen. Carrizo should have it though (and in fact I'm pretty sure just about all the performance improvement there on the graphics side will be due to higher bandwidth efficiency). Though it could mean that unlike Kaveri (at least if not equipped with ddr3-2400) you'd see some actual perf difference again between the 8 CU and 6 CU parts .

Unfortunately I don't think we'll see it with Carrizo but a 12 CU APU based on this IP + quad channel DDR4 support would be extremely competitive with the Xbox One performance wise.

I'd assume that's well within AMD's technical capabilities.

Malo · Sep 2, 2014

Dave Baumann said:
Would you expect anyone to? 7870 (or equivalent <$300 card) is a different thought process though,

I have a 7870 and the only thing that would make me upgrade from that list would be 3 or 4Gb of VRAM for my eyefinity setup .... oh wait nevermind.

CarstenS · Sep 2, 2014

Damien's piece is online:
http://www.hardware.fr/articles/926-1/amd-radeon-r9-285-tonga-sapphire-dual-x-oc-test.html
Interestingly, single-cycle FP16-blending in the ROPs - or is that just a byproduct of "lossless delta color compression"?

Malo · Sep 2, 2014

Is this anything like 3dfx's 22bit color?

Dave Baumann · Sep 2, 2014

Malo said:
I have a 7870 and the only thing that would make me upgrade from that list would be 3 or 4Gb of VRAM for my eyefinity setup .... oh wait nevermind.

4GB will be available. Its only a change of RAM on the BOM.

mczak · Sep 2, 2014

CarstenS said:
Damien's piece is online:
http://www.hardware.fr/articles/926-1/amd-radeon-r9-285-tonga-sapphire-dual-x-oc-test.html
Interestingly, single-cycle FP16-blending in the ROPs - or is that just a byproduct of "lossless delta color compression"?

I'm pretty sure amd cpus could do single-cycle FP16 blending since just about forever (at least since Southern Islands). Bonaire's result is well over half rate already (ok that's Sea Islands). Pretty interesting choice FWIW as it's sort of the opposite what nvidia does (half rate fp16 ROP blend, full rate fp16 TMU filter - though IIRC nvidia can "combine" channels so if it's just a R16G16 format they can do full rate blend too).
So, this ought to be a result of color compression.

Alexko · Sep 2, 2014

Rurouni said:
Since I have a Kaveri rig, can TrueAudio be used if I don't use the IGP (as in using a dGPU without TrueAudio)? And I know there is a demo of some sort that use the IGP for compute and dGPU for graphics. Realistically, when do games start to use it like that? Do I need to wait for Mantle to blossom or DX12 for it to happen? I'm thinking of NVIDIA style physx where you could assign a gpu for physx (if you use 2 or more NVIDIA GPU). Otherwise, when I do buy a dGPU, the IGP would be wasted.
And this bandwidth saving compression should be in Kaveri in the first place. It's definitely bottlenecked by RAM speed.

Anyway, I'm hoping that AMD introduce a non TrueAudio variant of dGPU that hopefully could be cheaper because otherwise the TrueAudio on the APU is wasted.

TrueAudio will gradually make it into every AMD GPU and probably every APU as well. It's already in Bonaire (which is fairly cheap!) Tonga and Hawaii. It will almost certainly be in Iceland and Fiji too. I don't expect AMD to enable that sort of hybrid feature for TrueAudio, it's just not worth it. By the time they would get it to work, their entire discrete graphics lineup would support it anyway.

fellix · Sep 2, 2014

mczak said:
I'm pretty sure amd cpus could do single-cycle FP16 blending since just about forever (at least since Southern Islands).

Hmm, not really. Tahiti "inherited" the ROPs from Cayman with virtually no upgrades and kept the half-rate FP16 blending.

Ethatron · Sep 2, 2014

Malo said:
Is this anything like 3dfx's 22bit color?

Certainly not. The slide states it's lossless, which means it must have a variable coding to allow fully uncompressible data in a buffer. It's literally the same case as for depth buffers, where a very long bitfield says yes/no if all fixed size block is either compressed (in some specific way) or not. This is assumed to be 8x8 based on patents. The largest continous 2D RT buffer you can use is 8kx8k, so 1M bits, or 128k bytes. This is some memory which is guaranteed available just for this, same for the depth buffer.
The uncompressed blocks are in RAM, the compressed ones are often assumed to be on the chip, either dedicated or nowadays they just might use an allocated block of common cache. The acess pattern of ROPs is very predictable so the stream in/out of compressed tiles from/to cache from/to memory shouldn't be a big deal.
Compressed blocks then have similar structure to lossy block compression (formerly DXT, and smaller 4x4 blocks), just that no loss happens in the encoding. The requirements for the codings of both compression technologies is identical: low power, fixed rate, in-place decompression. The fun comes from finding efficient codings.

mczak · Sep 2, 2014

fellix said:
Hmm, not really. Tahiti "inherited" the ROPs from Cayman with virtually no upgrades and kept the half-rate FP16 blending.

Sure of that? The problem with fp16 blending usually is that it's so heavily bandwidth limited you can't tell either way from benchmarks (for amd SI cards, typically even the fp16 non-blend case is already completely bandwidth limited, and that is true even for cards which have massive bandwidth per ROP, like Tahiti). But even R600 had fp16 full rate blend according to this Rys article - http://www.beyond3d.com/content/reviews/16/10 - granted it also had fp16 full rate texture filtering which was abandoned... I guess if you really wanted to figure out you could downclock the core clock to one fourth or so (while keeping mem clock the same) then you should start to see a difference...
In any case I didn't hear anything that Sea Islands had redesigned ROP blend neither, but Bonaire is definitely above half rate.

mczak · Sep 2, 2014

Ethatron said:
Certainly not. The slide states it's lossless, which means it must have a variable coding to allow fully uncompressible data in a buffer. It's literally the same case as for depth buffers, where a very long bitfield says yes/no if all fixed size block is either compressed (in some specific way) or not. This is assumed to be 8x8 based on patents. The largest continous 2D RT buffer you can use is 8kx8k, so 1M bits, or 128k bytes. This is some memory which is guaranteed available just for this, same for the depth buffer.

That's not quite an accurate description. There's several compression ratios available (since r3xx I think, for depth, but I doubt it's only one per color neither) - so blocks can be either compressed by 1:2, 1:4 and so on (not sure exactly which ratios are available, probably more than these 2), hence you need more bits per block (to identify the compression scheme, 2 bits would be good for just 2 ratios, as you need fast cleared, uncompressed, ratio 1, ratio 2,...)
Also, I don't think this buffer is really loaded as a whole nowadays. For color this would be very problematic as you'd waste _a lot_ of transistors (essentially should be able to hold that information for 8 16kx16k (which is the max size with d3d11, not 8kx8k) color buffers - that is 8MB (with the assumption of 2 bits per block and your 8x8 block assumption which I don't think is quite accurate neither since IIRC nowadays this is really done per "memory block" hence the amount of pixels covered differs depending on the buffer format). Sure you could say you only support it when there's just one color buffer or some such - meaning you miss it when you need that feature the most... Should be more efficient to just hold that information like other data - though this would increase latency in the (hopefully rare) case the block information data itself isn't yet in the cache.

fellix · Sep 3, 2014

mczak said:
Sure of that? The problem with fp16 blending usually is that it's so heavily bandwidth limited you can't tell either way from benchmarks (for amd SI cards, typically even the fp16 non-blend case is already completely bandwidth limited, and that is true even for cards which have massive bandwidth per ROP, like Tahiti). But even R600 had fp16 full rate blend according to this Rys article - http://www.beyond3d.com/content/reviews/16/10 - granted it also had fp16 full rate texture filtering which was abandoned... I guess if you really wanted to figure out you could downclock the core clock to one fourth or so (while keeping mem clock the same) then you should start to see a difference...
In any case I didn't hear anything that Sea Islands had redesigned ROP blend neither, but Bonaire is definitely above half rate.

Indeed, R600 (and RV670) were an exception in this regard, as a "native" 16-bit architecture design, but that wasn't scalable and later with RV770 they backtracked on full-rate FP16 blending and filtering to shift resources for more parallelism.

Tridam · Sep 3, 2014

http://www.hardware.fr/articles/926-24/tonga-vs-tahiti.html

I've just added some extra numbers : Tonga - 28 CU @ 918 MHz - 163.9 Gio/s (256-bit @ 1375 MHz) VS Tahiti - 28 CU @ 918 MHz - 163.9 Gio/s (384-bit @ 917 MHz)

Of course nothing is perfect and 384-bit @ 917 MHz is not exactly the same thing as 256-bit @ 1375 MHz, but it still helps a lot to compare Tonga to Tahiti in a more direct way. Tonga seems to be at its best when a lot of tessellation happens.

AMD: Volcanic Islands R1100/1200 (8/9 series) Speculation/ Rumour Thread

AnarchX

Rurouni

mczak

Kaotik

Drunk Member

CarstenS

Moderator

trinibwoy

Meh

silent_guy

pjbliverpool

B3D Scallywag

Malo

Yak Mechanicum

CarstenS

Moderator

Malo

Yak Mechanicum

Dave Baumann

Gamerscore Wh...

mczak

Alexko

fellix

Ethatron

mczak

mczak

fellix

Tridam

Similar threads

AMD: Volcanic Islands R1100/1200 (8***/9*** series) Speculation/ Rumour Thread

Drunk Member

Moderator

Meh

B3D Scallywag

Yak Mechanicum

Moderator

Yak Mechanicum

Gamerscore Wh...

Similar threads

AMD: Volcanic Islands R1100/1200 (8/9 series) Speculation/ Rumour Thread