AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
I am not AT nor affiliated with them, though I know Ryan as well and am in the same kind of business. Let me say this: This year's been extremely busy and hectic with all those "rolling thunder" approaches PR seems to like so much lately. You just cannot focus on a project any more without being constantly being interrupted by some "preview-trailer date announcement" and whatever it is.

If you're in charge, like Ryan is, it's probably even more difficult, because he has to organise everything, doing lots of trips to editor's/tech days, more informal briefing and of course the big IT fairs.

On top of that, at least for us Euro-guys, AMD and Nvidia have not been very open about their latest architectures, clouding everything in fancy words that's just really adding another dimension to vertex handling for example. Or making meaningless comparisons in their press material that gets leaked at some asian site and you have to explain far and wide to your audience why this and that comparison does or does not make sense. It does not help that some people tend to believe everything that's on a slide even if it is against all common sense. And that's true for all IHVs and their respective fanboys.

So don't be mad with Ryan or AT and do not think he/they defocussed their core audiences!

Sorry, just had to interject here.

I'm not mad at anyone, nor do I see any reason to believe so in my post.
I'm simply pointing out what I think is not going so well with my favorite hardware website, and I will continue to do so respectfully, as means of providing an audience feedback.

And while we're there, criticism is not purely about number of articles being released, but also about methodology, focus and timing.
The GTX1080 review + GTX1070 review + Pascal overview took 1.5 months between the 1080's release and AT's article. Then the GTX 1060 took two weeks between release and AT's article.
For Polaris, it's been 2.5 months since the RX480 release and 1 month since the RX470/460's release. All we've got is a 4-page performance preview of the RX480. The RX 480 released 3 weeks before the GTX 1060, yet the later has a month-old full article at AT.


If I was a part of AMD's PR, I wouldn't be very happy with this. Having the people in my company working their asses off to release our GPUs before the competition at their price range and making them available to the press on time, only to see the reviews being postponed indefinitely while the competition releases their cards and gets their reviews before ours.
Not only do I not see the point of that from AT's perspective, I also wonder if in case AMD's PR decides to somehow act on it, they'll be regarded as the bad guys as usual. There's just no possible victory for them, is there?
 
I tried going through the wiki listings to find an exception, but for GCN GPUs I did not find any weren't powers of 2 per engine.

So 4, 8 or 16 then. That's interesting, and might help to explain the what appears to be the shortage of ROPs on Polaris 10.

Compared to the "equally flopped" 390X, CU utilisation is down despite the improvements to the geometry engine. And while there's a lot less bandwidth (480 is 67% of 390X BW) the colour compression is supposed to allow up to 1.4X the effective BW. Taken at face value, that should mean up to 93% of the effective transfer rate of the 390X for none compute workloads.

ROPs, on the other hand are down to 32 from 64, and the modest clock bump means it still has only 60% of the fillrate. 12 ROPs per shader engine would take the 480 back up to around 90% fill, in what might appear to be a good match for the bandwidth and colour compression - at least going by the more highly utilised and much older 390X.

It starts to look like current GDDR5 and core clocks may have created a situation where neither 8 or 16 ROPs per shader engine is optimal, and there isn't the flexibility to have 12.

There's actually something that doesn't seem right about the RX 480 diagram with regards to that.
Traditionally, there are controllers that each manage 2 32-bit channels each, which in the Polaris 10 diagram means there are too many DRAM channels drawn.
The Polaris 11 diagram has the relationship between controllers and channels set as expected.

So probably just a diagram error. If they're lazy like me, perhaps they started out by editing something older and failed to catch that.

The coherent vector memory/texture path goes through the L1-L2 hierarchy, which is separate from the incoherent ROP path.

Cheers. I probably should have worked that out ....

I think that is the case. The Tonga picture made it look like the region next to the CUs was arranged in a similar manner to Tahiti.

While I'm sure that AMD ran the numbers thoroughly, my armchair chip peeping makes me wish they'd kept such an arrangement for Polaris 10, and kept the 64 ROPs of the 390X. This would have allowed them to ship a 384-bit 480 and a 256-bit 470.

The 480 can be several % ahead of the 390X at very low resolutions (so clearly there are improvements in there) but falls increasingly behind at higher ones. With a different ROP and memory arrangement, I think it could clearly exceed the 390X and also the 1060, particularly at higher resolutions (where the 390X already does this).
 
The RX 480 released 3 weeks before the GTX 1060, yet the later has a month-old full article at AT.
AT has to prioritize based on what their readers want to read, given their limited resources.

They wrote up 3 full review articles on Fiji (Fury X, Fury, Fury Nano), but not a single review of GM206, despite the fact that GM206 sold in far greater volume than Fiji.

I think this is because Fiji was a far more interesting story than GM206, given the HBM, water cooling, small FF board.

I wish that AT had more resources to publish more reviews. But the truth is that they can only do so much, especially given the terrible state of online advertising these days.
 
[...]what appears to be the shortage of ROPs on Polaris 10.
One might argue that Polaris 10 should have had 3 shader engines, each with 16 ROPs. 12 CUs per shader engine would have resulted in the same compute. This would have meant less geometry throughput. I imagine it would have been a smaller die.

If AMD was aiming for 32 ROPs, they could have done this with 2 shader engines. But there could only be 32 CUs. That would be a yet smaller die in theory. It would need higher clocks though. And clock speed is current-GCN's enemy.

So that would have even worse geometry throughput. The geometry improvements in Polaris architecture would have been "lost" with only two shader engines and it seems unlikely it would have been "VR Ready". It's unclear to me which is dominant for VR: geometry or rasterisation/ROP.
 
Brings back happy memories of B3D arguments of ROP shortages on R600, RV770 and friends. RV770 was more competitive than Polaris 10 though.
 
HIS RX 480 IceQ X2 Roaring Turbo: A true king of the jungle?
Now let’s get to the point. I’ve successfully overclocked this card to 1430 MHz. It was stable configuration that allowed me to finish all synthetic tests. However, once we got into gaming tests, it wasn’t so stable anymore, as some tests wouldn’t even start. It got to the point when our ASUS motherboard’s circuit protection would shut down whole PC, as too much power was being drained by the GPU. According to GPU-Z the card was consuming 203W of power, that’s really close to 225W we could get through 8pin connector and PCIe slot…
http://videocardz.com/review/his-radeon-rx-480-iceq-x2-roaring-turbo-8gb-review

HIS RX 480 IceQ X2 Roaring Turbo 8GB Video Card Review
http://www.madshrimps.be/articles/a...0-IceQ-X2-Roaring-Turbo-8GB-Video-Card-Review
 
Last edited:
You have to realize that most of the cooling in my case is via passive cooling. The one intake fan I have runs at 500 RPM. And I only have that due to overclocking the CPU a bit (2500k at 4.4 ghz to 4.1 ghz depending on core activity). If I didn't overclock and didn't need a GPU, my system would be fully passively cooled. The only component that wouldn't be affected is the PSU which in this case is located at the bottom of the case and vents the air out of the side of the case. If I had an RV05, I could likely go with a fully passively cooled PSU as it's oriented such that passive cooling is much more feasible, unfortunately, the internal storage options for that case are abysmal.

Anyway, having a GPU dumping heat into my system would require 1 or more of the following.
  1. Increased RPM on the intake fan.
  2. Adding an additional intake fan.
  3. Adding a fan to the CPU cooler.
All of which would increase the overall noise profile of the system at the expensive of a quieter GPU. It's not worth it. While the blower fan on the GPU when gaming contributes a significant amount of noise it's still lower than using an axial fan and having to add additional cooling to the case to deal with that.

Most people are going to have far noisier systems than mine with an overabundance of fans. I've taken the opposite philosophy in designing my system to be as quiet as possible while still being able to do gaming. An axial fan cooling system dumping the heat into the case is counterproductive to my having a quieter system.

Regards,
SB

But when you look at how loudness physically works, you will learn that many sources of little noise are less disturbing than one loud. 4 fans at 1,0 Sone are more silent than one at 2,5 Sone.

For example. Say your RX480 is doing 42db, you PSU 32db and your case fan 30db the sum is: 42,65db

Now we add another 30db case fan, a 32db CPU fan and a GPU Cooler at 38db, the sum is: 40,60db
 
One might argue that Polaris 10 should have had 3 shader engines, each with 16 ROPs. 12 CUs per shader engine would have resulted in the same compute. This would have meant less geometry throughput. I imagine it would have been a smaller die.

Can GCN support none power of two shader engines (I'm thinking of 3dilettante's observations about ROPs in shipped AMD products never being 12 per shader engine)?

Do we have any idea how big a geometry processor or render backed (4 ROPS) is in terms of die area? CUs you can work out if you can get a die shot, but I'm not so hot at identifying other units.

If AMD was aiming for 32 ROPs, they could have done this with 2 shader engines. But there could only be 32 CUs. That would be a yet smaller die in theory. It would need higher clocks though. And clock speed is current-GCN's enemy.

This makes me think of a "super 460". And that has 16 ROPs, compared to the PS4's 32.

Interestingly, the PS4 GPU loses 10~40 GB/s to the CPU depending on what the CPU's doing, according to Sony developer slides. So if we were to (very) crudely compare PS4 and the 460:
- PS4: 136 ~ 166 GB/s, 32 ROPs, fillrate of 25.6 Gpixels/s. Fill / flops: 25.6 / 1.84 = 13.9 pixels per megaflop
- 460: 112 GB/s, 16 ROPs, fillrate of 19.2 Gpixels/s. Fill / flops = 19.2 / 2.2 = 8.7 pixels per megaflop

Three things to note: Sony particularly requested 32 ROPs; devs say while the PS4 can be BW limited for colour, there are situations where it gives a noticeable boost over the 16 ROP (but higher clocked, greater bandwidth) X1, and while the 460 has less memory bandwidth than the PS4, the CPU on PS4 take a good chunk and the 460 has DCC that (supposedly) effectively increases transfers by up to 40%.

In short, I think the 460 could do with more ROPs too, and like the 480, 12 ROPs per shader engine might be a good fit but might not be possible. X1 can be hold back by its ROPs relative to the PS4, and 460 has a worse fill / compute rate than even the X1.
 
A new graphics card is normally fast enough to play all the current games well. It's how it'll play them in a couple of years that's going to matter most. - to me anyway But then again, I say this as a tight arse who's still playing on a 680 aka 770. Should a bought a 290 instead ....

AMD's hardware seems to suit the console space better than the desktop market - at least at the moment. Hardware that is still competitive after 2 (or 4, or 6) years of software advances is a positive there.
 
A new graphics card is normally fast enough to play all the current games well. It's how it'll play them in a couple of years that's going to matter most. - to me anyway But then again, I say this as a tight arse who's still playing on a 680 aka 770. Should a bought a 290 instead ....

AMD's hardware seems to suit the console space better than the desktop market - at least at the moment. Hardware that is still competitive after 2 (or 4, or 6) years of software advances is a positive there.


Well can't see that coming always, any case I upgrade my card every gen so it doesn't matter to me, I do look at features that might be useful in the future (dev stuff), but for my gaming needs, I just want the fastest right now for the games that will be coming out for the life span of that card.
 
Last edited:
That "Vega 20" info comes just a little too close to Global Foundries announcing their plans for 7 nm for me to be comfortable.

They don't appear to have 14 nm nailed yet, not sure when a 7 nm graphics card would be likely to be realistic.
 
That "Vega 20" info comes just a little too close to Global Foundries announcing their plans for 7 nm for me to be comfortable.

They don't appear to have 14 nm nailed yet, not sure when a 7 nm graphics card would be likely to be realistic.

If Vega 10/11 are 14nm at GF and Polaris is 14nm at GF, it makes sense that the next new chip would be 7nm at GF. Imho this might indicate that AMD is no longer doing a real fully fledged refresh line up on an existing process. But went with small chip - big chip for each process. Considering the limited resources it makes sense, but keeping up with NV, who still seems willing to do full refresh generations on each process, will be a challenge, even more so if GF is lacking behind TSMC, or Samsung.
 
Status
Not open for further replies.
Back
Top