AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
b3d-poly.png


Still really behind on polygon throughput.

4.7/4.9 vs the 3.9/2.3 of the Fury + Fury X:


b3dpoly.png

http://techreport.com/review/30281/nvidia-geforce-gtx-1080-graphics-card-reviewed/6

Certainly a large improvement from their last gen but severely lagging behind their competitor. I wonder how it'll scale to their larger GPUs and where the 1060/Ti will be. 5.7-6.2 perhaps?
 
Is there a specific slide or reference to real time operation from AMD? I don't recall the context, or mention of a watchdog timer in the command processor.


That kind of runs against Polaris extending into a decent range of real-time industrial work, where controllers can give cycle, microsecond, or nanosecond time frames.
It's only mentioned on a single slide as a singe bullet point, with the new GCN4 features. Further inquiry yielded the details. And yes, I'm aware what I'm comparing Polaris against, and no, I didn't expect any GPU to be able to provide that.

The MEC related parts (high priority queue, integrated clock) are just a microcode / firmware thing. Most of it was actually even backported to Fiji. Well, the improved scheduling capabilities obviously not.
 
Again, not my point... guess I should spell it out: Given the performance characteristics MS has claimed, one may reasonably concluded the power consumption of such a device will be higher than RX480. This is not really conducive to producing a cool, compact, quiet box....

The power characteristics of Rx480 has little bearing on how much the console version will consume. For example, a GPU similar to Pitcairn/Bonaire was used in PS4/XBO. Yet both consoles feature total system power usage lower than the comparable power usage of just the video card alone on PC. One of the reasons is that the console version is often clocked lower. Another is that a console SOC doesn't need as many redundant chips and circuits as you would with a PC (GPU card + CPU/MB, you can share many resources when both are on the same chip).

Hence, PS4 Neo isn't likely to consume much more power than PS4 despite using Polaris. It's clocked lower than a Rx 480.

Hence, XBO is likely to use a lower clocked Vega design than any shipping product.

If we assume that Rx 480 is clocked past the knee of the power curves which it appears it might be. Then clocking it lower could potentially lower its power consumption significantly. Look at Fury X (clock set beyond the knee of the power curve) power consumption compared to Fury Nano (clock set well before the knee of the power curve) power consumption. Identical ROPs, CUs, etc. The only thing that changed significantly was the clock speed target (more accurate is the power limit target).

What this does mean, however, is that all the people wishfully thinking that Sony is going to significantly increase the shipping clocks for PS4 Neo are going to be disappointed as doing so would likely greatly increase power consumption of the device.

Heck, even the console based Jaguar cores are lower clocked than any shipping non-console Jaguar parts. ~800 MHz for console parts versus 1.0 - 2.2 GHz for non-console parts.

TL: DR - the reduced clockspeed of console based SOCs goes a long way towards drastically reducing power consumption.

Regards,
SB
 
Last edited:

Those are high settings
In feb, at high settings on ashes the 980 ti > 390x. But going to crazy settings drastically reversed the situation in AMDs favor with 390x > 980 ti, by 30%.
Once we drop the quality mode to CRAZY we see something seriously interesting, AMD takes a strong lead. We even see a GTX 980 Ti dropping below the 390X which tells me that Nvidia has some work to do with optimizations or enabling a thing or two, DX12 ASYNC Compute perhaps? In regards to how relevant the CRAZY settings really is for gaming right now, that remains topic of discussion and debate of course.-http://www.guru3d.com/articles-pages/ashes-of-singularity-directx-12-benchmark-ii-review,7.html

It is likely pcgamer used crazy settings, in which case we could see how 480>980ti.

Very odd that crazy benches seemed missing in many reviews.
 
Hence, XBO is likely to use a lower clocked Vega design than any shipping product.
That's going to be a big die... and yeah you can save some power with lower clocks, but...

It's clocked lower than a Rx 480.
Well Rx480 min clock is apparently 910MHz... so not that much...

I mean one can wear whatever shade of glasses he or she pleases, but it is not going to change reality....
 
Cmon Dave, you can't expect a logical comparison like price to determine its value?
Yeah, odd comparison that. What these further don't show is the efficiency of the engines either, which is where a lot of work has gone into the various revs of GCN. I've been experimenting with the CAD cases in SPECviewperf and see that in a number of cases we have very linear performance scaling across the number of geometry engines in the chip, and this is in realworld, complex geometry cases.
 
Heck, even the console based Jaguar cores are lower clocked than any shipping non-console Jaguar parts. ~800 MHz for console parts versus 1.0 - 2.2 GHz for non-console parts.

TL: DR - the reduced clockspeed of console based SOCs goes a long way towards drastically reducing power consumption.

Reduced voltage too..but of course you meant that :)

Slight correction on the Jaguar cores..console parts are clocked at 1.6 Ghz but there are 8 of them. The console GPUs run at ~800 mhz vs ~1 ghz for non-console.
 
So, 32 ROPs appears to be quite a tight constraint on performance in "traditionally rendered" games, as it appears to be at least 35% faster (often a lot more) than 380X, rather than 50%+ faster. The primitive discard accelerator may be helping here, but not enough.

On the bright side, for only 40 GP/s of fillrate, the game performance is pretty impressive.

Regardless, new games that we're going to see more of are going to use more compute and less brute force bullshit with overdrawn triangles.

Conservative rasterisation and other niceties really do seem to be missing. If this card does achieve substantial market penetration, then that's going to hold back the use of these features.

---

Gibbo at OcUK said that they had sold over 700 cards by 6pm. Weirdly despite having loads of 4GB cards to sell and selling lots of them, OcUK has been told that AMD is discontinuing the 4GB cards :confused:
 
Logical comparison will be GTX 1060.
We don't know its price yet, in fact we know next-to-nothing about it. If they choose to release it at say $259 due to performance difference, then again 480 falls short of price segmentation. If it's smaller, faster and has better power profile then Nvidia has definitely succeeded again and may indeed choose to price it higher because they can.
 
I've been experimenting with the CAD cases in SPECviewperf and see that in a number of cases we have very linear performance scaling across the number of geometry engines in the chip, and this is in realworld, complex geometry cases.
And here we have AMD in a nutshell over the last several years.... failure to capitalize. If they scale very well and there are real-world gains to be had, THEN DOUBLE DOWN ON THEM. Tilted beyond belief.
 
So, 32 ROPs appears to be quite a tight constraint on performance in "traditionally rendered" games, as it appears to be at least 35% faster (often a lot more) than 380X, rather than 50%+ faster. The primitive discard accelerator may be helping here, but not enough.

On the bright side, for only 40 GP/s of fillrate, the game performance is pretty impressive.

Regardless, new games that we're going to see more of are going to use more compute and less brute force bullshit with overdrawn triangles.

Conservative rasterisation and other niceties really do seem to be missing. If this card does achieve substantial market penetration, then that's going to hold back the use of these features.

This is why I'm waiting for the RX470 review. If it has the same number of ROPs but lower CU's..difference in actual game performance vs theoretical ALU performance will be interesting.
Gibbo at OcUK said that they had sold over 700 cards by 6pm. Weirdly despite having loads of 4GB cards to sell and selling lots of them, OcUK has been told that AMD is discontinuing the 4GB cards :confused:

Well..you can bet the difference in BoM costs for 4GB of GDDR5 is not close to $40..so naturally they'd be promoting the 8 GB cards.
 
So, 32 ROPs appears to be quite a tight constraint on performance in "traditionally rendered" games, as it appears to be at least 35% faster (often a lot more) than 380X, rather than 50%+ faster. The primitive discard accelerator may be helping here, but not enough.
Also maybe an L2 bandwidth shortfall? I'm wondering if the architectural diagram is in error regarding the number of MC blocks, since a similar one for Polaris 11 has the expected number.

Raster/ROP throughput, number of L2 slices, and raw bandwidth seem to be the only notable areas where it shouldn't best the 390X.

Regardless, new games that we're going to see more of are going to use more compute and less brute force bullshit with overdrawn triangles.
Hopefully by that time, memory compression isn't solely limited to the color pipeline.

Conservative rasterisation and other niceties really do seem to be missing. If this card does achieve substantial market penetration, then that's going to hold back the use of these features.
That might have meant a driver/benchmark fight where conservative rasterization was pitted against the discard accellerator, assuming a feature tier that doesn't discard degenerate triangles.
 
So, 32 ROPs appears to be quite a tight constraint on performance in "traditionally rendered" games, as it appears to be at least 35% faster (often a lot more) than 380X, rather than 50%+ faster. The primitive discard accelerator may be helping here, but not enough.

On the bright side, for only 40 GP/s of fillrate, the game performance is pretty impressive.

Regardless, new games that we're going to see more of are going to use more compute and less brute force bullshit with overdrawn triangles.

Conservative rasterisation and other niceties really do seem to be missing. If this card does achieve substantial market penetration, then that's going to hold back the use of these features.

---

Gibbo at OcUK said that they had sold over 700 cards by 6pm. Weirdly despite having loads of 4GB cards to sell and selling lots of them, OcUK has been told that AMD is discontinuing the 4GB cards :confused:

Can only find the 8GB version of the card at one of the biggest webshops here locally. No sign of the 4GB whatsoever.
 
Yeah with you on this.
Also KitGuru, hardware.fr, Techspot, TomsHardware has the 480 below or close to the 980, and not challenging the 980ti - strange results from pcgamer.
In all those reviews the 980 is close to but just below a 390, and for some reason the 480 is slower than the 390.
I could also mention HardwareCanuck but they use PresentMon not the internal benchmark, although still similar positions.
This is at 1080p rather than 1440p.

Cheers
 
Status
Not open for further replies.
Back
Top