AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

The Tech Report conclusion is incorrect. Nvidia had a faster culling rate prior to the tiled rasterizer.
If you look at PcgamesHardware the rsullts are the same like Techreport. At culled Polygoneoutput there is no difference between gp102 and gp104
 

"AMD did not respond to a request for comment prior to the writing and publication of this article."

And this is the part where things are starting to get out of hand. This is not a pre-released leak someone broke an NDA on where a company gets to play coy and do the "we don't comment on unreleased hardware" hand-wave. This is an actual, retail product your customers paid you money for. Once that happens, and they raise issues with what you sold them you don't get to climb up into your tree fort and maintain radio silence. "Why does this do what it does and when/whether is it going to do something different" are a perfectly legitimate questions for the paying retail card owners to demand answers to; if you can afford high-concept glossy marketing videos about changing the world you can probably afford to provide some, you know, customer service to your actual customers.
 
Wait - is that the benchmark, where Vega FE scores 114-ish at gamersnexus? And downclocked to Fury X leel 97-ish and Fury X is at 70-ish? Don't you think something's fishy here?
Hi, first post.
Vega FE in gaming workloads at Fury X clocks seems to perform almost exactly as the Fury X
In specviewperf, GamersNexus showed great improvements in FPS even when downlocked to match Fury X.

There are rumors going around reddit and elsewhere based on statements that NVIDIA made about HBM2 power consumption, that maybe the HBM2 is actually consuming large amounts of power, but it seems that HBM2 power consumption is <40w based on the memory VRM of Vega FE - OnSemi NTMFD4C86N which is only going to do about 25-30 amps @ 1.2-1.3v or <40w.

I am not a programmer but I did get *rekt* on reddit after looking at the Linux source code which obviously doesn't include a lot of raster/render code and making observations on that, after Rys' comments about the "Fiji" meme.
To me, it seems there are some sort of translation tables(?) within the source code for GFX9 where a lot of functions(?) have been renamed while retaining similar and adding new functionality.
 
Hi, first post.
Vega FE in gaming workloads at Fury X clocks seems to perform almost exactly as the Fury X
In specviewperf, GamersNexus showed great improvements in FPS even when downlocked to match Fury X..
Welcome!
And even that Maya score gets handily beaten by a meager GTX 1080, as the THG benchmark Tottentranz referred to shows. Hence: fishy.
 
Last edited:
Graphics and Compute Preemption as reported to DirectX is currently the same for Vega and Polaris: Primitive/DMA-Buffer.
Note that this can (and has been) change(d) with driver revisions.
 
is this where they said it wouldnt be available that week? because, if so, he was talking about computex.
lol I think it might have been. I thought I checked my reference but apparently I didn't. Here's hoping for RX Vega ordering after SIGGRAPH! I wonder if that means reviewers will be getting their cards before or during SIGGRAPH?
 
Ever since they have distributed setup, to be exact (with the "polymorph engine", starting with fermi). (The tiled rasterizer would not help in any case for that.)
FWIW gp102 is a bit of an anomaly as it shows no scaling over gp104 with the culled polygon throughput test (which I think is what techreport must have been using). Since the theoretical culled throughput is nominally simply 1/3 tri per clock per smm, suggesting it hits another limit on gp102.
I suspect a global primitive distributor at the front of the pipe didn't scale. This likely fetches indices and forms primitives. Also, Nvidia has claimed 1/2 a tri per SM for some parts so apparently it can be 1/3 or 1/2.

If you look at PcgamesHardware the rsullts are the same like Techreport. At culled Polygoneoutput there is no difference between gp102 and gp104
I wasn't commenting on the performance results. Only the conclusion about the tiled rasterizer being relevant.
 
I suspect a global primitive distributor at the front of the pipe didn't scale. This likely fetches indices and forms primitives. Also, Nvidia has claimed 1/2 a tri per SM for some parts so apparently it can be 1/3 or 1/2.
Do you remember which ones had a 2-cycle per VTF? I was only aware of one VTF every 3 cycles per SM.
 
Do you remember which ones had a 2-cycle per VTF? I was only aware of one VTF every 3 cycles per SM.
I remember Kepler being 2 cycle and first gen Maxwell (750 Ti) being 3 cycle. It seems to me that 2nd gen Maxwell went back to 2 cycle. Without locking the clocks it's tough to know the clock rate for synthetics, thus it's tough to estimate how many operations are performed per clock.
 
I remember Kepler being 2 cycle and first gen Maxwell (750 Ti) being 3 cycle. It seems to me that 2nd gen Maxwell went back to 2 cycle. Without locking the clocks it's tough to know the clock rate for synthetics, thus it's tough to estimate how many operations are performed per clock.
I always thought Fermi is 1 tri every 4, Kepler 1 tri every 2, and Maxwell/Pascal 1 tri every 3 cycles. That said, I got that from what Damien wrote, and indeed starting with 2nd gen Maxwell the chips seem to exceed that rate. Now the theoretical rate wasn't mentioned in the gtx 980 article, but there was no hint it would be different to first gen maxwell (ok the marketing actually says maxwell 1 is polymorph engine 2.0, same as kepler, which doesn't make much sense, whereas gm2xx is polymorph engine 3.0, but I wouldn't really give those marketing terms any credibility).
In any case, whatever the rate is, the important thing is really the near perfect scaling with SM count (usually, with the exception of gp102 and mostly gm200).
 
Last edited:
I can't, chunks of the framework it uses are licensed and the license I have doesn't let me do that. I'll start a new thread for it nearer the time.

Roger, I've been trying to find it for a few years since I saw it used on TechReport's site but could never find it anywhere so now at least I know why :). Hope to see it in the future! :)
 

The card throttles to both temperature and power, @1600MHz the card reaches 375w power consumption quickly and throttles down to a lower clock to reduce power, which probably means it needs even more power than that to sustain 1600MHz for longer periods. Also increasing the clocks gives some diminishing returns.

375w at 1600 ? Holly...

I wouldn't pay much attention to the power draw, as he said the voltage wasn't working properly and because he had to put it at 50% to get the clocks to stick properly his stock settings are faster and use way less power.

Stock (1440?): 6701 @ 235w

Manual OC (1400): 6650 @ 346w

Thats 111w more for a lower clock and score. So clearly his OC settings are causing the massive power draw, not the card's normal functions.
 
Back
Top