AMD Vega Hardware Reviews

Sounds like a fair bit of work.
Also, as someone who has been involved in benchmarking since before the inauguration of SPEC, it seems to me as if these practices make valid comparative benchmarking very difficult.
Optimizations do require a fair amount of work though there is a benefit for more than just current games as by understanding how games work it's possible to make more educated improvements to future hardware.

I disagree that app specific optimizations make comparative benchmarking difficult in an absolute sense. It depends on the goal of the benchmark. If the goal is to provide readers of an article with information to guide their purchase decision it's valid. If the goal is to compare how a product runs a specific piece of code then optimizations make this more difficult.

I agree. That's why you need two things IMO:
a) a large variety of games and engines to test with.
b) avoid canned benchmarks and built-in functions as hard as you can.

Where a) just makes single data points pretty much invalid because it is very prone to cherry picking (does anyone really believe that benchmarks do "leak" and no vendor can do anything against it? If so, why is mostly AotS and 3DMark with their auto-upload features?) in one or the other direction or just plain unintentional bias, b) is worse. I've seen cases for both vendors, where built-in benchmarks or purpose-built benchmark versions do tell a sometimes very different story than playing the actual game - and not only a single scene in a game, but serveral. Hell, at some point even using a cracked .exe for a game did invalidate some optimizations and perf was lower.
I think benchmarks do leak and the steps it would take to prevent leaks aren't worth the lost freedom. A lot of cards a distributed around a company and locking down all of the computers slows development. Maybe there's software I'm unaware of that interrupts and says "you're trying upload results to Futuremark. Are you sure you want to do this?" but if the solution is an onerous firewall engineers rebel. Plus, someone could always take a card home unless you have a lot of security.

Some leaks are accidental too as not all engineers are as savvy about how the benchmarks work as they could be.

Unfortunately it's a necessity for IHVs to optimize for canned benchmarks even though fewer review sites use them because OEMs look at things like 3dmark results. My opinion, based on third hand data, is OEMs don't seem to put in the same amount of effort with benchmarking that sites like yours do.
 
Optimizations do require a fair amount of work though there is a benefit for more than just current games as by understanding how games work it's possible to make more educated improvements to future hardware.

I disagree that app specific optimizations make comparative benchmarking difficult in an absolute sense. It depends on the goal of the benchmark. If the goal is to provide readers of an article with information to guide their purchase decision it's valid. If the goal is to compare how a product runs a specific piece of code then optimizations make this more difficult.
I disagree with your disagreement. :)
For the purposes of comparing performance of a piece of code, well that's obviously impossible if the code changes from device to device.
For the purposes of comparing performance on a specific software title, it gets trickier. If the only code I want to run on the card is GTA5 for instance, then the performance on that title is what matters, regardless of whether the code is replaced with something IHV specific. At least as long as the optimizations are global, and not local to the canned benchmark or the typical benchmark setting. (Which Carsten pointed out that he had observed.)
For the purposes of predicting "general performance" the proposition gets more difficult again. Out of my 200 Steam titles, how many got the golden treatment? Seeing an average of commonly benchmarked titles is of little help, because they are guaranteed not to be representative of the general case, neither can I know just how many of the benchmarked titles are affected, nor to what degree.

While I take your point that some benefits goes beyond the specific app, the damn thing is that I suspect that the return on investment is quite good when it comes tweaking benchmark performance. For example, if nVidia can improve the average benchmark performance of the GP106 to be 5% higher than Polaris as opposed to 5% lower, how does that affect the public perception and price points? Or if AMD can move the average benchmark score of Vega from between the GTX1070 and GTX1080, to between GTX1080 and GTX1080Ti, how will that affect the number of cards they can sell, and at what price?
Extrapolating from such minor differences in benchmark scores to any other given other app is a fools errand, but that doesn't stop either editors or consumers from doing so. "Firmly trounced", "Crushes the opposition" and so on, based on single digit percentile differences. A cost/benefit analysis for either nVidia or AMD is likely to give a clear answer when it comes to benchmark app optimizations.

Oh well. I just can't help tilting at that particular windmill.

In this case though, my original concern was that this makes it difficult to evaluate the benchmark scores we'll see from Vega next week even if we limit ourselves to comparing only to AMDs own previous products. The driver team has to produce drivers that produce the correct output even for the corner cases, and performs as well as possible doing so, which is demanding enough. Have they additionally gotten around to tweaking all the pertinent titles? To what extent? Impossible to know. But I couldn't help wondering if this was part of the difference in timing between Vegas professional and gaming products. That they just had to get at least some of the most common benchmark apps looked at for competitive purposes.

I appreciate your input.
I'm much more familiar with these issues when it comes to CPU benchmarking.
 
Why do the games say "rest of the world" and then they list Germany, etc?
Originally, IIRC, the monitor bundle was US+ a couple of other only, Rest of world was Prey+Wolfenstein and the CPU/Mobo rebate and Ze Germans get Sniper Elite 4 instead of Wolfenstein. Maybe AMD does not know that there's a "germany safe" version without swastikas. *shrugs*

Not sure about Austria and Switzerland though, I thought the content in question which would make it illegal in germany was maybe frowned upon, but not illegal there.

Because Germany. Can't have any games involving Nazi's.
Actually, it's about swastikas (Hakenkreuz), not the Nazis themselves, as long as they are the evil guys. You can have Swastikas in pieces of art like movies, but Video games (no matter if rated R or not) are regulated as toys in germany and you cannot have swastikas there. Importing and selling toys with swastikas would be automatically illegal.
 
Last edited:
I think benchmarks do leak and the steps it would take to prevent leaks aren't worth the lost freedom. A lot of cards a distributed around a company and locking down all of the computers slows development. Maybe there's software I'm unaware of that interrupts and says "you're trying upload results to Futuremark. Are you sure you want to do this?" but if the solution is an onerous firewall engineers rebel. Plus, someone could always take a card home unless you have a lot of security.
I disagree. And I am not pointing at in-house benchmarks done by the IHVs themselves and their engineers. They can and are being controlled pretty effectively. But being in a position where you happen to know when most editors get their review samples and the respective driver drops, it's very obvious to see a correllation when the leaks start appearing like the last wave of 3DMark results for example.

There are easy fixes for that. For one, there's only so many reviewers and they get briefeg regularly. I have never heard anyone during those briefings urge reviewers to take care that the auto-submission feature for 3DMark is turned off (for which there is a switch in 3DMark) nor to unplug the LAN cable or disconnect Wi-Fi when running the beloved AotS benchmark. In case of 3DMark specifically, IHVs are already heavily sponsoring and influencing Futuremark and if they wanted, they could easily have them integrate an update, where the auto-submission asks for confirmation before sending the results into a database. Or make Futuremark part of the NDA by giving them device IDs that would not show up in the database before the NDA expires. None of this seems like rocket science to me.

Unfortunately it's a necessity for IHVs to optimize for canned benchmarks even though fewer review sites use them because OEMs look at things like 3dmark results. My opinion, based on third hand data, is OEMs don't seem to put in the same amount of effort with benchmarking that sites like yours do.
Guys making purchase decisions at OEMs really only want an easy, single number by which they can „compare“ respective offerings, I agree. Maybe removing canned benchmarks and purpose-built benchmarks like 3DMark would force them to do their job more carefully … But i was not only talking about 3DMark, but rather the integrated benchmark functions of many games.
 
I disagree with your disagreement. :)
For the purposes of comparing performance of a piece of code, well that's obviously impossible if the code changes from device to device.
For the purposes of comparing performance on a specific software title, it gets trickier. If the only code I want to run on the card is GTA5 for instance, then the performance on that title is what matters, regardless of whether the code is replaced with something IHV specific. At least as long as the optimizations are global, and not local to the canned benchmark or the typical benchmark setting. (Which Carsten pointed out that he had observed.)
For the purposes of predicting "general performance" the proposition gets more difficult again. Out of my 200 Steam titles, how many got the golden treatment? Seeing an average of commonly benchmarked titles is of little help, because they are guaranteed not to be representative of the general case, neither can I know just how many of the benchmarked titles are affected, nor to what degree.

While I take your point that some benefits goes beyond the specific app, the damn thing is that I suspect that the return on investment is quite good when it comes tweaking benchmark performance. For example, if nVidia can improve the average benchmark performance of the GP106 to be 5% higher than Polaris as opposed to 5% lower, how does that affect the public perception and price points? Or if AMD can move the average benchmark score of Vega from between the GTX1070 and GTX1080, to between GTX1080 and GTX1080Ti, how will that affect the number of cards they can sell, and at what price?
Extrapolating from such minor differences in benchmark scores to any other given other app is a fools errand, but that doesn't stop either editors or consumers from doing so. "Firmly trounced", "Crushes the opposition" and so on, based on single digit percentile differences. A cost/benefit analysis for either nVidia or AMD is likely to give a clear answer when it comes to benchmark app optimizations.

Oh well. I just can't help tilting at that particular windmill.

In this case though, my original concern was that this makes it difficult to evaluate the benchmark scores we'll see from Vega next week even if we limit ourselves to comparing only to AMDs own previous products. The driver team has to produce drivers that produce the correct output even for the corner cases, and performs as well as possible doing so, which is demanding enough. Have they additionally gotten around to tweaking all the pertinent titles? To what extent? Impossible to know. But I couldn't help wondering if this was part of the difference in timing between Vegas professional and gaming products. That they just had to get at least some of the most common benchmark apps looked at for competitive purposes.

I appreciate your input.
I'm much more familiar with these issues when it comes to CPU benchmarking.
It's always said that Nvidia has more resources than AMD, but I doubt even they can optimize a new architecture for anywhere close to 200 titles by launch of a new architecture. Therefore, how many of those 200 titles have been optimized for depend on the time frame over which the titles launched and how relevant are optimizations from previous architectures.

If reviewers want to remove the effect of app specific optimizations they likely need to consistently bring new titles into their review suite. Some core titles that readers expect to see should remain constant so there are easy comparison points over time. Titles that are used by many review sites are naturally going to be more of a focus for driver optimizations. As are titles that are big sellers.
 
And I am not pointing at in-house benchmarks done by the IHVs themselves and their engineers.
Then I misunderstood the leaks you were referring to. I wasn't thinking about press leaks. I don't like these leaks as I prefer to wait until launch day to read reviews and I would like to see them stopped, but I'm not a marketing person so what do I know. :D
 
So what are the chances that samples in each of those fancy boxes filled with assorted press ego-boosting trinkets are not hand-picked silicon lottery winners?
 
So what are the chances that samples in each of those fancy boxes filled with assorted press ego-boosting trinkets are not hand-picked silicon lottery winners?
Zero? This is a chip that would be fairly competitive with Maxwell. Unfortunately Pascal exists and Volta will soon enough. AMD must do everything it can to minimize the suck factor.
 
So what are the chances that samples in each of those fancy boxes filled with assorted press ego-boosting trinkets are not hand-picked silicon lottery winners?

Buyer beware if you buy based on oc numbers before the store bought reviews come out.
 
Damn, I was looking to pickup Wolfenstein, not at all interested in Sniper Elite :/

I'm not even aware of any censorship laws in Israel that would cause them to do that, which is what makes it weird.
Could just as easily be a image problem for marketing the bundles even if not illegal.
 
Back
Top