Are actually teraflops the correct metric to measure GPU compute power on consoles?

Cyan

orange
Legend
Supporter
I remember ERP's words on efficiency and the fact that he said Teraflops aren't the only performance metric to measure raw power. I've been discussing about this with a fellow forumer today and I drew the same conclusion as the aforementioned developer. :smile:

So after all this time thinking the matter over, I did my research and my procrastinator side, the techie one, came up with this.

I guess if that the rumours are to be believed, the problems in the yields could be caused by the speed of the eSRAM, I mean, the 1,600MHz cache clock of this memory inside the Xbox One --this post was meant to be in the eSRAM thread, since I began to type these words there, but I decided it would be a better idea for it to have its own thread.

Now back on the efficiency thing and how Teraflops can be irrelevant in some cases, this is what I've found.

As we already know, Xbox One's GPU is basically a Bonaire 7790 HD with a few tweaks, or either Bonaire is Xbox One's GPU with some touches here and there. :smile2:

Bonaire 7790 HD has truly amazing efficiency without crippling compute. xD

Knowing that Bonaire 7790 HD is a major indication of what's inside the Xbox One, after doing some research I found some tests and articles comparing the Bonaire to other GPUs, which are allegedly more powerful on paper. All theory it seems!

For instance, I LOVE how the 7790 HD Bonaire is about as powerful as the AMD 6870, and even more powerful sometimes, but consumes 60% less power. :smile2:

So what do we have here?

Yes, the Xbox One connection. :smile2:

Here is a quote from an article on the AMD 6870.

http://www.pcmag.com/article2/0,2817,2371271,00.asp

The card marshals two teraflops of compute power, a 900-MHz core clock, and 1,120 stream processors. As far as its frame buffer, it's got 1GB of GDDR5 memory, and a 256-bit memory path operating at 4.2 Gbps.

What does this lead us to conclude?

The conclusion is that a more expensive, more powerful (theoretically) graphics card -the 6870- than the Bonaire 7790 HD, doesn't perform quite as well as the later in games.

But first goes the theory.

Compared to the AMD 6870, the 7790 HD Bonaire consumes about 60% less energy, it has half the ROPs -16 vs 32-, it features 896 sp vs 1152 sp on the 6870 HD side-, it has a reduced 128-bit memory bus vs the 256-bits memory bus of the 6870, and its max bandwidth is 96GB/s vs 134GB/s of the 6870.

What's the mystery then? I guess it's all in the efficiency of the new design and that the 7790 HD based upon a console chip, most probably.

Hence the 7790 HD can run rings round the AMD 6870.

Here you can find some benchmarks that show how it does fare in comparison to other GPUs out there:

http://www.neogaf.com/forum/showthread.php?t=527976

You might wonder what's the special sauce then. Well, being the basis of the Xbox One GPU, the secret sauce is that it combines the greater geometry processing power (2 primitives per clock) of the 7900 -Tahiti- and 7800 -Pitcairn- cores with a reduced 128 bits memory bus.

It was the first time ever someone had tried something like this.

The leaked Xbox Durango development kit backs this up - Durango is apparently able to issue two primitives per clock (like the Tahiti and Pitcairn cores) but only has a 128-bit memory bus. Until the HD 7790 was announced, no such combination existed.

http://www.expertreviews.co.uk/graphics-cards/1298821/amd-launches-radeon-hd-7790-architecture-links-it-to-xbox-720-gpu

This is the theory, but here we have the results, the fruits of AMD's labour on this, and how efficiency and smart console-like design defeats a power hungry beast. Here is a performance summary:

dnlws0.png


hts1ua.png


http://gpuboss.com/gpus/Radeon-HD-7790-vs-Radeon-HD-6870

In real terms this means that the paper specs say nothing if we don't back them up with actual results, and that, imho, teraflops alone aren't indeed the only indication about how capable the hardware is.
 
Are teraflops the correct metric to measure GPU compute power on consoles?

In general they aren't the best measure, but when both consoles being compared use almost identical GCN 1.x GPU building blocks then comparing teraflops can give us a reasonable idea of their relevant compute power, provided there are no severe bottlenecks or overheads elsewhere in the system (CPU, memory, virtualisation, etc.).
 
I remember ERP's words on efficiency and the fact that he said Teraflops aren't the only performance metric to measure raw power...
Mod: No need to quote the whole thing for a one line reply.

And what of the PS4 GPU?
 
Radeon HD 6870 is a VLIW4-based architecture (AMD Terascale 2), and Radeon HD 7790 is a scalar vector architecture (AMD Graphics Core Next, GCN). GCN also was the first AMD architecture with proper general purpose cache hierarchy.

FLOP/s:
VLIW5 has around 3.5/5 lane occupancy for well optimized core (70% efficiency). VLIW4 efficiency is slightly better, lets say it is roughly 3/4 (75%). GCN on the other hand is a scalar engine, and has no efficiency lost because of VLIW execution. HD 6870 is rated at 2016 peak GFLOP/s. Multiply that by the approximated VLIW efficiency, and you get 1512 GFLOP/s. In comparison the 7790 scalar engine peak is 1790 GFLOP/s. This means that the new chip should generally be able to execute more arithmetic instructions per second.

Memory BW:
134.4 GB/s (6870) vs 96 GB/s (7790). That is a 28.6% deficit from the old model. It's not a huge amount, but still a noticeable difference for two pieces of hardware that have equal gaming performance. AMD improved cache architecture radically for GCN. L1 caches now are general purpose, and contain compressed texture data (more efficient cache usage). A big (768 KB in Tahiti) general purpose L2 cache was also added. These improvements reduce the bandwidth usage of GCN architecture. It's hard to estimate how much these improvements help in general, as each game has different memory access pattern. However I think these two cards should perform pretty close to each other in bandwidth limited cases.

Fill rate:
The older HD 6870 card has almost twice the theoretical fill rate. That's a big difference in theory, but in practice most shaders in recent games are not fill bound. Extra fill rate only helps if the shader is fill bound, otherwise it gives exactly zero benefit. Complex shaders are never fill bound, simple shaders are. The most common case that requires lots of fill rate is shadow map rendering (no pixel shader at all, or very simple linear depth shader). Particle rendering also tends to have simple pixel shaders and lots of overdraw. Huge fill rate mainly helps with these parts of scene rendering process, and gives little or no advantage at all for other parts of the rendering. It's a general misconception that resolution increase requires just more fill rate. If your shader is ALU/TEX bound, increasing the resolution will not make it any more fill bound (as there will be exactly the same amount of extra ALU/TEX instructions than ROP processing, because every pixel requires a separate pixel shader invocation, and those scale linearly to the pixel count). Increased shadow map resolution on the other hand requires almost pure extra fill rate.

In general you could say that old games tend to be more fill rate (ROP) bound. This is because these games use simpler pixel shaders. Thus 6870 would be better for older games, and 7790 would be better suited for future games.

Another thing that balances 7790 ROP deficit is full fill rate support for 64 bit per pixel formats (4x16 bit float and 4x16 bit integer). 6000 series ROPs required two cycles to output to 64 bit per pixel formats. Thus in HDR rendering (4x16 bit floats) and in g-buffer rendering the 7790 matches the 6970 in pure fill rate. See tip 6 in this optimization guide regarding to packing g-buffer data to 4x16 bit integer formats: http://developer.amd.com/wordpress/media/2013/05/GCNPerformanceTweets.pdf.
 
Hence the 7790 HD can run rings round the AMD 6870.
And to back that statement up you show a graph called gpuboss review that shows that not to be the case
(about 5% difference) ????
 
Time will tell if AMD start to implement some form of RAM into their apus, like Haswell (i know that´s not strictly the case)

Or how close are multiplats among X1/PS4
 
It's pretty "late" here where I live so I feel a bit tired, but I will try to type something to you good people.

In general they aren't the best measure, but when both consoles being compared use almost identical GCN 1.x GPU building blocks then comparing teraflops can give us a reasonable idea of their relevant compute power, provided there are no severe bottlenecks or overheads elsewhere in the system (CPU, memory, virtualisation, etc.).
Well, in this case this is probably the best approach -especially after reading sebbbi's post-, although architectural differences seem to play a role, taking into account they exist.

Those differentiating factors you mention -bottlenecks, overheads, latency differences, etc- are also important and I can't wait for nextnext gen Digital Foundry's and the like face-off articles.

And what of the PS4 GPU?
The findings in the VGLeaks article and the articles in the media suggest that the PS4 GPU is some hybrid between the AMD 7850 and AMD 7870, when it comes to specifications.

What puzzles me the most is that Sony know they have the superior specs and they didn't reveal them.

To tell you the truth, I think console makers aren't as open about those things as they used to be in the previous generation golden days.

Additionally, I have always been a Xbox user, mostly, so that's why I focused on the 7790 HD Bonaire and because the specs doesn't favour the Xbox One when it comes to flops and I thought it was an interesting point to consider.

Radeon HD 6870 is a VLIW4-based architecture (AMD Terascale 2), and Radeon HD 7790 is a scalar vector architecture (AMD Graphics Core Next, GCN). GCN also was the first AMD architecture with proper general purpose cache hierarchy.

FLOP/s:
VLIW5 has around 3.5/5 lane occupancy for well optimized core (70% efficiency). VLIW4 efficiency is slightly better, lets say it is roughly 3/4 (75%). GCN on the other hand is a scalar engine, and has no efficiency lost because of VLIW execution. HD 6870 is rated at 2016 peak GFLOP/s. Multiply that by the approximated VLIW efficiency, and you get 1512 GFLOP/s. In comparison the 7790 scalar engine peak is 1790 GFLOP/s. This means that the new chip should generally be able to execute more arithmetic instructions per second.

Memory BW:
134.4 GB/s (6870) vs 96 GB/s (7790). That is a 28.6% deficit from the old model. It's not a huge amount, but still a noticeable difference for two pieces of hardware that have equal gaming performance. AMD improved cache architecture radically for GCN. L1 caches now are general purpose, and contain compressed texture data (more efficient cache usage). A big (768 KB in Tahiti) general purpose L2 cache was also added. These improvements reduce the bandwidth usage of GCN architecture. It's hard to estimate how much these improvements help in general, as each game has different memory access pattern. However I think these two cards should perform pretty close to each other in bandwidth limited cases.

Fill rate:
The older HD 6870 card has almost twice the theoretical fill rate. That's a big difference in theory, but in practice most shaders in recent games are not fill bound. Extra fill rate only helps if the shader is fill bound, otherwise it gives exactly zero benefit. Complex shaders are never fill bound, simple shaders are. The most common case that requires lots of fill rate is shadow map rendering (no pixel shader at all, or very simple linear depth shader). Particle rendering also tends to have simple pixel shaders and lots of overdraw. Huge fill rate mainly helps with these parts of scene rendering process, and gives little or no advantage at all for other parts of the rendering. It's a general misconception that resolution increase requires just more fill rate. If your shader is ALU/TEX bound, increasing the resolution will not make it any more fill bound (as there will be exactly the same amount of extra ALU/TEX instructions than ROP processing, because every pixel requires a separate pixel shader invocation, and those scale linearly to the pixel count). Increased shadow map resolution on the other hand requires almost pure extra fill rate.

In general you could say that old games tend to be more fill rate (ROP) bound. This is because these games use simpler pixel shaders. Thus 6870 would be better for older games, and 7790 would be better suited for future games.

Another thing that balances 7790 ROP deficit is full fill rate support for 64 bit per pixel formats (4x16 bit float and 4x16 bit integer). 6000 series ROPs required two cycles to output to 64 bit per pixel formats. Thus in HDR rendering (4x16 bit floats) and in g-buffer rendering the 7790 matches the 6970 in pure fill rate. See tip 6 in this optimization guide regarding to packing g-buffer data to 4x16 bit integer formats: [URL="http://developer.amd.com/wordpress/media/2013/05/GCNPerformanceTweets.pdf."]http://developer.amd.com/wordpress/media/2013/05/GCNPerformanceTweets.pdf.[/URL]
What can I say sebbbi, I am eternally grateful for having you here. I am so happy you replied to me in such a professional and educational manner, I couldn't ask for more.

Thanks for your openness, for sharing and for being so awesome. I always learn something new when I read one of your posts and it makes me have the feet on earth. :D

Console gamers and our discussions are a bit peculiar.

When the inevitable graphics comparisons appear after a game comes out and when we start counting the seconds for the Digital Foundry or Lens of Truth article to appear, we are as happy as a lark after they tell us that our favourite console runs the game at 26fps on average , lulz, compared to the 24 fps of the rival version :eek:, or if our version can pull off a couple of extra weeds, and the moustache of the main character shows 3 more hairs, all of them in greater detail.

To make a long story short, as usual, infinite THANKS!!!!!!!!


And to back that statement up you show a graph called gpuboss review that shows that not to be the case
(about 5% difference) ????
Well, yeah, I chose that article because I liked how they showed the differences not only using specs numbers on paper and leaving them at that but also transforming the raw numbers in percentages. But I understand what you mean.

Thankfully a fellow forumer provided you with a more detailed link on the actual difference, which is striking tbh.


Time will tell if AMD start to implement some form of RAM into their apus, like Haswell (i know that´s not strictly the case)

Or how close are multiplats among X1/PS4
It's hard for the specs on consoles to not want to "distance" themselves from each other, people usually like to have a capable machine, and we shall see when they come face to face with real life examples of games.

Those games are out there -just wait a few months-, they are real, and they annoy me because I want to play them already.

It's a bad benchmark. Probably CPU limited. The difference is rather larger on Anandtech benches: [URL]http://anandtech.com/bench/Product/780?vs=776[/URL]
Thanks for the link tunafish, it saved me some valuable minutes, especially now that I can't take any more news because of the E3, there are too much news! :oops:

The 6870 is actually VLIW5. This changes your numbers, but not your point.
Yup, sebbbi's point is pretty clear. I am actually surprised by the actual results thouth. 2Tflops that perform actually like 1,5Tflops in proper performance terms. I thought such a thing didn't exist in GPUs, just in the case of CPUs and their IPCs.

Cheers.

-Cose
 
Last edited by a moderator:
The 6870 is actually VLIW5. This changes your numbers, but not your point.
Oh yeah... AMDs silly naming conventions :). 6870 was based on 5000 series VLIW5 architecture (not the new VLIW4 architecture they had in 6900 series). Not a biggie, just decrease another 5%-10% of ALU efficiency from the VLIW4 figures.
I am actually surprised by the actual results thouth. 2Tflops that perform actually like 1,5Tflops in proper performance terms. I thought such a thing didn't exist in GPUs, just in the case of CPUs and their IPCs.
That isn't a new discussion. ATI vs Nvidia FLOP/s efficiency has been long debated in reviews and forums. This was discussed in 5800 (VLIW5) reviews and 6900 (VLIW4) reviews, and both were compared against Fermi. Fermi is a scalar architecture, like Kepler and GCN. First 7970 (GCN) reviews also stated big ALU efficiency gains over 6970 (seen clearly in synthetic benchmarks).

Kepler and GCN are pretty much tied in ALU efficiency. These architectures are similar in many ways. But of course there are differences as well. See this thread for more info: http://beyond3d.com/showthread.php?t=63517
 
interesting and great thread.
Thanks, you are welcome.

Oh yeah... AMDs silly naming conventions :). 6870 was based on 5000 series VLIW5 architecture (not the new VLIW4 architecture they had in 6900 series). Not a biggie, just decrease another 5%-10% of ALU efficiency from the VLIW4 figures.

That isn't a new discussion. ATI vs Nvidia FLOP/s efficiency has been long debated in reviews and forums. This was discussed in 5800 (VLIW5) reviews and 6900 (VLIW4) reviews, and both were compared against Fermi. Fermi is a scalar architecture, like Kepler and GCN. First 7970 (GCN) reviews also stated big ALU efficiency gains over 6970 (seen clearly in synthetic benchmarks).

Kepler and GCN are pretty much tied in ALU efficiency. These architectures are similar in many ways. But of course there are differences as well. See this thread for more info: http://beyond3d.com/showthread.php?t=63517
I have been reading the thread and while I have had a hard time understanding some terms -especially when reading Gipsel's posts- or the odd mention to the "wavefronts" -3dilettante-, I am trying to assimilate all the info, and I'd have a lot of questions, but of course this is not the thread for that.

On a different note, I've seen all the presentations and events -at least the main ones- during the E3 and I am rather impressed (it certainly struck me as something unexpected) at the Xbox One's performance level. :oops:

Additionally, taking into account the specs advantage the PS4 has, I can honestly say that the games that impressed me the most were Xbox One's -save the odd one here and there-, whether they were exclusive titles or 3rd party games.

The Xbox One has 4 move engines itself which allows for fast direct memory access, afaik, and then there is the low latency thing, but I wonder if you are as surprised as me by the technological level displayed in those demos.

I am big enough to contain my enthusiasm and thought CoD: Ghosts was okay until the moment I saw all those games...

Did you expect that kind of performance or... are you actually surprised too, sebbbi? :eek:
 
The Xbox One has 4 move engines itself which allows for fast direct memory access, afaik, and then there is the low latency thing, but I wonder if you are as surprised as me by the technological level displayed in those demos.

Considering that 2 DMA units are par for the course on GCN, I really wouldn't put the move engines as a 'performance' advantage they are just strange microsoft speak for the usual DMA + compression hardware.
 
Since there's a nextgen Trials in development, I'd say there's a good chance sebbbi knows a lot more than he's allowed to say at this time ;)
 
Since there's a nextgen Trials in development, I'd say there's a good chance sebbbi knows a lot more than he's allowed to say at this time ;)


What a coincidence, i just read this

http://www.videogamer.com/news/xbox_one_and_ps4_have_no_advantage_over_the_other_says_redlynx.html
Xbox One and PS4 have no advantage over the other says RedLynx

Trials creative director Antti Ilvessuo believes both next-generation consoles to be on par in terms of power.

"The Xbox One and PS4 are on a equal footing according to Trials creative director Antti Ilvessuo.
Speaking to VideoGamer.com at E3, Ilvessuo said: " Obviously we have been developing this game for a while and you can see the comparisons. I would say if you know how to use the platform they are both very powerful. I don't see a benefit over the other with any of the consoles."
Ilvessuo also explained that "It all depends on how you use the platform and how you use it right. That's the thing."
On the subject of used games, the man behind Trials also believed that 'developers can't really worry about that [side of things]' and that it was a matter for 'different people' to sort out."



When i was reading, I thought that if anyone was going to feel comfortable with X1 design (Esram + move engines, etc) would be them ;) after reading Trials engine stuff here.
 
Considering that 2 DMA units are par for the course on GCN, I really wouldn't put the move engines as a 'performance' advantage they are just strange microsoft speak for the usual DMA + compression hardware.

The DMEs in the XOne are mutually exclusive from the typical DMA hardware common in AMD's product. ERP addressed that a while back. So the DMEs are in addition to that.
 
Since there's a nextgen Trials in development, I'd say there's a good chance sebbbi knows a lot more than he's allowed to say at this time ;)

Yeah, he is under nda atm so he can only comment on publicly available info from the manufacturers themselves. So until the NDA is lifted, he can't talk about much and I respect that.
 
The DMEs in the XOne are mutually exclusive from the typical DMA hardware common in AMD's product. ERP addressed that a while back. So the DMEs are in addition to that.

I don't know why we keep calling them something else, call them what everyone else but Microsoft would, DMA its what it is, DMA with fixed function compress/decompression stuck on the end of it.

Heres a good graph explaining it from vgleaks.

move_engine112.jpg



As you can see the top two 'DME's contain a DMA unit and a tile/untile unit these two are what are present in ALL GCN cards (and probably older as well).

The other two bellow contain the previous twos functionality but also add the FF decompression/compression on.
 
Since there's a nextgen Trials in development, I'd say there's a good chance sebbbi knows a lot more than he's allowed to say at this time ;)
Yes, Trials Fusion was announced at E3. It will be released for XB1, PS4, PC and Xbox 360 :)

It goes without saying that I will not comment on any unofficial rumors/leaks/speculation about either one of the new consoles.
 
It goes without saying that I will not comment on any unofficial rumors/leaks/speculation about either one of the new consoles.
Of course, we wouldn't expect you to.

But how's about just leaking the technical documents and specs? :mrgreen:
 
Back
Top