Are actually teraflops the correct metric to measure GPU compute power on consoles?

Rangers · Jun 13, 2013

Shifty Geezer said:
Of course, we wouldn't expect you to.

But how's about just leaking the technical documents and specs?

Shifty Geezer
uber-Troll!

Reiko · Jun 13, 2013

Uhhhhh... There's nothing to really leak on the Sony spec side.

Didn't they officially release their specs a while ago & at E3?

Shifty Geezer · Jun 13, 2013

We don't know anything about the audio DSP side of things. And with full tech specs, we could compare the CPUs and GPUs for any differences. There might be minor (or major!) tweaks that haven't been discussed in the broad overview documents.

Reiko · Jun 13, 2013

ERP · Jun 13, 2013

Shifty Geezer said:
Of course, we wouldn't expect you to.

But how's about just leaking the technical documents and specs?

Probably more informative just to ask him which version of the game he'd buy....

Rangers · Jun 13, 2013

sebbbi said:
.

Kepler and GCN are pretty much tied in ALU efficiency.

Well, maybe this is a you know something I dont thing, but I've done some figuring, Nvidia does seem to maintain perhaps a 10-20% advantage per flop with latest GPU's. Of similarly performing GPU's with similar bandwidth, Nvidia usually has somewhat less raw flops.

But it has of course shrunken incredibly since AMD went away from VLIW, as you'd expect. It's now more a footnote.

scently · Jun 13, 2013

Betanumerical said:
I don't know why we keep calling them something else, call them what everyone else but Microsoft would, DMA its what it is, DMA with fixed function compress/decompression stuck on the end of it.

Heres a good graph explaining it from vgleaks.

As you can see the top two 'DME's contain a DMA unit and a tile/untile unit these two are what are present in ALL GCN cards (and probably older as well).

The other two bellow contain the previous twos functionality but also add the FF decompression/compression on.

I am not saying that it is not DMA, what I am saying is that it is separate from DMA found in all GCN.

Kb-Smoker · Jun 13, 2013

ERP said:
Probably more informative just to ask him which version of the game he'd buy....

Pc....

pjbliverpool · Jun 13, 2013

Cyan said:
Now back on the efficiency thing and how Teraflops can be irrelevant in some cases, this is what I've found.

...

Bonaire 7790 HD has truly amazing efficiency without crippling compute. xD

...

For instance, I LOVE how the 7790 HD Bonaire is about as powerful as the AMD 6870, and even more powerful sometimes, but consumes 60% less power. :smile2:

This is a completely bizarre comparison. You comparing a previous generation GPU built on a 40nm process to a current generation GPU built on a 28nm process. And you're surpised the newer one is more efficient?

This has nothing to do with "console efficiency" and everything to do with GPU's getting more efficient between generations and obviously more power efficient between node transfers.

So what do we have here?

Yes, the Xbox One connection. :smile2:

No you don't. You have the connection of 1 AMD architecture progressing to a newer more efficient AMD architecture with the help of a node change. Thisis the same thing that happens every new GPU generation whether or not there's a console based on the new architecture.

The conclusion is that a more expensive, more powerful (theoretically) graphics card -the 6870- than the Bonaire 7790 HD, doesn't perform quite as well as the later in games.

As one could reasonably expect when comparing an older generation GPU with a newer one.

What's the mystery then? I guess it's all in the efficiency of the new design and that the 7790 HD based upon a console chip, most probably.

Oh geez you've got to be kidding? So your theory is that the 7790 - and by extension the rest of the GCN family which shares almost identical efficiency is only as efficient as it is because "it's based on a console chip?"

Never mind the fact that this architecture was on the market near enough 2 years before it comes to market in console form?

Tell me, how did Nvidia achieve similar efficiency with Kepler without the benefit of it being based on a console chip?

You might wonder what's the special sauce then. Well, being the basis of the Xbox One GPU, the secret sauce is that it combines the greater geometry processing power (2 primitives per clock) of the 7900 -Tahiti- and 7800 -Pitcairn- cores with a reduced 128 bits memory bus.

It was the first time ever someone had tried something like this.

No this isn't special source. It's simply another configuration of a highly modular architecture. So what if the 7790 is the first implementation 2 setup engines plus a 128bit bus? Tahiti was the first to do that with a 384bit bus, Pitcairn was the first to do it with a 256bit bus. Are these equally secret sauce endowed or does only the 128bit bus qualify as the magic ingredient?

This is the theory, but here we have the results, the fruits of AMD's labour on this, and how efficiency and smart console-like design defeats a power hungry beast.

Again, no. GCN is not a "smart console-like design". It is AMD's current generation (currentlly around 1.5 years old) PC architecture that has been leveraged for use in consoles (partly) because of it's high efficiency.

It only defeats a "power hungry beast" that is build on a larger node and a much older design. Compared to a "power hungry beast" on the same node and a modern design like Tahiti or Kepler it's clearly much slower.

In real terms this means that the paper specs say nothing if we don't back them up with actual results, and that, imho, teraflops alone aren't indeed the only indication about how capable the hardware is.

Actually paper specs will tell you pretty much everything you need to know about a systems performance as long as you account for ALL aspects of the system and know what workload your using to compare.

TFLOPS alone obviously aren't a great metric but when were comparing across the same architecure as you would be when comparing X1 with PS4 or indeed a GCN powered PC then 50% higher TFLOPs (not to mention texture processing) means exactly 50% higher throughput in shader and texture limited scenarios. You obviously have to compare other aspects of the system like memory configuration, fill rate, geometry throughput and CPU performance to get a more accurate picture of the systems overall comparitive capabilities. And even once you know all that, without knowing which elements of the system the workload is going to stress most you're still going to struggle to understand which will perform better.

All you can really say is that one has x% more potential in that type of workload provided no other bottlenecks are encountered first. Then wait for the games to come out and see if they take advantage of that potential or not.

ERP · Jun 13, 2013

Actually paper specs will tell you pretty much everything you need to know about a systems performance as long as you account for ALL aspects of the system and know what workload your using to compare.

But the part I bolded is extremely hard to do without using a system, because many things simply are not documented.
Seemingly innocuous things like changing how frequently the DRAM in a system is refreshed can have a 5-10% performance impact.
Your assumptions can be completely flawed.
You might assume that you will be GPU bound, but later discover you are actually CPU bound or vice a versa. Historically for games it's been far more common for the CPU to be the limiting factor not the GPU.
And it's not just about hardware there is software between you and the system you have little or no control over.
Game software isn't a trivial demo, it's complicated, there are a lot of moving parts.

You will be ALU bound in some circumstances and at those points flops are all that matter.
If your geometry carries too many attributes, those ALU's will be massively underutilized when processing vertices.
When you are rendering shadows you will be fill or possibly bandwidth coinstrained
The same when doing a first pass for a deferred renderer
Full screen effects are probably memory limited, but could be ALU bound depending on complexity.
None trivial compute jobs are usually memory bound

Flops are a useful metric, but only in context, I just hate boiling performance down to a single number, because I don't believe that you can.

From the leaked specs it would be my best guess that PS4 in most GPU limited situations would have an advantage performance wise, and certainly it has an advantage from a development standpoint.
What I would not want to guess at is how big that advantage is in real terms. I certainly don't think it will be as apparent as the 12 vs 18 numbers would seem to indicate.

Cyan · Jun 14, 2013

Betanumerical said:
Considering that 2 DMA units are par for the course on GCN, I really wouldn't put the move engines as a 'performance' advantage they are just strange microsoft speak for the usual DMA + compression hardware.

As I said, if I understood it correctly, the console has 4 move engines, and this allows for fast direct memory access to take place.

The true purpose of the move engines -excuse me if I am wrong- is to take workloads off of the rest of the architecture while still yielding positive results at a very low cost. Perhaps the so called *secret sauce*

From VGLeaks:

The four move engines all have a common baseline ability to move memory in any combination of the following ways:

From main RAM or from ESRAM

To main RAM or to ESRAM

From linear or tiled memory format

To linear or tiled memory format

From a sub-rectangle of a texture

To a sub-rectangle of a texture

From a sub-box of a 3D texture

To a sub-box of a 3D texture

(....)

All of the engines use a single memory path, resulting in the best throughput for all of the engines that would be the same for only one of the engines.

(....)

The great thing about the move engines is they can operate at the same time as computation is taking place. When the GPU is doing computations, the engines operations are still available. While the GPU is working on bandwidth, move engine operations can still be available so long as they use different pathways.

http://www.vgleaks.com/world-exclusive-durangos-move-engines/

Kb-Smoker said:
Pc....

: -D Maybe it does, maybe it doesn't.

Laa-Yosh said:
Since there's a nextgen Trials in development, I'd say there's a good chance sebbbi knows a lot more than he's allowed to say at this time

Yup, some questions that could likely work with him would something like, for instance:

Sebbbi, what's your favourite number? Your least favourite?

Do you look up topics online (off Beyond 3d) for personal enjoyment/fun? If so, what?

Do you think life would have any meaning without death? Why? Conversely, why not?

But this is not the topic for that, of course.

Shifty Geezer said:
Of course, we wouldn't expect you to.

Shifty Geezer said:
But how's about just leaking the technical documents and specs?

Rangers said:

Shifty just spoiled my secret question, he has been a visionary. :smile: As with his work as a moderator, I wondered if he would be either too strict or too lenient, but I guess he would be too straightforward. You need to be more tactful with such a sensitive developer like sebbbi. heh I'd deem Shifty's approach too progressive.

ERP said:
Probably more informative just to ask him which version of the game he'd buy....

Nicely said. I wonder if it would work. Your idea is a conundrum itself.

Reiko said:
Ah.

he he

Betanumerical · Jun 14, 2013

Cyan said:
As I said, if I understood it correctly, the console has 4 move engines, and this allows for fast direct memory access to take place.

The true purpose of the move engines -excuse me if I am wrong- is to take workloads off of the rest of the architecture while still yielding positive results at a very low cost. Perhaps the so called *secret sauce*

From VGLeaks:

http://www.vgleaks.com/world-exclusive-durangos-move-engines/

What is described by the 'Move Engines' is literally just DMA (with decompression/compression hardware), DMA is in everything, DMA is in the PS4, its in the Nintendo 64.

DMA allows you to move memory from one spot to another without involving another processor (generally the CPU) in the task. Its used hugely in desktops generally from what I can tell by high performance devices (such as network cards)

Im sorry but this is not 'secret sauce' nor will it give any large benefit. But it sure would be cheap silicon wise.

MBDF · Jun 14, 2013

It will all be known soon.

(((interference))) · Jun 14, 2013

ERP said:
From the leaked specs it would be my best guess that PS4 in most GPU limited situations would have an advantage performance wise, and certainly it has an advantage from a development standpoint.

Why do you say that?

Do you know your tools are better or more complete than MS's or are you referring to being unable to talk directly to the hardware on XB1?
...

Sebbi, can you comment on what your creative director said? Do you agree with his statement or not?
He's under NDA too so if he can make such statements I'm sure you can clarify.

Cyan · Jun 14, 2013

Rangers said:
Well, maybe this is a you know something I dont thing, but I've done some figuring, Nvidia does seem to maintain perhaps a 10-20% advantage per flop with latest GPU's. Of similarly performing GPU's with similar bandwidth, Nvidia usually has somewhat less raw flops.

But it has of course shrunken incredibly since AMD went away from VLIW, as you'd expect. It's now more a footnote.

I think AMD's technology is better at efficiency nowadays. I was SO happy when first rumours said MS would go AMD this gen...

Even so, I remember people people tearing into Sony for choosing ATi and not NVIDIA and were sore for Sony going AMD, but the only reason for that was that it pretty much sealed the deal for backwards compat via hardware being a non option.

pjbliverpool said:
This is a completely bizarre comparison. You comparing a previous generation GPU built on a 40nm process to a current generation GPU built on a 28nm process. And you're surpised the newer one is more efficient?

This has nothing to do with "console efficiency" and everything to do with GPU's getting more efficient between generations and obviously more power efficient between node transfers.

No you don't. You have the connection of 1 AMD architecture progressing to a newer more efficient AMD architecture with the help of a node change. Thisis the same thing that happens every new GPU generation whether or not there's a console based on the new architecture.

As one could reasonably expect when comparing an older generation GPU with a newer one.

Oh geez you've got to be kidding? So your theory is that the 7790 - and by extension the rest of the GCN family which shares almost identical efficiency is only as efficient as it is because "it's based on a console chip?"

Never mind the fact that this architecture was on the market near enough 2 years before it comes to market in console form?

Tell me, how did Nvidia achieve similar efficiency with Kepler without the benefit of it being based on a console chip?

No this isn't special source. It's simply another configuration of a highly modular architecture. So what if the 7790 is the first implementation 2 setup engines plus a 128bit bus? Tahiti was the first to do that with a 384bit bus, Pitcairn was the first to do it with a 256bit bus. Are these equally secret sauce endowed or does only the 128bit bus qualify as the magic ingredient?

Again, no. GCN is not a "smart console-like design". It is AMD's current generation (currentlly around 1.5 years old) PC architecture that has been leveraged for use in consoles (partly) because of it's high efficiency.

It only defeats a "power hungry beast" that is build on a larger node and a much older design. Compared to a "power hungry beast" on the same node and a modern design like Tahiti or Kepler it's clearly much slower.

Actually paper specs will tell you pretty much everything you need to know about a systems performance as long as you account for ALL aspects of the system and know what workload your using to compare.

TFLOPS alone obviously aren't a great metric but when were comparing across the same architecure as you would be when comparing X1 with PS4 or indeed a GCN powered PC then 50% higher TFLOPs (not to mention texture processing) means exactly 50% higher throughput in shader and texture limited scenarios. You obviously have to compare other aspects of the system like memory configuration, fill rate, geometry throughput and CPU performance to get a more accurate picture of the systems overall comparitive capabilities. And even once you know all that, without knowing which elements of the system the workload is going to stress most you're still going to struggle to understand which will perform better.

All you can really say is that one has x% more potential in that type of workload provided no other bottlenecks are encountered first. Then wait for the games to come out and see if they take advantage of that potential or not.

Well, it's the tendency I have to compare top of the line PCs to consoles -it's past being a simple habit, even-. I might have to look into it.

It's just that we are used on consoles to run games at 30 fps or less, comparing a couple of spare weeds and hairs here and there and then a PC gamer appears and tells us that he is running the game at 60 fps and high res textures and you know the rest...

As for Kepler and console efficiency, that's a good question. The 7790 HD is an AMD card and its performance it what I would call amazing efficiency without crippling compute X-D (hear this nVidia?)

It is also interesting to note that the 7790 is a 896 sp part clocked around 1ghz with the aforementioned 128 bits bus and it seems to be exactly half as powerful as the Tahiti GPU which also goes in line with the rumoured 1792sps/256 bits bus part for 8870 which should match Tahiti line in performance if the chip is configured the same way as this excellent Bonaire.

Also they say Tahiti is not as efficient gaming chip as the Bonaire due to its other compute goodieson board, so it's truly good to see that with their console chips and their new GPUs AMD figured it out and hopefully this will reflect in the higher end chips which are going to come out.

Yay AMD! :smile:

If you take into account the allegedly amount of Teraflops alone of the X1's GPU, Xbox One games looked out of this world.

ERP · Jun 14, 2013

(((interference))) said:
Why do you say that?

Do you know your tools are better or more complete than MS's or are you referring to being unable to talk directly to the hardware on XB1?

Neither I'm just talking about the unified fast memory.
Obviously I've written no XB1 code, but you are going to have to decide what you want in the ESRAM for any given point in your rendering.
You have to decide if copying something between the pools is worthwhile, if it's something you choose to do you have to decide how to overlap the copy with whatever else you are going to do.
The data flow requires a lot more thought, there is a lot more to screw up.

Cyan · Jun 14, 2013

sebbbi said:
Yes, Trials Fusion was announced at E3. It will be released for XB1, PS4, PC and Xbox 360

sebbbi said:
It goes without saying that I will not comment on any unofficial rumors/leaks/speculation about either one of the new consoles.

Well, mine would be a very innocent question, actually, as I rather prefer to be as tactful as possible with you, because of NDA and stuff.

I mean... Xbox One games looked simply astonishing.

I was surprised and wondered if you felt the same, sebbbi.

Out of pure curiosity: Do you think Xbox One games looked as one should expect from the console? Or are you too surprised about how good games looked? Why or why not?

DonaldDuck · Jun 14, 2013

One question that I have related with XBO is that, if it´s confirmed that games in E3 from both consoles were played in devkits, and devkits at the moment don´t have real hardware, real silicon exactly as the one that will be in the future consoles... How can you implement (better: emulate) the behaviour of ESRAM without ESRAM? The devkit for PS4 is easier to imagine: a GPU with GDDR5 and a PC CPU with software regulated specs but, what could you do to emulate ESRAM? The data flow between memory pools in XBO is more complicated, and with no real hardware in its devkit I suppose that could be really difficult to tune to replicate.

That´s a point that makes me think that it´s more legit to expect differences in performance (for the better or worse, I don´t know) between XBO devkits vs real console that between PS4 devkit vs real.

What do you think?

(((interference))) · Jun 14, 2013

But the devkits do have ESRAM

sebbbi · Jun 14, 2013

Rangers said:
Nvidia does seem to maintain perhaps a 10-20% advantage per flop with latest GPU's.

I was just talking about the pure ALU efficiency, not the efficiency of the full architecture. In general when running well behaving code (no long complex data dependency chains) the ALUs should be very close in efficiency (assuming all data is in registers or L1). Of course when we add the differences in memory architecture and caches (sizes/associativies/latencies) there will be bigger real world differences. And when we add the fixed function graphics hardware (triangle setup, ROPs, TMUs, etc) there will be even more potential bottlenecks that might affect the game frame rate. Actual GPU performance is hard to estimate by the theoretical numbers alone. These are very complex devices, and the performance is often tied to bottlenecks. A hardware might for example be much faster in shadow map rendering (fill rate, triangle setup), but be bottlenecked by ALU in the deferred lighting pass. Still the frame rate could be comparable to another hardware with completely different bottlenecks. But in another game performance might be completely different, since the other game might have completely different rendering methods (that are bottlenecked by different parts of the GPUs). Good balance is important when designing a PC GPU. PC GPUs must both be able to run current games fast, and be future proof for new emerging rendering techniques.

Cyan said:
Out of pure curiosity: Do you think Xbox One games looked as one should expect from the console? Or are you too surprised about how good games looked? Why or why not?

I expected next generation games to look very good. That has always happened (PS1->PS2 / PS2->PS3 / XBox->x360). Why would it be any different this time?

Are actually teraflops the correct metric to measure GPU compute power on consoles?

Rangers

Reiko

Shifty Geezer

uber-Troll!

Reiko

ERP

Rangers

scently

Kb-Smoker

pjbliverpool

B3D Scallywag

ERP

Cyan

orange

Betanumerical

MBDF

(((interference)))

Cyan

orange

ERP

Cyan

orange

DonaldDuck

(((interference)))

sebbbi

Similar threads