Are actually teraflops the correct metric to measure GPU compute power on consoles?

I think AMD's technology is better at efficiency nowadays. I was SO happy when first rumours said MS would go AMD this gen...

From the mid range downwards you're probably right although it's pretty close. While it's a little faster the the 650Ti and has a lower TDP, from what I've seem the measured power draw from the 650Ti is pretty much equal.

AMD's compute advantage may turn out to be a big plus moving forwards though. From what I've seen and heard Kepler is competitive - although still a little slower in direct compute which is what would be used - at least in the PC world for graphics based compute work however it can be vastly slower in OpenCL. This however isn't necessarily down to it being rubbish and more about it having to be very carefully tuned for maximum performance. i.e. is has a lot of very easy to encounter performance cliffs which the regular (unoptimised) benchmarks are probably falling off left right and centre. Consoles wouldn't suffer that as much.

It is also interesting to note that the 7790 is a 896 sp part clocked around 1ghz with the aforementioned 128 bits bus and it seems to be exactly half as powerful as the Tahiti GPU which also goes in line with the rumoured 1792sps/256 bits bus part for 8870 which should match Tahiti line in performance if the chip is configured the same way as this excellent Bonaire.

I don't agree with this. There's nothing fundamentally more performant about Bonaire compared to any other GCN GPU including Tahiti. It uses the more or less exact same functional units just in a different configuration. It does seem more power efficient but that's likely down to the more refined process it's built on similar to NV's 7xx series.

There aren't many benchmarks out there comparing the two but those that I've seen do put the 7970Ghz at only a little more than twice as fast. However they're only comparing at modest settings and thus the higher end card isn't being pushed to it's full potential. What if the comparisons were being made at 5760x1080? Do you think the 7790 would still be half as fast? And of course in some respects the 7970Ghz IS only twice as fast as the 7790 (setup rate, fill rate) so it's unreasonble to expect a more than double performance in every situation.

Bottom line, a 1792sps/256 bits bus "Bonaire style" part at 1Ghz isn't going to be faster than Tahiti. It may be competitive at lower settings but as settings are dialled up, Tahiti will pull away.

Also they say Tahiti is not as efficient gaming chip as the Bonaire due to its other compute goodieson board, so it's truly good to see that with their console chips and their new GPUs AMD figured it out and hopefully this will reflect in the higher end chips which are going to come out.

This is efficiency in terms of performance/transistor/watt. That's difference to what you're implying Bonaire improves which is actual performance/theoretical performance. I see no evidence that Bonaire has improved that. See 7790 vs 7770 benchmarks for evidence of that.

If you take into account the allegedly amount of Teraflops alone of the X1's GPU, Xbox One games looked out of this world.

I'm not sure I agree with that either, the games look great of course but why would we expect anything less from a system with easily 6-8x the performance of the Xbox360. Just look at the miracles they pulled off with an inefficient 240 GFLOP GPU. It's not hard to imagine that they could do vastly greater things with a highly efficient 1200 GFLOP GPU. I expect the games to get much better looking that what we're currently seeing.
 
But the part I bolded is extremely hard to do without using a system, because many things simply are not documented.
Seemingly innocuous things like changing how frequently the DRAM in a system is refreshed can have a 5-10% performance impact.
Your assumptions can be completely flawed.
You might assume that you will be GPU bound, but later discover you are actually CPU bound or vice a versa. Historically for games it's been far more common for the CPU to be the limiting factor not the GPU.
And it's not just about hardware there is software between you and the system you have little or no control over.
Game software isn't a trivial demo, it's complicated, there are a lot of moving parts.

You will be ALU bound in some circumstances and at those points flops are all that matter.
If your geometry carries too many attributes, those ALU's will be massively underutilized when processing vertices.
When you are rendering shadows you will be fill or possibly bandwidth coinstrained
The same when doing a first pass for a deferred renderer
Full screen effects are probably memory limited, but could be ALU bound depending on complexity.
None trivial compute jobs are usually memory bound

Flops are a useful metric, but only in context, I just hate boiling performance down to a single number, because I don't believe that you can.

From the leaked specs it would be my best guess that PS4 in most GPU limited situations would have an advantage performance wise, and certainly it has an advantage from a development standpoint.
What I would not want to guess at is how big that advantage is in real terms. I certainly don't think it will be as apparent as the 12 vs 18 numbers would seem to indicate.

I think we're on the same page there, you just said it better ;)

I just wanted to illustrate the point that completely ignoring "paper specs" (i.e. the actual hardware) makes no sense since they do actually tell you everything about the performance of a system if you know them in sufficient detail. As you say, knowing in sufficient detail across architectures is extremely difficult if not impossible.

However I think it would be fair to say when comparing between identical architectures it does become plausable to make basic comparisons as long as you account for all known differences. So in the case of the two consoles using the same GPU and CPU architectures at the same clock speeds you'd need to account for the following:

  1. Difference in number of functional units after system reservation
  2. Difference in software interface to the hardware
  3. Effect of low latency esram vs single pool of slightely faster GDDR5
  4. Effect of any additional units (SHAPE, move engines etc..)
All else seems to be pretty much even so on that basis, if comparng the two consoles I'd say that comparing "TFLOPs" (number of CU's) is pretty relevant indeed as long as your taking accounf of the other functional unit differences (and similarities) at the same time, i.e. ROPs, CPU cores, setup engines etc...
 
But the devkits do have ESRAM

Ok, I didn´t know that. But the devkit has an APU inside? Or it simply have a ESRAM memory pool in a PCIe card or a dedicated bus?

The architecture of the motherboard of the devkit is custom, is a PC motherboard?

I thought that before final devkits the previous versions were pc-ish...

Thank you for the explanation.
 
Ok, I didn´t know that. But the devkit has an APU inside? Or it simply have a ESRAM memory pool in a PCIe card or a dedicated bus?

The architecture of the motherboard of the devkit is custom, is a PC motherboard?

I thought that before final devkits the previous versions were pc-ish...

Thank you for the explanation.

At this point in development devkits will have something very similar to final silicon, and if MS is true to form, probably the final PCB layout or close to it.
They will be based on early spins of the final chips, possibly with bugs.
It's likely been that way since Jan or Feb, though they may not have been available in quantity in that timeframe.

Earlier devkits were likely just PC's running a version of the OS.
 
Ok, thanks.

So it´s legit to think that the global performance and programming tricks that will apply with future XBO it´s already achievable with the current devkits.

That´s a question that has made me think because I remember Hideo Kojima complaining about how different were PS3 devkits compared with final hardware, and my conclussion was to assume that the more complex an architecture is, more difficult is to accurately replicate it with software measures even using powerful hardware.

If it´s a fact that XBO devkits have "almost" final silicon, or even prototypes of PCB layouts then it´s quite possible to consider the look and feel of the games showed at E3 legit, always relating to the final XBO hardware.

If the devkit has different architecture because of the absence of final silicon, it will be more probably to find differences in performance; and that could make the final look of the games somewhat unexpected (even for the better).

No doubt that the first bunch of games run in real consoles will attract a lot of nitpicking. That will be intellectually engaging as much as interesting.

I´m really excited about final performance levels. And how far the consoles will be from standar pc setups.
 
Yeah the beta kits started going out in Dec-Jan and had all the extra componentry - ESRAM, move engines, SHAPE etc so they were probably using the APU silicon.
 
Ok, thanks.

So it´s legit to think that the global performance and programming tricks that will apply with future XBO it´s already achievable with the current devkits.

That´s a question that has made me think because I remember Hideo Kojima complaining about how different were PS3 devkits compared with final hardware, and my conclussion was to assume that the more complex an architecture is, more difficult is to accurately replicate it with software measures even using powerful hardware.

If it´s a fact that XBO devkits have "almost" final silicon, or even prototypes of PCB layouts then it´s quite possible to consider the look and feel of the games showed at E3 legit, always relating to the final XBO hardware.

If the devkit has different architecture because of the absence of final silicon, it will be more probably to find differences in performance; and that could make the final look of the games somewhat unexpected (even for the better).

No doubt that the first bunch of games run in real consoles will attract a lot of nitpicking. That will be intellectually engaging as much as interesting.

I´m really excited about final performance levels. And how far the consoles will be from standar pc setups.
:smile: Pillin. Agreed. It is going to be a very interesting time to know how these new consoles can actually perform and a much more interesting generation than the previous one in terms of ports and graphics technology, resolutions, etc etc.

After watching the Xbox One in action I have a feeling that teraflops aren't good indicators about the amount of performance games are going to need to add so much detail. :p

Heck, or, hell, or Heaven's or whatever.... some games are a perfect example of why I think the good engineering at AMD wanted to avoid bottlenecks and instead made sure their GPUs for next gen consoles are well-balanced.

Besides that I remember sebbbi's words :p quite a few months ago saying that top of the line PCs were 10x more powerful than the PS3 and X360.

In terms of Teraflops this means 2,5 Teraflops. :cool:

But Microsoft said the other day that the Xbox One is 10 times more powerful than the X360, regardless of the fact that it is a 1.2TFS console in terms of raw power.

This means that the PS4 and the Xbox One will be -in maths terms- equivalent to a high-end PC. :eek:
 
I expected next generation games to look very good. That has always happened (PS1->PS2 / PS2->PS3 / XBox->x360). Why would it be any different this time?
Was I hallucinating when I read your thread or you just answered my question with another question? This is a known trait people have here where I live. :smile:


I expected that kind of jump as well. I wonder how long it'll be before someone cries that I am a terribly dull sort or what have you. But I am still curious, I gotta keep asking sebbbi! :eek:


I mean, how were those graphics so cute? And did you actually expect that? Not just a jump in graphics quality, but simply....


Were those the kind of graphics you expected from these consoles knowing the specs?
 
my 5850 is a 2TFLOPS GPU (VLIW5 like the 6870), but it can be slower than a 7750 (0.82) in newer games, I think simply because it's slow for tessellation, on TessMark the 7750 is also absurdly faster than the 6870 (and the Nvidia GPUs even better), how heavily will next gen console games rely on tessellation?

http://anandtech.com/bench/Product/512?vs=535
(look at civ5 and batman)
 
theoretical flops are pointless and always have been. Its why cpus aren't meassured in MIPs or flops any more. Efficiency plays a big part in any architecture and the flops are just there to be an absolute ceiling for the performance. Likely the ps4 will never be using the 1.84TF at any point of its life time because you will never be able to keep every single ALU fed 24/7 no matter what your memory bandwidth or latency is. If the operations is only inside the GPU's L2 then performance could be more efficient but even then you would not get max compute flops.

Even with the same GCN cores in the next gen consoles, the general performance depend on memory bandwidth and memory latency. An L3 in cpu can increase core performance by up to 20% alone. For a compute driven task that has to deal with data between cpu and GPU, its likely microsoft's approach is much more efficient. The Ps4 however has 4 GCU compute modules just for the compute tasks by VGleaks. This will likely result in better compute performance if they can find a way to bypass memory bottlenecks due to high latency GDDR5. Communication between CUs and CPU modules will likely be higher cost on the ps4 due to the GDDR5 latency but the excess resources there will allow for more performance when there is no latency bottleneck.

Xbox ONE is an interesting architecture and I do think it will be easier to get at efficiency at least earlier on in the generation than the ps4 when doing gpu oriented compute tasks. That doesn't even mean it would be as powerful at the end of the gen because devs will likely always find new uses for the ESRAM. The PS4 however will always have more compute units and more potential performance. And most likely have better GPU oriented performance in regards to that. The whole story won't be told until devs can start talking about this stuff.
 
Flops + IPC is perhaps a meaningful combination for CPUs. Flops and ALU saturation amount is probaly quite good for GPUs, although you'd probably want several metrics to cover different workloads as Sebbbi has spelled out.
 
my 5850 is a 2TFLOPS GPU (VLIW5 like the 6870), but it can be slower than a 7750 (0.82) in newer games, I think simply because it's slow for tessellation, on TessMark the 7750 is also absurdly faster than the 6870 (and the Nvidia GPUs even better), how heavily will next gen console games rely on tessellation?

http://anandtech.com/bench/Product/512?vs=535
(look at civ5 and batman)
Some of the results are staggering taking into account such a difference in hardware specs, and even if some of the ratings of the comparison are to be expected, they never equal the 2x + difference in power never translates in 2x the framerate, sometimes the difference is negligible.

As for your question, I can't say because I am not a developer.

Games like The Witcher 3 are going to be launched on both systems, and they are going to be a perfect benchmark for both consoles.

The Witcher 3 relies heavily on tessellation. Some of the technologies used in the game are featured in this astonishing video. :oops: A must see:


There are some techniques like the Fur Simulation -after the 22 seconds mark, the fur waves like the real thing- :oops: and the tessellation itself which are truly incredible overall.
 
theoretical flops are pointless and always have been. Its why cpus aren't meassured in MIPs or flops any more. Efficiency plays a big part in any architecture and the flops are just there to be an absolute ceiling for the performance. Likely the ps4 will never be using the 1.84TF at any point of its life time because you will never be able to keep every single ALU fed 24/7 no matter what your memory bandwidth or latency is. If the operations is only inside the GPU's L2 then performance could be more efficient but even then you would not get max compute flops.

Even with the same GCN cores in the next gen consoles, the general performance depend on memory bandwidth and memory latency. An L3 in cpu can increase core performance by up to 20% alone. For a compute driven task that has to deal with data between cpu and GPU, its likely microsoft's approach is much more efficient. The Ps4 however has 4 GCU compute modules just for the compute tasks by VGleaks. This will likely result in better compute performance if they can find a way to bypass memory bottlenecks due to high latency GDDR5. Communication between CUs and CPU modules will likely be higher cost on the ps4 due to the GDDR5 latency but the excess resources there will allow for more performance when there is no latency bottleneck.

Xbox ONE is an interesting architecture and I do think it will be easier to get at efficiency at least earlier on in the generation than the ps4 when doing gpu oriented compute tasks. That doesn't even mean it would be as powerful at the end of the gen because devs will likely always find new uses for the ESRAM. The PS4 however will always have more compute units and more potential performance. And most likely have better GPU oriented performance in regards to that. The whole story won't be told until devs can start talking about this stuff.
Like Shifty said it's all about what are the usable and unusable levels of the hardware taking into account the peculiarities of each hardware's architecture.

Taking into account that the PS4 is the most powerful hardware -more ROPS, CUs-, it will be great how far developers can push both machines and where all the architectural differences kick in.

Digital Foundry articles and the like are going to take the guesswork out of picking between both machines and helps people to know what's the variance in scene complexity. This generation is going to be more varied than the previous, because the difference seems more pronounced.

Lastly, as I said in the previous post, games like The Witcher 3 seem to be an excellent test since it doesn't target a specific hardware and the engine is very advanced.
 
Back
Top