Digital Foundry Article Technical Discussion Archive [2013]

Status
Not open for further replies.
I think you're getting confused between balance and efficacy.

For example, the infamous Saab 9-3 Viggen is a textbook example of an unbalanced car, with a powerful turbo engine married to a chassis that simply could not cope with all the torque.

However it'd definitely be faster around a track than the regular, 'balanced' 9-3 models.

In the same way despite PS4 possibly being a less 'balanced' design, it should still outperform XB1.


We should also keep in mind that "balance" also is influenced by the workload required and the proper balance for one game may be very different from another game.
PS2 should be a pretty good example.

Overbuilding a part of a system makes it unbalanced, but in most cases, having too much of something isn't going bomb your performance, rather it's just parts of the system "twiddling their thumbs" for a large portion of the time.

To understand the performance of the system, what we should be looking at is the bottleneck and how much throughput the bottleneck can do. It doesn't matter if your engine can power you to run at 500km/hr if your chassis can only withstand 100km/hr. You're still going to be limited at 100km/hr.

If a game is CPU bound, then we shouldn't expect too much difference between the two consoles as they essentially use the same thing.
If, on the other hand, the game is bandwidth or GPU bound and we take the relatively safe assumption that PS4 outperforms the Xbox One in those two areas in the absolute sense, it's a safe bet we'll notice it.
 
I don't understand why for some reason you think that Sony carelessly design unbalanced system. I am convinced they did their homework and knew what they did want to achieve with PS4.
Of course, there are scenarios where CPU will be the limiting factor, but as I understand it can be compensated with compute.
As to claim that PS4 should still outperform XBone. It is not a matter of "should" - it will. Processing power is there, ROPs are there, texture units are there and bandwith are there.
And that, in my opinion makes DF comparison conclusion incorrect, because they ignored most of the relevant factors.
 
The article was intended to illustrate the 14+4 thing/diminishing returns from additional ALU resources as much as the real world difference between the two consoles.

Besides the question mark over the performance of XB1's memory setup (which Richard mentioned) I don't see how their comparison is 'incorrect', they showed that having 50% more GPU resources on paper gives you far less of a real world performance advantage.

ERP for one said the exact same thing back in February (he also estimated the real world performance advantage would be closer to 20-30% like Richard's findings seem to indicate)
 
Aren't 16 ROPs are adequate for 1080p?
There is no direct correlation between ROPs and resolution. More ROPs only help shaders that are ROP bound. Only simple shaders are ROP bound. Most shaders are either ALU or TEX bound (depending if the shader has more math or more texture accessing). A shader can also be bandwidth bound (if the render target bit depth is high and/or the shader is sampling lots of high bit depth textures).

Common examples of ROP bound shaders: shadow map rendering (depth only, or simple linear / exponential depth write from pixel shader), simple blit/fill shaders (copy / move / UI rendering), particle rendering (usually simple enough shader to be ROP bound, not always however). In some cases deferred rendering pass can also be ROP bound (as the fill rate is divided by amount of MRTs), however the more MRTs you have, the more texture you usually sample in the deferred pass, and thus are more likely TEX bound.

The most important thing to understand is: If your pixel shader is not ROP bound, adding extra ROPs does not increase performance at all. Not a single bit. Increasing resolution doesn't matter, since all the other operations pixel shader does are also multiplied by the same amount. Resolution doesn't change pixel shading bottlenecks (except for making texture cache access pattern a little bit more linear, and thus slightly reducing cache stalls).

Of course you might argue that if you are targeting 1080p in the first place, you might want to use simpler shaders that are less ALU/TEX bound, and in that case your shaders are more likely ROP bound. This is why extra ROPs often boost performance of older games (especially when ran at higher resolutions, as that minimizes the front end & CPU effect).
 
I don't understand why for some reason you think that Sony carelessly design unbalanced system. I am convinced they did their homework and knew what they did want to achieve with PS4.
Of course, there are scenarios where CPU will be the limiting factor, but as I understand it can be compensated with compute.
As to claim that PS4 should still outperform XBone. It is not a matter of "should" - it will. Processing power is there, ROPs are there, texture units are there and bandwith are there.
And that, in my opinion makes DF comparison conclusion incorrect, because they ignored most of the relevant factors.

Should is a more Fair word to use when we still know so little about the systems, specially Xbox One since Microsoft is not so open to talk Tech.

And in the end - if the "Should" will turn in to "Will" is up to the developers.

As for who designed the best balanced system, who cares? Price and therman shoudl then also be accounted for.

In my personal opinion is that not only is PS4 looking more powerfull, it also looks easier and also seems easier to scale down in price.
 
But Digital Foundry pulled their conclusions based on computing only. They completly ignored the ROPs advantage (which may or not be relevant depending on scenario) and texture units advantage.


All in all, I think that this kind of comparison is simply premature - we know the figures but we still don't know which system will be for example more bandwith limited in real world situations. There is no simple analogy here.
 
they showed that having 50% more GPU resources on paper gives you far less of a real world performance advantage.

I think this is a bit misleading. 50% more GPU resources on paper will give you 50% more real world performance provided you are talking about ALL GPU resources being 50% greater and the game is not at all CPU/system bound.

What the article actually demonstrated was that if you increase compute and texture resources by 50% without increasing memory bandiwdth, ROP throughput, setup rate, and to a lesser extent CPU speed correspondingly then you will not see a 50% real world performance increase. Common sense should have told us that already but at least the article proves it and to some degree quantifies it - at least in the games tested which as we all know won't be a particularly useful proxy for console software.

To give an example take a 7970 with 32 CU's. Well in excess of the apparent 12 sweet spot of the XB1. Now add a second 7970 to the same system thus doubling every aspect of the GPU. Provided you have a sufficiently powerful CPU and your game if GPU bound you will foten see greater than 80% real world performance increases. The fact that it doesn't go all the way to 100% can mostly be accounted for by the software overhead of Crossfire and CPU bottlnecks - even with a very powerful CPU in a mostly GPU bound game.
 
But Digital Foundry pulled their conclusions based on computing only. They completly ignored the ROPs advantage (which may or not be relevant depending on scenario) and texture units advantage.

They didn't ignore to the texture advantage. Texture units scale with CU's in GCN so the test was representitive of the difference there. ROPs and Setup rate were definately not accounted for though and the big one - memory performance - is still largely unknown so the XB1 was given the benefit of the doubt there but the article did specifically point that out.
 
So a system with a HD7850 paired with a Core i3 would be unbalanced.

To be pedantic, an i3 wouldd actually be nearly a perfect match for a 7850. It's basically a solid but low end CPU with a solid but low end GPU. But of course we understand your point.


What's interesting is how similar the 1080p vs 1000p screens of Crysis 3 look. I bet if you did a double blind test from normal viewing distances and asked people to choose which was higher res they wouldn't be able to tell you.

Yes, I immediately noticed that as well! Actually at 16X9 I thought I could tell some difference, but surprisingly little, but at 17X10 virtually none.

That was surprising because I was against 16X9 before, I would have rather suffered a deficit at full 1080P. But after seeing that comparison, I'm not so sure.

It may be that at these higher resolutions to begin with, it will be more difficult to tell the difference.

At first glance that would bode well for X1, since for a long time a common thought has been "well the X1 version may just drop to 16X9 with all effects". However, the "issue" of course is nothing stopping Sony/others from also dropping to 16X9 on PS4, and thus regaining the effects edge.


the Move Engines are simply DMA backed by compression hardware, DMA is present in all modern video card and nearly every modern real time computer device, it is strange that microsoft decided to rename this but not that strange (they did it with everything to), so infact the PS4 does contain most of the move engines and I also wouldn't be surprised if they worked at the same speed too.

You continually state this as fact, where is proof that they are the same DMA units (+compression) as in AMD GCN parts?
 
You continually state this as fact, where is proof that they are the same DMA units (+compression) as in AMD GCN parts?

Because they move memory from point a to point b without the assistance of the main CPU. This is pretty much the definition of a DMA controller, and we know that GCN contains 2 DMA controllers so it would make sense to base it off them.

From wiki.

To carry out an input, output or memory-to-memory operation, the host processor initializes the DMA controller with a count of the number of words to transfer, and the memory address to use. The CPU then sends commands to a peripheral device to initiate transfer of data. The DMA controller then provides addresses and read/write control lines to the system memory. Each time a word of data is ready to be transferred between the peripheral device and memory, the DMA controller increments its internal address register until the full block of data is transferred.

http://en.wikipedia.org/wiki/Direct_memory_access
 
Because they move memory from point a to point b without the assistance of the main CPU. This is pretty much the definition of a DMA controller, and we know that GCN contains 2 DMA controllers so it would make sense to base it off them.

From wiki.



http://en.wikipedia.org/wiki/Direct_memory_access

The ones on the x1 sit at a different position to the ones generally found in Radeon products so they are mutually exclusive in their purpose at least. ERP commented on it awhile back.
 
the Move Engines are simply DMA backed by compression hardware, DMA is present in all modern video card and nearly every modern real time computer device, it is strange that microsoft decided to rename this but not that strange (they did it with everything to), so infact the PS4 does contain most of the move engines and I also wouldn't be surprised if they worked at the same speed too.

This is not correct at all, at least for the info at our disposal.

PS4 could have the "standard" GNC DMAs: limited to number of 2 and that can saturate up to 16gb/s.

X1 DME are 4 (more flexibility), are located in completelly a different position than standard GCN, and can saturate up to 25,6 gb/s.
The differences do not ending here.
2 Move Engine are composed by DMA + tile/untile.
A third one has DMA + tile/untile + LZ decode + JPEG decode.
The fourth one has DMA + tile/untile + LZ code

DF, also pointed that the last 2 move engines could be a real hint of the X1 Cloud predisposition.
 
the Move Engines are simply DMA backed by compression hardware, DMA is present in all modern video card and nearly every modern real time computer device, it is strange that microsoft decided to rename this but not that strange (they did it with everything to), so infact the PS4 does contain most of the move engines and I also wouldn't be surprised if they worked at the same speed too.

Have we got any more detailed info on these? I remember seeing a patent a while back, that I'm not even sure it was really about the same DMEs durango has, that while stated that they would serve as DMA they could also perform a few operations that regular DMAs don't. I think I remember a few of those outlined in the patent:

- They could bypass memory and move data directly into one device to another (device being cpu, gpu, hdd, kinect, network etc)
- They could bypass cpu caches.
- They can be commanded directly by the cpu writing into their registers (which would be the DMA functionality), but they can also be set to read instructions from memory independently from the cpu.
- The executing software can raise some sort of flags that give them hints of what kind of operations are going to be done so they have a chance to move the data prior to it being needed.
- They can direct the cpu to perform operations on data. For creating a new stream. For example, say that they need a depth map stream with the background removed. The DME reads the depth stream from kinect, ask the cpu to perform background removal on the stream it is receiving, and using the cpu result they create a new stream that is the depth map with the background removed. (That's not an actual example in the patent).

In short, they seem to be there to move data around, but they seem essential for the esram to actually be useful... Splitting data between esram and main ram so the system can read and write from both pools at the same time...
 
This is not correct at all, at least for the info at our disposal.

PS4 could have the "standard" GNC DMAs: limited to number of 2 and that can saturate up to 16gb/s.

X1 DME are 4 (more flexibility), are located in completelly a different position than standard GCN, and can saturate up to 25,6 gb/s.
The differences do not ending here.
2 Move Engine are composed by DMA + tile/untile.
A third one has DMA + tile/untile + LZ decode + JPEG decode.
The fourth one has DMA + tile/untile + LZ code

DF, also pointed that the last 2 move engines could be a real hint of the X1 Cloud predisposition.

tile/untile is standard in all DMA and memory hardware atleast on graphics cards, that what leads me to think that the two without extras are atleast derived from the GCN DMA. The last two don't really lead themselves to cloud computing at all imo, lz is obvious (popular game format) and jpeg comes cheaply once you have lz.
 
To give an example take a 7970 with 32 CU's. Well in excess of the apparent 12 sweet spot of the XB1. Now add a second 7970 to the same system thus doubling every aspect of the GPU. Provided you have a sufficiently powerful CPU and your game if GPU bound you will foten see greater than 80% real world performance increases. The fact that it doesn't go all the way to 100% can mostly be accounted for by the software overhead of Crossfire and CPU bottlnecks - even with a very powerful CPU in a mostly GPU bound game.

If you're talking about PC card benchmarks you need to keep in mind that the increase as you add CUs is usually at higher resolutions. When PC enthusiast sites test crossfire and high end video cards they are including resolutions upwards of 2560x1440 and/or show frame-rates as high as they'll go (60,80,100,etc). In the PC space this is more important as resolutions are custom and the reader is interested in the cards' relative performance and more as an investment.

When talking about PS4/XBO, its important to remember the frame-rate and resolution have ceiling of 1080p/60, anything more than that wont be visible to the user. While sebbbi pointed out the disconnect between ROPs and resolution, at some point these added resources (CU or otherwise) are also wasted above 1080p/60.
 
Last edited by a moderator:
Most of the benchmarks run on PC (ingame or not) are in 1080p. And there is still large diffrence between Radeon HD 7770 and HD 7850.
 
If you're talking about PC card benchmarks you need to keep in mind that the increase as you add CUs is usually at higher resolutions. When PC enthusiast sites test crossfire and high end video cards they are including resolutions upwards of 2560x1440 and/or show frame-rates as high as they'll go (60,80,100,etc). In the PC space this is more important as resolutions are custom and the reader is interested in the cards' relative performance and more as an investment.

When talking about PS4/XBO, its important to remember the frame-rate and resolution have ceiling of 1080p/60, anything more than that wont be visible to the user. While sebbbi pointed out the disconnect between ROPs and resolution, at some point these added resources (CU or otherwise) are also wasted above 1080p/60.

Also, for all practical purposes, with the average TV set size in the US around 40-42" and sitting 8-9 feet away, you will not be able to tell the difference between 720p and 1080p in motion. You need to get up to 60" to made a difference (and that's if you're looking for it).

Fact of the matter, resolution will be sacrificed for frame rate and effects.
 
If you're talking about PC card benchmarks you need to keep in mind that the increase as you add CUs is usually at higher resolutions. When PC enthusiast sites test crossfire and high end video cards they are including resolutions upwards of 2560x1440 and/or show frame-rates as high as they'll go (60,80,100,etc). In the PC space this is more important as resolutions are custom and the reader is interested in the cards' relative performance and more as an investment.

When talking about PS4/XBO, its important to remember the frame-rate and resolution have ceiling of 1080p/60, anything more than that wont be visible to the user. While sebbbi pointed out the disconnect between ROPs and resolution, at some point these added resources (CU or otherwise) are also wasted above 1080p/60.

The only reason such high resolutions are required on the PC to notice large performance gains on dual GPU setups is that the graphics engines aren't particularly GPU heavy (by high end standards). Thus they become CPU bound very quickly with multiple high end GPU's and you need to dial up the resolution to bring the bottleneck back to the GPU again.

For example I'll bet if you take a next gen game like Battlefield 4 and run it at 1080p on a 7770 (10 CU's) and then again on 2 7770's in Crossfire (20 CU's) on an extremely powerful CPU (lets say a 4Ghz Haswell-E for the fun of it) you'll see very good scaling. Assuming of course the game properly supports dual GPU's which many don't.
 
i have a question, i get the many comparisons of xbox one to a 7770 or 7790 because of the number of CUs but wouldn't it make more sense to compare it to a 7870/7970 because of xbox one's bandwidth? with xbox having 264GB/s of bandwidth (esram's 196GB/s + ddr3 68GB/s) it seems like it would make more sense to compare it to a higher end card then there maybe a clock increase right? so wouldn't that put the esram even higher?
 
Status
Not open for further replies.
Back
Top