SUBSTANCE ENGINE

So, the Substance Engine is scaling across an additional core "if" the PS4 CPU is officially locked at 1.6GHz, which makes sense on the 14MB/s figure. You can't arrive at that figure other than an additional core is in use....

So we are "hypothetically" looking at something like this....


It could be that PS4 games are just getting more out of the 1.6GHz cores than Xbox One is out of the 1.75GHz cores.
 
My funky maths:

XB1 = 12 MB/s / 6 cores / 1.75 GHz = 0.001143 MB/core/clock

PS4 = 14 MB/s / 7 cores / 1.6 GHz = 0.00125 MB/core/clock

That means PS4 is achieving more per clock than XB1. Maybe that's due to compilers? :???: We have a PS4 CPU performance equal to the higher clocked XB1 CPU per core in this example.

Yes I see, good one. :smile:

It could be that PS4 games are just getting more out of the 1.6GHz cores than Xbox One is out of the 1.75GHz cores.

If everything is equal on the CPU, other than clock speeds between the two (PS4/XB1)... and this being strictly CPU based performance... then something within XB1 CPU is hindering it, even with the higher clock speeds. Honestly, I don't think anything is hindering the XB1 CPU, for God sakes it an x86-64 product with very good compilers. The only thing that washes at the "moment" is PS4 is getting more out its CPU by having an additional core available for developers.
 
Could be different code generated by the compiler.

Could be a timing issue with the memory controller.

Could be overhead introduced by the hypervisor.

It's just a single data point, without some understanding of why it's the case, which would probably require running a number of tests, it just is what it is.
 
From the video linked in the orbis tech thread for their texture tech, they mention using multiple cores scales well, but not linearly. That was from last year tho. Still, if the performance increase is not linear, it won't be simple division math if we have to talk about just more cores, as opposed to clock rates. But like ERP says, it's probably a combination of multiple points.
 
Last edited by a moderator:
I assume the 6/7 thread discrepancy is down to the OS being able to take away a core at will and the game has to yield it. The same was rumored about the RAM reservation on PS4. I mean, it would make sense for the configuration of the OS to not take away 2 cores, as long as the OS is only using a fraction of them at any given time within a game, but instead being able to take away a core when the OS needs it, i.e. drawing the Homescreen... if the API is sound... why not?
 
I assume the 6/7 thread discrepancy is down to the OS being able to take away a core at will and the game has to yield it. The same was rumored about the RAM reservation on PS4. I mean, it would make sense for the configuration of the OS to not take away 2 cores, as long as the OS is only using a fraction of them at any given time within a game, but instead being able to take away a core when the OS needs it, i.e. drawing the Homescreen... if the API is sound... why not?

Context Switching?
 
I assume the 6/7 thread discrepancy is down to the OS being able to take away a core at will and the game has to yield it. The same was rumored about the RAM reservation on PS4. I mean, it would make sense for the configuration of the OS to not take away 2 cores, as long as the OS is only using a fraction of them at any given time within a game, but instead being able to take away a core when the OS needs it, i.e. drawing the Homescreen... if the API is sound... why not?
That really comes down to how much multitasking you're doing. XB1 can run other apps on the snapped to the screen. AFAIK PS4 doesn't do anything like other than tasks in the background, so the need for CPU reservation should be less. Once you press a button to interrupt the game, like switching to the home screen, it doesn't really matter how many cores the OS uses because the game will be on hold. So the only consideration of importance is how much CPU time a running game has. When the game is on hold, the OS can nick 7 cores without it being an issue. Unless there's a fancy background mode where the game can continue to work on stuff while the OS is up. eg. Bring up the home screen and the game in the background gets to work building procedural assets so it won't have to during gameplay for a little extra fluidity. But that'd be weird. ;)
 
Could be different code generated by the compiler.

Could be a timing issue with the memory controller.

Could be overhead introduced by the hypervisor.

It's just a single data point, without some understanding of why it's the case, which would probably require running a number of tests, it just is what it is.

Or it could simply be that it was run on an X1 devkit that wasn't upclocked.

As the guy said, this wasn't run for a cpu analysis.
 
I'm really surprised by "the number of available cores" explanation and related math.

In reality most practical software involving memory and synchronization does not scale linearly with number of cores. In fact the Substance video clearly states so (1 core -> 2 core ~90%, 2->4 70%, thus 1->4 200% speed up).

With those numbers, I'd expect 5-9% speed up with 7th core instead of 16.7% (1/6) that's being implicitly assumed here.

Whatever the case it's pretty sure bet it's going to be much smaller than 16.7%.
 
It's the most logical explanation given what we understand of the hardware differences, which are basically that they are exactly the same CPU only with more cores available for PS4 and a higher clock for XB1.

There's also no reason for a non-linear scale up in this case, IMO. Because if the second core nets you 90% of the performance of the first core running the job split across both, just run the code in isolation on the second core to create a different material and you get exactly 2x the performance by running two jobs. It'd mean time to create a single material won't decrease across cores, but time to create several materials will be faster than if you run each material in parallel across cores. In such cases, the very peak throughput possible will be performance per core (2 MB/s) x number of cores (6/7). In the case of procedural content creation, creating multiple assets simultaneously seems viable to me.
 
I noticed that if the test is processed linearly by the cores and the numbers are a bit rounded up/down then it is possible to have those results (14/12) if:

- PS4 has 7.5 cores available at 1.6ghz
- X1 has 6 cores at 1.75ghz

It wasn't possible with only 7 cores even with the numbers being rounded up/down, but with just half of a core more it became perfectly possible.
 
It's the most logical explanation given what we understand of the hardware differences, which are basically that they are exactly the same CPU only with more cores available for PS4 and a higher clock for XB1.

There's also no reason for a non-linear scale up in this case, IMO. Because if the second core nets you 90% of the performance of the first core running the job split across both, just run the code in isolation on the second core to create a different material and you get exactly 2x the performance by running two jobs. It'd mean time to create a single material won't decrease across cores, but time to create several materials will be faster than if you run each material in parallel across cores. In such cases, the very peak throughput possible will be performance per core (2 MB/s) x number of cores (6/7). In the case of procedural content creation, creating multiple assets simultaneously seems viable to me.

I noticed that if the test is processed linearly by the cores and the numbers are a bit rounded up/down then it is possible to have those results (14/12) if:

- PS4 has 7.5 cores available at 1.6ghz
- X1 has 6 cores at 1.75ghz

It wasn't possible with only 7 cores even with the numbers being rounded up/down, but with just half of a core more it became perfectly possible.


It could be that Xbox One CPU just isn't performing as well as it should at certain tasks vs the PS4 CPU because of the OS or maybe the PS4 CPU has some help from a co-processor or accelerator.

PS4 still has that chip connected to the main APU that no one has come up with a answer to what it is yet.
 
The rough approximation of the marketing bar graph is one source of error, the uncertain wording of the CPU in terms of single or multi-core is another, the unknown system environment, unknown code paths, unknown tool sets, and ill-defined clock speeds are another.

To go even further, and inject a coprocessor would render the comparison pointless to the extreme. We couldn't even be sure that the test was fair, because it might have ignored an Xbox One magical coprocessor that can detect previously nonexistent procedural generation software engines.

Where is the benefit of Sony adding a whole chip and board space to provide a little over 15% in a random software engine, and how would this chip even know to inject itself into a CPU test?
 
The rough approximation of the marketing bar graph is one source of error, the uncertain wording of the CPU in terms of single or multi-core is another, the unknown system environment, unknown code paths, unknown tool sets, and ill-defined clock speeds are another.

To go even further, and inject a coprocessor would render the comparison pointless to the extreme. We couldn't even be sure that the test was fair, because it might have ignored an Xbox One magical coprocessor that can detect previously nonexistent procedural generation software engines.

Where is the benefit of Sony adding a whole chip and board space to provide a little over 15% in a random software engine, and how would this chip even know to inject itself into a CPU test?

That's not a little over 15% it's closer to a 30% boost compared to what we thought the PS4 CPU performance was compared to the Xbox One CPU 1.6GHz vs 1.75GHz.


Why have a co-processor? Maybe because it could perform some tasks better than the CPU using less power & creating less heat.
 
That's not a little over 15% it's closer to a 30% boost compared to what we thought the PS4 CPU performance was compared to the Xbox One CPU 1.6GHz vs 1.75GHz.
What everyone with scant knowledge of the details pretends to know doesn't have much bearing here, and one can magic up tens of percent of error with each of the items I listed. There is way too much that isn't disclosed or controlled for.


Why have a co-processor? Maybe because it could perform some tasks better than the CPU using less power & creating less heat.
A coprocessor that spins off a few tens of percent might be nice to have on-die, however the external chip is not, which negates at least some power savings and severely constrains its ability to participate in computation alongside a CPU in what is supposed to be a CPU measurement.

I didn't see an answer as to how this chip is supposed to ninja itself into the code flow. How does it identify what arbitrary code is running that it can offload?
If it's not able to auto-detect, the comparison is very dishonest.
 
What everyone with scant knowledge of the details pretends to know doesn't have much bearing here, and one can magic up tens of percent of error with each of the items I listed. There is way too much that isn't disclosed or controlled for.



A coprocessor that spins off a few tens of percent might be nice to have on-die, however the external chip is not, which negates at least some power savings and severely constrains its ability to participate in computation alongside a CPU in what is supposed to be a CPU measurement.

I didn't see an answer as to how this chip is supposed to ninja itself into the code flow. How does it identify what arbitrary code is running that it can offload?
If it's not able to auto-detect, the comparison is very dishonest.

1.6GHz is the number that we have heard over & over & it's even been said in the PS4 Audio presentation so until we see different we can only assume that it's 1.6GHz & using the 1.6GHz clock rate on it's on would have the Xbox One CPU ~ 10% faster but the benchmark is showing the PS4 with ~17% advantage that's about 30% boost from what we expected.

For now we don't have a real clue where the 30% boost is coming from or even if there is a boost at all because it could be just a fault in the Xbox One design.
 
1.6GHz is the number that we have heard over & over & it's even been said in the PS4 Audio presentation so until we see different we can only assume that it's 1.6GHz & using the 1.6GHz clock rate on it's on would have the Xbox One CPU ~ 10% faster but the benchmark is showing the PS4 with ~17% advantage that's about 30% boost from what we expected.

For now we don't have a real clue where the 30% boost is coming from or even if there is a boost at all because it could be just a fault in the Xbox One design.

There is a range of probabilities along which various options can be sorted.
The absence of enough evidence to make definitive statements on even the most mainstream or easily verfiable options, doesn't make the burden of proof lower for methods that require additional complexity or that already require a level of justification beyond the ordinary.
Differences in the environment or tools, or even a moderate change in the clock multipliers in late-stage system tweaks are reasonably straightforward and don't require heroics.

A coprocessor that can automatically inject itself into what appears to be a CPU measurement would fall in the out of the ordinary category.
For one thing, it negates the point of the comparison, since it isn't measuring a CPU.
Another is that a coprocessor capable of detecting arbitrary code and offloading it is a very, very significant advance.
If the engine is instead explicitly targeting the coprocessor, it destroys the integrity of the measurement.

However, adding a discrete and exotic component to a board like this generally does not start to justify itself until it can yield performance improvement on the order of many hundreds or a thousand percent. A few tens of percent might make more sense if it were on-die, but the overhead of off-die synchronization and control is going to work against it in the general case.
 
Last edited by a moderator:
It's the most logical explanation given what we understand of the hardware differences, which are basically that they are exactly the same CPU only with more cores available for PS4 and a higher clock for XB1.
It may come to memory management if clocks are close enough.
There's also no reason for a non-linear scale up in this case, IMO. Because if the second core nets you 90% of the performance of the first core running the job split across both, just run the code in isolation on the second core to create a different material and you get exactly 2x the performance by running two jobs. It'd mean time to create a single material won't decrease across cores, but time to create several materials will be faster than if you run each material in parallel across cores. In such cases, the very peak throughput possible will be performance per core (2 MB/s) x number of cores (6/7). In the case of procedural content creation, creating multiple assets simultaneously seems viable to me.
Even if the cores have little data to share, they end up sharing the memory bus which is more costly than computation for current state manufacturing. Plus I see little reason not to advertise close to linear or strong scaling if isolated texture generation is as efficient as you assume, as opposed to what's being said in the video.

In fact considering almost all scenes have multiple materials and textures to generate, it would be weird to parallelize for a single material. Job based scheduling/parallelization makes more sense if memory access is not a bottleneck.

Why have a co-processor? Maybe because it could perform some tasks better than the CPU using less power & creating less heat.
Co-processor meaning the arm chip?

I think low power (especially for standby and possibly for background tasks) and security together make a good reason.
 
PS4 still has that chip connected to the main APU that no one has come up with a answer to what it is yet.
1) There's been some sound speculation (centered on IO). 2) If it was anything of substance, Sony would have talked about it, or it'd appear in the leaks. 3) A little general purpose coprocessor doesn't make sense in context of the system architecture where Sony could instead up the CPU clocks a bit, add cores, add GPU cores, etc. 4) A specialist coprocessor doesn't make sense because there's no knowing what the workloads will be, and Sony's emphasis has been on GPGPU and ease of development.

Why have a co-processor? Maybe because it could perform some tasks better than the CPU using less power & creating less heat.
You'd be looking at something like a computation engine. If Sony wanted to do that, why not put in 4 SPEs on a chip? That'd be a great little chip, and also a PITA for developers who'd have to wrestle with another ISA.

Furthermore, the graph lists CPUs, not anything else. If the engine were more exotic I expect it'd use GPGPU too. So as it's not referencing system performance and instead CPU performance, this coprocessor would have to be a transparent CPU augmentation. And why stick that out in the middle of the board instead of inside the actual SOC?

At this point, there's no substantial evidence that the unknown chip is anything special, and discussion of the systems should assume it's something like an IO controller. Only in a rumour thread or PS4 hardware thread should the topic of the 'third' chip be covered.
 
It's the most logical explanation given what we understand of the hardware differences, which are basically that they are exactly the same CPU only with more cores available for PS4 and a higher clock for XB1.

There's also no reason for a non-linear scale up in this case, IMO. Because if the second core nets you 90% of the performance of the first core running the job split across both, just run the code in isolation on the second core to create a different material and you get exactly 2x the performance by running two jobs. It'd mean time to create a single material won't decrease across cores, but time to create several materials will be faster than if you run each material in parallel across cores. In such cases, the very peak throughput possible will be performance per core (2 MB/s) x number of cores (6/7). In the case of procedural content creation, creating multiple assets simultaneously seems viable to me.

Thats not logical to me as it seems it takes a rather huge leap in logic to go from "one cpu" to an explanation that requires exploitation of all available cores.

Given that texture generation, manipulation and compression performance is much higher on an i7 is allogorithmic going to be encouraged to max out the performance on the XB1 or PS4. Its a middleware tool for developers and substance has a broad client base of big name publishers. Outside of small indies, most will probably employ substance on hardware with more powerful cpus.
 
Back
Top