SUBSTANCE ENGINE

A single Haswell core at 3.2ghz is 3-4 times faster than a single Jaguar core at 1.6ghz full stop. These numbers just dont make make any sense for a single core unless the implementation is absurdly inefficient on the i7.

Maybe it's the low voltage flavor of the i7 that they used in the benchmark, and not the desktop part.
 
Seems implausible though. A 2GHz announcement would be free PR; they gain nothing from being silent about that investment.
I don't get it either but Sony state the 800Mhz GPU clock everywhere but never the CPU clock, not even in their toolchain slideshow posted in the nextgen compiler thread.

Perhaps Sony's chip has a lower clock but the CPU has been tweaked more than has been published.
 
Is it a real world usage scenario to dedicate 4-8 cores for one task such as this?

Probably not but it's still a more representative way to compare performance than using 1/8th of the consoles CPU's, 1/4 (or maybe 1/2) of the i7, 1/4 of the Tegra 4, 1/2 of the iPhone5 etc...
 
First of all generalizing that Jaguar is 2x slower for all code than Ivy or Haswell is not right. There are algorithms were you can't extract much more IPC no matter how many resources you throw at it and 2 issue Jaguar will be close to wider architectures.

Secondly that slide might be comparing CPU's with similar clock speeds, lets say up to 2GHz.

Thirdly, can PS4 GDDR5 5500MHz memory interface improve (despite not being able to access all of it's bandwidth) performance of this algorithm compared to DDR3 2133MHz of XBONE?
 
I don't get it either but Sony state the 800Mhz GPU clock everywhere but never the CPU clock, not even in their toolchain slideshow posted in the nextgen compiler thread.

Perhaps Sony's chip has a lower clock but the CPU has been tweaked more than has been published.

From that PDF

"Massive real-time graphics-intensive 3D simulations
And they call them games…
Vectors (LOTS of vectors) and not just GPU stuff
Piles of shaders (GPU kernels)
Data build (assets) much bigger than code"


I wonder what exactly they mean by this?
 
First of all generalizing that Jaguar is 2x slower for all code than Ivy or Haswell is not right. There are algorithms were you can't extract much more IPC no matter how many resources you throw at it and 2 issue Jaguar will be close to wider architectures.

Secondly that slide might be comparing CPU's with similar clock speeds, lets say up to 2GHz.

Thirdly, can PS4 GDDR5 5500MHz memory interface improve (despite not being able to access all of it's bandwidth) performance of this algorithm compared to DDR3 2133MHz of XBONE?

That is an interesting theory but would be very surprising because Xbox One is supposed to have more bandwidth dedicated to its CPU (30GB/s versus PS4's 20GB/s anyway the low output shown makes this argument neutral) and also DDR3 ram should have less or at least the same latency than GDRR5, not the way around.

So you say maybe the memory controllers of the X1 in a way severely hampers the performance of its CPU? But in what way exactly? If that's the case it would be a serious defect in the hardware. Should the 7 cores versus 6 cores theory not be a simplier and more logical explanation?

But even that is not enough to explain the difference even if the numbers had been rounded down (from 12.4) and up (from 13.5), we would need at least some slight overclock on the PS4 CPU, but not necessarily up to 1.75ghz.

Whatever the reasons I am stil surprised by those results and by the confirmation of those benchmarks by one multiplat dev insider at NeoGaf.
 
But CELL had 8 threads, 2 on PPE and 6 on SPEs.

The PPE was strictly used for orchestrations jobs / sync purposes between the SPEs. The Jaguar cores are far more robust in design (better prediction, larger cache, etc...) on handling "both" duties of Orchestra and game code, than the split used on the Cell. So, six threads probably were just needed on the PS4 on not adding more time and energy (coding, optimizing) trying to scale across additional cores - not worth it, especially on an older engine framework that did good on an older processor.
 
If code is parallel enough it should almost automatically scale across more cores.
Its thread based system, so it handles code threads [jobs in Frosbite for example] to free CPU threads.
They already had job system in February demo and they had one core that handled threads distribution across other cores. Adding ability to support one/two more cores should be just single code change.
They even talk about how they optimized their system to be as much multithreaded [Slide 10 in Valient's presentation from February]
"Focused optimization on going 'wide'
Almost all code is multithreaded/ jobified"

No using 2/1 additional cores that were available just does not make sense for November build. Of course there could be situation that they are never CPU bound so they didnt care, but in their presentation they are actually have very heavy system on CPUs, like their particles system.
 
Scaling (especially porting Cell code across Jaguar) isn't linear, by no means. All Engines aren't equal, nor universal on how they are ported across systems. Essentially, Jaguar has a lot more headroom (easier as well) compared to Cell/SPE combo. GG had a timeframe, a launch timeframe, on getting that product out. So spending additional hours/days/money on possibly getting an extra 10% of headroom from the additional cores - aren't worth it, if your barely hitting 65-70% CPU usage across the six. The only thing GG needed to do (or did), with the older Engine was improve the older art/texture assets, add better lighting, increase geometry where needed, ect... that the GPU took care of.

Not saying my scenario is correct... I'm just debating about forming assumptions based off little information, especially when comparing one game engine to another. The Base Engine might use seven cores or possibly eight - we just don't know for sure, other than it having more headroom across PS4 CPU compared to XB1. Wasn't the assumption XB1 CPU should be more robust than PS4? Hence, the reason I hate assumptions...
 
Last edited by a moderator:
@Shortbread

I'm not 100% sure, but i think i heard Repi said that BF 4 is using 6 cores in next-gen consoles and this is engine that by default scales between 8 threads.
 
Last edited by a moderator:
@Shortbread

I'm not 100% sure, but i think i heard Repi saying that BF 4 is using 6 cores in next-gen consoles and this is engine that by default scales between 8 threads.

That's because minimally BF4 can run on a decent quad-core computer, six cores being an optimal setup, and 8 cores being supported (however no significant return or benefits have been shown using eight). Also, BF4 has been optimized more for AMD GPUs, giving more credence that BF4 is more GPU intensive, than being CPU bound. Plus, I would think parity between PS4/XB1 is in play to a certain degree... but then again, BF4 is a buggy mess on PS4.

But like I stated before, I'm not to much into assumption being labeled as facts. Could their be a possibility that PS4 doesn't reserve any cores per se, but an allotted percentage time across one, or two, or all cores? We just don't know enough... and judging by the Substance Engine, core scaling seems to be one of the key factors in PS4 having a bit more CPU grunt.

Lastly, I would think Sony/AMD engineers would have probably discussed the best possible/optimal CPU configuration for the PS4 GPU not being CPU limited/bound. I would think PS4 GPU would require more CPU headroom compared to WiiU/XB1, just by the fact it has more bells and whistles in it.
 
14 Nov 2013:

"Since the beginning of this year when we saw leaks [about the specs] of next-gen platforms, we immediately knew since the tech specs on PS4 were accurate that the Xbox specifications were likely accurate as well," Yoshida told GamesIndustry.biz.

CPU:

  • Orbis contains eight Jaguar cores at 1.6 Ghz, arranged as two “clusters”
  • Each cluster contains 4 cores and a shared 2MB L2 cache
  • 256-bit SIMD operations, 128-bit SIMD ALU
  • SSE up to SSE4, as well as Advanced Vector Extensions (AVX)
  • One hardware thread per core
  • Decodes, executes and retires at up to two intructions/cycle
  • Out of order execution
  • Per-core dedicated L1-I and L1-D cache (32Kb each)
  • Two pipes per core yield 12,8 GFlops performance
  • 102.4 GFlops for system
http://www.vgleaks.com/world-exclusive-orbis-unveiled-2/

Also at APU13 the presenter confirmed 1.6GHz for PS4 CPU.

http://forum.beyond3d.com/showpost.php?p=1805886&postcount=3778
 
So, the Substance Engine is scaling across an additional core "if" the PS4 CPU is officially locked at 1.6GHz, which makes sense on the 14MB/s figure. You can't arrive at that figure other than an additional core is in use....

I would have thought so to, although it points to a substantial difference that doesn't tally with other info. We have heard two reserved cores on PS4, and two on XB1. Actually, perhaps this is the best evidence that PS4 is only one reserved core? That'd tie in perfectly with the above numbers. 12 MB/s from 6 cores on XB1, and 14 MB/s from 7 cores on PS4, at 2 MB/s from each Jaguar core. At which point, they'd have to be clocked the same.

Current theory had XB1's CPU clocked faster though, 1.75 GHz vs. 1.6. Well, the figures could be well rounded, but my calculator can't get 14 at 7*1.6 GHz and 12 from 6*1.75 GHz.

*hypothetically* *this is not fact* *just some funky thoughts*

PS4: 2MB/s per core (14MB/s 7 cores) x 1.6GHz per core = 3,200MB/s per core (22,400MB/s total)

XB1: 2MB/s per core (12MB/s 6 cores) x 1.75GHz per core = 3,500MB/s per core (21,000MB/s total)

So obviously from my funky math, XB1 CPU is able to handle more data during cycles because of the higher clock rates; however the PS4 CPU is capable of processing more throughput data, if, the seven cores are available for developers.

Anyhow, scaling across the cores seems to be a better fit for these systems, IMHO… rather than risking unwanted effects of increasing CPU clock speeds in small enclosed box environments.

So we are "hypothetically" looking at something like this....
 
Last edited by a moderator:
So we are "hypothetically" looking at something like this....
My funky maths:

XB1 = 12 MB/s / 6 cores / 1.75 GHz = 0.001143 MB/core/clock

PS4 = 14 MB/s / 7 cores / 1.6 GHz = 0.00125 MB/core/clock

That means PS4 is achieving more per clock than XB1. Maybe that's due to compilers? :???: We have a PS4 CPU performance equal to the higher clocked XB1 CPU per core in this example.
 
Back
Top