PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Status
Not open for further replies.
In my personal experience, compromising trustzone is not as hard as it may seem. In many cases, TZ services are poorly coded/secured.

TrustZone isn't a magic bullet and there's little skill in breaking a broken system, but I would expect a console to be using an established tried-and-trusted implementation.

As long as that console isn't made by Nintendo.
 
Obviously not. For example, gcc usually optimizes code better than MSVC (it has i.e. a better constant propagation/folding), among other interesting amenities, if someone cares to test.
There are many cases where MSVC has an advantage in my experience (and I havent used it in years but I still see the same deficits in current gcc versions). I dont know which one would produce faster code in most cases just saying its so one sided to quickly call one the "better" version.

Anyway, Sony is using LLVM + Clang and this is still behind both of those compilers http://llvm.org/devmtg/2013-11/slides/Robinson-PS4Toolchain.pdf.

Fact is, MSVC has a long-term team behind that, which has quite a grasp on CPU miarch, especially for AMD (and Intel).
Sony obviously is not developing x86/x64 compiler since two decades.
Nope, but they bought some talent (SN Systems, referenced in the slides).

Right now I would be surprised if the Sony Toolchain is faster, but I believe LLVM/Clang is alot easier to work with for both generic and hardware specific improvements
 
There are many cases where MSVC has an advantage in my experience
hm... an example? Just to understand exactly to what you are referring to.

I dont know which one would produce faster code in most cases just saying its so one sided to quickly call one the "better" version.
I know quite well the assembly results of MSVC, on average (mostly I've seen gcc output in ARM, but on x86 too). By comparison, I find -with some notable exception, mainly related to the linker codegen- gcc to produce slightly better asm output. It won't change your life in 99.99% of cases and it is my personal feeling, based on like 20+ years of deep asm knowledge, but still.
ASM trick apart, I was referring to the high level compiler optimizations, the one you do before you move to the backend -you would call them 'neutral'. Gcc can crunch and perform complex optimization tasks like constant folding much better than MSVC, with my surprise - tested it.

Nope, but they bought some talent (SN Systems, referenced in the slides).
...of course, they cannot do it from nothing :)


Right now I would be surprised if the Sony Toolchain is faster, but I believe LLVM/Clang is alot easier to work with for both generic and hardware specific improvements
I would be surprised too - that is why I believe there will be a better gain from Sony's compiler than MS compiler over the time in terms of performance.
I totally agree that Sony toolchain is -very likely- much easier and handier to use for developers.
 
Last edited by a moderator:
I am not discussing about toolchain status - rather compiler's backend.
My conclusion comes just by the fact that MS is/should be highly advantaged on developing an optimized backend for AMD - they just need to add a custom profile for those parts, among the existing others. Code generation is still the same, given it is x64.

Since both Microsoft and LLVM have publicly available x64 compilers, there's no need to guess at which produces better code. Anyone with the time and inclination can go test it.

But I think the "doing it longer" theory holds little weight. Just look at how rapidly clang has surpassed gcc.

As for speed, one shouldn't be surprised if the Sony toolchain is faster since Microsoft platforms have notoriously slow linking. Microsoft has not put near the resources into C/C++ that people seem to think.
 
hm... an example? Just to understand exactly to what you are referring to.
Im threading on thin ice, havent done comparisons for quite a while. I found that MSVC is alot better at finding common subexpressions on loops and switch statements - like finding a common tail and moving it at the top of the loop.
Also MSVC can combine stores (talking about valid cases of course), while gcc always fails - this I just tested with gcc 4.7.2:
Code:
struct MyStruct { int a; short b; short c; };
void foo(struct MyStruct &inst)
{
	inst.a = 0; inst.b = 0; inst.c = 0;
}
It will generate 3 stores when 2 32 bit ones would suffice.

I agree that constant propagation is ace on gcc, but features like LTO were only introduced recently when MSVC had them for more than 10 years.

You can PM me or make a own thread if you want to discuss this further.
 
I would be surprised too - that is why I believe there will be a better gain from Sony's compiler than MS compiler over the time in terms of performance.
I totally agree that Sony toolchain is -very likely- much easier and handier to use for developers.

It seems like you really have no idea what your are talking about. Have you actually compared the toolchains?
 
Does it really take over a minute (discounting non-skipable splash screens) to go from the UI to a loaded saved game - in this case AC4 - on the PS4?

http://www.tomshardware.co.uk/ps4-hard-drive-upgrade,review-32907-3.html

I haven't read the link you posted and am simply responding to your question directly. Decided to fire up ACIV and test it. Here's what I found:

From OS to initial AC UI (UI to the point where it says to "Press X"): 34s
From Pressing X on initial UI to Connecting to uPlay and getting to Main Menu: 53s (!)
From Pressing X on Continue Save Game in Main Menu to Gameplay: 22s

So, to answer your specific question, no. At least it doesn't for me. But from start (OS) to finish (gameplay) was 1:56 total (that includes some quick menu navigation if you're wondering about the time discrepancy).

HDD: Samsung 830 (256 GB - MZ-7PC256N/AM)

EDIT

Did a quick test of KZ:SF which I really like the way it handles initial loading:

OS to Main Menu: 15s

Hence forth I shall call remvoing all, and I do mean all, initial loading screens "Killzoning the Intros". :p
 
Last edited by a moderator:
I'd assume it's unskippable flash screens. The uplay log on itself should only take around 3 seconds if its similar to the PC.

That's not what happened in the timing I noted above. The unskippable flash screens were 34s. Uplay logon was 53s. Although I don't remember it taking that long before (been a while since I fired up ACIV).
 
Microsoft has not put near the resources into C/C++ that people seem to think.

They use it to compile windows. Actually, dont remember the exact version, but they did switch to a new one at Vista time, if I remember correctly.

For sure it is not as user-friendly as borland/embarcadero, yet I think they had at least invested in their backend code generator... no?
In the end, every advantage there is an advantage in full win platform speed.

npl, I pm'ed you - I do not want to deviate further from the thread's object.
 
That's not what happened in the timing I noted above. The unskippable flash screens were 34s. Uplay logon was 53s. Although I don't remember it taking that long before (been a while since I fired up ACIV).

Probably just a one off long log on time then. I get those occasionally on PC too but usually it's just a few seconds. The log on process works a little differently on PC too as you log on within Uplay before launching the game (or skip if you prefer and play offline). But from manually logging on in Uplay, then launching the game, skipping through the flash screens and loading a game to the point it was playable it took 36 seconds for me.

Add about another 14 seconds to power up the PC from sleep, navigate to Blag Flag in the start menu and then open up Uplay.
 
EDIT

Did a quick test of KZ:SF which I really like the way it handles initial loading:

OS to Main Menu: 15s

My general observation is that the PS4 is usually very quick to get into the game, and also get playing in the game. Installing everything helps :)
 
Wow those L2 latencies (from one cluster to another) are dreadful. 190 cycles?
That is awful, maybe not that important in a console but tells a lot about how "not scalable" the design is.
 
Wow those L2 latencies (from one cluster to another) are dreadful. 190 cycles?
That is awful, maybe not that important in a console but tells a lot about how "not scalable" the design is.
That's depends on your definition of scaleable. If you are writing highly parallelised code, as I do, that runs on as few a dozen cores, or as many as hundreds of cores, but that the individual jobs do not need 'cross-cluster' communication very often, then 190 cycles isn't an issue.

If you do need to do this at scale, you use a platform with a lot of fast L3 cache, or L4 cache if your platform is multi-architecture (CPU/GPU).
 
That's depends on your definition of scaleable. If you are writing highly parallelised code, as I do, that runs on as few a dozen cores, or as many as hundreds of cores, but that the individual jobs do not need 'cross-cluster' communication very often, then 190 cycles isn't an issue.
I meant scalable under a "wide" selection of workloads. AMD spoke of releasing server chips based on jaguar compute clusters now that Avoton is out (~) it is going to be an even harder sale.
If you do need to do this at scale, you use a platform with a lot of fast L3 cache, or L4 cache if your platform is multi-architecture (CPU/GPU).
Thanks for the information, though jaguar supports neither a L3 or a L4.
Still that figure is dreadful, I may search for the sake of having a reference what type of latency penalties one incurred with those old pentium (2 chips/cores on the same pad).
 
Last edited by a moderator:
I meant scalable under a "wide" selection of workloads. AMD spoke of release releasing server chips based on jaguar compute clusters now that Avoton is out (~) it is going to be an even harder sale.

I guess in a server context they could be virtualized on a per cluster basis. Each VM wouldn't even know there was a second L2 cache that's slow to access.
 
Status
Not open for further replies.
Back
Top