PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Status
Not open for further replies.
John Carmack hinted at some intelligent design choice(s) in Orbis.

I don't see why that has to mean anything other than the current rumours, i.e. an x86 based APU with GCN CU's, a HSA architecture and a massive pool of unified, fast memory. Sounds like a brilliant design choice to me. Why is even more necessary to qualify?
 
Carmack's statement was deliberately vague and may have been at a higher level concerning the platform and its organization.
To take that as a sign there are special resources would first assume that making things special out of a design that can do all of that with no additional modification is somehow intelligent.
There are some possible reasons why making some CUs somewhat less available for graphics work may be worthwhile, but not with 4 magically different CUs or the like.

Cerny's mangled translation was dominated by data movement optimizations and cache management, not reserved resources.

Ah, I don't think Carmack meant more power/resources. That's just the peak power. ^_^

Cerny's comments are more revealing, in the area of Onion+ and cache management. His 1.84 TFLOPS announcement settles the peak compute power of the GPU. The rest is in the details of how tasks can be scheduled, laid out, synchronized and fed. I think those details impact development more.


I don't see why that has to mean anything other than the current rumours, i.e. an x86 based APU with GCN CU's, a HSA architecture and a massive pool of unified, fast memory. Sounds like a brilliant design choice to me. Why is even more necessary to qualify?

Why not ? As a developer, he would be interested in the details. He would probably applaud Sony's decision to simplify development. Getting specific comments from him about what exactly he likes in Orbis is not a bad thing at all. He did say he's not free to talk about it though.



For Orbis, he postulated how low-level access coupled with a real-time scheduler could be used for a more responsive use of the GPU.
We now know that the PS4's OS is some version of Linux, but I'm not certain about the real-time aspect.

Yes, Tim was commenting on how much flexibility low level access would enable. He laid out each parts (ACE, DMA, ...) based on a typical GCN GPU and described how it would benefit him as a developer, given low level access as opposed to standard GL/DX layers.
 
"Resources" could mean quite a lot of things. Just additional cache to store two shader programs (compute and graphics) simultaneously, working on one at a time, would constitute more resources IMO. (I really must read up on APU design at some point!)

That's garbage, quite frankly. For a CU to run graphics code and compute at the same time, it'd effectively have to be twice as big, with twice as many computation units. Can we please apply just a modicum of common sense for once when dealing with rumours and speculation?

Again, how on earth can a processor process two workloads at once with no loss? The only way is to have twice as much logic, so basically 36 CUs with half dedicated to compute and half to graphics. We don't need official documentation from Sony to know that's utter twaddle. Basic understanding of hardware allows us to interpret the rumours and PR releases without succumbing to the allure of unrealistic hopes of magical hardware performance.

What if each CU was given a little extra hardware. I dont mean doubled, but lets say enough to equal four complete CUs.
 
Wouldn't the extra h/w cause the total computational power to exceed the 1.84 TFLOPS number Cerny declared ?

in one of the versions of the articles i read the 1.84 tflops were the graphics power. I have not seen a specific denial of it having compute power in addition to that. in fact, there have been hints and rumors of additional ALUs and so fourth.
 
in one of the versions of the articles i read the 1.84 tflops were the graphics power. I have not seen a specific denial of it having compute power in addition to that. in fact, there have been hints and rumors of additional ALUs and so fourth.

In GCN graphics power = compute power. There's no reason to seperate them. If Orbis had 1.84 TFLOPS of graphics power and x TFLOPS of compute power it would actually have 1.84+x TFLOPS of graphics power.
 
Wouldn't the extra h/w cause the total computational power to exceed the 1.84 TFLOPS number Cerny declared ?

The VGLeaks paper said that 4 of the CU's had extra ALU's for compute.



So maybe the case is 18 CU but 4 of them have extra ALU's for computing


GCN-CUTh.png



so maybe another ALU along side of the Scaler ALU


Edit: kinda like Larrabee.


intel-larrabee-diagram-2-small.jpg
 
Last edited by a moderator:
It sounds like the PS4 GPU will be efficient when utilizing it's CU's for both rendering and compute.. perhaps more efficient than just when rendering alone.
 
The VGLeaks paper said that 4 of the CU's had extra ALU's for compute.
Can you show me how you are parsing that (badly written) sentence fragment?
I don't think it says that unless the reader wants it to.

So maybe the case is 18 CU but 4 of them have extra ALU's for computing
How does the math work, exactly?
18 regularly CUs at 800 MHz give the total confirmed GFLOP count.

so maybe another ALU along side of the Scaler ALU
How so? What is the math that is consistent with that?


No, it would alleviate stress from and for multitasking.
How do you mean, and how would that work?


It sounds like the PS4 GPU will be efficient when utilizing it's CU's for both rendering and compute.. perhaps more efficient than just when rendering alone.
This would depend by what measure efficiency is being calculated. In terms of cycles where ALUs and data paths are left idle, this would normally be the case since there's more work to be done.

That doesn't mean the individual kernels are running faster than if they had the GPU wholly to themselves, however.
 
3dilettante, if we look at the GPU alone, the internal states and flow will be consistent. What happens if the CPU tries to read/write into the GPU internal states (I don't quite know what this mean because the freaking translated article s*cks) ?

Based on the poorly translated article, when the CPU tries to sneak a compute job to the CUs directly, the default cache behavior is not ideal (e.g., GPU will flush the cache). What situation exactly is Cerny talking about there ?

The Orbis diagram we saw only shows the queues and ring buffers. I don't see any way for the CPU to access the CUs directly, bypassing the cache.
 
How do you mean, and how would that work?

It's about efficiency. We talk about this GPU that has a "theoretical" peak of 1.84TFLOPs, how do engineers make it possible for devs to touch that performance ceiling? The situation here is that there's a lot of supposed unique stuff in Liverpool that's made for helping off with apparent bottlenecks due to time, cost or whatever. The speculated extra components are not for making the console do more than it can, just all that it can. On the dev side, it can also reduce the difficulties of optimization and tweaking.
 
According to dice bf4 was running at 3k resolution and 60fps on a hd 7990 .


I imagine he was playing on a high-end PC?

PB: It was a decent PC, yes. But you can still buy it, so it's not like alien technology or anything. It's still a PC that you can buy and it's still unoptimised code. Another thing that you might not have realised is that we ran it at 3K resolution at 60fps, and you don't do that. Nobody talked about pixels because at that size you couldn't see them.
http://m.computerandvideogames.com/...-had-to-build-battlefield-4-from-gut-feelings

1080p is 33.33% of 3k res . 1.8tf is around 20.93% of 8.6 TF of hd 7990.
So how will ps4 run bf4 and which one will be sacrificed - resolution or frame rate ?
 
Last edited by a moderator:
The VGLeaks paper said that 4 of the CU's had extra ALU's for compute.

So maybe the case is 18 CU but 4 of them have extra ALU's for computing

Or perhaps the vague bullet points filtered through random people aren't totally correct. I'm sure things have been said that contradict each other at some point so I'm also pretty sure you're cherry picking what you like.

so maybe another ALU along side of the Scaler ALU

So what you're saying is that for 4 out of 18 CUs instead of having a 64 + 1 ALU arrangement it's 64 + 1 + 1? And that this is done for the benefit of compute?

I think you must not understand why that scalar unit is there. Accelerating its computational throughput is just not worth it. Usually if you have a good reason for using the CU in the first place you'll probably spend at least as many instructions on the SIMDs as you need scalar input, usually used for flow control or managing some wide shared control like TMU bindings. And those scalar instructions will probably tend to be more latency dependent in nature in which case it'll be harder to even utilize a second unit. Not worth the overhead in scheduling, register file access, etc. And it doesn't sound like a good idea to introduce asynchronous CUs in the first place.

Edit: kinda like Larrabee.

GCN is already "kinda like Larrabee" in all ways pertinent to this discussion.
 
According to dice bf4 was running at 3k resolution and 60fps on a hd 7990 .



http://m.computerandvideogames.com/...-had-to-build-battlefield-4-from-gut-feelings

1080p is 33.33% of 3k res . 1.8tf is around 20.93% of 8.6 TF of hd 7990.
So how will ps4 run bf4 and which one will be sacrificed - resolution or frame rate ?

As you quoted, the code is still unoptimised, also 1080p is actually 39% of 3k resolution.

The PS4 GPU has been customised and is in a console enviroment, do we know how much of the 7990 they were using? 100%? 75%? 10%?

Right now we don't have enough information.
 
Last edited by a moderator:
1080p is 33.33% of 3k res . 1.8tf is around 20.93% of 8.6 TF of hd 7990.
So how will ps4 run bf4 and which one will be sacrificed - resolution or frame rate ?
That's a big bunch of bad math my friend... :) 1, 7990 is a crossfire-on-a-card solution, so bad efficiency. Double up on the GPUs and you almost never double up on actual performance. 2, who says the BF4 demo maxed out the 7990? If it runs at a steady 60fps, it most likely isn't as you'd risk dropping frames all the time if you're right at the edge of what the hardware is capable of.
 
just across the wire @digitalfoundry

1ghz GPU clock was never realistic, also unless there was more to the tweet there is nothing that suggests a CPU frequency, so CPU frequency is still in the air and it's the thing that has always seemed the most subject to variation.

~1.8ghz CPU is my bet.
 
Where did that FXAA creator go ? He removed his tech speculation post, but I remember he had some of his own ideas in his removed blog post.

You still can find some of his thoughts on NeoGAF:

Timothy Lottes said:
Working assuming the Eurogamer Article is mostly correct with the exception of maybe exact clocks, amount of memory, and number of enabled cores (all of which could easily change to adapt to yields)....
PS4
The real reason to get excited about a PS4 is what Sony as a company does with the OS and system libraries as a platform, and what this enables 1st party studios to do, when they make PS4-only games. If PS4 has a real-time OS, with a libGCM style low level access to the GPU, then the PS4 1st party games will be years ahead of the PC simply because it opens up what is possible on the GPU. Note this won't happen right away on launch, but once developers tool up for the platform, this will be the case. As a PC guy who knows hardware to the metal, I spend most of my days in frustration knowing damn well what I could do with the hardware, but what I cannot do because Microsoft and IHVs wont provide low-level GPU access in PC APIs. One simple example, drawcalls on PC have easily 10x to 100x the overhead of a console with a libGCM style API....

I could continue here, but I'm not, by now you get the picture, launch titles will likely be DX11 ports, so perhaps not much better than what could be done on PC. However if Sony provides the real-time OS with libGCM v2 for GCN, one or two years out, 1st party devs and Sony's internal teams like the ICE team, will have had long enough to build up tech to really leverage the platform.

I'm excited for what this platform will provide for PS4-only 1st party titles and developers who still have the balls to do a non-portable game this next round....
Xbox720
Working here assuming the Eurogamer Article is close to correct. On this platform I'd be concerned with memory bandwidth. Only DDR3 for system/GPU memory pared with 32MB of "ESRAM" sounds troubling....If this GPU is pre-GCN with a serious performance gap to PS4, then this next Xbox will act like a boat anchor, dragging down the min-spec target for cross-platform next-generation games.

My guess is that the real reason for 8GB of memory is because this box is a DVR which actually runs "Windows" (which requires a GB or two or three of "overhead"), but like Windows RT (Windows on ARM) only exposes a non-desktop UI to the user. There are a bunch of reasons they might ditch the real-time console OS, one being that if they don't provide low level access to developers, that it might enable a faster refresh on backwards compatible hardware. In theory the developer just targets the box like it was a special DX11 "PC" with a few extra changes like hints for surfaces which should go in ESRAM, then on the next refresh hardware, all prior games just get better FPS or resolution or AA. Of course if they do that, then it is just another PC, just lower performance, with all the latency baggage, and lack of low level magic which makes 1st party games stand out and sell the platform.



A fast GDDR5 will be the desired option for developers. All the interesting cases for good anti-aliasing require a large amount of bandwidth and RAM. A tiny 32MB chunk of ESRAM will not fit that need even for forward rendering at 1080p. I think some developers could hit 1080p@60fps with the rumored Orbis specs even with good AA. My personal project is targeting 1080p@60fps with great AA on a 560ti which is a little slower than the rumored Orbis specs. There is no way my engine would hit that target on the rumored 720 specs. Ultimately on Orbis I guess devs target 1080p/30fps (with some motion blur) and leverage the lower latency OS stack and scan out at 60fps (double scan frames) to provide a really great lower-latency experience. Maybe the same title on 720 would render at 720p/30fps, and maybe Microsoft is dedicating a few CPU hardware threads to the GPU driver stack to remove the latency problem (assuming this is a "Windows" OS under the covers).

I'm highly interested in this double scan frame method he mentioned. How likely will we see that in next gen games?
 
Status
Not open for further replies.
Back
Top