PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Status
Not open for further replies.
Digital Foundry provides a much better record on the GDC presentation, including the full quote of a quote that's been confusing some:

They're talking about handling threads, or compute jobs, and the GPU's ability to run compute in parallel with graphics which is going to be using some CUs in compute and the rest for graphics. There's no running both across the full GPU simultaneously. Whether that's the old 14+4 idea or the scheduler managing jobs across CU clusters is unknown at this point, but I'd put money on the latter.

What if the PS4 CU can run both compute jobs and graphics on all CU.?

Maybe that was the whole customization.

From what i read everything points at running both at once,which would be and incredible gain if they can pull it.
 

On cursory look, doesn't look like it's used in PS4.


What if the PS4 CU can run both compute jobs and graphics on all CU.?

Maybe that was the whole customization.

From what i read everything points at running both at once,which would be and incredible gain if they can pull it.

In that case, both compute and graphics jobs will use part of the GPU unless the jobs are stalled/stopped temporarily (e.g., waiting for data) explicitly, allowing others to "switch in".

Together, they will still use up to the same FLOP count (about 1.84).
 
I don't know the low-level allocation logic for GCN, but there is no restriction mentioned in the architectural slides or ISA document that a graphics task's wavefront can't also be in the same CU as a wavefront from a compute kernel.

There's nothing custom about a CU being able to run whatever combination of kernels that can fit.
 

How about no. 1833MT/s is the highest anything officially supports and that's under 30GB/s on a 128-bit bus that's standard for PCs. 2133MT/s, the highest JEDEC standard, only gives you 33.3GB/s. Sure you can get RAM that unofficially clocks higher and CPUs/motherboards that will let you clock that high, but that is definitely not something most current PCs are doing. They could be using an SB-E or similar with 256-bit memory but that's even less likely. Most current PCs are probably only at 1333 or 1600MT/s.

That part of the article was silly anyway, since it's comparing GDDR5 and DDR3 in totally different bus width configurations. If you use 256-bit DDR3 you can go well beyond 40GB/s with standard RAM, like what MS is purported to be doing with Durango.
 
Here is my opinion.

The PS4 is probably one third as powerful, in raw specs, as a GTX 680 or high end AMD card. When you factor in the fact it is a closed system, it might be half as powerful or a bit more. The massive amount of RAM probably gives it another boost. If it can run compute without losing any graphics power, it might be close to a single 680. However, what we need is official documentation from Sony.
 
How about no. 1833MT/s is the highest anything officially supports and that's under 30GB/s on a 128-bit bus that's standard for PCs. 2133MT/s, the highest JEDEC standard, only gives you 33.3GB/s. Sure you can get RAM that unofficially clocks higher and CPUs/motherboards that will let you clock that high, but that is definitely not something most current PCs are doing. They could be using an SB-E or similar with 256-bit memory but that's even less likely. Most current PCs are probably only at 1333 or 1600MT/s.

That part of the article was silly anyway, since it's comparing GDDR5 and DDR3 in totally different bus width configurations. If you use 256-bit DDR3 you can go well beyond 40GB/s with standard RAM, like what MS is purported to be doing with Durango.

I thought that he was suggesting that it was above 40GB/s.
 
I thought it was already confirmed 720p @60fps?

PS4 sports roughly 1/4 the raw power of a 7990 so somethings got to give. And this early in the generation the console specific advantages will be minimal at best.
Where do you base that speculation on?
 
The author's conclusion is a reach, given the data he is using as justification. Running compute and graphics simultaneously suggests nothing about reserving resources.
"Resources" could mean quite a lot of things. Just additional cache to store two shader programs (compute and graphics) simultaneously, working on one at a time, would constitute more resources IMO. (I really must read up on APU design at some point!)

What if the PS4 CU can run both compute jobs and graphics on all CU.?

Maybe that was the whole customization.

From what i read everything points at running both at once,which would be and incredible gain if they can pull it.
That's garbage, quite frankly. For a CU to run graphics code and compute at the same time, it'd effectively have to be twice as big, with twice as many computation units. Can we please apply just a modicum of common sense for once when dealing with rumours and speculation?

If it can run compute without losing any graphics power...
Again, how on earth can a processor process two workloads at once with no loss? The only way is to have twice as much logic, so basically 36 CUs with half dedicated to compute and half to graphics. We don't need official documentation from Sony to know that's utter twaddle. Basic understanding of hardware allows us to interpret the rumours and PR releases without succumbing to the allure of unrealistic hopes of magical hardware performance.
 
It's tied to the question of efficiency. If there were unused power because of general coding for different brand of GPUs, stalling and other overhead. The developer can code them differently.

If facilities are provided to allow compute tasks to run well without disturbing graphics work, then the developer will have more room to optimize.
 
"Resources" could mean quite a lot of things. Just additional cache to store two shader programs (compute and graphics) simultaneously, working on one at a time, would constitute more resources IMO. (I really must read up on APU design at some point!)
The statement was that there were special CU resources for each type of workload. While not impossible, it isn't necessary.

That's garbage, quite frankly. For a CU to run graphics code and compute at the same time, it'd effectively have to be twice as big, with twice as many computation units.
A compute program and a graphics shader are just instructions as far as the CU is concerned. The graphics pipeline is just a client, albeit one with some funky add-ons.
Each CU has four SIMDs, each of which can host up to 10 wavefronts. The SIMDs are designed to run for 4 cycles per issue, so a CU that isn't stalling is at a minimum running 4 different wavefronts at the same time.
Where those wavefronts come from isn't restricted, although the optimal mix would depend on occupancy and the tendency for contention.
 

No. 25.6GB/s is the standard for virtually all high end Intel and AMD CPU's. The latest APU's go a little higher but not as high as 40GB/s. Only Sandybridge-E gets higher than 40GB/s and that's hardly the typical PC.

Not that the comparison made any sense whatsoever considering it compared unified system+graphics memory to system memory only.
 
The statement was that there were special CU resources for each type of workload. While not impossible, it isn't necessary.

John Carmack hinted at some intelligent design choice(s) in Orbis.

Mark Cerny spoke about some extensions in a mangled Japanese interview.

Would be nice if someone can tell what exactly they were referring to. :p

Where did that FXAA creator go ? He removed his tech speculation post, but I remember he had some of his own ideas in his removed blog post.
 
Where do you base that speculation on?

AMD have said it will be the fastest single graphics card available. Obviously that has to be taken with a HUGE grain of salt but if true, then in order to beat the 690 it would have to be composed of something very close to 2x 7970GE's. 1x 7970GE sports 4.3TFLOPS so 2x that would be 8.6TFLOPS. Slightely in excess of 4x Orbis which would be 7.44TFLOPS. Texture throughput would be on the same level but memory bandwidth would only be 3.3x higher and both fill rate and geometry setup only 2.5x higher. So I took 4x as a rough ballpark.
 
John Carmack hinted at some intelligent design choice(s) in Orbis.

Mark Cerny spoke about some extensions in a mangled Japanese interview.

Would be nice if someone can tell what exactly they were referring to. :p
Carmack's statement was deliberately vague and may have been at a higher level concerning the platform and its organization.
To take that as a sign there are special resources would first assume that making things special out of a design that can do all of that with no additional modification is somehow intelligent.
There are some possible reasons why making some CUs somewhat less available for graphics work may be worthwhile, but not with 4 magically different CUs or the like.

Cerny's mangled translation was dominated by data movement optimizations and cache management, not reserved resources.

Where did that FXAA creator go ? He removed his tech speculation post, but I remember he had some of his own ideas in his removed blog post.
For Orbis, he postulated how low-level access coupled with a real-time scheduler could be used for a more responsive use of the GPU.
We now know that the PS4's OS is some version of Linux, but I'm not certain about the real-time aspect.
 
Status
Not open for further replies.
Back
Top