Technical Comparison Sony PS4 and Microsoft Xbox

Status
Not open for further replies.
Why would the GPU work on the Kinect depth buffer? AFAICT they have offloaded all the Kinect chores to dedicated silicon.
I suspect they haven't exactly:

1) take the very high-resolution IR image from the camera sensor, and convert it into a lower resolution depth image. This previously occured on the 'primesense' chip in kinect 1.

2) take that depth image and locate 'points of interest/shapes/edges/whatever' and turn that into 'skeletal models'.

3) from that, work out the motion from the previous frame and attempt to match the motion to 'commands'.

Step 1, along with aligning the video feed to the depth buffer/any compression and generation of an IR stream occurs inside kinect2.

The rest of it, given the resources required (and the flexibility of changing code), seems more likely to run inside the xbox1.
 
I've got no idea about HDMI input, but I'd be surprised if the kinect data wasn't present in ESRAM before it gets processed by the GPU. (the depth buffer may only be around 640kb [16bit 640x480?]).

Unless there is some huge advantage to having it there (it needs massive bandwidth or gets a tremendous win from the low latency), which I doubt given the type of processing you are likely to do on it, they almost certainly do any GPU work on it from main memory and probably to main memory. It's just not worth the copy to put it there unless you get something out of it.

My understanding is that the "Kinect" skeletal algorithm is much more akin to a search algorithm than it is an image processing algorithm. I would guess they do some sort of dimensionality reduction and a search the former might be faster on the GPU, but that doesn't mean it's done there necessarily.
Of course I'm just speculating.
 
Im not sure the eSRAM will provide any bandwidth benefits (outside of its actual bandwidth) to the XBONE in comparison the PS4. Latency sure, but I don't see how having a low latency cache can make more bandwidth then it already has (unless I'm misunderstanding the question).

....
The g-buffer writes require 12 bytes of bandwidth per pixel, and all that bandwidth is fully provided by EDRAM. For each rendered pixel we sample three textures. Textures are block compressed (2xDXT5+1xDXN), so they take a total 3 bytes per sampled texel. Assuming a coherent access pattern and trilinear filtering, we multiply that cost by 1.25 (25% extra memory touched by trilinear), and we get a texture bandwidth requirement of 3.75 bytes per rendered pixel. Without EDRAM the external memory bandwidth requirement is 12+3.75 bytes = 15.75 bytes per pixel. With EDRAM it is only 3.75 bytes. That is a 76% saving
....

From what I understand, once you have the data in the EDRAM, you don't have to tap into the system ram, you are actually using the bandwidth of the EDRAM, so that will lessen the strain on the system RAM.
 
I'm not sure which switch penalty would be four cycles. Four cycles is the round-robin delay for instruction issue amongst the SIMDs, which is part of steady-state execution.

Wavefronts don't really switch out until they are done executing, and I'd imagine the resource release and startup of a new wavefront should take longer.

Is it possible/feasible to "overload" the threads ? e.g., In SPU programming, stalling is also a problem. As I understand, the developers overload the SPUlet with multiple jobs so that if one is waiting for data, the SPULet simple pick another one to work.

In that sense, another wavefront would get scheduled if the system or developer know things will get stalled based on earlier runs.
 
I suspect they haven't exactly:

1) take the very high-resolution IR image from the camera sensor, and convert it into a lower resolution depth image. This previously occured on the 'primesense' chip in kinect 1.
Kinect 2 operates in a completely different way. The sensor measure time of flight for each pixel. It effectively means you read depth data directly out of the depth sensor.

The depth buffer is lower resolution than the image sensor and only a small factor bigger than K1 on paper, but has much higher fidelity.

2) take that depth image and locate 'points of interest/shapes/edges/whatever' and turn that into 'skeletal models'.
This is where I believe there is dedicated silicon involved.

3) from that, work out the motion from the previous frame and attempt to match the motion to 'commands'.
That is simply gesture recognition. Then absolutely biggest computational task is in your point 2) above.

Cheers
 
I really want to know if the ARM core is going to free the PS4 from using its 8 cores? and leave them just for gaming? or this is just wrong and made up BS by fanboys?
 
I saw the wired demo of XB1 Kinect which showed a Kinect Fusion like feature. I was pretty impressed compared to videos of Kinect Fusion using the Kinect for PC camera from a few months back. What sticks out is the hardware requirements for Kinect Fusion on PC. It requires a 3 Ghz multicore cpu with a 2 GB RAM DX 11 gpu preferably a HD 7850 or a GTX680.

I wonder if its the XB1's SOC design with its 30 GBs interconnect that allows for lesser hardware. Or if there gpu requirement is simply a reality of the large amount of video ram needed. Or if MS spent most of its time catering to the XB1 hardware and its solution simply doesn't port seamlessly to a traditional PC setup. Maybe a combination?

Seems like a computional hungry feature, I wonder what MS wants to do with it outside of niche markets. HD 7850 like performance is not going to be standard on consumers PC anytime soon. And 3D voxel modeling doesn't seem like its very applicable to a gaming console.
 
I really want to know if the ARM core is going to free the PS4 from using its 8 cores? and leave them just for gaming? or this is just wrong and made up BS by fanboys?
The nature of any ARM cores, and their usage has not been fully defined.
It is possible there's at least one for security offload, although Sony was rumored at one point to be going its own way in that regard.

The background processor may be an ARM, but the usage model given so far doesn't require it being tightly integrated with the x86 cores or needing to interact with them at a low level. The Killzone demo shows that up to 2 x86 cores were not being used by the job system, and the discussion of system mutexes being fast and directly effecting the x86 cores probably means that x86 cores are handling low-level OS functions for the game. I wouldn't know if an ARM core could participate in low-level OS functionality as a peer to the x86 cores outside of Trustzone, and I'd expect it to be more of a hindrance.

That doesn't stop an ARM core or two or more being used in the various subsystems in the console. That's actually the case for many slave devices and secondary processors.
 
Last edited by a moderator:
You are correct, the GPU will switch to another wavefront (64 thread work unit) on a memory stall. To make efficient use of the GPU you need enough wavefronts available so that memory stalls don't impact you too badly. Switching wavefronts has a cost though (something around 4 cycles I think). A memory stall can cost upwards of a thousand cycles, that you have to fill with other wavefronts. Like I said, a big GPU cache will not help for large streaming jobs, only for small, diverse jobs. It remains to be seen if it will have any practical difference once developers start coding to it.
Switching wavefronts doesn't have a direct cycle cost. The cost is indirect via cache/memory contention.
 
From what I understand, once you have the data in the EDRAM, you don't have to tap into the system ram, you are actually using the bandwidth of the EDRAM, so that will lessen the strain on the system RAM.

But the PS4 still has the total bandwidth advantage. What you were saying would only make sense if the ESRAM provided more memory bandwidth than the PS4's GDDR5.
 
Enough with the rumors and back to the Technical Comparisons, after all that's what the thread title says.
 
And the reading of the register file.

What switching are you talking about? Each SIMD in a CU can juggle 10 threads, and when a simd comes up for scheduling (once every 4 clocks), it can choose any of those 10 without any extra cost. There is no extra cost related to switching and the register file. Hiding memory latency only becomes a problem when all 10 threads are constantly waiting for memory. As far as I understand GCN scheduling, if there is an unfinished memory op in a SIMD, there is no way for that workload to be evicted from it, so there can not be any higher switching costs.
 
XB1 downclock rumour discussion moved to XB1 thread. This thread is only for considering design differences on technical level. IF the downclock rumour proves true, we can factor that into the discussion as to how MS's choices have affected the hardware compared to Sony.
 
Doesn't PS4 landing at 399 with (allegedly) more power say that every design decision MS made was wrong?

Ok, we dont know the BOM, and what the BOM in year 1 is may not be at all the same as say year 4 (I suspect the Xbone SOC will cost less over time, GDDDR5 will stay expensive)

BUT STILL
 
Doesn't PS4 landing at 399 with (allegedly) more power say that every design decision MS made was wrong?

Ok, we dont know the BOM, and what the BOM in year 1 is may not be at all the same as say year 4 (I suspect the Xbone SOC will cost less over time, GDDDR5 will stay expensive)

BUT STILL


Pricing is more about a business decision. I'm quite sure that Sony is subsizing the console, while Microsoft may be close or heck,sell it with a thin margin.
 
Pricing is more about a business decision. I'm quite sure that Sony is subsizing the console, while Microsoft may be close or heck,sell it with a thin margin.

still, the design decisions made make sense to me if xbone was at 399, and ps4 499, not vice versa, as i expected

that's what i would have expected. then you can say hey, that xbone design may not have the most brute power, but it gained them something.
 
ESRAM costs maybe?

Like Markoit says, BOM of both consoles should be close and it's just a business strategy. Also MS having Kinect as an extracost.
 
still, the design decisions made make sense to me if xbone was at 399, and ps4 499, not vice versa, as i expected

that's what i would have expected. then you can say hey, that xbone design may not have the most brute power, but it gained them something.

Well if MS is looking at the two machines and what they both "do" out of the box, the XBO has a long list of features now absent from the PS4 in both hardware and software (now that PSEye is not in the box, no HDMI in, etc) Aside from "power" what bullet item feature does the PS4 have over the XBO?

We've spent a lot of time bickering about hardware but if yesterday was any indication, telling the difference between any power disparity will be very difficult for 97% of the world. This may change over time but early on I think that much was clear.
 
Status
Not open for further replies.
Back
Top