PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Status
Not open for further replies.
AMD tended to increase memory latency in tandem with increases in the OoO and latency-hiding capabilities of the cores, with varying levels of success.

Jaguar's resources are more limited, which means it shouldn't be expected to compensate for the same kind of latencies that Bulldozer or Trinity are expected to deal with.
The cycle numbers for Durango's L2 to L2 cache hits point to a memory pipeline that, is likely at least somewhat longer latency than Trinity's, without knowing if the remote snoop is additive to main memory latency.

It may be that the much lower clocks are what AMD is counting on to help mitigate the IPC losses due to trips to memory.

As for the comment concerning how AMD arbitrates under load: I haven't run across a review that runs a cache/mem latency benchmark while the GPU is also running at speed. The latency benchmarks for APUs tend to measure CPU memory latency when the GPU isn't fighting for access, and they are mediocre.
 
Last edited by a moderator:
prefetching across the range of modern cores is good enough to make software prefetching a net loss in the general case.
I have came to the same conclusions in my own tests. Software prefetching only increases performance in very specific use cases on modern CPUs. But good data structure design is more important than ever. For example using bucketed lists instead of simple pointer lists yields up to 10x gains even on Sandy/Ivy Bridge CPUs. It's still up to the programmer to keep payload per cache miss as high as possible. The more instructions you have between two potential cache misses, the better.
It may be that the much lower clocks are what AMD is counting on to help mitigate the IPC losses due to trips to memory.
Bobcat/Jaguar have 2x slower clock ceiling compared to BD/PD. This should definitely help with memory latency (latency in cycles is halved). I remember that in some old latency chart Bobcat was scoring comparable memory latency results (in cycles) to Sandy Bridge (Sandy had of course around 2x higher clocks).
 
I have came to the same conclusions in my own tests. Software prefetching only increases performance in very specific use cases on modern CPUs. But good data structure design is more important than ever. For example using bucketed lists instead of simple pointer lists yields up to 10x gains even on Sandy/Ivy Bridge CPUs. It's still up to the programmer to keep payload per cache miss as high as possible. The more instructions you have between two potential cache misses, the better.

Bobcat/Jaguar have 2x slower clock ceiling compared to BD/PD. This should definitely help with memory latency (latency in cycles is halved). I remember that in some old latency chart Bobcat was scoring comparable memory latency results (in cycles) to Sandy Bridge (Sandy had of course around 2x higher clocks).

Do you like Jaguar cores?. In some of your old posts i think it looked like that. Maybe you can´t even talk about this, but in case you can...
 
Do you like Jaguar cores?. In some of your old posts i think it looked like that.
Jaguar is a huge jump in performance compared to it's predecessor (Bobcat) and compared to the current lineup of (in-order) Atom chips. It's a tiny modern CPU core with very good performance per watt. AMD has gotten a lot of design wins lately (lots of laptops and tablets with Kabini/Temash SOC inside them). However we have to wait for Intel's forthcoming (Q4?) Silvermont Atom in order to draw long lasting conclusions about Jaguar's relative strengths (and weaknesses) in the market. Haswell is also pushing down the TDP of Intel's main line of processors (at 15W nothing can compete against it in performance). ARM is scaling up performance at a rapid pace as well. A15 is a very nice core. It's NEON SIMD hardware is four wide and it processes most vector instructions at full rate (including multiply add). And it has all the modern CPU core goodies (deep OoO execution, automatic cache prefetch, robust store forwarding, etc). My previous experience on ARM cores was Nokia N-Gage, and it didn't even have a hardware floating point unit :)

After spending almost 6 years in in-order PPC programming, it feels refreshing to program all these modern out of order cores. I don't have a full control anymore, but at least I don't have to analyze other people's code and solve their LHS stalls and cache misses :). Writing efficient code is so much easier on these modern CPU cores.

At the beginning I was actually quite surprised how well poorly optimized code ran on my new Intel based workstation. With these new CPUs and new compilers you can actually use function calls (small functions without inlining, virtual functions), exceptions (not throwing), actual loops (no unrolling) and (predictable) branches with practically no performance cost. These new Xeons have 8 cores (16 threads), huge L3 caches and quad channel memory controllers, so there's plenty of raw performance under the hood. And that's very good for productivity. Optimizing your content production tools is as important as optimizing the final game product.
 
Jaguar is a huge jump in performance compared to it's predecessor (Bobcat) and compared to the current lineup of (in-order) Atom chips. It's a tiny modern CPU core with very good performance per watt. AMD has gotten a lot of design wins lately (lots of laptops and tablets with Kabini/Temash SOC inside them). However we have to wait for Intel's forthcoming (Q4?) Silvermont Atom in order to draw long lasting conclusions about Jaguar's relative strengths (and weaknesses) in the market. Haswell is also pushing down the TDP of Intel's main line of processors (at 15W nothing can compete against it in performance). ARM is scaling up performance at a rapid pace as well. A15 is a very nice core. It's NEON SIMD hardware is four wide and it processes most vector instructions at full rate (including multiply add). And it has all the modern CPU core goodies (deep OoO execution, automatic cache prefetch, robust store forwarding, etc). My previous experience on ARM cores was Nokia N-Gage, and it didn't even have a hardware floating point unit :)

After spending almost 6 years in in-order PPC programming, it feels refreshing to program all these modern out of order cores. I don't have a full control anymore, but at least I don't have to analyze other people's code and solve their LHS stalls and cache misses :). Writing efficient code is so much easier on these modern CPU cores.

At the beginning I was actually quite surprised how well poorly optimized code ran on my new Intel based workstation. With these new CPUs and new compilers you can actually use function calls (small functions without inlining, virtual functions), exceptions (not throwing), actual loops (no unrolling) and (predictable) branches with practically no performance cost. These new Xeons have 8 cores (16 threads), huge L3 caches and quad channel memory controllers, so there's plenty of raw performance under the hood. And that's very good for productivity. Optimizing your content production tools is as important as optimizing the final game product.

Awesome description ;)
 
Jaguar is a huge jump in performance compared to it's predecessor (Bobcat) and compared to the current lineup of (in-order) Atom chips. It's a tiny modern CPU core with very good performance per watt. AMD has gotten a lot of design wins lately (lots of laptops and tablets with Kabini/Temash SOC inside them). However we have to wait for Intel's forthcoming (Q4?) Silvermont Atom in order to draw long lasting conclusions about Jaguar's relative strengths (and weaknesses) in the market. Haswell is also pushing down the TDP of Intel's main line of processors (at 15W nothing can compete against it in performance). ARM is scaling up performance at a rapid pace as well. A15 is a very nice core. It's NEON SIMD hardware is four wide and it processes most vector instructions at full rate (including multiply add). And it has all the modern CPU core goodies (deep OoO execution, automatic cache prefetch, robust store forwarding, etc). My previous experience on ARM cores was Nokia N-Gage, and it didn't even have a hardware floating point unit :)

After spending almost 6 years in in-order PPC programming, it feels refreshing to program all these modern out of order cores. I don't have a full control anymore, but at least I don't have to analyze other people's code and solve their LHS stalls and cache misses :). Writing efficient code is so much easier on these modern CPU cores.

At the beginning I was actually quite surprised how well poorly optimized code ran on my new Intel based workstation. With these new CPUs and new compilers you can actually use function calls (small functions without inlining, virtual functions), exceptions (not throwing), actual loops (no unrolling) and (predictable) branches with practically no performance cost. These new Xeons have 8 cores (16 threads), huge L3 caches and quad channel memory controllers, so there's plenty of raw performance under the hood. And that's very good for productivity. Optimizing your content production tools is as important as optimizing the final game product.

Thank you for your viewpoint!.
 
Last edited by a moderator:
Sebbi, what do you think of them inside the consoles though? I know Sony\MS didn't have much of a choice really, but from a performance point of view what are your thoughts?

For pure gaming application in an embedded system (with 8 cores) will the consoles have legs?
 
Some slides from a presentation Cerny gave this morning at Gamelab 2013 in Barcelona.

Development time needed to get an engine up and running compared
9k8rFRt.jpg


The 'other' PS4 :)
ZbyUT5D.png


He mentioned how they could've made PS4 more powerful but felt they would be approaching PS3 levels of over built territory.
 
Yeah, I am more interested in the developer's view of the system. So hopefully, the Develop keynote has more juicy details. :)
 
Cerny's presentation was very good, he talked how Yoshida helped him constantly for over 20 years. He gave first US-bound PS2 devkit to Cerny, allowed Cerny to establish ICE Team [sony ninjas!], he got him a chair in the design team of PS3, and supported him when Cerny wanted to become system architect of PS4. Yosp owns.

Cerny described himself, Andrew House and Yoshida as a great team [Three Musketeers] that has worked closely together at Sony for 20 years, all with aligned goals for gaming. He expressed wish that someday they three will become as successful as the core leadership team that made Nintendo so great.



Btw, interesting piece of news from Gaf insider zomgbbqftw:
I don't think this is thread worthy so I'm posting it here.

Had a conversation with a source in semi-conductors this morning, said that one of the major memory manufacturers is currently testing 8Gbit GDDR5 chips of identical specification that would be required for 8GB at 176GB/s. The source said that chips would be ready for commercial devices by the middle of 2014 and would lower the cost for Sony by around a third if they use these chips. They added that it seems like these chips are built to order for one major buyer (Sony I'm guessing).
http://www.neogaf.com/forum/showthread.php?p=66682231#post66682231

:)
 
Wouldn't be surprised if the ok for 8GB was contingent on manufacturers signing up to making these possible. I'm guessing shrinks will be less painful too and we'll see 20nm production APUs in consoles by 2015.

Sony subsidizing enthusiast graphics card price reduction :D
 
Wouldn't be surprised if the ok for 8GB was contingent on manufacturers signing up to making these possible. I'm guessing shrinks will be less painful too and we'll see 20nm production APUs in consoles by 2015.

Sony subsidizing enthusiast graphics card price reduction :D

Why would SoC shrinks be less painful based on whether there's going to be 8Gbit GDDR5? Those operate in different manufacturing realms.

The high-density GDDR5 could make compact form factor SoCs with decent bandwidth and good-enough capacity possible--but uncertain economicaly, for SoCs beyond the PS4.
 
Status
Not open for further replies.
Back
Top