Predict: The Next Generation Console Tech

DieH@rd · Jan 11, 2013

Hecatoncheires said:
Hey guys,

I have a few questions concerning the rumour of an APU and GPU combination concerning the new PlayStation, if that is agreeable. Maybe someone can bring a little light into my darkness.

I stumbled upon this slide from the Fusion Developer Summit which took place in June 2012. The slide deals with GPGPU algorithms in video games. There are a couple of details that are probably somewhat interesting when speculating about a next generation gaming console.

As far as I understand, AMD argues that today GPGPU algorithms are used for visual effects only, for example physics computations of fluids or particles. That is because developers are facing an insurmountable bottleneck on systems that use homogeneous processor architectures. AMD calls it the copy overhead. This copy overhead originates from the copy work between the CPU and the GPU that can easily take longer than the processing itself. Due to this problem game developers only use GPGPU algorithms for visual effects that don't need to be sent back to the CPU. AMD's solution for this bottleneck is a unified adress space for CPU and GPU and other features that have been announced for the upcoming 2013 APUs Kabini (and Kaveri).

But these features alone are only good for eliminating the copy overhead. Developers still have to deal with another bottleneck, namely the saturated GPU. This problem is critical for GPGPU in video games since the GPU has to deal with both, game code and GPGPU algorithms at once. I'm not sure whether this bottleneck only exists for thick APIs like DirectX or if it also limits an APU that is coded directly to the metal. Anyway, AMD claims that a saturated GPU makes it hard for developers to write efficient GPGPU code. To eliminate this bottleneck AMD mentions two solutions: Either you can wait for a 2014 HSA feature that is called Graphics Pre-Emption, or you can just use an APU for the GPGPU algorithms and a dedicated GPU for graphics rendering. The latter is what AMD recommends explicitly for video gaming and they even bring up the similarities to the PlayStation 3, which renownedly uses SIMD co-processors for all kinds of tasks.

I would like to know what you guys think about these slides.

What if AMD was building an 28nm APU for Sony that is focused solely on GPGPU, for example four big Steamroller cores with very fast threads in conjunction with a couple of MIMD engines? Combine it with a dedicated GPU and a high bandwidth memory solution and you have a pretty decent next gen console.

I would also like to know if an APU + GPU + RAM system in package is possible with 2.5D stacking, which was forecasted by Yole Development for the Sony PlayStation 4, for IBM Power8 and Intel Haswell.

And since Microsoft is rumoured to have a heavily customized chip with a "special sauce", could that mean they paid AMD to integrate the 2014 feature Graphics Pre-Emption in the XBox processor, so they can go with one single ultra-low latency chip instead of a FLOP-heavy system in package?

This is interesting, especially this special sauce speculation [Graphics Pre-Emption].

LXFBN · Jan 11, 2013

What's with all the steamroller talk for the CPU cores in Orbis? Sweetvar specifically mentioned that Sony switched over to Jaguar.

Mianca · Jan 11, 2013

Hecatoncheires said:
What if AMD was building an 28nm APU for Sony that is focused solely on GPGPU, for example four big Steamroller cores with very fast threads in conjunction with a couple of MIMD engines? Combine it with a dedicated GPU and a high bandwidth memory solution and you have a pretty decent next gen console.

I would also like to know if an APU + GPU + RAM system in package is possible with 2.5D stacking, which was forecasted by Yole Development for the Sony PlayStation 4, for IBM Power8 and Intel Haswell.

And since Microsoft is rumoured to have a heavily customized chip with a "special sauce", could that mean they paid AMD to integrate the 2014 feature Graphics Pre-Emption in the XBox processor, so they can go with one single ultra-low latency chip instead of a FLOP-heavy system in package?

If you wanted to leak some hints without making it too obvious, would you consider just making it look as if you were asking some innocent (but conspicuously informative) questions?

anexanhume · Jan 11, 2013

Mianca said:
If you wanted to leak some hints without making it too obvious, would you consider just making it look as if you were asking some innocent (but conspicuously informative) questions?

Am I missing something? How would he know anything?

Bagel seed · Jan 11, 2013

He knows because he never replied back.

Lightman · Jan 11, 2013

This might be interesting for those discussing AMD FX CPUs power consumption in relation to consoles.

http://www.xtremesystems.org/forums/showthread.php?284577-8350-Power-Consumption

Just quick summary of The Stilt's great work:

Vishera (4 module / 8 core + 16MB L2+L3 cache + high speed memory interface and interconnect interface):
- 45W TDP @2.7GHz 0.925V
- 25W TDP @1.8GHz 0.7875V (this was passively cooled)

So there is a hope for better results than that when using more power efficient Jaguar cores on better 28nm process and designed specifically to meet game console demands. Of course above results, even though fully stable, are hand tuned! Mass produced chips would loose 5%-15% of that efficiency for yield and manufacturing.

anexanhume · Jan 11, 2013

Bagel seed said:
He knows because he never replied back.

He hasn't been online since an hour after his post. No one quoted or mentioned his post in that time.

XpiderMX · Jan 11, 2013

DieH@rd said:
This is interesting, especially this special sauce speculation [Graphics Pre-Emption].

Full HSA. I asked for this some time ago but the answer was it is not important for games.

Mianca · Jan 11, 2013

anexanhume said:
He hasn't been online since an hour after his post. No one quoted or mentioned his post in that time.

Best and sharpest first post ever by a person with a very telling nickname. There's more insight in his one (and only) post than in the last dozen pages of this thread.

Just my opinion, of course. You are entitled to a different one.

Blazkowicz · Jan 12, 2013

Hecatoncheires said:
Anyway, AMD claims that a saturated GPU makes it hard for developers to write efficient GPGPU code. To eliminate this bottleneck AMD mentions two solutions: Either you can wait for a 2014 HSA feature that is called Graphics Pre-Emption, or you can just use an APU for the GPGPU algorithms and a dedicated GPU for graphics rendering. The latter is what AMD recommends explicitly for video gaming and they even bring up the similarities to the PlayStation 3, which renownedly uses SIMD co-processors for all kinds of tasks.

Down the road AMD wants you to buy an APU and a GPU, rather than only a GPU for your existing PC or only an APU.
They sell more hardware this way

AMD could say, HSA features aren't quite there yet, you have to wait for Steamroller to get your Single Address Space feature on a gaming desktop (unless Richland brings this??) ; in the mean time we advise you to buy an Intel CPU, an AMD GPU and a low end Nvidia GPU for Physx. But that wouldn't be good commercial sense

Alexct · Jan 12, 2013

Hecatoncheires said:
Hey guys,

I have a few questions concerning the rumour of an APU and GPU combination concerning the new PlayStation, if that is agreeable. Maybe someone can bring a little light into my darkness.

I stumbled upon this slide from the Fusion Developer Summit which took place in June 2012. The slide deals with GPGPU algorithms in video games. There are a couple of details that are probably somewhat interesting when speculating about a next generation gaming console.

As far as I understand, AMD argues that today GPGPU algorithms are used for visual effects only, for example physics computations of fluids or particles. That is because developers are facing an insurmountable bottleneck on systems that use homogeneous processor architectures. AMD calls it the copy overhead. This copy overhead originates from the copy work between the CPU and the GPU that can easily take longer than the processing itself. Due to this problem game developers only use GPGPU algorithms for visual effects that don't need to be sent back to the CPU. AMD's solution for this bottleneck is a unified adress space for CPU and GPU and other features that have been announced for the upcoming 2013 APUs Kabini (and Kaveri).

But these features alone are only good for eliminating the copy overhead. Developers still have to deal with another bottleneck, namely the saturated GPU. This problem is critical for GPGPU in video games since the GPU has to deal with both, game code and GPGPU algorithms at once. I'm not sure whether this bottleneck only exists for thick APIs like DirectX or if it also limits an APU that is coded directly to the metal. Anyway, AMD claims that a saturated GPU makes it hard for developers to write efficient GPGPU code. To eliminate this bottleneck AMD mentions two solutions: Either you can wait for a 2014 HSA feature that is called Graphics Pre-Emption, or you can just use an APU for the GPGPU algorithms and a dedicated GPU for graphics rendering. The latter is what AMD recommends explicitly for video gaming and they even bring up the similarities to the PlayStation 3, which renownedly uses SIMD co-processors for all kinds of tasks.

I would like to know what you guys think about these slides.

What if AMD was building an 28nm APU for Sony that is focused solely on GPGPU, for example four big Steamroller cores with very fast threads in conjunction with a couple of MIMD engines? Combine it with a dedicated GPU and a high bandwidth memory solution and you have a pretty decent next gen console.

I would also like to know if an APU + GPU + RAM system in package is possible with 2.5D stacking, which was forecasted by Yole Development for the Sony PlayStation 4, for IBM Power8 and Intel Haswell.

And since Microsoft is rumoured to have a heavily customized chip with a "special sauce", could that mean they paid AMD to integrate the 2014 feature Graphics Pre-Emption in the XBox processor, so they can go with one single ultra-low latency chip instead of a FLOP-heavy system in package?

Good post with interesting ideas,in a short time you're already considered the source of a new leak on NeoGAF

.
The approach taken by ps4 seems better.I do not understand how
MS wants to compete.If it is true the rumor of a GPU 1.2 TFLOPS also this "special sauce" Graphics Pre-Emption could definitely improve the efficiency,etc.. but not do miracles.
Why does MS should spend so much on customization to get lost with a GPU so weak :?:

Blazkowicz · Jan 12, 2013

Alexct said:
Why does MS should spend so much on customization to get lost with a GPU so weak

MS has the better HSA now, with its Jaguar derived CPU.
The PS4 may be in "trouble" if it has to use Richland instead of Steamroller, but I don't know what's the feature level of Richland.

Hence the trouble with the slides. They are well written, understandable by anyone and give examples of things that can be done (and of which we don't necessarily think about, as current GPGPU is mostly useless for them)
But the dates have slipped in the roadmap, at least regarding Steamroller (Kaveri) - it slipped from 2013 to 2014. And what was in 2014, I don't know where it is now.

The idea of MS getting a "special sauce" is funny, it means it could be well more "HSA" than the PS4 and getting more flexibility, more efficiency and easier efficiency, that's not a bad thing. But I don't think it's very likely.

To make a parallel with the early 2012 roadmap, the next Xbox will likely be "2013", and the PS4 "Let's hope it ends up at 2013, but it can be at 2012".
Only future PC would be at "2014" (and maybe out in 2015). If AMD could really accelerate the next GPU arch to include it in the next Xbox, or do an architecture half-way between the Radeon 8000s and 9000s that has GPU pre-emption, well MS would benefit but AMD would also benefit as it leads to more GPGPU adoption in gaming. Xbox would be "2014" and a PC at "2013" or "2014" level would be desireable. But I doubt AMD was able to deliver this.

function · Jan 12, 2013

Alexct said:
Why does MS should spend so much on customization to get lost with a GPU so weak

I wouldn't judge a GPU by flops. RSX is supposed to have something like 50% more shader flops than Xenos.

God damn, Wikipedia has the wrong clock speed for RSX. I thought they fixed that.

Anyway, two GPUs, one with 2.5 gflops and the other with 3.8. Which is faster?

They're both about the same - 670 vs 7970.

AlphaWolf · Jan 12, 2013

More correctly you should just say, it depends. At some tasks the 7970 will absolutely crush it (bitcoin) at others they are similar.

anexanhume · Jan 12, 2013

function said:
I wouldn't judge a GPU by flops. RSX is supposed to have something like 50% more shader flops than Xenos.

God damn, Wikipedia has the wrong clock speed for RSX. I thought they fixed that.

Anyway, two GPUs, one with 2.5 gflops and the other with 3.8. Which is faster?

They're both about the same - 670 vs 7970.

RSX was also not unified. A lot less efficient potentially.

karak · Jan 12, 2013

function said:
I wouldn't judge a GPU by flops. RSX is supposed to have something like 50% more shader flops than Xenos.

God damn, Wikipedia has the wrong clock speed for RSX. I thought they fixed that.

Anyway, two GPUs, one with 2.5 gflops and the other with 3.8. Which is faster?

They're both about the same - 670 vs 7970.

Can you explain that a tiny bit more. Most interesting thing I have read in weeks.

pjbliverpool · Jan 12, 2013

function said:
I wouldn't judge a GPU by flops. RSX is supposed to have something like 50% more shader flops than Xenos.

They were actually very close in shader FLOPS. Xenos was 240 GFLOPS and RSX was around 250 off the top of my head.

Xenus · Jan 12, 2013

The differnce being when they are both made by the same company the variation is likely going to be less. Dave seems to be dropping hints that GDDR5 isn't out of the question as an option. Now GDDR5 on an interposer would give godly bandwidth but I'm not sure why you'd go that route as it seems to give the cost disadvantanges of both worlds. So it's more likely to be stacked ram or GDDR5.

MrFox · Jan 12, 2013

It looks like power consumption and die size remains the most accurate metrics for expected performance

Except if the architechture is revolutionary, but if they are both from AMD, what are the chances that one get's an amazing new thing and not the other?

AlphaWolf · Jan 12, 2013

That might well depend on when they asked.

Predict: The Next Generation Console Tech

DieH@rd

LXFBN

Mianca

anexanhume

Bagel seed

Lightman

anexanhume

XpiderMX

Mianca

Blazkowicz

Alexct

Blazkowicz

function

None functional

AlphaWolf

Specious Misanthrope

anexanhume

karak

pjbliverpool

B3D Scallywag

Xenus

MrFox

Deludedly Fantastic

AlphaWolf

Specious Misanthrope

Similar threads