Predict: The Next Generation Console Tech

Status
Not open for further replies.
Because a beefy cooling solution and custom motherboard would have to be made to support it, along with embedded GDDR5. In a console this is a given, but on PC it's not worth the hassle because the benefits are basically zero. While there are other PC's out there still using a discreet CPU and GPU games cannot be designed to take advantage of such a system.

I doubt a fully fledged GCN GPU would ever be used, that is Tahiti or its sucessor, more likely the cut down Pitcairn with lacking DP compute capability, but it's still a fair possibility that this is what the final product will be - an ultra high end APU.

I'm not really that sure what more likely though, the above, or an custom but similarly specced to current APU's coupled with a discreet customised mid-high end GPU.
Yeah I see your point, however for me I'm actually not abandoning the prospect of seeing a Sea island gpu at the time of launch. It may not be more powerful even though there's a high chance of it be, but the advantages of a cooler and more efficient 8000series chip sounds like a more logical choice by all means. I just don't know if Sony has the timing right for this in correlation to the launch.
 
On the other hand, the World Technological Executive from Square Enix is readying to showcase the Agni demo on at least one nextgen console at June 2013. This kinda puts the weak vanilla A10 APU to rest doesn't it since we're talking about something close a 680gtx in power.

Well it could be another console and not the PS4 .... ;)
 
ultragpu said:
10bucks says you're wrong;).
Seriously it could be both 720 and ps4. Again with the sheer market size Playstation is in Japan, SE wouldn't dare in a million year to leave Sony out.

And I dont think Sony would be stupid enough to ignore SE's requests for performance especially when it holds some of the most important franchises under its hood that require high performance ;)

Unless they want FF to be exclusive on a competing console or have a disapponting looking version on PS4
 
They don't crossfire well perhaps, but if the GPU renders the world, and the APU renders characters, certain objects and other doodads, smoke, haze, particle effects and so on... Or the GPU renders everything, then the APU handles post-processing like bloom, tone-mapping, depth of field, FXAA, and possibly physics workloads.

In a deferred renderer they could render different passes quite comfortably.

Also something that I was thinking about over the last few years is using one chip to do all of the Lighting / Ray-tracing & other things & leaving the other chip freed up to do as much as it can with the rendering because it doesn't have to worry about all the other task that the other chip can take off of it's hand.
 
Well, maybe GPU in APU is an AMD version of "console physx" implementation which will work in tandem with discreet GPU?

In a way GPU in APU is some sort of Cell SPE which is far more flexible to do much better and more that is currently possible on Cell.
 
A pair of APUs with 2 full-width coherent HT links could provide each die with around the same peak bandwidth numbers from the remote memory pool as its own, obviously with latency penalties and whatever share the other chip is using of its own bandwidth.

The interface between the two could be smaller, but if you want the most simple relationship for the software to deal with for non-AFR, it can't be too narrow between them.
The GPUs would need to be designed to allow them to readily work together in this fashion.
That HT interface would need to be awfully wide. We are not speaking about the DDR2/3 interfaces of CPUs, where HT was barely able to keep pace. Look at the bandwidth numbers of GPUs in the performance region we are talking about, let alone a possible eDRAM solution (should some kind of dual ported eDRAM sit inbetween the APUs?). That doesn't look like a good option to me. So, is it possible to implement? Sure! Does it make sense? In my opinion not.
 
Also something that I was thinking about over the last few years is using one chip to do all of the Lighting / Ray-tracing & other things & leaving the other chip freed up to do as much as it can with the rendering because it doesn't have to worry about all the other task that the other chip can take off of it's hand.
With GCN, one can basically partition a single GPU (assuming you were talking about GPU resources). That's way more flexible and should give higher performance from the same total number of CUs.
 
They don't crossfire well perhaps, but if the GPU renders the world, and the APU renders characters, certain objects and other doodads, smoke, haze, particle effects and so on... Or the GPU renders everything, then the APU handles post-processing like bloom, tone-mapping, depth of field, FXAA, and possibly physics workloads.

It sounds possible to get it working like that, and that's probably what they had in mind. my comments are based on my gaming experience using an APU+GPU; with one GPU "slightly" over powering the other.

From what i've tested is that once an additional GPU is detected and enabled, they instantly work together for rendering a single frame or alternate frame rendering with no specific tasks. even when calculating physics it's shared along with rendering graphics. now of course, PC gaming is still not the best way to test hardware utilization correctly so....

when support for APUs go up, i'm sure developers would be able to make better use of it. right now APUs+GPUs don't share tasks the way you would imagine for games. (both manually or developer wise.)
 
Perhaps the reason why they are APUs is because AMD wanted to sell them APUs and they made the deal reflect this fact? Hasn't IBM for instance benefitted greatly from the console eco-system in that there are significantly more developers familiar with PPC than there would have been otherwise had these consoles not existed? If we follow the same kind of thinking for AMD, they probably want developers to USE the APUs resources in a way which benefits future development on AMD hardware. Perhaps AMD didn't want them to develop a standard GPU + CPU model and would only sell them hardware which included an APU.
 
It sounds possible to get it working like that, and that's probably what they had in mind. my comments are based on my gaming experience using an APU+GPU; with one GPU "slightly" over powering the other.

From what i've tested is that once an additional GPU is detected and enabled, they instantly work together for rendering a single frame or alternate frame rendering with no specific tasks. even when calculating physics it's shared along with rendering graphics. now of course, PC gaming is still not the best way to test hardware utilization correctly so....

when support for APUs go up, i'm sure developers would be able to make better use of it. right now APUs+GPUs don't share tasks the way you would imagine for games. (both manually or developer wise.)

Also something to think about when looking at APU + GPU in PC gaming is the fact that when they are working together the GPU becomes limited to the speed of the main ram in the Computer & the speed of the GDDR5 basically goes out the window.


So I don't think the APU & GPU will be running in crossfire unless the PS4 is going to have main memory that's just as fast as the VRAM, & in that case there really wouldn't be a point in having the VRAM on the GPU because they can just have one pool of Ram.

So I think the GPU in the APU will more than likely work as a co-processor handling it's on computing task.
 
Perhaps we will see both consoles with split embedded ram with the ROPs (similar to xenos).

I know it's been discussed before, but maybe it this setup does make sense. This would certainly be easier than manufacturing one giant APU or APU+discrete GPU. Perhaps keeping all the shaders and CPU cores together and splitting out the ROPs has some benefit as well.

APU: 4/8 Core CPU + 16-20 CU GPU (sub 200mm chip)
Daughter die: 32MB of Embedded RAM and 16-24 ROPs. (probably around ~100mm)

The interconnect between the two would have to be really fast, but if they can get an interposer working that would handle it.
 
Could an APU help for having a feature like picture in picture. For example playing a game while video chatting in the background with both running smoothly?
Edit: assuming there is also a separate GPU.
 
Last edited by a moderator:
Could an APU help for having a feature like picture in picture. For example playing a game while video chatting in the background with both running smoothly?
Edit: assuming there is also a separate GPU.

Yes but if what Proelite say is true about the PS4 having a Arm\PowerVR SOC for the OS & Apps then it will probably be better to just have Video Chat & things like that as part of the OS being done by the SOC.
 
That HT interface would need to be awfully wide. We are not speaking about the DDR2/3 interfaces of CPUs, where HT was barely able to keep pace. Look at the bandwidth numbers of GPUs in the performance region we are talking about, let alone a possible eDRAM solution (should some kind of dual ported eDRAM sit inbetween the APUs?). That doesn't look like a good option to me. So, is it possible to implement? Sure! Does it make sense? In my opinion not.
I'm using the alleged A10 APU as a starting point.
Opterons have 4 HT links with 16-bits in each direction. A single 16-bit HT link at max speed in a single direction is about what a single DDR3-1600 channel provides, with some quibbles given the overhead of HT's protocol.
Two HT links would pair with chips with dual-channel memory, which is what the A10 is.

If the RAM is faster, giving each APU an Opteron's IO would provide for 50 GB/s in chip-to-chip bandwidth in each direction and allow a setup with an aggregate memory bandwidth of over 100 GB/s.
This would be between the Radeon 7770 and 7870 in terms of bandwidth, although a chip with an Opteron's I/O is probably going to have some spare die area thanks to all the extra perimeter it needs.
 
I'm using the alleged A10 APU as a starting point.
Opterons have 4 HT links with 16-bits in each direction. A single 16-bit HT link at max speed in a single direction is about what a single DDR3-1600 channel provides, with some quibbles given the overhead of HT's protocol.
Two HT links would pair with chips with dual-channel memory, which is what the A10 is.

If the RAM is faster, giving each APU an Opteron's IO would provide for 50 GB/s in chip-to-chip bandwidth in each direction and allow a setup with an aggregate memory bandwidth of over 100 GB/s.
This would be between the Radeon 7770 and 7870 in terms of bandwidth, although a chip with an Opteron's I/O is probably going to have some spare die area thanks to all the extra perimeter it needs.
How does those hyper transport links fares with regard to pin number / physical IO?
If it's not much of a bother I could see 2 Hyper transport 3.1 links being enough.

I throw the idea of 2 APUs foremost for economical reasons, (you develop, test and produce only one chip) and from the software pov it is not really different than a APU + GPU set-up.
Given a reasonably fast interconnect the set-up as you mention may offer a reasonable amount of bandwidth, something like DRR3 1866/1600 may remain in production and cheap for a few years (if they could aim for cheap and "slow" (not slower than aforementioned RAM) DDR4 it would even better).

I would expect devs and the API use by the system to come with something better than AFR but as a starting point it would not be that bad.
In the console realm lot of games run at low frame rate, are v-sync, along with optimizations would micro-stuttering be that much of an issue?
A nice side effect is that before more clever use of the dual GPU set-up are ready, each GPU has to render only 15 FPS, in effect the each gpu would be provide with twice the bandwidth per frame a A10 rendering at 30 fps or more can rely on.( it is kind of an obvious argument and it applies to also to the shader cores, texture units, rasterizer, tessellator, etc.)

Anyway I hope the rumors about the A10 are wrong but I hope Sony has something cheap and efficient. Cheaper and more efficient (power and area) than Piledriver cores which Jaguar cores seems to deliver.

Actually I started as a basis with a quad core jaguar + cap verde type of GPU, but AMD might be able to do better/ balance things better. It looks like Trinity is already bandwidth limited and its GPUs perfs almost scale linearly with faster RAM. Depending on the RAM Sony would use and the data AMD must have about where and when trinity is bottle-necked they may come with something
tinier that would perform mostly the same.
I read (extremely fast though) a review of a hd 7750 linked to DDR3 and the card took a huge hit in performances. 16 ROPS would be a waste, actually even 8 ROPs as in trinity could prove overkill (they are under fed as shown by the increase in perfs when the APUs is matched with faster /more expansive RAM).

Any idea what AMD engineers could do to alleviate a bit the bandwidth constrains such a GPU(s in fact) would face? Could oversized L2, texture cache, and local store help a tad?
In the hypothetic case that there are two GPUs could some parts of the GPUs could be scaled down without much of an impact? (thing like geometry engine, etc.). The whole idea is really to hunt for as much saving as possible.

I want my "Wii core" MSFT running behind multiples goals at the same time might not deliver that. After 7 years and counting I could see quiet some people jumping on something cheap with a free to play network. Pricing is more relevant than ever in that still bad economic situation.
 
Last edited by a moderator:
How does those hyper transport links fares with regard to pin number / physical IO?
HT uses differential pairs. In terms of data lines, a 16-bit bidirectional link will have 16 signal pairs in each direction for data.
There's a fair number of additional pins for command and clock, plus some other signal types.
In terms of pins, an Opteron with 4 links has double the data pinout of the memory controller, although a die shot shows that the overall area is pretty close. The electrical demands for the DDR3 bus probably play into this.


I would expect devs and the API use by the system to come with something better than AFR but as a starting point it would not be that bad.
The question would be whether AMD would have modified the GPU and uncore of the APU to better manage something besides APR. One thing about having a heavy interconnect is that the uncore of each chip would be expected to carry a lot more traffic than before, and there better be features in the GPU to better distribute work.
AMD's been content to just let AFR be the common use case so far.


Any idea what AMD engineers could do to alleviate a bit the bandwidth constrains such a GPU(s in fact) would face? Could oversized L2, texture cache, and local store help a tad?
The sort of numbers that get bandied about to make a serious impact on things like texturing above the caches already present are big. It would need to be some kind RAM on interposer or some other expenditure of cash.
 
Just out of curiosity how much memory do the PS3 and 360 devkits have? And how much did the PS2 and XBOX devkits have?
There are two versions of the 360 devkits. One version has 512 MB, and the other has 1GB. I used both at work. I wouldn't put too much truck in the "devkits always have twice as much ram" theory.
 
Status
Not open for further replies.
Back
Top