Predict: The Next Generation Console Tech

Status
Not open for further replies.
http://www.computerandvideogames.co...-as-good-as-ps3-but-its-still-not-as-capable/

Wii U CPU bound code has trouble to match current-gen at this time. PS3/360 are even processing all sounds on the CPU unlike Wii U

I'm sure this article has been beaten to death elsewhere, but here goes:

A) Anonymous developer contradicts what has been stated by named developers in interviews.
B) The middleware is likely not optimized to use the DSP and I/O controller.
C) What version of the dev kit is this developer using?

This whole article is lacking any context. We don't know what the developer is using, whether it's an in-house engine and no middleware, or early unoptimized middleware included in the SDK. They could very well have a V2 dev kit from the middle of last year. In any of those cases, I would expect that they would definitely be having issues. But I think blaming the hardware instead of the middleware is stupid.


ModEdit: OT removed
 
For starters you'll be lucky to get a bit more than half that 32GB/s on 128bit DDR3. Second that has to be shared with CPU. I'm quite sure it'll be cheaper and with far smaller overhead to just go with x MB RAM + 2-4x MB VRAM or just one huge shared memory pool.

If that devkit specs are even remotely true then I'm almost certain that the final box will only have an APU without a discrete GPU and it'll likely be using GDDR5 instead of DDR3 in a single unified memory pool.
Well that just doesn't answer my question no disrespect or anything but look at the 360, the device got away with 22GB/s of bandwidth to the main ram were texture were stored. That bandwidth was share by the GPU and CPU and for moving data from edram to main RAM.

If I look at the HD6570, it uses DDR3 @ 900MHz which must be called DDR3 1866 (naming are so confusing...), that's 28.8 GB/s of bandwidth according to AMD own data.

Using hardware.fr data the CPU in peak situation will reach ~20 MB/s in while reading and ~16 MB/s while writing. That's peak figures while reading and writing I would assume that 20 MB/s is the absolute limit for the CPU bandwidth usage.

That's ~9MB/s left for textures in the worst case scenario. I still have no idea if that would be bottleneck.

Anyway in llano the same bandwidth feeds the CPU , the textures read and frame buffer operations. I'm close to assert that texture reads from another GPU would not be a problem.

I don't expect to this to work in a dual graphic fashion ie alternate rendering. I expect both GPUs to work on different tiles. Ultimately the "definitive" frame buffer would be in VRAM or RAM. you may not want the link to be a bottleneck went you move part of your frame buffer but having more bandwidth than to main ram may be over killed.
If the CPUs sends data to the GPU is can be bursty, I don't know. PCi express x16 may be enough but more... is always tempting :LOL:

-------------------------------
How would you do to come with unified memory pool?

A while ago and speaking the next xbox not the 360 I though that the second GPU would be a "shader core only" with all the ROPs on the APUs, I don't know it it's doable the xenos ROPS were simplistic, HD6000 ROps are tied to the L2 which are tied to the mem controller... that means the L2 would be off chip, etc.
I don't either if it's doable for the APU GPU to render straight into VRAM.

It sounds like complicated proposal to me, each GPUs are likely to render straight into the memory pool they are connected. So having a unified memory space sound really complicated.

Another thing is that both GPUs will want to read textures, would want waste memory and have duplicate of the same texture in ram and vram?

For the kind of resolutions that kind of card can push I feel like 512MB is really enough, more would be over kill. Let's consider this a tiny pool of fast memory intended only at frame buffer operation :)
 
Last edited by a moderator:
Well that just doesn't answer my question no disrespect or anything but look at the 360, the device got away with 22GB/s of bandwidth to the main ram were texture were stored. That bandwidth was share by the GPU and CPU and for moving data from edram to main RAM.
XB360 also had edram that was used as write-only framebuffer that had an extra 32GB/s bandwidth. Also, that thing is how old now? And you say that a next gen system coming out in about 1.5 years will barely have comparable bandwidth (gddr+edram on 360)? I find that rather horrible if true.
Using hardware.fr data the CPU in peak situation will reach ~20 MB/s in while reading and ~16 MB/s while writing. That's peak figures while reading and writing I would assume that 20 MB/s is the absolute limit for the CPU bandwidth usage.

That's ~9MB/s left for textures in the worst case scenario. I still have no idea if that would be bottleneck.
Pretty sure the reason why the CPU can't use the whole 28-ish GB/s thoughput to the ram is due to the memory controller's inefficiency and it's physically impossible to get more out of it by having the GPU use the RAM by going through the CPU.
Anyway in llano the same bandwidth feeds the CPU , the textures read and frame buffer operations. I'm close to assert that texture reads from another GPU would not be a problem.
It probably won't be a huge issue but it's still an extra bit of used resources that WILL cause extra latency on CPU side as well and I hope you know how bad will it be when already expensive cache miss on CPU has to wait until GPU is using the DDR at the moment.
How would you do to come with unified memory pool?
Simple, I do it by replacing the APU+discrete GPU with a stronger APU and no discrete GPU :)

Yes, it'll be a pretty low-end machine but it'll be far easier to use for devs and will have less resources wasted on useless stuff like trying to balance two differently performing GPUs + CPU with two separate memory pools.
Another thing is that both GPU will want to read textures, would want waste memory and have duplicate of the same texture in ram and vram?
Same thing will be with split memory pools and two GPUs just that instead of pulling the texture in cache for one GPU you'll have to have it in two. Same goes for all geometry and any dynamic textures. Multi-GPU solution is just plain horrible when you can get same thing on a single GPU at next to no cost.
 
I'm sure this article has been beaten to death elsewhere, but here goes:

A) Anonymous developer contradicts what has been stated by named developers in interviews.

Multiple anonymous developers contradict vague and carefully worded niceties by people who don't want to piss off a platform vendor. I wouldn't want to put my money on either but I'm happy to reflect upon the 45nm CPU, 4cm fan and Nintendo badge on the case.


ModEdit: OT removed
 
XB360 also had edram that was used as write-only framebuffer that had an extra 32GB/s bandwidth. Also, that thing is how old now? And you say that a next gen system coming out in about 1.5 years will barely have comparable bandwidth (gddr+edram on 360)? I find that rather horrible if true.
The link between Xenos and its daughter die can't be consider extra bandwidth, it's just a mean to send data from the shader to the ROPs. In standard design that would be an internal bus.
For the purpose I'm speaking about (reading textures) 28.8GB/s may not be an issue.

The 360 has 22.2 MB/s of bandwidth to main RAM that all the rest doesn't add up. It also have 256MB/s for frame buffer operation and those operation only. You can't aggregate bandwidth like that it's like adding apple to orange.
Pretty sure the reason why the CPU can't use the whole 28-ish GB/s thoughput to the ram is due to the memory controller's inefficiency and it's physically impossible to get more out of it by having the GPU use the RAM by going through the CPU
Controller efficiency and perhaps a limit on the CPU side too. We have no way to know for sure.
If you are the same issue could be right for a lot of design whether they use GDDR5 or DDR3.

It probably won't be a huge issue but it's still an extra bit of used resources that Will cause extra latency on CPU side as well and I hope you know how bad will it be when already expensive cache miss on CPU has to wait until GPU is using the DDR at the moment.
I don;t expect that to happen, GPU are less latency sensitive that CPU, it's pretty much a given that the memory controller prioritize CPU request. It might still not be as good as having the CPU alone but I'm not sure it a significant difference, GDDR5 comes with higher latency and a round trip to main RAM is ~1000 cycles on AMD CPU (latest Intel in between 700 and 800) a this stage it's already a performances disaster in this worse case scenario a few cycles won't make that much of a difference.
Simple, I do it by replacing the APU+discrete GPU with a stronger APU and no discrete GPU :)
that's if you can. there are power, heat and $ (chip per wafer, yields) considerations.
Yes, it'll be a pretty low-end machine but it'll be far easier to use for devs and will have less resources wasted on useless stuff like trying to balance two differently performing GPUs + CPU with two separate memory pools.
I do agree with that. But if it doesn't happen I guess they will have decided that at least one of the factors above (power, heat, $) was out of their design goals.

Same thing will be with split memory pools and two GPUs just that instead of pulling the texture in cache for one GPU you'll have to have it in two. Same goes for all geometry and any dynamic textures. Multi-GPU solution is just plain horrible when you can get same thing on a single GPU at next to no cost.
Well I make the assumption that actually reading texture may not be that much of a bandwidth intensive task. As I say llano on it's inefficient 28.8MB/s link has "enough" bandwidth for the cpu the tex units read and frame buffer operation. The later is most likely the most bandwidth hungry operation of the three still llano run some games nicely at sane resolution.
 
Last edited by a moderator:
But you *do* need colours to make a Call of Duty game so that's not true while, so far, you haven't needed complex physical simulation to make a Mario game (as far as I'm aware). It doesn't read like he's trying to slag off Mario games, or even the money printing machine that is Nintendo, it reads like he's looking for a justification for the level of performance he or his team are encountering at this stage of development. He's entitled to be disappointed by it just as Nintendo are entitled to put that hardware out.

It's fair to point out that the number of "anonymous" voices reported on by well known gaming news sites that don't think the WiiU is a step above PS360 is starting to add up.

Go play Mario Galaxy...

Folks, this isn't a thread about journalism.

True 'nuff. My apologies for derailing the thread.
 
As I wrote some on GAF, those PS4 specs (whether or not there is any shred of truth to them) actually arent horrible.

7670=6670=480 SP's (worth noting IGN also claims 1ghz clock, better than 6670's 800mhz, of course any clocks might be subject to change)

3850 APU=400 SP's. Ability to crossfire with a discrete GPU built in.

That's 880 SP's, which is not bad. More than my HD 4890 that runs BF3 campaign @1080P mostly ultra settings at ~30 FPS. And of course you can about triple that with console optimization.

It's interesting how many of these rumors conflate though, thinking some wires may be crossed. EG:

Xb720 has two GPU's. Xb720 has 6670...both would fit this PS4 rumor perfectly...or maybe the machines very similar?
 
As I wrote some on GAF, those PS4 specs (whether or not there is any shred of truth to them) actually arent horrible.

7670=6670=480 SP's (worth noting IGN also claims 1ghz clock, better than 6670's 800mhz, of course any clocks might be subject to change)

3850 APU=400 SP's. Ability to crossfire with a discrete GPU built in.

That's 880 SP's, which is not bad. More than my HD 4890 that runs BF3 campaign @1080P mostly ultra settings at ~30 FPS. And of course you can about triple that with console optimization.

Ok, the flip side.

The AMD Fusion A8-3850 has 400 "Radeon Cores" (SPs) in the APU top out at 600MHz (480 GFLOPs), 20 TMUs (12 GT/s), and 8 ROPs (4.8GP/s) and is connected to slow system memory. (Just to note, this is a 100W TDP chip at 2.9GHz; the 2.5 (2.8) GHz 3820 is 65 TDP. The additional performance comes at a high price in power which could be spent elsewhere).

The AMD Radeon HD 6670 is has 480 SPs at 800MHz (768 GFLOPs), 24 TMUs (19.2 GT/s), and 8 ROPs (6.4GP/s) and is connected to a 64GB/s 128bit bus.

So some issues: First is it is heterogeneous GPUs with vastly different performance. You tossed out the 480 and 400 SPs numbers for the 6670 and A8-3850, respectively, but did not factor in their frequency, memory architecture, or the other graphic pipeline features like TMUs. The Radeon 6670 is almost twice as fast as the Fusion chip. So you have a pretty big gulf there in performance. I would hate to think what that would do to Crossfire.

The next issue is microstuttering. Get 30Hz framerate that feels like 15Hz. Yuck.

The following issue is how are you going to effectively coordinate these things. Not only are they different GPUs they also are going to be accessing wildly different memory architectures. Even worse: Memory Duplication. Your 1GB Video Framebuffer now is also 1GB duplicated in system memory for your other GPU for Crossfire.

And I noted in the PS4 thread: a 6670 isn't some huge leap over Xenos. 2-4x performance jump.

Personally, I consider that horrible unless these things ship Day #1 at $199 and not a penny more (budget hardware=budget price). And even worse: this sort of console could ship in 2012. The fact we may not see something like this until 2013 or 2014 is laughable.

It is funny, with the end of 2011/early 2012 Pitcairn coming out that we see a GPU with a sub-last gen GPU budget with GCN architecture and it won't be viable for 2013 consoles but a VLIW5 2009 architecture at 118mm^2 and about half/third (!!) the performance of Pitcairn is being actively tossed around.

Really, they have 18 months from the release of Pitcairn (early Winter 2012) to mass production ramp for a 2013 launch (Summer 2013) to get something like Pitcairn into a console and yet the rumors are everyone is running toward 2009 products with woeful performance compared to where a "modest" area/power budget will be in 2013.

Either everyone is cheaping out which is what I think is happening (based on bk's and db's recent comments; bk's about hot and expensive hardware; db's about services being the differentiators, i.e. they will be very similar) and someone is missing a golden opportunity to spend a little more and BLOW AWAY the competition in realistic metrics or there is a lot of false stuff going out.

AlStrong has been whipping me to expect a 6670 range GPU, so it must be true. The up side? We all get Waggle this gen out of the box! :D
 
And I noted in the PS4 thread: a 6670 isn't some huge leap over Xenos. 2-4x performance jump.

If its running at 1Ghz its ~4x just in raw shader perf, probably 5+ in terms of useable performance.

I find it interesting that this is the second time we've heard about a 6670 and more than one GPU though, wonder if it was true all along...? I could see an APU in crossfire(or whatever they call that APU/discreet crossover tech) with a 6670 being 7-8x Xenos performance.
 
You know the rumors about MS and/or Sony showing devs their plans and the devs freaking out about how unacceptably low they were? These APU+ Discreet rumors make me wonder if one or both companies was going to try and start the next gen with a single chip design but were forced to tack on a discreet GPU to mollify developers.

It is very strange that these PS4 rumors coincide so closely to so many of the XB3 rumors already in the ether. APU+Discreet was the only way to even make sense of yesterday's "dual GPU" rumor that "doesn't work like traditional SLI or CF". And the 6670/7670 is the exact model previously rumored for the next Xbox.
 
You know the rumors about MS and/or Sony showing devs their plans and the devs freaking out about how unacceptably low they were? These APU+ Discreet rumors make me wonder if one or both companies was going to try and start the next gen with a single chip design but were forced to tack on a discreet GPU to mollify developers.

It is very strange that these PS4 rumors coincide so closely to so many of the XB3 rumors already in the ether. APU+Discreet was the only way to even make sense of yesterday's "dual GPU" rumor that "doesn't work like traditional SLI or CF". And the 6670/7670 is the exact model previously rumored for the next Xbox.
maybe media try to say last year's microsoft-sony.cm and sony-microsoft.com is true:LOL:
 
Are we sure that's a given?

Sorry I meant to say the Power PC architecture in general

I would very much expect MS to *not* go with Power7.
A PPC470 would buy them little less than half the single-threaded brunt in dramatically (think tenth, not half) less space and heat, which they could spend to get more cores or beef up the GPU.

Ok, so given the same transistor/heat/die size etc budget PPC will still give you better bang for your buck than x86? Would the only real advantage to x86 being ease of programming and more accessible performance? (and also ease of porting to-from PC)

It would be even more ironic if the Xbox 3 had more of the PS4 CPUs... again :D

Yeah exactly, do you think the three console manufacturers know pretty much exactly what the other is up to (given AMD/IBM in common) and so (for MS and Sony at least) design their machines to match each other (things like identical RAM this gen for example).

For example do you think MS knew that the Xenon cores were originally the PPC core for Cell (with the enhanced VMX units?).
 
It is funny, with the end of 2011/early 2012 Pitcairn coming out that we see a GPU with a sub-last gen GPU budget with GCN architecture and it won't be viable for 2013 consoles but a VLIW5 2009 architecture at 118mm^2 and about half/third (!!) the performance of Pitcairn is being actively tossed around.

I think Pitcairn with all it's great qualities still consumes more power than Xenos or RSX did at launch. It is possible that they don't want that. Those desktop Llanos already have pretty high TDP on the stronger models. Still you'd think that GCN-based APU and a Cape Verde GPU would be possible and better fit for something launching next year?
 
It really makes little sense when there is faster and more efficient gpus available. A 7750 is something like 50% faster and uses about 75% of the power. And by 2013 I doubt there would be a real cost advantage.

exactly, those dev kit specifications if there is any truth to them would certainly change by fall 2013.

so according to rumors we have :
- next xbox CPU having a lot more cores and being much more powerful than ps4 APU 4 cores AMD CPU.
- both consoles having the same weak GPU 6670 architecture (is it a coincidence ?)
- next xbox is rumored having more RAM than ps4 which made developers angry and so sony would very likely change this.
- we have no rumors on RAM bandwidth, nor what is exactly the quantity of RAM in both consoles, despite the importance of this information (after all why put an expansive GPU in there if it is bandwidth limited ?)

and all of this doesent make any sense in 2013. :LOL:

if that is true than sony and microsoft missed a huge opportunity to launch their consoles in 2012, one year before te competition, and blowing the Nintendo wiiU party LOL I understand now why the epic guys arent at all satisfied with what is happening with next gen consoles.....and why they delayed their unreal engine 4 plans...lol
 
Last edited by a moderator:
The link between Xenos and its daughter die can't be consider extra bandwidth, it's just a mean to send data from the shader to the ROPs. In standard design that would be an internal bus.
For the purpose I'm speaking about (reading textures) 28.8GB/s may not be an issue.
Cutting out all FB write operations from the 22.2GB/s link between GPU and vram is a pretty major thing.
Controller efficiency and perhaps a limit on the CPU side too. We have no way to know for sure.
If you are the same issue could be right for a lot of design whether they use GDDR5 or DDR3.
Considering those were synthetic benchmarks it's pretty much as high as the memory controller can get. The CPU itself can issue far more fetches but it's impossible to get near theoretical peak in real-world settings.
I don;t expect that to happen, GPU are less latency sensitive that CPU, it's pretty much a given that the memory controller prioritize CPU request.
Maybe but even that prioritizing will mean that there is a bit of overhead that cuts down on usable bandwidth for both. Also, GPU's won't like latency either once you start using GPGPU.
that's if you can. there are power, heat and $ (chip per wafer, yields) considerations.
Just cutting out the channel between GPU and CPU and extra memory controller in GPU will give you quite a bit of extra power budget to use in an APU. Also, the used APU was relatively old and a new one is coming up in next few months. I'm sure there will be a variant that has roughly similar TDP and performance to the devkits CPU+GPU. Also, two separate low-end chips will not give you much of a cost benefit over a single midrange one if you can cut out a ton of fluff by going with a single chip. There is a reason why every console moves to SoC ASAP.
Well I make the assumption that actually reading texture may not be that much of a bandwidth intensive task. As I say llano on it's inefficient 28.8MB/s link has "enough" bandwidth for the cpu the tex units read and frame buffer operation. The later is most likely the most bandwidth hungry operation of the three still llano run some games nicely at sane resolution.
Intel adding HUGE cache to Haswell just for GPU tells me that it's far from neglible.
If its running at 1Ghz its ~4x just in raw shader perf, probably 5+ in terms of useable performance.
... and only about 3x increased memory bandwidth, less when we include edram in XB360.
 
and only about 3x increased memory bandwidth, less when we include edram in XB360.

Yeah, but thats assuming its an actual 6670 board with RAM, which I doubt. I'm taking it as a 28/32nm shrink of the same chip, and they would probably couple it with faster RAM. They're only using 4ghz GDDR5 on 6670, surely they can use at least 5ghz and probably even 6ghz. That puts you up to almost 5x bandwidth as well.

Its also assuming it doesn't have its own eDRAM, there we're rumors Sony was looking at some kind of interposer a while back, could be a good chunk on die via interposer.
 
Interesting: An AMD A8-3850 is $120 at retail and and AMD Radeon HD 6670 1GB GDDR5 model can be found for $70. 4GB of memory isn't even $20. A 12x Blu Ray player can be found for under $60 at Retail. I have seen 64GB SSD get down to about $70. An FM1 Motherboard is going to run $55. Pick up a generic crap case with stock PSU for $20. That is $415 at retail.

Put this into perspective for a console: $415 after the retailer takes a healthy cut (consoles usually have thin margins with software having a bigger margin at retail) as well as a cut for certain manufacturers (e.g. Sapphire for graphics, Gigabyte for the MB; i.e. they purchase the components from AMD, Samsung, etc and assemble the product, market it, distribute it, and then take their own cut). Then there is redundancies as the video card alone is almost a complete MB and the MB itself has a lot of excess bells and whistles that would be cut for a console. Further, they aren't going to get a SSD but are going to bulk contract out a single platter HDD that is less than $30. So two things grab me from this.

The first is as of right now, early 2012, an AMD A8-3850 and Radeon HD 6670 GPU system looks and sounds like it is easily a sub-$300 console to manufacture. By fall 2013 or later it is inconceivable once you consider all the hands this sort of retail hardware passes through with all the extra layers of marketing, distribution, management, etc cut out, as well as a console specific design with hefty supply contract discounts that this sort of console could not be made extremely cheap.

The second is that if, truly, both MS/Sony went this route consumers could easily in 18 months (Fall 2013) assemble (probably even just plain old buy a prefab) a PC in the $400-$500 price range that could have a better CPU, a GPU on order of 2x-3x faster, twice as much RAM, a SSD, and so forth that would allow them to get console ports all generation long that ran better on a launch-day PC than the consoles.

I guess you could argue that Kinect2/Move2+EyeToy3 are going to eat up a huge chunk of the console budget and you would possibly miss out on some of that. Anyways, I just find the rumored specs for a 2013+ platform launch just shockingly lower than I was anticipating even with more constrained budgets in view. If this all comes to pass I wonder if AMD made a really hard sell on their APUs because they know that getting every console game developer to basically do R&D into maximizing their APUs helps them against Intel and Nvidia.
 
Interesting: An AMD A8-3850 is $120 at retail and and AMD Radeon HD 6670 1GB GDDR5 model can be found for $70. 4GB of memory isn't even $20. A 12x Blu Ray player can be found for under $60 at Retail. I have seen 64GB SSD get down to about $70. An FM1 Motherboard is going to run $55. Pick up a generic crap case with stock PSU for $20. That is $415 at retail.
And with this config, will be possible achieve games with Samaritan level graphics, without quality compromises at 720p 30fps ?
 
Status
Not open for further replies.
Back
Top