Predict: The Next Generation Console Tech

Kaotik · Jan 27, 2012

jonabbey said:
Random, off-topic thought:

One of the things that the 360 and PS3 had in common last generation was the use of specialized high speed ram for the entire system, unlike PC's which have mass quantities of slower ram and dedicated high speed ram for the GPU.

It seems like having some slower ram in bulk could be useful for caching of content from optical disc / hdd, but nobody went that way last generation.

Is the cost differential between the fancy ram and the normal stuff too small to make it worthwhile using both types in the next generation? It seems like plentiful caching is only going to get more important as content size increases on storage subsystems that can't get much faster than they were last generation.

For what I've read, devs love the fact XB360 has unified memory pool, and the general thought is that all the next gen consoles will use one, so unlikely.

Ninjaprime · Jan 27, 2012

AlStrong said:
8ROPs would be pretty awful (even with high GPU clocks) as the WiiU has to support both the television and the controller screen. There's no way 8ROPs will be able to handle 1080p+480p (whatever the res of the controller screen is) with current gen graphics requirements. Hell, it would barely handle 1080p unless it were a simple game with next to no blending and really easy culling scenarios.

Without MSAA, the bandwidth requirements aren't that high for pixel throughput unless you're also doing a lot of blending, but then you'd probably be fill limited anyway.

That depends on if they are actually doing 1080p, which I would suspect not. I'm guessing it will mostly be 720p, or maybe 1080i scaled. Whos to say the controller needs to have 3d graphics in the first place for most games anyway? For menu systems the controller screen could be 480p with 24 bit color, or even 16 bit color.

On raw performance side, that is true but I think that in the long run, using GNC would be a much better choice, even if they go the low-power route.

Better on what metric? Seems like for packing as much bang for buck/die area/transistor VLIW5 is the way to go, its ~75% more dense per FLOP. Might have to use more effort to get peak out of it, but thats all you do in a console anyway, right?

torque · Jan 28, 2012

Hi all. This is amazing and very interesting stuff. Probably learned more here in 5 minutes that most tech forums in 3 weeks.

I have a quick question.
Is there anything stopping a company like MS from using something like a Bulldozer CPU/GPU, or 2 combined in some way instead of a discrete solution with a GPU and CPU separated? I see that the Bulldozer uses a good deal of watts under power, probably too much, but just wondering if this solution would ever be...feasible versus all the CPU/GPU combos I see being discussed?

I know there are smart ass answers to this

But I am actually legitimately wondering.
Thanks all. Great forum.

Brad Grenz · Jan 28, 2012

torque said:
Hi all. This is amazing and very interesting stuff. Probably learned more here in 5 minutes that most tech forums in 3 weeks.

I have a quick question.
Is there anything stopping a company like MS from using something like a Bulldozer CPU/GPU, or 2 combined in some way instead of a discrete solution with a GPU and CPU separated? I see that the Bulldozer uses a good deal of watts under power, probably too much, but just wondering if this solution would ever be...feasible versus all the CPU/GPU combos I see being discussed?

I know there are smart ass answers to this But I am actually legitimately wondering.
Thanks all. Great forum.

The rumors already seem to be pointing to a single chip, CPU/GPU combo, only with PowerPC cores instead of Bulldozer or K10 cores like AMD is creating for the PC market. When you see someone say SoC or System on a Chip, that's exactly what they are talking about. AMD has been selling such chips for a while. Ones based on the Bulldozer architecture are coming out this year. There's absolutely no obstacle to Microsoft choosing that direction. Using two of these combo chips would not make much sense, though. The current state of the "Fusion" technology hasn't reached a point where the integration gives enough of a performance advantage that a traditional 2 chip discrete CPU and GPU system wouldn't faster and easier to use.

TheChefO · Jan 28, 2012

torque said:
Hi all. This is amazing and very interesting stuff. Probably learned more here in 5 minutes that most tech forums in 3 weeks.

I have a quick question.
Is there anything stopping a company like MS from using something like a Bulldozer CPU/GPU, or 2 combined in some way instead of a discrete solution with a GPU and CPU separated? I see that the Bulldozer uses a good deal of watts under power, probably too much, but just wondering if this solution would ever be...feasible versus all the CPU/GPU combos I see being discussed?

I know there are smart ass answers to this But I am actually legitimately wondering.
Thanks all. Great forum.

Pros and cons for using a BD solution, but at this point the cons significantly outweigh the pros so not likely:

-watts-heat
-trans budget and die size per performance
-complete break of code compatibility
-doubtful ability to "buy" the design by MS or Sony

There are much better solutions out there than x86 for the console sector which MS/Sony found when they started doing research for this gen machines. Assuming of course that intel won't be selling their sandy/ivybridge core designs to MS/Sony and letting them manufacture in their plants.

It's all about bang for the buck and at this point, Bulldozer is pretty low on that metric.

TheChefO · Jan 28, 2012

Ninjaprime said:
Have you ever considered for inflation?

Wafer costs have gone up 25% (28nm).

And we currently see a $300 kinect xbox360 on retail shelves ...

So yeah I'm not expecting a $300 xb720 core to hit retail.

I fully expect that the price of entry for nextgen will be $400, not $300.

Having said that, there are significant streams of cash now coming from XBL for MS which CAN be used to offset the BOM if MS chooses to use it that way.

I don't think they will as they will want to capture the high end market, as they always have. The current gen xb360 will slide into the lower tier and fill that void well while xb720 can safely sit at the top end and not be concerned with dominating the sales charts immediately.

XBL is profitable, xb360 is profitable, kinect is profitable, and the software chain is profitable. There is no need to kill this venture. Only to expand it.

With that, the premium $400-600 price segment can be added to the current lineup.

This pricing segment affords some pretty aggressive hardware which is nothing like the xb360+ that some are speculating.

At least as aggressive as the HW was at the time in 2005/2006:

~500mm2 die budget (GPU, CPU, ?EDRAM?)
>=2GB ram
~200W

For those concerned on price affecting adoption:

Ipad $500-700
Iphone $600+ (w/o contract)

Price isn't an issue. Even in this economy. People buy the gadgets they want. Just a matter of making it desirable. Gimped hardware isn't the way to create this desire.

Offering a full Windows 8 for increased functionality turning the xbox into a wbox... that might be.

It's all about value-add, and with Apples latest sales quarter, some of you people really need to adjust your expectations for expensive gadget acceptability. Along those lines, MS/Sony may just decide to bundle motion controls from the start and move the entry price to $500.

_______________

One other thing I'll address is those thinking that services will be the feature that will move the new consoles and therefore, gimped hardware will be ok. Problem with that theory: The existing consoles are already service/feature rich!

Why upgrade to ps4 for netflix when it's available now on the cheaper box?
Why upgrade to xb720 for facebook/twitter/social integration when it's available now on the cheaper box?

TV channels on live/psn? no need for new box
Motion controls? already here
Digital ecosystem? check
3D? lame, but check

The only thing the next generation will be doing that this gen can't/doesn't already is doing what this gen does, but better. That's only possible through better guts... bigger dies (on the same process node), more ram, better architectures, and in some cases, better storage capacity.

So to take away or diminish the one thing that would separate the nextgen consoles from the current gen is to limit the viability of the nextgen and encourage competition on a weak offering.

Again, assuming a rapid replacement ideology isn't in place here along the lines of IOS devices.

I'm fully expecting the spec framework above because frankly, anything less is foolish from the standpoint of delivering on the intended goal of getting these devices into as many homes as possible.

That only happens when enough people WANT the device in the first place.

TheChefO · Jan 28, 2012

liolio said:
At this point I don't think that ms can go with multiple sku...

I'm not sure I'll be able to fully grasp your reasoning for stating this, but I have to ask if you can clarify your stance on why MS would not be able to go with Multiple SKUs.

I can see no reason why they would not and in fact, I'd be SHOCKED if they didn't.

I could see them consolidate the xb360 SKU's as it is a bit ridiculous right now with 4, and that is a bit much to ask retailers to then accept 4 additional SKU's on top of it, but to assume that MS will not want to take advantage of buyers which enjoy buying "elite" is ...unreasonable to put it mildly.

Especially seeing Apple being able to get an additional $100 for 16GB of Flash and another $100 for 32GB.

MS and Sony will both be taking advantage of this "elite" market that enjoys having the "best of the best" even if it means the value of the up-sell isn't there.

Now how far up they choose to go in the tiered sku segment will be interesting to see, but I fully expect both to be there with multiSKU's and milking the top for all it's worth. Just as most other companies are doing these days.

The interesting thing will be if the "elite" approach is fully reciprocated by Sony/MS and they truly offer a better console experience at the top end with a better GPU/CPU/RAM which actually effects the end result on the screen. In other words, an SKU that does more than swap in more Flash or an over-priced HDD.

THAT would be interesting and would follow the concept introduced of forward/upward compatible.

In such an arrangement, the gimped .. er... "entry level" console could incorporate a Turks 6670 118mm2 @ 40nm joke class GPU and the Elite console could offer a Tahiti 7970 352mm2 @28nm bigboy class GPU.

This would afford those that like their new console in a tiny Wii-sized case which is whisper quiet to be able to choose that direction, and those that prefer cutting edge graphics to be fulfilled as well.

IllusionistK · Jan 28, 2012

6 x 165(Xenon) million transistors = 1bln transistor CPU for Xbox 3
6 x 200(Xenos) million transistors = 1.2bln transistor GPU for Xbox 3

Those are all on the 45nm lithography, how much would they change on 32, 28, 20, 22nm, etc.

I put my money on a four-core OoO Power7 and a Pitcairn Pro(1536SPs) if launching at 28nm.

20nm tape-out Q4 this year, with production silicon ready for 2013. It may be a stretch to asked for a 20nm launch, but I hope its possible. They may save 20nm for a cost-cutting node.

Brad Grenz · Jan 28, 2012

I think Pitcairn Pro would be a good choice for next gen, but I'm sure the transistor count is closer to 2.5 billion.

liolio · Jan 28, 2012

I was reacting to Patsu's joke on another thread yesterday night but there is a weird way to put all the rumors/talk here heard so far together.

We've heard:
*6 cores and 2 gpus
*PS4>720
*Simulation at the heart of next-gen experience, middle-wares vendor supposedly already informed
*it's a SoC
*GPU is akin to 6670
*the system will~6 time more powerful

Some APU in Xfire mode would fit the bill.

Almost every body here expect the next-box to use 32nm and 28nm, so using this there are way to pull something cheap and if not over the top more acceptable to core gamers than HD 6670 level on performances

which OT performs imo impressively well it's possibly the first low-end card to do this well with the game of its time but anyway even to me for a 2013 release it's a bit short

.
That combination sounds pretty complicated and one could say why not simply put all the GPU silicon budget on one die and call it a day?

because some dudes on internet said otherwise

I will answer at the end.

Back to my hypothesis. Using 28nm process a HD6670 class of gpu will be "freaking" tiny, basically it would be close to the size of the daughter die in nowadays 360 may be tinier, it could still consume an hefty amount of power as the hd6670 as a whole so with ram, etc. has a TDP of 60 Watts but more on that later.

Now for the SoC, putting all the rumors together I still feel a sense of "cheap", in no way I've the impression that MS is to pursue an as aggressively specs system as the 360 was. So whereas in may dream I would like to see a big SoC oozing with power and bandwidth, I'll be more conservative and stuck the rumors (no matter how believable they are) we heard.

I assume MS wants something cheap to produce in fact really close to nowadays 360, so it has to be pretty tiny so they get good yields and plenty of chip per wafer. They will also want something easy to cool so not too hot.
Taking that in account I can't see the chip being bigger than valhalla. EDRAM could help to keep the chip tiny or to get it tinier, it's more expansive to produce, considering cost alone I can't say if that would be a win or not but taking in account the rumors about in which foundry it is produce I'll assume that it use it (and more see below).
The SoC would be around Valhalla size
Packing 6 cores and a relevant amount of GPU power in an area comparable to valhalla... is not an easy game, especially as IO wont scale down @32nm. To me it has various implications:
1) CPU are unlikely to benefit from any kind of aggressive OoO execution implementation. Something akin to ARM cortex A9 supporting 2 way SMT is a best case scenario. The worst case scenario would be POWER A2 close parents.
2) I would dismiss wider SIMD (as they would not get feed properly).
3) The use of as much EDRAM as possible, for the L2 and L3 (if there is one) as well as in the gpu.
4) To make a long story short expecting thing to double is not sane.

Putting the whole thing together, one could say that MS is trying to put together a system that is a lot more powerful than the 360 but has the same "footprint". By footprint I mean, silicon budget, power dissipation, may be even the number of memory chip and mobo size. That's not an insane requirement, the 360s is still a beefy piece of electronic. MS may want to simply replace the 360 current SKU (so for 400$ you got kinect 2, the console an HDD).

To me the cheapest option would be something like this:

* Both the SoC and the GPU are on the same pad as Valhalla and nowadays daughter die.

* MS use GDDR5 for the RAM and only x4 2Gb modules (straight replacement of GDDR3 modules)

* Flash storage is for all SKU, 8GB is the bottom line with a part reserved for caching.

* MS replaces the x12 DVD drive by a x4 BRD one.

* MS keeps its multi SKU policy

* looking at the inner of the system without info about the hardware wnat would say that it looks a lot like the 360 last revision.

------------------------------------------------------------
A more precise view of the system.
CPU (part of the SoC)
6 cores @2.4 GHz, bastard child of Xenon and POWER A2, most likely In Order.
3MB of L2 cache (edram)
64GB/s link to the north bridge (on SoC)

GPU 1 (part of the SoC)
320 SP (4 SIMD arrays)
16 texture units
6 Color ROP Units
24 Z/Stencil ROP Units
64GB/s link to the north bridge(on SoC)
@600 MHz

GPU 2
480 SP (6 SIMD arrays)
24 texture units
8 Color ROP Units
32 Z/Stencil ROP Units
@600 MHz
64 GB/s link to the north bridge

SoC & IO and RAM
The north bridge is connected to 128-bit GDDR5 memory interface.
It attributes dynamically the bandwidth (76GB/s) to the different units, up to 64GB/s to a given unit.
Fast link between the SoC and the GPU2 (provides 64GB/s of bandwidth).
x4 2Gb GDDR5 memory chip @1200 MHz => 76GB/s

------------------------------------------------------------
One will notice that I've clocked down both the CPU cores and the GPU 2. I did so to remove pressure on the cooling. HD6670 TDP is way to high ( still would be @28nm) to be stuck next to the SoC. It's also clear that 6 cores running at xenon speed along with a faster bigger GPU that Xenos (+the ROPs) would be significantly warmer even using 32nm process => the CPUs take the hit in clock speed.

One could also notice that there is not much point to do what I did and I would agree, I may defend my-self my saying that I've been pretty bored lately (still trying to make a place from my-self in great USA).

After watching the result AMD dual graphic (APU in Xfire with low end GPUs) my belief is that the thing could prove surprisingly up to the task. The only that would really hurt the system would be some arbitrary enforcement like 1080p+AA.
I believe that making flash memory standard (so including model with HDD) is not that bad of a trade off to the low amount of RAM as most think. Along with the generalization of virtual texturing it could do surprisingly well (BF3 is somehow already proof of the concept).
As far as BC is concerned the system would emulate what it can that's it.
For some games (think XBLA, indies) I could see the GPU 2 being turned off. So it will in BC mode.
The same is likely to happen to some CPU cores (depending on the needs). I expect the GPU 2 to be dead during all set top box operations.
I think that I would also be possible to disable GPU1, so in the beginning one could use the system as a traditional single CPU set-up (and so using the most powerful GPU, the GPU 2).

The most interesting part: peak FLOPS figures.
6 (cores) x 2 (FLOPS per cycle) x 4 (SIMD width) x 2.4 (GHz) = 115 GFLOPS
768 (hd6670 GFLOPS figure) x 10/6 (taking in account 4 extra SIMD arrays) x 6/8 (taking in account lower clock speed) = 768x5/4= 960 GFLOPS
All together it's 1.104 TFLOPS EDIT that's a bottom line the clock could be higher if MS were to invest into a proper cooling system (water based or not not my point, I'm something serious as in the PC realm).

Might be underwhelming to some but it has advantage you have 2 GPUs, that open a lot of options and has some benefits:
You have twice the fixed functions (rasterizer, geometry engine, etc). It should impact positively thing like tessellation performance.
You can devote a gpu to physics for example ( I would expect some people to do that at first then they would move to finer grained approaches).

Disclaimer:
To make it clear it's like I believe it will happen more a fun exercise (at least to me it was): put together all those rumors and try to do so with sense, nothing more nothing less.

fehu · Jan 28, 2012

Can I do a shark jump?
what about an almost traditional gpu, and an ibm cpu with a good amount of L3 edram + rop and necessary logic?

Code:

                -------------
-------        |     EDRAM  |
| GPU |  <==>  |     ROP    |
-------        | 6 CORE CPU |
                -------------

pros
- only two chips connected with low pin serial
- the edram scales in size at ibm's speed
- cpu can share data with the gpu faster

cons
- it's a little mad

Helmore · Jan 28, 2012

I kinda like your idea fehu, just because it's a little crazy

.

Personally I'd go for a SoC with a 5 core Power7 CPU, of which only 4 cores are accessible by games, with 8 MB of L3 cache. Your 4 main cores will be running at around 3 GHz while your 5th core can run at a lower clockspeed. As for the GPU I'd take a cut down AMD Tahiti graphics core, with 1536 Shader ALUs, 24 ROPS, 96 TFUs, still 2 Geometry engines and rasterizers though running at roughly 750 MHz. The SoC has a 256-bit connection to 4 GBs of GDDR5 memory running at around 5.5 GHz. I'd wager that something like that on IBM's 32 nm process would be around 450 mm^2 if it isn't a little bigger than that, but we're talking about a mature process and you'll be able to shrink your chip to a 22 nm process in less than 6 months after release. As for power consumption, I think something like that will be possible within 200 Watt for the whole system. On 22 nm your SoC would be between 200 to 250 mm^2, which is pretty much the smallest you can go if you want to keep your 256-bit memory controller.

I know, may be a little extreme, but I like it

.

liolio · Jan 28, 2012

I don't like the idea (lot of edram on the cpu core especially as it is unclear how much data the shader cores would have to send or read from the cpu/edram) it would cost a lot. A massive investement in silicon and a chip most likely way bigger than anything we saw last gen.
EDIT
Honest question how the amount of data tranfered to the rops?
I would think that it scales with the number of render targets their resolution and the precision one is using ( fp16 would so double the bandwidth requirement), is that right?

IllusionistK · Jan 29, 2012

Brad Grenz said:
I think Pitcairn Pro would be a good choice for next gen, but I'm sure the transistor count is closer to 2.5 billion.

You're right. It's 2.5bln for the 28nm process node.

torque · Jan 29, 2012

TheChefO said:
Pros and cons for using a BD solution, but at this point the cons significantly outweigh the pros so not likely:

-watts-heat
-trans budget and die size per performance
-complete break of code compatibility
-doubtful ability to "buy" the design by MS or Sony

There are much better solutions out there than x86 for the console sector which MS/Sony found when they started doing research for this gen machines. Assuming of course that intel won't be selling their sandy/ivybridge core designs to MS/Sony and letting them manufacture in their plants.

It's all about bang for the buck and at this point, Bulldozer is pretty low on that metric.

Thank you and the other poster as well. So we are looking at a situation where this may actually be the case but just not using BD. What are the pros and cons of a SOC and the split CPU/CPU. It seems like the SOC is just a far far better option.

french toast · Jan 29, 2012

TheChefO said:
In general I agree.

However, if the die budget is comprised of a rather large GPU (400mm2) and a much smaller CPU (<100mm2), then the minimum size of a component would already be an issue with yield so not much more prohibitive to have the cpu on die.

Still, a disadvantage, but significantly less so than a balanced gpu/cpu split.

The advantage of a SoC from day one would be a simpler/cheaper transition of die shrinks and hypothetically would allow for very high bandwidth interconnecting the cpu and gpu which would be great for GPGPU.

Maybe if they are thinking of either designing kinect into the 720 or putting the extra hardware functions it would need ie dedicated processors..they could build the whole thing onto one SOC and drive costs down that way..whilst also making the kinect itself cheaper to manufacture and smalller than it is now?

All this talk of the 720 only being 6 times the power doesn't make sense to me, i would be thinking 10 times would be the minimum they were aiming for, technology has moved on substantially since 03/04 when 360 was being taped out.

I just checked microsofts xbox division quaterly results, they made a mind boggling 3.5+ BILLION$ ..thats a hell of alot of money, with xbox live now mature, and more apps/paid for services sure to be added they could easilly afford a technologically advanced console and make their money back quicker than Sony or Ninty, plus they are already in a more favourable finantial position to begin with.

Microsoft could easilly afford to turn the screw here, it might be too har to resist

liolio · Jan 29, 2012

french toast said:
All this talk of the 720 only being 6 times the power doesn't make sense to me, i would be thinking 10 times would be the minimum they were aiming for, technology has moved on substantially since 03/04 when 360 was being taped out.

Ain't that a bit of an arbitrary statement? Only transistors size is cut in two every 18 months, power consumption and the watts dissipated per mm2 scale way slower but I guess it' unimportant.

i
I just checked microsofts xbox division quaterly results, they made a mind boggling 3.5+ BILLION$ ..thats a hell of alot of money, with xbox live now mature, and more apps/paid for services sure to be added they could easilly afford a technologically advanced console and make their money back quicker than Sony or Ninty, plus they are already in a more favourable finantial position to begin with.

Microsoft could easilly afford to turn the screw here, it might be too har to resist

Well something to consider is that MS has a more important war to fond in the mobile realm. Anyway as a cosutmer I want MS to bleed itself for my sake, dreaming is free.

Honestly those rumors might be bs I don't know and nobody here seems to have insider information, I find the whole thing remant of x4 cell talk back in 2005 a bit too much of wishful thinking imho.

fehu · Jan 29, 2012

liolio said:
I don't like the idea (lot of edram on the cpu core especially as it is unclear how much data the shader cores would have to send or read from the cpu/edram) it would cost a lot. A massive investement in silicon and a chip most likely way bigger than anything we saw last gen.

in my (mad) idea the cpu had something as 1MB of L3 reserved, and a minimum bandwidth to and from the gpu
its a soc split in two... ehm...
and designing the cpu so close to the gpu it can help more in some graphical task as raytracing
if you put them on the same substrate you can have a very fast connection, and down the road glue them in a real cheap soc

I'm making it all up, but its funny XD

almighty · Jan 29, 2012

Upscaling fro 720p or rendering at sub-HD and upscaling to 1080p is not acceptable this time around.

These things should all be aiming for native 1920x1080 rendering, Upscaled games look horrible...

jlippo · Jan 29, 2012

almighty said:
Upscaling fro 720p or rendering at sub-HD and upscaling to 1080p is not acceptable this time around.

These things should all be aiming for native 1920x1080 rendering, Upscaled games look horrible...

It really depends on how it is implemented, there is a lot of research to decouple resolution from shading and methods like this would improve the overall quality/speed.

I'm quite sure that we will see dynamic framebuffers with FXAA/SMAA edge information used during scaling to get 'pixel perfect' edges on scaled image.

Perhaps some games will not use linear sampling of the screen at all, basically just go for 1-3M samples in screen and place them intelligently.
There are lot's of strange possibilities and the tech of next generation will be very fun to follow.

Predict: The Next Generation Console Tech

Kaotik

Drunk Member

Ninjaprime

torque

Brad Grenz

Philosopher & Poet

TheChefO

TheChefO

TheChefO

IllusionistK

Brad Grenz

Philosopher & Poet

liolio

Aquoiboniste

fehu

Helmore

liolio

Aquoiboniste

IllusionistK

torque

french toast

liolio

Aquoiboniste

fehu

almighty

jlippo

Similar threads