Predict: The Next Generation Console Tech

Status
Not open for further replies.
I've really a growing disagree with some people here, we're completely losing ground with reality, sorry it's not a completely tech oriented post but come on, you have a company Sony that as a whole is still loosing billions, is suffering from the ongoing currency war, etc. etc.
Their only good business now is playstation based, and you think it would be suicidal for MS (who still makes billions overall) to launch whatever product because Sony will bleed it-self to annihilate Ms and Nintendo. Sorry it's a hardware prediction thread but some talk just doesn't make sense.
Let me ask a question related to this next Xbox rumor. Suppose the 6670 is your starting point, it's a 716M transistor chip, clocked 800MHz with 480 shaders, 24 TMU's, and 8 ROPs. It's based on the VLIW5 architecture.

How would you modify/customize this part to in order to provide a good upgrade for the next gen? (Without saving just use a 6850 or higher)

My guess is that the silicon budget for the GPU is around 800-1000M transistors (in a hypothetical 1.5B transistor SoC) so that would limit the shader count to the ball park of the 6670 and I would guess the clock limits would be similar as well.

Do you think it's a good idea to stay with a VLIW5 architecture? Does going to VLIW4 or GCN shaders buy anything in terms of performance, specifically, for a gaming console (ie 5 VLIW5 vs ~ 7-8 VLIW4 vs ~7-8 GCN CU's?
Well old VLIW5 architecture offer great bang for buck as far as compute density is concerned.
We never learnt for sure how much room the move for VLIW5 to VLIW4 saved to AMD, they spoke of 10% but then they introduced modifications to their ALUs, etc. For some calculations VLIW5 are better/faster. The talk could be between more SIMD array clocked slower vs 6@800MHz.

If I were to mod a HD6670 into something akin (IGN wording) to a HD6670 I would:
x2 ROP and better one (it's unclear which ROP Turk products used they haven't been tested much).
use a greatly improved geometry engine (like twice the perfs).
use faster RAM.
Maybe better texturing units.
 
It's initially targeting professional markets, so that rules out inexpensive.
It may be too late and unproven to be considered for this upcoming generation.
 
probably nothing, because amd is providing the gpu for all 3 consoles next gen.


but, theres another thread with an apple console rumor, so theres always the possibility for something unexpected.

also im pretty sure that openrl is just their raytracing api, it runs on all hardware, and its going to take significantly longer before they can get caustics rt accelerator integrated with their own gpu.
 
You're doing selective reading like some other people. The writer said a developer told them late last year that it was 2x more powerful. However IGN's article confirms what they were told more recently about the dev kits. Read the last part of the article again. His comment reaffirms that.

EDIT: Upon further review I'm just going to say it was bad writing for now.

You don't need to be a rocket scientist to to tell 5x360 is not anywhere near a possibility for Wii U. That´s over 1k SPUs and Wii U is lucky to have close to half of that.
 
Inexpensive Ray Tracing technology is now a reality -

What does this mean for next gen consoles?
Don't get your hopes up. ;) 'Realtime raytracing' could mean anything, from rendering a simple cube at 60 fps to rendering a photoreaslitic scene at 2 fps. Professional ray tracing still takes an age and an accelerator to speed that up to non-gaming speeds is still very desirable. In a gaming environments, application of RT hardware might be for specific jobs. Perhaps RT reflections and refractions, or even combining it with the physics engine. It'd be an interesting addition, but don't expect raytracing to feature prominently in realtime graphics for the next 5 years. Unless this Caustics tech really is the most amazing tech since the invention of the transistor! ;)
 
Suppose the 6670 is your starting point, it's a 716M transistor chip, clocked 800MHz with 480 shaders, 24 TMU's, and 8 ROPs. It's based on the VLIW5 architecture.

How would you modify/customize this part to in order to provide a good upgrade for the next gen? (Without saving just use a 6850 or higher)

Double the ROPs, quadruple the Z/Stencil output per clock and increase texture cache sizes for starters.

Update the feature support to whatever DX spec.

My guess is that the silicon budget for the GPU is around 800-1000M transistors (in a hypothetical 1.5B transistor SoC) so that would limit the shader count to the ball park of the 6670 and I would guess the clock limits would be similar as well.
If we take a billion as the upper limit, you'd probably be looking at a 5770.

Just for comparison's sake:

Juniper
850MHz
800ALUs
16 ROPs
40 texels per clock (half-rate for FP16)
64 Z/stencil per clock
128-bit GDDR5
166mm^2 @ 40nm
108W TDP


Turks (as you noted above)
800MHz
480 ALUs
8 ROPs
24 texels per clock (half-rate for FP16)
32 Z/Stencil per clock
128-bit GDDR5
118mm^2 @ 40nm
66W TDP

Keeping the die sizes in mind, Turks is really quite pathetic, even moreso if you consider that 28nm will be the process node of choice.

If the next console had launched on 40nm, Juniper-class hardware would probably have been close to ideal if you consider the Xenos mother die being 182mm^2 at launch.

Although the load power consumption of Xenos is unknown since we only know the total power being drawn, it would be worth noting that 108W for Juniper would represent over 60% of what the original 360 drew from the wall (175W). However, it's a bit interesting to note as well that the 360S SoC presentation indicated that the highest heat spot is between CPU0 and CPU1 for the 45nm chip, indicating that perhaps the CPU is the greater power hog.


Do you think it's a good idea to stay with a VLIW5 architecture? Does going to VLIW4 or GCN shaders buy anything in terms of performance, specifically, for a gaming console (ie 5 VLIW5 vs ~ 7-8 VLIW4 vs ~7-8 GCN CU's?
It's a bit of a loaded question because it will really come down to how developers are writing shaders and ultimately what hardware they target in the next round of hardware.

We'd probably need a really hardcore in-depth analysis of shader performance on the three different architectures to gain some insight. A proper comparison would mean normalizing a few parameters such as ALU counts and core clocks (for starters). I'd definitely like to see shader analysis for more recent games that use DX11, but the problem is that all these games originally targeted DX9-class hardware to begin with.

----------------

Other GPUs

Barts

900MHz
1120 ALUs
32 ROPs
56 texels per clock (half-rate for FP16)
128 Z/Stencil per clock
256-bit GDDR5
255mm^2 @ 40nm
151W TDP

Cayman
880MHz
1536 ALUs
32 ROPs
96 texels per clock (half-rate for FP16)
128 Z/Stencil per clock
256-bit GDDR5
389mm^2 @ 40nm
250W TDP

Tahiti
925MHz
2048 ALUs
32 ROPs
128 texels per clock (half-rate for FP16)
384-bit GDDR5
365mm^2 @ 28nm
250W TDP

-------------

Supposedly...

Pitcairn
???MHz
1408 ALUs
24 ROPs
88 texels per clock (half-rate for FP16)
256-bit GDDR5
245mm^2 @ 28nm

Cape Verde
1GHz
896 ALUs
16 ROPs
56 texels per clock (half-rate for FP16)
128-bit GDDR5
164mm^2 @ 28nm

(Corrections on specs are welcome...)

-----

I'm rather curious if there will be a refresh part for the 7xxx series using 24CU (1536 ALUs, 96 tex/cl) as that'd pretty much be a direct way of comparing to Cayman's VLIW4.

At any rate, comparisons to the 360 die budget is kinda putting it close to the low end of today's GPUs (Cape Verde). I've mentioned it moons ago, but a die shrunk Barts would be pretty darn close as well...
 
And yes, I do realize I am ignoring eDRAM entirely. Ultimately, it's just a separate chip and whether or not that is cost effective for bandwidth is up in the air. High end GDDR5 isn't exactly inexpensive either. It also wouldn't be right to combine the two dice for an equivalent die budget because yield is non-linear with die size.
 
You don't need to be a rocket scientist to to tell 5x360 is not anywhere near a possibility for Wii U. That´s over 1k SPUs and Wii U is lucky to have close to half of that.

Considering the early dev kit GPU had at least 640, there's no reason to believe the final will be at or under 500. Anyway personally I'm not even a fan of the "x times more powerful" labels. They're too subjective IMO. I prefer looking at what the specs could hypothetically achieve until we see actual games over "Console A is x times more powerful than Console B based on such and such information."
 
Do we know what resolution is Nintendo aiming a? If its 720p, than I don't see a need to go "5x" faster than current gen consoles.
 
Do we know what resolution is Nintendo aiming a? If its 720p, than I don't see a need to go "5x" faster than current gen consoles.

WiiU may need to drive the pad in addition to the HDTV, 'specially if it can take more than one WiiU pads.

On the input side, it will also need to take data from all their Wii accessories, plus potentially new sensors in WiiU.

Not saying 5x is real or enough or too much, but there are always plenty of needs for memory and computing power. I supposed the workflow and development work can be simplified too if the machine itself is powerful enough. I think improvement in that area may be more welcomed for developers.
 
Do we know what resolution is Nintendo aiming a? If its 720p, than I don't see a need to go "5x" faster than current gen consoles.

If all 3 next gens don't target 1080p as the primary resolution then I'll skip them all...

I really want to get back into console gaming but looking at the way things are going it's not going to happen.

I'll happily stick to my PC for another 5-6 years.
 
Anyone have any idea how difficult it would be to integrate molybdenum disulfide into traditional foundries?

If that could be integrated both traditional gpus as well as consoles would seemingly experience vast improvements in short order. As heat constraints would seemingly go out of the equation.
 
Anyone have any idea how difficult it would be to integrate molybdenum disulfide into traditional foundries?

If that could be integrated both traditional gpus as well as consoles would seemingly experience vast improvements in short order. As heat constraints would seemingly go out of the equation.

Unfortunately, these sorts of research discoveries will take years not only for integration (this isn't a simple job of "add salt to soup, voila"), but also for the legal hurdles (patents come to mind). The fabrication equipment may not even support it in the general case as we have no information about this discovery or the lab conditions. You're also talking about fixing up all the fab plants to support it practically on a whim - one research study does not automatically translate to being feasible in industry for consumption. Fabs will want to do their own R&D, not just take someone else's work for granted. There's just a whole slew of testing that needs to be done given the considerably wide variety of devices that can be fabricated on a particular process node. For instance, we don't know what the research finding tested on or sampled. Did they use a complex IC or just a dummy test... etc.

And who is to say that the fabs don't already know? A lot of what goes on behind the scenes is trade secret.

Just some things to consider...


I wouldn't hold your breath for it to appear anytime soon. At this point is has about as much chance as using carbon nanotubes for heatsinks.
 
Unfortunately, these sorts of research discoveries will take years not only for integration (this isn't a simple job of "add salt to soup, voila"), but also for the legal hurdles (patents come to mind). The fabrication equipment may not even support it in the general case as we have no information about this discovery or the lab conditions. You're also talking about fixing up all the fab plants to support it practically on a whim - one research study does not automatically translate to being feasible in industry for consumption. Fabs will want to do their own R&D, not just take someone else's work for granted. There's just a whole slew of testing that needs to be done given the considerably wide variety of devices that can be fabricated on a particular process node. For instance, we don't know what the research finding tested on or sampled. Did they use a complex IC or just a dummy test... etc.

And who is to say that the fabs don't already know? A lot of what goes on behind the scenes is trade secret.

Just some things to consider...


I wouldn't hold your breath for it to appear anytime soon. At this point is has about as much chance as using carbon nanotubes for heatsinks.
I wouldn't say the situation is as bad as nanotubes, this is a simple 2d layer of material not a 3d molecular structure.

Further research yielded the following
Despite molybdenite's potential, the researchers say it will be at least 10 to 20 years before it enters commercial use.

IT appears it might be available the gen after nextgen. Unless the mobile industry's hunger for energy efficiency and spending manage to expedite its arrival(3 companies have already shown interest in the research).
 
Last edited by a moderator:
At any rate, comparisons to the 360 die budget is kinda putting it close to the low end of today's GPUs (Cape Verde). I've mentioned it moons ago, but a die shrunk Barts would be pretty darn close as well...

Why should we be comparing the die size of Xenos alone when Xenos also had dedicated EDRAM for the sole purpose of assisting graphics throughput?

182mm2 Xenos
80mm2 Edram

262mm2 Dedicated graphics die size for Xbox360
240mm2 Dedicated graphics die size for RSX

In context, I think it's safe to say ~250mm2 was the graphics budget this gen.

Projecting the future GPU budget on this number also assumes that the overall die size budget will scale equally for GPU and CPU which for many reasons I don't think is a reasonable expectation.

GPGPU will see more CPU functions cast off the CPU and onto the GPU. Also from discussions that have taken place regarding current and future CPU projections, the ability to cram more into a CPU die budget reaches diminishing returns more quickly than expanding the GPU die budget which scales more linearly.

Therefore, I think it is safe to assume that whatever the overall die budget is, a larger percentage of that budget will be dedicated to the GPU than ps3/xb360's 51-60%.


Perhaps there will be a shrinking of overall die budget for standard inclusion other items/features (kinect2/move2), but I can't see a good reason for Sony/MS to significantly undermine overall system performance and potential sales to save a few dollars in die space.

Especially with increased competition in the console sector.


1400-2000 alu
250-350mm2 w/ motion control as standard
300-400mm2 w/o motion control standard
80-120mm2 cpu
 
Why should we be comparing the die size of Xenos alone when Xenos also had dedicated EDRAM for the sole purpose of assisting graphics throughput?
I explained in the follow up post.

Yield is not linear with the die size so you can't just blindly add them together. Naturally, two smaller dice are easier to produce than a monolithic die that's equal in area.

I think it's safe to say ~250mm2 was the graphics budget this gen.
eh, coincidence. eDRAM is also a different manufacturing process from CMOS (different cost structure as well). You can't just equate 80mm^2 of (mostly) eDRAM to that of CMOS logic in terms of both cost and TDP* and power consumption and board complexity.

RSX was also significantly bloated compared to G71 (191mm^2), in no small part due to XDR I/O and redundant hardware being shoved in there to increase yields.


The whole point of looking at the ballpark 180mm^2 was to look for a base die upon which there's the option of eDRAM. Again, as I already mentioned and you hastily skipped, we don't know what the high bandwidth options will be.


-------------

You might even think about how ditching eDRAM and merging the ROPs and Z/Stencil back into the mother die would still have produced a chip that's significantly smaller than RSX and yet it'd still have shading efficiencies and geometry setup advantages and single cycle 4xMSAA. Replace the eDRAM I/O with another 128-bit bus.

*If you consider the TDP as well, I'm not so sure MS would have asked for a 250mm^2 part if they didn't have eDRAM. eDRAM thermals aren't going to be anywhere near that of the main processor logic - the heatmaps of the 65nm eDRAM and 45nm CGPU indicate as much if you really want something tangible, but it should be obvious.

There are going to be more considerations than just overall die area.
 
Status
Not open for further replies.
Back
Top