Predict: The Next Generation Console Tech

NRP · Sep 26, 2012

Are people seriously pissed that the 720/PS4 might ship with a 6870/6950-level gpu? Those are pretty good gpus. In a closed environment programmed to the metal, that gpu could rock harder than Yngwie Malmsteen.

Acert93 · Sep 26, 2012

JEDEC published DDR4 standard.

Ruskie · Sep 26, 2012

NRP said:
Are people seriously pissed that the 720/PS4 might ship with a 6870/6950-level gpu? Those are pretty good gpus. In a closed environment programmed to the metal, that gpu could rock harder than Yngwie Malmsteen.

No, its just that people want see GPUs that are not 2 years old. Something of similar characteristics, but newer and more efficient. There is no reason MS would go with something like that since its too big and too hot in comparison with better deals you can get now (and even better by the end of 13').

Than again, I don't know if those Alpha kits where even legit. Yes they got "MS Asset" tags on, but again, who knows...

(((interference))) · Sep 26, 2012

ERP said:
While I agree that it's a reasonable spec for a NG console, I wouldn't draw any conclusions from it.

The processors in the 360 alpha kits were MUCH faster (excluding FP code) that the final 360 processors, bit a lot of teams in the ass even though it was obvious if you gave it any thought.
.

I didn't know the G5s in the alpha kits were faster than Xenon - I thought Xenon was pretty good for its time and I remember reading a dev (maybe on B3D) saying how it has held up better over the years than any other consumer CPU available at the 360's launch would have.

This seems to be borne out in this DF interview with the devs of Metro 2033 who suggest the Xenos is far weaker compared to modern GPUs than Xenon is to modern CPUs.

Digital Foundry: How would you characterise the combination of Xenos and Xenon compared to the traditional x86/GPU combo on PC? Surely on the face of it, Xbox 360 is lacking a lot of power compared to today's entry-level "enthusiast" PC hardware?

Oles Shishkovstov: You can calculate it like this: each 360 CPU core is approximately a quarter of the same-frequency Nehalem (i7) core. Add in approximately 1.5 times better performance because of the second, shared thread for 360 and around 1.3 times for Nehalem, multiply by three cores and you get around 70 to 85 per cent of a single modern CPU core on generic (but multi-threaded) code.

Bear in mind though that the above calculation will not work in the case where the code is properly vectorised. In that case 360 can actually exceed PC on a per-thread per-clock basis. So, is it enough? Nope, there is no CPU in the world that is enough for games!

http://www.eurogamer.net/articles/digitalfoundry-tech-interview-metro-2033?page=4

360 and Xbox before it shipped with state of the art GPU's, Xbox near the transition to programmable shaders and 360 at the transition to Unified shaders, so not surprisingly yes the alpha kits had weaker GPU's, again I wouldn't call it a trend.

But since MS has a history of shipping GPUs that are better than devkit hardware, the GPUs in the leaked devkit aren't outrageous choices for a next gen machine and since the PS4 spec leak has it's GPU around the same power; I think it would be reasonable to assume that we'll see similar (or perhaps better) performance in the final hardware.
Certainly that seems more likely than the converse being true.

Blazkowicz · Sep 26, 2012

In its early life the Xenon was panned for being slow.
Carmack's comment was, half a pentium 4 3.2 GHz, which is very harsh. Of course this is missing the six threads and vector units, but if you come from a "safe" G5 where your "naive" code ran great then see your performance crumble on the final unit, that must have been quite a moment.

McHuj · Sep 26, 2012

Acert93 said:
JEDEC published DDR4 standard.

Interesting. Now, I'm a little hesitant that they will use it. A techspot article claims it really won't make it to PC's until 2015. So MS and some server markets would be the early adopter of this.

No as far as bus speeds go, we're looking at 2.133 Gb/s at the start with a max of 3.2 Gb/s eventually. 2.133 Gb/s would only give you around ~68 GB/s on a 256-bit bus. If MS goes this route, then this eSRAM is a must. Even if they managed the 3.2 Gb/s variety next year, that's only about 102 GB/s. (7770 for comparison has a 72 GB/s BW on 128-bit GDDR5 bus).

The other interesting thing seems that modules will be spec'ed for upto only 16-bits of IO. So a 256-bit bus would require 16-chips, regardless of their densities. JEDEC does mention that you could stack them, up to 8 "high" so 8 devices could share a pin. Maybe long term, MS's cost and power reduction plan is to go from 16-devices on a 256-bus to 2 "8 stacks" running a 32-bit bus (or some combination of).

Sorry, one more thing. There have been rumors of Durango going with 6 or 8 GB of RAM.

8 GB of DDR4 Ram with 2Gb chips = Total 32 chips, bus: 512 bits with a BW 136 GB/s (using 2.1333Gb/s)
8 GB with 4Gb chips = Total 16 chips, bus: 256-bits 68 GB/s
8 GB with 8Gb chips = Total of 8 chips, bus: 128-bits 34 GB/s
6 GB with 2Gb chips = Total 24 chips, bus: 384-bits 102 GB/s
6 GB with 4Gb chips = Total 12 chips, bus: 192-bits 51 GB/s

IMO, none of these options look great or cost effective, but I would say 8 GB with 4Gb chips is most likely.

Earendil · Sep 26, 2012

Ruskie said:
No, its just that people want see GPUs that are not 2 years old. Something of similar characteristics, but newer and more efficient. There is no reason MS would go with something like that since its too big and too hot in comparison with better deals you can get now (and even better by the end of 13').

Than again, I don't know if those Alpha kits where even legit. Yes they got "MS Asset" tags on, but again, who knows...

I could put an "MS Asset" tag on my ass, but that doesn't make it a dev kit.

Rangers · Sep 26, 2012

Blazkowicz said:
Yes it's short-lived. It was meant to live on a cancelled 32nm process if I'm not mistaken. The radeon 6970 as it came out was sort of a backup plan. In the end it will be most significant on the Trinity APU, here it's a stop-gap until gcn 2.0 (or gcn 1.1 if you're cynical)

VLIW 4 is a tweak on VLIW 5 architecture, it remains entirely similar. We may joke that AMD had more work updating its compiler.
A 5D vector was good for basic rendering operations you do over and over again but with shaders getting longer, more instructions would readily fit in 4D. So switching to VLIW 4 allowed better utilization.

GCN is a whole lot better for GPGPU and damn good for gaming, whatever cruft is added from the complexity doesn't matter much when we see the result.

I still have a question in my mind whether it would be better to have, well for example the HD7770 and HD6870 are pretty similar in transistor count, but one is VLIW5 and clocks 2 teraflops, the other is GCN and clocks only 1.28 teraflops.

Which is better in a console environment?

I have a feeling very few people in the world can answer that...

XpiderMX · Sep 26, 2012

Earendil said:
I could put an "MS Asset" tag on my ass, but that doesn't make it a dev kit.

Developers don't will say that your ass is a devkit

Mobius1aic · Sep 26, 2012

Rangers said:
I still have a question in my mind whether it would be better to have, well for example the HD7770 and HD6870 are pretty similar in transistor count, but one is VLIW5 and clocks 2 teraflops, the other is GCN and clocks only 1.28 teraflops.

Which is better in a console environment?

I have a feeling very few people in the world can answer that...

Barts on 40 nm = 255 mm²
Juniper on 40 nm = 166 mm²
Cape Verde on 28 nm = 123 mm²

The game would change if Barts and Juniper were on 28 nm, but the benefits of GCN in terms of efficiency, greater tessellation capability, ability to carry out GPGPU better, and the fact it's on 28 nm today would make me go for it. If you're willing to pay for a Barts sized die at 40 nm, you could go with Pictairn at 212 mm² to get the GCN feature set and similar GFLOPS to Barts yet enjoy Pictairn's greater efficiency.

Obviously, I really like Cape Verde. It's as if it was designed for a console in some respects, with very high GFLOPS per Watt, and would've made a great GPU for a new console in late 2010 or 2011, but these days I'd spring directly for Pictairn. However, I'm really attracted to building a high efficiency, low profile system with an i3 and one of those low profile 7750s from either Sapphire or HIS.

Blazkowicz · Sep 26, 2012

I like to think about a custom Haswell console, if Intel would make a dual core variant with the top end GPU or even a bigger one, with a bigger attached memory - 256MB at around 250GB/s, with dual channel ddr4 3200.
The GPU wouldn't be that great but hell it would be power efficient.

Thanks $deity Intel does not want to go that route and merely wants to rule the laptop and the server - where they want to sell a lot of cores, regular Xeon without GPU, or actual many-core CPUs.

Rangers · Sep 26, 2012

Mobius1aic said:
Barts on 40 nm = 255 mm²
Juniper on 40 nm = 166 mm²
Cape Verde on 28 nm = 123 mm²

The game would change if Barts and Juniper were on 28 nm, but the benefits of GCN in terms of efficiency, greater tessellation capability, ability to carry out GPGPU better, and the fact it's on 28 nm today would make me go for it. If you're willing to pay for a Barts sized die at 40 nm, you could go with Pictairn at 212 mm² to get the GCN feature set and similar GFLOPS to Barts yet enjoy Pictairn's greater efficiency.

Well, the whole point of pointing out similar transistor count was to assume similar die size and cost. So at the same cost which would be preferable?

It looks like CV is 128 and Barts is 256 bus though, ruining the comparison to some extent as the latter could be more costly over time.

liolio · Sep 26, 2012

NRP said:
Are people seriously pissed that the 720/PS4 might ship with a 6870/6950-level gpu? Those are pretty good gpus. In a closed environment programmed to the metal, that gpu could rock harder than Yngwie Malmsteen.

Yngwie J Malmsteen

Beware the guy has a strong ego, a good front kick and mean boots...

My hope is that AMD can pull somehow a Bart out of the 7870 (and the hd7970).
The hd 5870 had 20 SIMD and 32 ROPs, the HD 7870 is pretty much the same though a different architecture. From this AMD ended with with 14 SIMD and 32 ROPs and a bit simpler memory controller and slower memory. This diet saved quiet some silicon as the went from 334 mm^2 to 255 mm^2.
It saved power too as the chip ended up consuming the same as a HD 5850 while outperforming it (slighly).
power and performances comparison can be found here and here.

Pitcairn is tiny (212 mm^2), so I guess that the 256 bit bus would have to go.
I would be willing to see a more proper hd 57xx type replacement (cap verde does great thing but it's a really tiny gpu)
Think something like 14 CUs, 24 ROPS and a 192 bit bus.
That would be a tiny chip, AMD gained 25% from the hd 58 xx to the 68xx without touching the ROPs number. It could be a 150 mm^2 chip (or less)

It's a bit OT as I don't expect exactly such a set-up but it puts things into perspective. an HD7850 consume ~100 Watts but it obliterates either a HD 6870 or HD 7700 and slightly beats a HD 6950.
It's card that runs most games at 1080p x4 AA at more than 30fps (way more in some cases).

I could see such thing get close to a HD7850 (say AMD clock the part higher than a hd7850 and uses faster memory).
-------------------------
Actually I believe that the kind of GPU Sony at least should aim at. Put such a GPU and a quad core jaguar of a SOC. Lower the clock say 800MHz, keep a reasonably fast memory 1.2GHz (as from AMD own words ROPs can be under fed in Pitcairn). Stuck 2GB of GDDR5 to the 192 bit bus Nvidia style and call it a day.
They may end with a chip that under 185mm^2 and still "easily" produced and burns south of 100 watts.

From K.Kutaragi:

Cell has 8 embedded "SPE" CPU cores. What is the basis for this number?
Because it's a power of two, that's all there is to it. It's an aesthetic. In the world of computers, the power of two is the fundamental principle - there's no other way. Actually, in the course of development, there's this one occasion when we had an all-night, intense discussion in a U.S. hotel. The IBM team proposed to make it six. But my answer was simple - "the power of two." As a result of insisting on this aesthetic, the chip size ended up being 221mm2, which actually was not desirable for manufacturing.
In terms of the one-shot exposure area, a size under 185 mm2 was preferable. I knew being oversized meant twice the labor, but I on the other hand, I thought these problems of chip size and costs would eventually be cleared as we go along. But in this challenge of changing the history of computing, I could not possibly accept any deviation from the rule of the power of two.

borrowed from here.
It's interesting to notice that for both Xenon and Xenos MSFT remains below that size.

As for the OS, I believe they should use an cheap ARM SOC @45 with its own say 512MB of RAM and use AMD technology that allow to connect multiple SOC together. It provides coherency and virtualization of IO.
It's a cost but it could prove a lesser cost than putting more than 2 GB of GDDR5 in the system. 2GB is not a catastrophe but if you start to byte into for the OS in significant proportion it could prove bothering.
As a win they may use the same OS for the vita and the ps4.

Overall I still hope Sony comes with not pride this time but a sort of Wii for core gamers with a Wii like price and free online. They may capture a lot more core gamers than hard core gamers who want the best thing in town (and are willing to pay the price for) are willing to believe.
Something like 229$ 12GB of flash version, 299$ + HDD 500 GB. So pretty much shipping at the same price as actual PS3.

fehu · Sep 26, 2012

DDR4 is official, there's some future processor that support it?
It can change a lot if request and production are at good levels by mid 2013

Blazkowicz · Sep 26, 2012

If consoles use it late 2013 and Intel servers use it early 2014 then makers have an easy "make shit tons of it now" plan.
Also for a console you can do ddr4 3200 with high timings and high voltage I believe, dropping the price and maybe voltage over time. This lacks OEM desktops and low end graphics cards taking the bottom of the barrel ddr4 though. We need low end GPUs supporting gddr5/ddr4/ddr3. Bad ddr4 can find a lot of customers too but they will need their memory controllers to support it.
Desktops/laptops wait till 2015 according to current plans. Fringe stuff may use it such as "dram drives" for some servers and what not.

Shifty Geezer · Sep 26, 2012

NRP said:
Are people seriously pissed that the 720/PS4 might ship with a 6870/6950-level gpu?

Whether they are or aren't, this isn't the thread for it. There's a next-gen discussion thread for talking about the new consoles. This thread is for working out what might be in them.

You'd hope after nearly 15,000 posts people would have got the gist of that by now.

almighty · Sep 26, 2012

My take..

CPU

An AMD APU is out of the question, way to power hungry and will go very close to the power budget on its own. And AMD A8 APU will top out at 150-160w in reviews and AMD's new A10 APU's are built on the same 32nm line so even they won't show a significant drop in power consumption. The IGPU is also not fast enough t warant it's power consumption which could be used for a faster discreet GPU.

Now they have to have a quad core CPU, the best performance would be to go with Intel but that's also the most expensive option, the second best option would be to go with a Trinity based CPU with the IGP removed for reduced to decrease costs, this would put the power load at the 65w mark at full load while offering good performance. Failing those options then a custom CPU yet again from IBM.

GPU

RSX/G70 consumed around 80-85w under load, in today's world the closest GPU to that is the new AMD HD 7770 which tops out at 83w, for an extra 18w they could have a HD 7850 that's rated at 101w

CPU - 65w Quad core
GPU - 83w HD 7770

Total - 148w

Or 166w with an HD 7850

How many whats would the memory and I/O devices use?

AlNom · Sep 26, 2012

Highest end GDDR5 is probably around 2W per 2Gbit DRAM (6-7Gbps, 1.5V). At 4Gbps it's under 1W.

McHuj · Sep 26, 2012

For 4 GB, that would require 16 2Gb chips which depending on speed will consume 16-32 watts. IMO, that's a significant percentage of the total power consumption.

Rangers · Sep 26, 2012

So, the tags, presumably written by the in the know B3d mods, outed the Durango GPU as 2.8 teraflops, before it was abruptly removed further proving the nugget's validity.

Discuss

Predict: The Next Generation Console Tech

NRP

Acert93

Artist formerly known as Acert93

Ruskie

(((interference)))

Blazkowicz

McHuj

Earendil

Rangers

XpiderMX

Mobius1aic

Quo vadis?

Blazkowicz

Rangers

liolio

Aquoiboniste

fehu

Blazkowicz

Shifty Geezer

uber-Troll!

almighty

AlNom

Moderator

McHuj

Rangers

Similar threads