Predict: The Next Generation Console Tech

Status
Not open for further replies.
If this is an SOC in the purest sense, as an actual System On a Chip, and it's manufactured in a manner much like the chips before it, yes. There's a physical limit to the optics responisble for projecting the mask patterns on the wafer, probably something a bit over 600 mm2.

There are likely cost concerns that arise before it, and there are ways to make larger chips, but those aren't used for this kind of chip.
 
With a chip that large, yields may be an issue given the amount of stock that a product launch of this caliber will require. Perhaps there is a compromise in APUs; CPU and GPU occupying the same die while everything is off-site.

A custom Trinity APU may fit the bill with a disproportionate amount of transistors owed to the GPU. AMD could sell them is desktop parts; Microsoft in a home console.
 
Perhaps there is a compromise in APUs; CPU and GPU occupying the same die while everything is off-site.
What would that "everything" be and how big chip would it make when you chop it off from the SoC? My guess is it'll be a tiny fraction of the SoC and you'll be making it huge by having to attach data transfer pins to it from both APU and the extra chip side.
 
I am not the one claiming that Blu-Ray is as dead as SACD, i am just showing how it´s taken over the physical market from DVD´s. The idea that personal use and "my friends" usage pattern can explain why it´s a good idea to drop Optical from the next XBOX is classic, there is clear tendency that Blu-Ray is taking over the Physical market, so someone friends must be watching them.

You're putting words in my mouth.I never compared SACD to BluRay, I merely mentioned that I have a physical media library, and a player to play it with. I face a hassle and an expenditure when I need to upgrade said library. I will face the same hassle and expenditure again in the future if I were to choose a physical format again.

And It's not my personal usage patterns, it is the trend.

BluRay has spent six years getting to 25% of physical media sales in the U.S. The growth have been more or less linear. compare it to DVD where pickup was slow initially, but then rose exponentially. BR players are now dirt cheap, way cheaper than DVD players were when DVD reached its inflection point.

BluRay looks more and more like BetaMax, a technical superior format, that failed.

And the streaming vs Physical media is even more classic. It´s always been rental vs streaming, why should people that usually buy movies stop doing that just because they can stream? Of course there will be a good amount of people that take the couch approach and just streams/rent the movies instead of buying them, but unless someone can come up with some solid numbers on how does that it will be us guessing.

There are a ton of reasons why people stop buying physical media. Generally, buy-to-own is falling out of favour because people finde more value in VOD or subscription based streaming (NetFlix). Physical media is also a poor fit for how people use media today. It is a PITA to transfer a BluRay movie from disc to your tablet or phone, or to another PC. If you really want to own your media, it makes more and more sense to buy it in DD form.

Cheers
 
Last edited by a moderator:
if you already have a player then why would getting a new console make it impossible to use your old player for the old format media?

My old player has a limited life expentancy. Also, do I want to have a stack of players at every TV in my house? No.

I can hope that the future format will be backwards compatible, but there is really no guarantee. Heck, even Sony unceremoniously dumped support for their own SACD format in PS3s with the release of PS3 slim.

Cheers
 
A very dumb question

With amd effort to unify memory management between cpu and integrated gpu, and between cpu and discrete, there's some hope (or sense) in uma with two different pool of memory?
 
A very dumb question

With amd effort to unify memory management between cpu and integrated gpu, and between cpu and discrete, there's some hope (or sense) in uma with two different pool of memory?


Do you mean two pools of memory of different bw/latency? I'd say certainly. It could be as easy as using a ring bus MC. However, there are some issues/concerns that I think should be addressed.

  • Sufficient access to both/all memory. What I mean here is that access should not be restricted beyond its inherent characteristics. This is where the PS3 scheme failed, had the cpu/gpu access been more or less equal to both pools there would have been few issues. In fact, its entirely probable that most Devs would have been using the xdr memory for their gpu buffers.
  • A unified addressing scheme. Not really an issue for a proper ring bus MC.
  • Fine grained control. I know we are coming out of a period where some Devs were horrified at the thought of having to include a dma call, but then there may be some Devs out there would be equally horrified at the thought of that 100MB texture they touched once sitting around taking up space until LRU kicked in.

There's probably more so I'd like to hear others.
 
In theory they could use a COTS Opteron meant for dual CPU systems and add Hypertransport to the GPU instead of PCI-E ... would allow reasonable high bandwidth and low latency access across pools (of course GPU memory would still be a hell of a lot faster).
 
It's so easy?
i believed to be kicked out from the forum for asking a notuma uma :p

one of the best thing about xbox and 360 is that you have only a big pool of memory, so that you can choose to balance texture or ai state
with that deranged uma scheme how this will be managed?
 
It's not trivial, but it's not a whole lot of work relatively speaking ... PCI-E already has direct memory access on the GPU, changing it to Hypertransport would not be a fundamental change in architecture. That said, AMD probably wouldn't be too interested in doing it ... they would much rather push a single die Fusion chip.

In theory NVIDIA could do the same thing with an Intel processor and QPI, but they would need a QPI license first and lack in-house expertise with that interface.
 
I have seen fma both in amd's apu and gpu roadmap, so in some form they are planning it for 2013/14 on the desktop
 
I'm curious what kind of interface could be implemented if CPU and GPU are on an interposer from the start, surely either QPI, PciE, HT or flexIO are all based on significant equalization circuit, high power and nasty capacitance, so possibly a lot of die area and power wasted. Intuitively they all look like the wrong choice.

I would think they'd want a new very wide interface, very low power, lower latency, etc... Does such a thing already exists or would they need something completely custom? What was the interface used for the edram chip in the xbox360, would this approach be relevant?
 
Reflecting on an AMD SoC... Llano was 228mm^2 for 4 CPU x64 cores + 240 Cypess Shaders. About 35% of the die, or 80mm^2, is for the GPU. So a quadcore AMD x64 on 32nm is running at 150mm^2.

To compare Deneb (4core x64) was 258mm^2 on 45nm. 346mm^2 for the 6 core Thurban. Intel's 6 core Gulftown was 240mm^2 on 32nm. Bulldozer ( 8 core / 4 module) with 4x2MB L2 and 8MB L3 is 315mm^2 on 32nm with a TDP of 95-125 W for 2.8-3.9 GHz.

I know AMD chips are more robust and have much better IPC than IBM's PPC chips current in consoles but looking at the area dedicated to 4 cores when the 360 already has 3 cores / 6 HW threads. Considering more than 60% power reduction and over 50% total die space reduction from 90nm 360's CPU/GPU budgets to the 45nm Vejle CPU/GPU revision it is a moment for pause. It looks like the PPC cores are 40% or so of the SoC and looks like you could easily fit 6 cores / 12 threads in about the same area the 90nm Xenon used. A fresh design at 28nm for similar PPC cores is in the 12 core / 24 thread count. Even assuming a more fleshed out PPC core with some L3 and the requirement for better interchip communication I think IBM could deliver an 8-10 core / 16-20 thread solution.

So reflecting back on a 32nm 4 core Llano CPU at 150mm^2 and then contrasting a PPC solution with 8 core/16 HW threads (which I think is conservative) at a similar footprint and considering the "console mantra" of pretty lean / long lifecycle focus I don't see how an AMD SoC with their current CPUs is very competitive. Core count isn't everything but if you are looking for long term bang for buck for a product with an 8 year development cycle current AMD SoC solutions aren't very fast. And that isn't even mention how slow the Llano GPU is, especially its memory situation. Maybe future AMD solutions will be a lot better, or MS/Sony could ask for special designs. If the latter then there is no reason not to have a PPC chip there as IBM has already integrated an AMD graphic solution to their PPC architecture.

Question on a "SoC" (really GPU+CPU): What is so appealing about a single chip when you could go with a CPU and GPU with the same footprint but a better bus design, memory configuration, yields, and flexibility in manufacturing and design wins (e.g. AMD win your GPU, IBM your CPU contracts)?

Is it only the concept of a cheap console right out of the gate? What kind of savings are your proposing such a SoC will have? It doesn't seem likely that coherence between the CPU and GPU can be the argument as AMD won't have such products until 2014. And what advantage is that, anyways, when the memory for such AMD products is pretty poor. Basically right back to a custom design to overcome such, which begs the question: they why force a SoC right out of the gate?
 
The advantage of an AMD x86 APU is price. Relatively, it takes very little design work to switch the memory controller from the desktop version designed to couple to JEDEC standard DIMMs, to a GPU controller designed to interface with soldered chips.

That's all it takes to go from a vanilla x86 SoC, debugged and produced in mass volumes, to a console product that will perform admirably for the resolution (HD7750+ level). If you want to add some easy customization, increase the number of GPU blocks to taste.

Design work/cost/risk is miniscule, even manufacturing could be largely debugged.
 
Cost as in compared to... the PPC cores that already do the same thing? You may have an argument if this was a new ARM architecture being discussed, but IBM already has PPC designs doing exactly what you laid out.

With AMD you do have the concern of delays (...) or worst if you are relying heavily on them for your 2-3 major revisions and a haldful of other minor changes their is their fiscal situation to consider.

On price: lets assume IBM wants hundreds of millions for a design contract and AMD is just happy to toss a design out for a meager license fee. There is a cost consideration for the *size* of the AMD design. Their designs are larger and hotter and you consider that across over 80M units over the lifetime you are right back into the hundreds of millions of extra monies needed to accommodate the "cheaper" design. Unless we are saying performance / area (cost) is pretty much irrelevant going forward?
 
Question on a "SoC" (really GPU+CPU): What is so appealing about a single chip when you could go with a CPU and GPU with the same footprint but a better bus design, memory configuration, yields, and flexibility in manufacturing and design wins (e.g. AMD win your GPU, IBM your CPU contracts)?

I do think costs can be sufficiently lower for an APU than discrete components on a mature process.

Yes, yields will always be better for two smaller chips than one big one, but at some point on a mature process they'll be good enough and eventually the costs of packaging, testing, shipping, and installing two chips will be more expensive than one bigger one.

I don't know when then break even point is, but AMD seems to think it's in their business interest to sell a GPU+APU chip instead of two lower cost chips.


For my experience in the embedded world, I've seen that I/O can be a significant contributor to power. I've seen very high speed I/O ( at least for our application ~1GB/s) consume around 15-20% of the power budgets (~10W). Granted some of that could have been our own implementation faults, but I imagine that keeping CPU/GPU IO on chip would be very beneficial especially if there's a large-on-chip eDRAM.
 
Wow all this time Alstrong was holding out, 256 bit IS possible. The fact the chip can go to an SOC later so pad limiting is not a major factor for much longer had completely escaped me.

Too me it's an absolute no brainer then and simplifies everything immensely. Forgot all this EDRAM, DDR3 for cache, and other nonsense. 4GB RAM is then possible with relative ease and even 8GB could happen.
 
Cost as in compared to... the PPC cores that already do the same thing? You may have an argument if this was a new ARM architecture being discussed, but IBM already has PPC designs doing exactly what you laid out.

With AMD you do have the concern of delays (...) or worst if you are relying heavily on them for your 2-3 major revisions and a haldful of other minor changes their is their fiscal situation to consider.

On price: lets assume IBM wants hundreds of millions for a design contract and AMD is just happy to toss a design out for a meager license fee. There is a cost consideration for the *size* of the AMD design. Their designs are larger and hotter and you consider that across over 80M units over the lifetime you are right back into the hundreds of millions of extra monies needed to accommodate the "cheaper" design. Unless we are saying performance / area (cost) is pretty much irrelevant going forward?

To clarify - I don't think going with an AMD x86 SoC is a brilliant idea.
I merely pointed out where using such an existing design could bring savings - by minimizing design/debugging cost and manufacturing risk.
 
Reflecting on an AMD SoC... Llano was 228mm^2 for 4 CPU x64 cores + 240 Cypess Shaders. About 35% of the die, or 80mm^2, is for the GPU. So a quadcore AMD x64 on 32nm is running at 150mm^2.

To compare <snip>
Look at the die photo in your own link: The 4 cores of Llano (with L2 caches) takes up less die space than the GPU part. I remember we looked at the X360 CPU when it was released, the die area of one 360 core was roughly comparable to an Athlon 64 core, both in 90 nm (35mm^2 IIRC).

The 360 CPU and the CELL PPU was a result of the CPU design paradigm IBM was pursuing at the time of super fast in-order designs. It ultimately resulted in the Power 6 processor. It was a disaster.

A core of the much wider out-of-order POWER 7 is smaller than a P6 core on a process normalized basis while delivering higher single thread performance, much higher throughput and consuming less power.

Cheers
 
Status
Not open for further replies.
Back
Top