Predict: The Next Generation Console Tech

Status
Not open for further replies.
Its the same thing this gen. gdr 700 was how much in 2005 , how much is it now.
Actually outdated RAM prices go up! RAM for my Athlon 2500 is 50%+ more expensive than the latest RAM. When the market has moved on, economic forces as very well defined by economists cause prices to increase. I don't know how RAM prices will vary across the gen. However, if you're talking $20 extra dollars per unit at launch with a view to that dropping, that can amount to hundreds of millions of profits over the life of the console. Why were MS wanting to put only 256 MBs into XB360 - to save on the pennies, which all add up. A line has to be drawn where an extra few dollars BOM is deemed too much, where a 20GB HDD is deemed better value at $5 a piece cheaper than a 60GB HDD and $10 more RAM is $10 more than you'd like. RAM often seems one of the first things to go in a console, with them having significantly lower amounts than PCs of the time. If PCs are going to be 4-8 GBs come next-gen...
 
Last edited by a moderator:
I'd like to ask two noob-ish questions, if I may:

1) Can I apples-to-apples compare a transistor in a GPU to a transistor in a CPU (assuming the same process?)

2) Is there are a transistor break down of the 4870 chip anywhere? Like - how many transistors were spent on each SIMD core, on each 4-piece texture unit etc.. Even decent estimates would be good..

I know the latter mightn't seem very relevant, but I'm going somewhere with it..
 
Actually outdated RAM prices go up! RAM for my Athlon 2500 is 50%+ more expensive than the latest RAM. When the market has moved on, economic forces as very well defined by economists cause prices to increase. I don't know how RAM prices will vary across the gen.

Just out of curiosity, what ecnomic forces are we talking about regarding the price increase of your older RAM?
 
My query up there was based on the thought of bolting texturing and other fixed-function graphics logic onto Cell and seeing what kind of tradeoff would have to be made, and how much could be squeezed onto a 200-250mm^2 die by 2011, say...ultimately asking if this could be reasonably competitive..and if it'd be worth sticking 2 of those in a machine vs a cell + gpu..

Now Larrabee seems to be in that kind of mould, I suppose one could think about that as a kind of template rather than existing gpus..

Asides from potential paths of evolution for Cell, Larrabee itself I think could potentially loom large over tech decisions for the next-gen systems.
 
Actually outdated RAM prices go up! RAM for my Athlon 2500 is 50%+ more expensive than the latest RAM. When the market has moved on, economic forces as very well defined by economists cause prices to increase. I don't know how RAM prices will vary across the gen. However, if you're talking $20 extra dollars per unit at launch with a view to that dropping, that can amount to hundreds of millions of profits over the life of the console. Why were MS wanting to put only 256 MBs into XB360 - to save on the pennies, which all add up. A line has to be drawn where an extra few dollars BOM is deemed too much, where a 20GB HDD is deemed better value at $5 a piece cheaper than a 60GB HDD and $10 more RAM is $10 more than you'd like. RAM often seems one of the first things to go in a console, with them having significantly lower amounts than PCs of the time. If PCs are going to be 4-8 GBs come next-gen...


Yes but isn't that because they stop producing that ram type in the numbers it was previously made at. For example , DDR ram is no longer made from my knowledge and that is why it has gone up. Its the Same with slower grades of ddr ram. Also they no longer move the ram to newer processes. However I'm sure that a buyer like Microsoft is able to make large orders of the ram at a time and its incentive for the ram provider to continue to decrease cost on that ram. I'm sure whoever has the contract for gdr 700 ram will continue to drive prices down. The first thing they could do is reduce the amount of ram chips needed by increase the density of the ram correct ? Then they can move on to a newer process thus reducing the size of the ram again. MS will most likely still need another 20m xbox 360s at the very least. Sony who is also using that ram for part of their ram requirements will also need at least 20m more ps3s.




http://www.engadget.com/2008/08/05/micron-announces-insanely-quick-realssd-c200-ssds/

This interests me.
the new 2.5-inch (up to 256GB) laptop and 1.8-inch (32GB to 128GB) ultra-portable storage slabs offer a 3Gbps SATA interface and ridiculous 250MBps read and 100MBps write speeds

These will be out later this year. There is no price given but it seems to me that ssd is going to explode in size quickly. A 1.8 inch 128 gig drive would not be a bad choice for an xbox next. Though I would think 256GB would be more ideal for the base model of it. These drives are insanely fast and at 1.8 inches they are extremely tiny. I wonder if these will be affordable for a console in 2011. The 20 gig drive was what 50-60 bucks when the xbox 360 launched in 2005. I wonder if they can get this at that price point by then. Ripping a game to the drive would ammount to extremely quick load times
 
However I'm sure that a buyer like Microsoft is able to make large orders of the ram at a time and its incentive for the ram provider to continue to decrease cost on that ram.
If there's only one supplier not in competition with another, who has landed the contract, they are likely to make more profit using their smaller process/highest efficiency technology for higher-volume productions where profit margins are slim and you want to maximize every penny. Moving production to a smaller process has a cost and you won't do it unless the returns are worth it. A niche product isn't likely to fit the bill. I also don't think RAM shrinks anything like processing transistors, so where a console CPU can be reduced to a sixteenth of it's starting size, the RAM won't go down so much.

And more importantly, the only reason we're interested in RAM cost reductions is for the choices the console companies have as to how much to put it. If you're providing RAM to MS for their console, you're only going to pursue cost reductions if that improves your profit margins. If you have to pass those cost reductions onto MS, why would you bother with them?! You could instead use the old, expensive lines to produce limited niche RAM at a higher price and still make use of that production capacity. So like nVidia didn't pass on savings to MS over the NV2A, MS would have to wrangle some deals to make sure they were getting an outdated RAM tech at best price long into the product lifecycle.

I'm not a RAM expert, know little about production costs, and can't say for sure, but RAM is certainly a considered cost such that, history shows us, the console manufacturers always skimp out on it when they can.
 
I think where texture repetition is an issue (racing car tracks and grass sidings!) procedural texturing can replace it. I really think procedural all-sorts can replace a lot of stuff, it's just not happening this gen. It's funny how many amazing technologies exist that just aren't being used!


Far Cry 2 is making heavy use of Procedural Data Generation. The skybox isn't a texture, but procedural generated clouds. IGN has a article on all the procedural stuff going on in Far Cry 2.


GDC 2008: Far Cry 2's Gamble
Breaking the mold.

Dominic Guay, Technical Director of Ubisoft Montreal, recently gave a GDC speech on "Procedural Data Generation" in Far Cry 2. What is Procedural Data Generation? At a basic level Procedural Data Generation is a system that allows techniques and algorithms to automate the data of game content. This new system throws the old way of designing a texture in Photoshop by hand away. Why spend hours drawing a texture, when the computer can do it for you in milliseconds? … Essentially it's the human chess player vs. the computer. Except that in Ubisoft's case… the human always wins.

The first game to fully utilize procedural generated content was a game developed in 1984 known as "Elite". The reason Elite used procedural data generation was because of the limited capabilities of computers to handle large amounts of prefabricated content. Therefore the game utilized random numerical data entrees and random mathematical tables to calculate its game content on the fly.

There are a lot of things procedural data generations can do for level design. The majority of Far Cry 2's level design utilizes this system. The system allows for particle effects, wind, fire, clouds etc… to all be automated on the fly though mathematical pipelines.


http://pc.ign.com/articles/854/854167p1.html
 
I'd like to ask two noob-ish questions, if I may:

1) Can I apples-to-apples compare a transistor in a GPU to a transistor in a CPU (assuming the same process?)

2) Is there are a transistor break down of the 4870 chip anywhere? Like - how many transistors were spent on each SIMD core, on each 4-piece texture unit etc.. Even decent estimates would be good..

I know the latter mightn't seem very relevant, but I'm going somewhere with it..

1) Yeah I think you can, each single transistor should take up the same amount of space on the same process... I think...

2) I think the best you could do would be to look at a die shot (they're around the interweb) isolate the units in question and calculate the % of the die that it occupies and then use that to figure out the # of transistors. But it'd be a very rough estimate. Transistor density is not uniform... cache is denser then logic etc.
 
I think question 1 is a bit misleading. In theory you'll have 1:1 transistor usage; I can't see why not. But AFAIK GPUs don't share the same manufacturing processes as CPUs, so the transistor budget will not be equal across chips.

I'm very far from being an expert though. This is just my vague feeling from what tiddly bits I think I've picked up over the years. :p
 
I have a question and I apologzie in advance for my lack of knowledge. Its my understanding that AMD is using z-ram in the future. Could this replace the edram that the xbox 360 uses (in a future xbox next console of course) and offer more speed and larger amounts of ram on the chip ?
 
Larrabee or a "larrabee like" solution while still unproven looks more and more attractive.
We can dismiss Intel (no mater they state again theat they would like to enter the market).

I tried to make estimation of the kind of "computational density" Ms could reach in taking the actual Xenon into something more larrabee like. I found out that it could and good enough.
(I dismiss Sony has they likely to have a cell2 + a gpu)
GFlops per mm² ratio between the different architecture would looks like this:
GPU>larrabee>XenonII

I think that the reasons that could push console manufacturers to dismiss this kind of architecture are more likely R&D costs and softwares considerations than potential value/prowess of the hard itself.

Thus I've questions... :D

Could a manufacturer afford the R&D to desing something complinat on its own?
IBM is working on the power8 (a eight cores that should provide 1/2TFlops) have already the cell at its disposal, I'm not sure that they're in need to offer something in the middle of this two offers. Thus I don't think that Ms can expect IBM to join force with them.

Say Ms doesn't go that route they will still have R&D costs associated to both the GPU and CPU.
Could focusing only on the "CPU" could balance the R&D costs?

So R&D cost is my main concern, my second concern is software, when I say software I think both about tools and the likes and the games.
A lot of pressure would be put on the tools, especcially for multiplatform games.
The design is unlikely to be the standard and develop for discrete CPUs and GPUs will still be the norm.
Ms will have to provide a layer close enough to the more commun graphic pipeline for easy portability and thin enough to not have to much perfs overhead (the same challenge as Intel but to a lesser extend as Intel is facing challengers head to head).


But clearly my main concer is R&D some opinions on this issue?
 
Why not just go with the waternoose agian. Its what 170m tranistors ? Thats at 90nm. I'm pretty sure we will be looking at 32nm when these things launch. So why not the 12 core monster they envisioned and add back in OOE support, beef up the cache and each of the cores. They already have tools based around the design.
 
Why not just go with the waternoose agian. Its what 170m tranistors ? Thats at 90nm. I'm pretty sure we will be looking at 32nm when these things launch. So why not the 12 core monster they envisioned and add back in OOE support, beef up the cache and each of the cores. They already have tools based around the design.
I think that if a company is to go with a "larrabee wanabee" it will likely be used as both CPU and GPU, Implement OoO in a many cores design may land to a too power hungry chip.

I thank about using a modified waternoose as a building mainly because of cost(R&D cost) but there are others reasons. Also I don't state it has to be that way, it was just an hypothesis that allow me to estimate grossly the kind of "computational density Ms could reachs.

By using the xenon as a building block, Ms would not have to start for ground.
The point is that for compatiility sake Ms would have to reach @3,2GHz this way retrocompatibilty would not be an issue. As Ms is doing way better, the 360 is likely to coexist some time with its successor, pass on BC would not be a good choice.

Ms could go the same way as Intel and base it's design on something old like a pc603/4.
They share similarity with the pentium=> simple, ~3 millions transistors, short pipeline.
But it would prove difficult to reach 3.2GHz with such a short pipeline, for reference the pc604 (which is OoO, 4 issues, and~3.6 millons transistors) is a six stages design.
Ms could split each stage in two, ends with a 12 stages pipeline more likely to reach high frequency. It's possible might yeld better result but resolve in more work and Ms has already quiet some work to do. That's why for the sake of my hypothesis I used the waternoose.

The waternoose is a long pipeline design that is not that power hungry (more than the cell but nothing bothering) running modified xenons @3.2GHz should be less of a challenge.

As I state Ms would have quiet some work on the table.
They would need a new instruction set for the VPU
They would have to design a real good VPU, to give the design a chance they aim at the kind of density STI reached with the SPU (1.4x the number of transistors per mm² compared to ppu/px). Thus it should be a huge ingeneering effort.

Then come the caches, once again huge effort here:
better latencies, good prefetching capability, ldetermine in which way the L2 cache(s) is shared among the different cores, implement the same tricks as Intel (some new instruction in regard to the L2 cache, read only policy within the different "non shared" L2 (no matter how it shared or the layout). Once density would also be a concern.

Then come the fixed function units (texture sampler) and how they are shared among the different cores.

Here a gross estimation based on cell datas:
--------------SPE---------- PPE--------xenon
90 nm----14.76------- 26.86--------168
65 nm----11.08--------19.6----------122
45 nm-----6.47---------11.32---------70
then the gross estimation
@45nm
Xenon x4
12cores, L2+4Mo
~300GFlops
~280mm²

XenonII x4
~1200GFlops
320mm ---> (random increase due to bigger VPU, texture samplers, and a Lot more registers (smtx4+512bit wide vpu))
@45nm XenonII => 80mm²

As it stand Ms would not lag that much behind GPU manufacturers, say @32nm they manage to pack 60% more transistors per mm² we have this:
@32nm Xenon II would be 48mm² and worse 300GFlops.

It's likely to be less as Ms would focus on density. I feel like 40mm²(@45nm) for wider VPU+bunch of registers some fixed function units is healthy estimate if we look at the kind of density ATI achieves for example.

I would say that depending on engineering efforts and how good is the process MS could lend anywhere between 35 and 45 mm² for the reworked xenon.

I we consider 40mm²=300GFlops MS would have legs to design something worse it.
For example depending on the silicon budget and power consumption/thermal dissipation ( and obviously retail price they aim at launch) could end with these kind of designs:

One chip systems:
8 xenonII (24 cores)
L2 cahe 8MO
~320mm²
2.4TFlops @3.2GHZ

6 XenonII (18 cores)
L2 cache 6MO
~240mm²
2TFlop @3.6GHZ.

A two chips design could be like this:
2x 4XenonII (24 cores)
2x160mm²
2x1.5TFlops @4GHZ

6 XenonII (36 cores)
2x240mm²
2X1.8TFlops @3.2Ghz

These example are random to show that Ms could have ways to adapt to meet its performace goals.
My most optimistic prevision could that Ms manage to have the XenonII around 35mm² or less (huge effort on density + good process) and good TDP and I would go with a one chip:
10 xenonII (30 cores)
L2 cache 10Mo
350 mm² or slightly less
3TFlops @3.2GHZ or higher

EDIT those are random numbers for the sake of discussion.
The real question remains can MS afford the R&D behind such a project?
 
Last edited by a moderator:
I think that if a company is to go with a "larrabee wanabee" it will likely be used as both CPU and GPU, Implement OoO in a many cores design may land to a too power hungry chip.

Do you really think that larrabee wanna be would compete with an updated waternoose and dedicated gpu by ati ? I doubt it.

By using the xenon as a building block, Ms would not have to start for ground.
The point is that for compatiility sake Ms would have to reach @3,2GHz this way retrocompatibilty would not be an issue. As Ms is doing way better, the 360 is likely to coexist some time with its successor, pass on BC would not be a good choice.
3.2ghz should not be hard to do at 32nms with a 12 core chip. The waternoose will be absolutely tiny on 32nm they could easily just add more of the old cores to it easily.

The waternoose is a long pipeline design that is not that power hungry (more than the cell but nothing bothering) running modified xenons @3.2GHz should be less of a challenge.
Waternoose was originaly envisioned as OOE core , I believe it was supposed to be a 12 core OOE design according to that book about the 360.

As I state Ms would have quiet some work on the table.
They would need a new instruction set for the VPU
They would have to design a real good VPU, to give the design a chance they aim at the kind of density STI reached with the SPU (1.4x the number of transistors per mm² compared to ppu/px). Thus it should be a huge ingeneering effort

VPU ? Why not continue going in the direction of original waternoose and just have more ppus . Developers are already use to the design. Just beef them up some more and fix as many problems developers might feel they had with it.

Depending on the size and I don't know about the maths behind it , I was thinking 24 core cpu 8mbs l2 cache and a sizable amount of L3 cache.

Then go with a dx 10 or 11 at that point gpu which would be larger than the cpu as it was this generation. Then perhaps 32mbs of e-dram or a newer verison of the ram.
 
Do you really think that larrabee wanna be would compete with an updated waternoose and dedicated gpu by ati ? I doubt it.
I will just answer this part.
I've really no clue and so far as I stated as in my posts "larrabee architectures sounds more and more appealing while still unproven" (or something like that).
Basically the main question is still how good these architectures will be at hiding texturing latencies? Till the first larrabee samples and real world benchs we're all just speculating.
I've interest in the larrabee mostly because this architecture opens a new world of possibilities.

In regard to you hypothesis, it's likely to be a safer approach.
A cpu + gpu as I say is likely to be the standard environemnt even by the time next system will launch.
For the cpu it will take a lot of effort to make a standard cpu scale well till 12 cores and to some extend why not push further and turn it into a larrabee like (obviously if the larrabee performs properly and I don't mean by properly that it has to be best of the class).

For the Gpu Nvidia and ATi do great thing that's sure they're likely to offer the most Flops per mm². If you looks at my gross estimates you will see that a larrabbe will no lag that much behind what gpus manufacturers offers. And it's interesting to consider that say a hd4850 is already to power hungry and warm to go in a console.
The point if one would do a system now it could use a hd4850 for example BUT it would have to be clocked lower, thus negating some ofthe raw power advantage.

Going with one big (not enourmous) larrabee wanabee has some advantages:
costs one chip simplier mobo.
Also you save bandwidth between the cpu andthe gpu.
In fact I don't think that you would need more "cpu" power than a xenon as some many tasks would be spread among a lot of cores. And you can do a lot of things in a cleverer way and save lot of ressources (you should read if you still haven't some presentation of the Siggraph ;) ).

Then it comes down to silicon budget you can't compare a cpu+gpu that would take say 450 mm² and a single chip ~350 especially as the mobo would be more complex.

I won't go further as I said the concetp is still unproven if succesfull it's a whole different story.

EDIT
I think more and more that Rapso had a point and that raw power is not that relevant as long as you don't lagged too much in raw power.
EDIT
By VPU I basically speak about reworked altivex units as I'm not sure I understand your talk about this part.
 
Last edited by a moderator:
I'd expect a quad core out-of-order CPU to be enough, beyond that if you have some massively parallel stuff to run you can do it on the GPU.
The CPU+GPU can either be separate or be a single chip would they go for a custom AMD Fusion for instance.

so my guess is an AMD Fusion with 2GB of fast 128bit GDDR5, that would allow a not too expensive system board that scales nicely with the next process. More simple than the three chip X360, and the PS3 with its two ram types and three architectures (PPC, SPE and G7x).
 
Last edited by a moderator:
I'd expect a quad core out-of-order CPU to be enough, beyond that if you have some massively parallel stuff to run you can do it on the GPU.
The CPU+GPU can either be separate or be a single chip would they go for a custom AMD Fusion for instance.

so my guess is an AMD Fusion with 2GB of fast 128bit GDDR5, that would allow a not too expensive system board that scales nicely with the next process. More simple than the three chip X360, and the PS3 with its two ram types and three architectures (PPC, SPE and G7x).

A quad-core what exactly...as compared to an evolved Larabbe or 32 SPU Cell or whatever MS is cooking?

I would not hedge any bets on anything the takes GPU processing time away from putting more and prettier pixels on screen.

Anyhow here's my bet on what the true Cell2 will look like possibly first appearing in PS4.

I suspect Cell2 will offer traditional memory management via memory coherent caching logic. I don't expect the caching logic will be replicated for each SPU. Instead the equivalent of caching "units" will sit around the plurality of ring buses in Cell2.

These cache units will service the memory requests of the SPUs which share the same spatial context with them on a grid. I'll refer to this spatial context as an "Oct-drant" as to signify 8 SPUs being placed around a single MFC in a local area on the chip die.

In this way MFCs do not map 1 : 1 with SPUs but instead they map 1 MFC : 8 SPUs. DMA MMUs found within the MFCs of Cell1 are replaced by caching logic or rather a caching "unit" which services the 8 SPUs within an Oct-drant found in Cell 2. The caching unit is the MMU ( I know I just went in a circle...I just want to stress its not the same unit in Cell2 as there is presently in Cell1 ). The MMU only handles addressing.

Each MFC in Cell2 has a single DMA control unit. There is no need for separate SPE and PPE queues. There is now a single prioritized queue the SPEs and PPEs share.

Each MFC now has "data-movement" units. These are 1 : 1 with SPUs. They only flip bits in memory cells.

Each SPU has a local cache. A single Oct-ported ram space shared by all SPUs in an oct-drant seems unrealistic. Separate SRAM caches are much more likely to be employed in order to maintain the "relative" massive bandwidth Cell(tm) chips enjoy. This would NOT mean an SPE is granted exclusive access to a cache UNLESS cache lines are locked along the physical boundaries of the SRAM pool within an SPU.

Each cache pool in an oct-drant sits on a ring bus exclusive to that oct-drant. Each of these exclusive "oct-drant" ring buses sit on another ring bus which connects all oct-drants and the PPU cores together so that all cores can communicate.

Each MFC has an Atomic unit which handles synchronization.

MMUs are decoupled from the "bit-flippers" so that they can work asynchronous on translating the memory requests the 8 SPEs generate. Each bit flipper handles I/O jobs as they come. DMA control units re-order their queues according to priority. Control units serve the request of highest priority to MMUs for translation. SPEs and PPEs fill the DMA control units queue with requests.

MMUs can see all system memory including that within other oct-drants a particular MMU does not belong to. A single process/task/thread/whatever may see and utilize the entirety of cache memory on chip be it running on an SPE or PPE.

The goals have been: 1) Bring transparent memory management to Cell 2) Reduce the hit to the transistor budget meeting goal 1 will unavoidably incur 3) Do not destroy or nerf the good things about the CBEA in the process. 4) Try not to re-invent the wheel wherever possible.

Thoughts?
 
Last edited by a moderator:
A quad-core what exactly...as compared to an evolved Larabbe or 32 SPU Cell or whatever MS is cooking?

I would not hedge any bets on anything the takes GPU processing time away from putting more and prettier pixels on screen.

I mean a traditional CPU as on X360 and on PC, this would be for the next MS console ; no doubt sony may prefer to build on their Cell. MS probably wants to keep similarity with the X360 and the PC.
Sure GPGPU takes power from the GPU and is less efficient but the deal may be, would you prefer 1 teraflop of SPE or Larrabee-like units and 1 teraflop of GPU, or just 2 teraflops of GPU? SPE aren't the right place to do pixel shading either.

On the consoles you might be able to do some amount of GPGPU, with an advantage on PC : you know what budget you use, and you may enjoy lower overhead.

of course it's baseless speculation. Do we at least know which CPU vendor MS will use for the next console? IBM, AMD or even Intel?
this time AMD could be able to supply parts for a console (make TSMC build them). we could also imagine Intel making a big ass GPU from their IGP tech :D. wouldn't a sandy bridge make a nice replacement for Xenon?
 
Last edited by a moderator:
I mean a traditional CPU as on X360 and on PC, this would be for the next MS console ; no doubt sony may prefer to build on their Cell. MS probably wants to keep similarity with the X360 and the PC.
Sure GPGPU takes power from the GPU and is less efficient but the deal may be, would you prefer 1 teraflop of SPE or Larrabee-like units and 1 teraflop of GPU, or just 2 teraflops of GPU? SPE aren't the right place to do pixel shading either.

On the consoles you might be able to do some amount of GPGPU, with an advantage on PC : you know what budget you use, and you may enjoy lower overhead.

of course it's baseless speculation. Do we at least know which CPU vendor MS will use for the next console? IBM, AMD or even Intel?
this time AMD could be able to supply parts for a console (make TSMC build them). we could also imagine Intel making a big ass GPU from their IGP tech :D. wouldn't a sandy bridge make a nice replacement for Xenon?

I think you'll find most developers would prefer to still have a CPU and GPU instead of running the risk of not being able to map algorithms efficiently to a GPU alone. I think you'll also find most developers would prefer the GPU spend it's time running graphics related tasks when there's a monstrously capable CPU also in the system to handle most to the tasks one would target with GPGPU regardless of if you're talking about a Sony or MS machine.
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top