Predict: The Next Generation Console Tech

Status
Not open for further replies.
The 360 reveal was at E3 (well, just before) for launch the same year. Why would MS want to tank their 360 sales for 3 quarters if they didn't have to? It makes no sense to reveal new hardware with a long lead time, marketing wise it's a nightmare, sales wise, it's unwise, and historically, they haven't done it.

I'm right with you, but many gamers always thinking with old age (their golden age) marketing , announced product one year before the launch… The worst marketing thing to do actually… When you 're announcing a mass tech product, you have to launch quick after to profit of the Buzz Wave.
 
Yes and no. I don't think the problem is so much putting a fast GPU with a CPU, it's memory bandwidth. Even fast 2.133 DDR3 isn't cutting it. It's only about 17 GB/s 64-bit channel and I think the APU's only support dual channel as well, but even four channels wouldn't be enough. And it's not like you can buy off the shelf GDDR5 modules, so AMD is kind of stuck. I think embedded RAM will be used down the road.
I would say more "no" than yes. I mean Trinity is clearly bandwidth limited as perfs scale almost linearly with the RAM clock speed, there is no disputing that.
But that has nothing to do with the GPU architecture, I mean late GCN based GPUs would make better use bandwidth available, even though it would remain a bottleneck.
I would think the issue is that the GPU and CPU doesn't use the same process and AMD lack the man power to implement their GPUs architectures on two process at the same time.

OT, I think that if AMD were more "serious /executing more properly / :( " about their APU performances they should have considered soldering some GDDR5 on every mobo supporting those APUs. I mean they invest half the die on the GPU in both Llano and trinity on the GPU, that is more GPU power than almost every body that doesn't play need, way more. At the same time they did not provide enough bandwidth to feed those GPUs. I do not get it is the selling point of those chip as CPU performances are not.
It would not be that crazy or costly to actually remove one channel of DDR3 on those chip and replace it by a gddr5 memory controller, 2 actually. The CPU performances would suffer a tad more but the those platforms would have really viable gaming platforms for budget players (in desktop form0 and would prove outstanding in laptop form.
/OT
 
I'm right with you, but many gamers always thinking with old age (their golden age) marketing , announced product one year before the launch… The worst marketing thing to do actually… When you 're announcing a mass tech product, you have to launch quick after to profit of the Buzz Wave.

But it's not old age. PS3 was announce over a year before. PSV was announce 11 months before it shipped. WiiU was announce 1.5 years before it shipped. 3DS was announced almost a year before it shipped. Neither the PS3 nor the 3DS killed the sales of the PS2 or DS. PSV has it's own issues. I think it's debatable whether the WiiU or a complete lack of games killed off the Wii.

I think anyone really paying attention expects a MS unveiling this year and a launch in 2013. Consumers aren't that dumb.
 
I want to go back to my view about Sony situation.
As Gipsel said (and I was actually doing the same point pages ago) it would doable to have a SoC embarking both Jaguar cores and a quiet decent GPU within a reasonably sized chip (translating in reasonable production costs). GDDR5 is expansive and set strong limitation on the amount of RAM, actually even if you were to overcome those limitations, by doubling the amount of RAM you would double the cost of something already expansive.
Overall one would have strong intensive to use some form of embedded memory, most likely EDRAM.

The thing is that I am still concerned by the sustained performances of say a quad cores Jaguar set-up. Not in insulation but in comparison to the rumors we hear from MSFT.
Msft seems to have spend quiet an amount of silicon on its CPU, no matter how you spin the rumors (AMD, IBM) they all point to a lot of CPU resources.
Actually it points to an amount of CPU resources I'm not confident a quad core Jaguar would match, especially taking in account the clock speed limitations of the architecture.
The other way around is that Sony uses Piledriver cores but that mean that the SoC alone is unlikely to provide enough GPU power. That means more costs, higher power consumption, etc.
Looking at Sony situation I can't see as a good thing.

So whereas I'm confident that Sony will not go that way, I really wonder if the dual SoC set-up would be the most price efficient solution. It's more a theoretical discussion than anything else.
We would speak about a relatively tiny SoC, without that much raw power either on the CPU or the GPU side, but the thing would be as balanced as possible, pretty cool / low power, and cheap to to produce. More than that it would require to design, produce, test and so on, only one chip. The design as a whole would rely on nothing but readily available and cheap RAM.
With the help of AMD engineers they could create something really balanced. I would not describe for example Kaveri as balanced as I would say that the GPU seems clearly oversized wrt the available bandwidth. The extra shading power will still get helpfull when shading power prove to be a bottleneck, but overall I would still say that a lot of power will to waste vs the amount of silicon or even less invested in a discrete GPU provided with a sane (for a GPU) amount of bandwidth.

Overall I do get the criticism that such a system would be nothing to wow at, or that actually the aforementioned, bigger SoC would be a better product, but I wonder.

What I wonder about is if actually on the contrary of what most think (and I though to) AFR (which I though could have been a good starting point for devs, moving latter on possibly better use of the dual GPU set-up) could be indeed a really price efficient approach
Actually I pushed a bit further and wondered if actually to make the dual SoC even more tempting, while resolving some of the concerns that could raise in one mind wrt sustain CPU performances of a quad core jaguar set-up, if actually the idea of alternate rendering could be extend to a lot more than rendering only in such a set-up. It turns more into a CPU and programming issues.
I could see some synchronization issues, say you run two physics simulations, two AI simulations, etc. you don't want at some point to be lock because you don't have the result for the previous frame, game world up date.
I can't tell if that doable, I would say it is as games already deals with network lag, and so I would assume that game world simulation accounts for some "iffy" states (like you surviving a couple frames a bullet that killed you a couple of frame ago, in the same time you manage to shoot a bullet in the mean time that magically disappear from the game world).

I can see AFR and running twice the same software subsystems having an impact on memory usage but that actually the beauty in such a trade off, memory is dirty cheap. Going with DDR3/4 should ensure that at a minimal costs you can buy plenty of RAM so waste some of it could prove a non issue. Overall you would have what one could be right to describe as a sucky SoC but it would mostly works on rendering (CPU and GPU) at 15 fps. If you stick to AFR with the GPU and are OK with the replication of data in both data pool (textures, render target, frame buffer, etc. but also some data base) you don't need any specific link between the 2 SoC as far as the GPU is concerned, right?
If it is a matter of keeping only the RAM coherent and having the CPUs working together, one or two Hyper transport links could be enough.

What would you think of such a trade off? (lot of duplicated data, but you 2 SoC have twice the time the bigger more powerful competitor system has to deal with in half the time... I wonder if my wording is clear).
 
Last edited by a moderator:
Thinking about it more, I think you're right. With no L3 cache and no memory coherency between clusters, the design is much simpler. With hardware coherency, you'd have a lot of hardware for the address checking among all the caches. Shifting the burden to the programmer is probably a better solution for a power/area efficient design. Even with 16-cores, it would be a really small design, probably sub ~100mm.

Yeah, devs really loved it when Cell shifted the burden to the programmers! :D
 
I wish that would be true, power 8 or a custom power 7+ on IBM 32nm process.
Actually the overall design for the 360 proved to be quiet successful, the only draw back was the Edram which remained unchanged at 10MB when MSFT decided to move to 720p /HD resolution as the target resolution for the system.

It could be achievable to deliver the jump in performances people wants while actually possibly lowering a tad the silicon budget vs the 360.
I'm not sure about how big would be a quad core power 7+ with a sane amount of L3 (not 10MB per core) vs Xenon. Looking at a power 7+ die shot as this one they look indeed pretty tiny.
Then you have the massive L3 interconnect, the SMP links, memory controllers, accelerators, etc.

I would think that a "ROPless" HD 7850 (16 CUs native) with let say three memory channel /192 bit bus to DDR3 would end in the same ball park as Xenon. Pitcairn is 202 mm^2 and CUs seems to take a lot of that space (Anandtech among others has die shots available here ).
In a "native" HD 7850" (as I describe it), that is 20% less space allocated to the CUs, no ROPs, 25% less space taken by the memory controllers, I would not be too surprised if the chip ends up at least 20 % tinier than pitcairn, that would be a max estimate of 200 mm^2 minus 40mm^2 so 160 mm^2.

That left the smart edram, using IBM 32nm process, too I would not expect 64MB to be that big.

A 16 core CPU with 512 wide vectors could happen as well.

64 megs of eDRAM seem plausible for the GPU. 1080P @ 60hz.
 
Actually, most of the credible rumours about the current specs (and not the ones mentioning Power PC/Intel cores that MS might have been considering at some point or other,) indicate an AMD CPU with a large amount of cores (8), lots of RAM (8 GB) and a modestly powerful, modern AMD GPU (around 2TF) with embedded eDRAM.

What credible xbox rumor has an amd gpu around 2TF with embedded eDram? Link?
 
There's still next gen interposer talk floating around:

Yole is forecasting that the IBM's Power 8 chip and the Intel Haswell and the Sony PS4 will all be based on 2.5D interposer technology. [see IFTLE 88: Apple TSV Interposer rumors; Betting the Ranch ; TSV for Sony PS-4; Top Chip Fabricators in Last 25 Years] The Sony GPU + memory device may look something like the Global Foundries demonstrator shown...

http://www.electroiq.com/blogs/insi.../11/iftle-121-semicon-taiwan-2012-part-2.html

FYI, Yole Development is a semiconductor industry market researcher.
 
What credible xbox rumor has an amd gpu around 2TF with embedded eDram? Link?

well the dev kit pic was narrowed down to either a 6870 or 6950. and bgassasin said durango would sport esram. additionally, any 8gb ram rumors which is most pretty much are going to require edram in the system.
 
well the dev kit pic was narrowed down to either a 6870 or 6950. and bgassasin said durango would sport esram. additionally, any 8gb ram rumors which is most pretty much are going to require edram in the system.

Exactly, the eDRAM was a huge advantage the 360 had this gen so it's unlikely they're going to ditch it (especially since they're going with a large amount of slower memory DDR3/4)

The vgleaks specs also indicate that the final card will have ESRAM (it isn't present in the alpha kits though).
http://www.vgleaks.com/whats-inside-durangos-alpha-kit/
 
http://www.anandtech.com/show/6465/nintendo-wii-u-teardown

Anandtech found out that the ram is ddr3 in wii u with a peak speed of 12.4 GB/s.
They also assume that the gpu is amd rv740 with a 40nm process

they only say it's same size (rv740 was 40nm, rv730 and 770 were 55nm), but isn't the die being shared with additional functions?
anyway, RV740 had exclusive, over 60GB/s memory, I suppose it would be to much for the Wii U, well, obviously there is the edram but... or maybe clock is really low,

something close to half a RV740 would make more sense!?
 
Last edited by a moderator:
Exactly, the eDRAM was a huge advantage the 360 had this gen so it's unlikely they're going to ditch it (especially since they're going with a large amount of slower memory DDR3/4)http://www.vgleaks.com/whats-inside-durangos-alpha-kit/
That's not the engineer's way of thinking. eDRAM gave PS2 its huge advantage last gen, but it still got dropped. There's more than one way to solve the issues of BW, and engineers will evaluate them all on their pros and cons and pick the appropriate choice for their targets irrespective of any history (save where the differences between memory systems aren't that great and they would favour existing experience)

they only say it's same size (rv740 was 40nm, rv730 and 770 were 55nm), but isn't the die being shared with additional functions?
anyway, RV740 had exclusive, over 60GB/s memory, I suppose it would be to much for the Wii U, well, obviously there is the edram but... or maybe clock is really low,

something close to half a RV740 would make more sense!?
This is the thread for predicting next-gen hardware. Wii U is now out and doesn't need predicting; discovering what's in it can go in the Wii U GPU thread! ;)
 
can a dedicated hardware be used in a console for anti aliasing type functions ?

The ROPs are already such hardware, as long as you're doing good old MSAA.
Radeon 2000/3000 series moved some work to the shaders but it was only much slower than geforce 8 and radeon 4000 with no gain.

With newer techniques such as FXAA, TXAA etc. I don't know much what happens, but it's about trying to do smarter things, so as to do less work, save bandwith or have a more predictable performance hit. It's often worse than MSAA, but dealing with aliasing that MSAA doesn't deal with, so I can't complain.
If there's a post-processing step, that definitely belong on the shaders, the rest is probably done on shaders as well.

So, things should probably stay the way they are, new AA techniques work because algorithms can be invented and tuned at whim, also if you have a lot of bandwith (edram or other memory technology) then you can do the brute MSAA thing on ROPs.
 
That'd mean 8 chips on a 256 bit bus would yield about 4GB of RAM. Would that be a viable option? I wonder about the tradeoffs for instance in terms of cost especially if GDDR5 is quickly replaced by stacked memory due to size/cost/power reasons. If they release a console towards the end of 2013 and in 2014 or 2015 major graphics cards transition away from the memory standard it could get quite expensive couldn't it?

How do the pin counts for GDDR5 compare to DDR4? I understand that GDDR5 has 30% more pins for a particular bus width than DDR3 IIRC so therefore a 256 bit GDDR5 would be equivalent to 333 bit DDR3, right?
 
4G/5T
5G/8T

Based on the Xboxworld article, 4G/5T can't be the number of threads. And the 8T in 6G/8T probably refers to the amount of memory.

4 teraflops GPU or Graphics / 5 teraflops Total (GPU + CPU)
6 gigabytes Games / 8 gigabytes Total
 
Status
Not open for further replies.
Back
Top