Predict: The Next Generation Console Tech

Status
Not open for further replies.
I'd be as bold to say it would be in MS's best interest to repackage the 360 as a different console with Natal included and release that as competition for the next Wii while still releasing a new console.

I think we can all assume that the next Wii is very likely only going to be as powerful as the current 360, by the time the next Xbox is ready for deployment the cost of producing the 360 in a new form factor should warrant a price very VERY competitive to whatever Nintendo plans on putting out for the next Wii. They could still release Natal as an "Add-On" for current 360 owners who would now have the same system as the Wii HD or XboxNatal but also be a potential customer for the next Xbox as well.

If MS's competition in 4 years is the PS4 and Wii HD; then why not take advantage of one of the competitors low power requirements and undercut them with a proven library, better price and potentially better controls and graphics?

Nintendo succeeded with the Wii because nobody saw it coming; however both Sony and MS are potentially sitting on technology that is going to be as strong if not superior to the next version of the Wii and I think one of them should take advantage of it. MS with slightly older hardware (reduced cost) has the best chance of repackaging the 360 as a new motion based system.

MS would be fighting for 2 demographics but could do well enough in both to warrant the risk because I believe the payoff would be huge. For the new XboxNatal they need to find developers and create new IP that targets that demo-graph while many of the existing developers can now create games for the the Xbox720 and XboxNatal.

Just a thought
 
Nintendo is all about inventing something to make it possible to create completely new game genres, or at least significantly change existing ones. If they believe that Sony and MS has moved too close to them with their motion controllers, then they're gonna need to add something new to the mix once again in order to avoid playing in their field. The reason behind the Wii's success is mostly that there's been nothing even remotely like it in the market and they were able to offer a new and unique gameplay experience, even if it wasn't too complex in most cases.

Although it also has to be noted that the Wii has given them the financial power to take the fight up to Sony and MS, if they want to. Challenge them in hardware speed and game budgets and so on - but it is not their way as long as Miyamoto has an influence. Then again I'm sure he'd be happy to make Mario and Zelda games with Pixar movie-like graphics, so maybe they're gonna push for an advanced console. Mario Galaxy has already been a step in that direction, imagine how that game could look even on the current HD consoles...

Edit: and don't forget an important financial issue: if your hardware is on par with those of the others, you become viable for straight multiplatform development. Even if Nintendo itself doesn't want to produce hardcore titles, if their console is powerful enough then the GTAs and CODs and such games will come to them and make a lot of money in license fees. They'll have to balance it with the added cost on the hardware, but it might make sense.
 
Last edited by a moderator:
Lolio my friend, have you considered that Larrabee which was designed to be most programmable shader based architecture left the texture units and first focused on getting rid of the Raster units to perform that task in software. Even ATI experimented with shader based AA.

As a suggestion perhaps you could focus on the size of the raster units and the possibility of rasterizing in software on shader units as that seems to be the next logical step in the pipeline of GPU development. Its also extremely relevant to this topic as custom ROP logic is already installed on the Xbox 360 and further development between on-die framebuffers, 3D etc can be expected.
 
Lolio my friend, have you considered that Larrabee which was designed to be most programmable shader based architecture left the texture units and first focused on getting rid of the Raster units to perform that task in software. Even ATI experimented with shader based AA.
As a suggestion perhaps you could focus on the size of the raster units and the possibility of rasterizing in software on shader units as that seems to be the next logical step in the pipeline of GPU development. Its also extremely relevant to this topic as custom ROP logic is already installed on the Xbox 360 and further development between on-die framebuffers, 3D etc can be expected.
Well I realize that most likely the next generation of GPUs (whether they are many cores ala larrabee, evolution of actual architecture) are going to starve for bandwidth. Bandwidth has always be a limit but it looks like GPU are rexxally hitting the wall now (that's not the only wall they are facing power and heat comes to mind to), R8xx has twice the compute power and marginally more GBs of bandwidth to play with, it's the same for Nvidia. We may expect 28nm GPUs next year with possibly another X2 jump in compute power matched by marginal gains in bandwidth.
In the same time GPU computing is becoming a reality, more ALUs/compute power at the cost texturing power may make sense in this case. I mean nobody may notice that the new GPUs are less efficient at texturing if they face a bigger bottleneck and are starving for more bandwidth.

In the case of larrabee it looks like compute density was pretty much a disaster it would have been fine if Intel reached their frequency goals (~2GHz). Any way comparing ATI product and Larrabee is interesting as it shows the kind of compute density ATI has reach.
Larrabee: 2 billions Xtors and 1 TFLOPS
R8xx: 2 billions Xtors and 2.7 TFLOPS
Now you add 40% more Alus (from removing the texture units) it's 3.8 TFLOPS.
Then you add extra ALUs from RBE and you easily pass the 4 TFLOPS ( going by this there must be less saving than by removing tex units tho).
What I mean is that ATI would not be in the same situation as Intel if they were to remove tex units and ROPs/RBEs they would have a lot more power to (or try to) make up for these missing units. With the next GPUs being produce @28nm something like >6 TFLOPS GPU would be possible while on the other hand bandwidth would not move much.
On the other side by going by Intel own estimates for game like Stalker I'm not sure that the removal of the rasterizer is a good idea as it performs well and takes really few space.

I also feel like it could help to clean the "memory model/system" in the GPUs which looks a bit messy to me, RAM => texture fetch => texture L1 & L2 cache => Alus => then two way out RBE/ROP or export a texture. Some thing like RAM =>generic caches =>ALUs&register => generic caches => RAM looks more sane.
 
Last edited by a moderator:
How are you figuring that 40% of a SIMD is texture units?
I'm not seeing anything that large, and at least part of the non-ALU portion is taken up by the SIMD's instruction sequencer and the LDS.
 
How are you figuring that 40% of a SIMD is texture units?
I'm not seeing anything that large, and at least part of the non-ALU portion is taken up by the SIMD's instruction sequencer and the LDS.
Well... now that you're pointing this out... that's a likely possibility...

I used this (from techreport) which may have confused me :oops:
die-shot-colored.jpg

There are two visible blocs/structures in the part marked as "texture units" one is a bit bigger than a quarter of the SIMD array (25% < block << 33%), the other one a bit bigger than 1/7 of the SIMD array.
 
Last edited by a moderator:
Actually the more I watch the more I'm lost, I actually see 3 different blocks/structure next to the 4 quads of "stream processors" which some how diminishes the size of SIMD and thus push the percentage a bit higher.
 
My speculation:

If they go for similar designs to last gen, coming out in the end of 2011, that means 28nm. At 28nm, with similar chip sizes to Xbox360, PS3 = 2 billion transistors for CPU, which would be about in the range of 8 core AMD bulldozer, 8 core Intel sandy bridge or 8 core Power 7 derivative, so I would say one of those 3 options for Xbox720 and PS4, leaning towards AMD for Xbox and IBM for PS4. Probably similar clocks, 3.2GHZ maybe a little more, for power/heat reasons. CPUs would end up probably 3x the power of current for Xbox, maybe double the effective power for PS4.

For GPU its a bit harder. I would guess they would put the 360's eDRAM on the same die, so combining the two for the equivelent size, at 28nm would be around 3.5 billion transistors. So, probably sticking with ATI, I'd say a 2560 shader, 32 "core" chip, with 64MB eDRAM on board with 1TB/sec bandwidth, which would fit 1080p@32 bit with 4xAA, and 64 bit with just two tiles, which would be acceptable performace. Probably will make 1Ghz with it. Would end up about 22x the power of their current GPU.

For PS4, I'm not really sure, they could do anything, they don't seem to follow any pattern. My guess would be an NV chip for ease of BC, similar programming, but thats pure guess. In which case is would probably something similar to Fermi, with some eDRAM on board, probably 64MB if they go that route. Maybe higher clocks, 750 core 1.5ghz shaders. Far more powerful than their current GPU, maybe 30x. This assumes Sony has learned a lesson from this gen and doesn't do something crazy again...

For RAM, probably both will have similar amounts, probably 2GB of GDDR5+, maybe 8ghz. I'd guess quad mem controllers, so 256GB/sec bandwidth. Probably both will be fully unified this time.

Of course, this assumes they go for similar designs, I have a feeling they will both be more conservative this go around and go for smaller, easier to manufacture designs, which you could pretty much half the expectations above to make those qualifications.

Nintendo is a different animal, I expect them to go for something small again, with similar power/size envelope to Wii. I would suspect them to go for a single SoC design. They can really go with anyone, because Wii could be emulated on just about anything. My guess would be either a ATI Fusion product or a maybe some kind of Tegra-like product form NV, if they think they can get a better deal getting the DS2 and Wii2 chips from the same company. In the ATI case, maybe a dual or single core bulldozer chip with 480 ish shaders, 6 "cores". Modest clocks for power reasons, maybe 2.5ghz-3ghz CPU, 500mhz GPU. Probably would have framebuffer on die, but only enough for like 720p with 4xAA, maybe 16MB with two tiles or some odd number to just make it in one like 28MB. In the NV case, maybe a custom Tegra-like product with 4 ARM cortex9 cores at 1.5ghz+ with a low end Fermi-type chip on-board, maybe 128 shaders in 4 "cores". Probably would still have framebuffer on die, or a daughterboard. Modest clocks, 500mhz core, 1ghz shaders. Hard to estimate how much more powerful this would be than the Wii, but I'd guess around 10x CPU wise and maybe as much as 100x GPU.

RAM would be lower, maybe 512MB, possibly 1GB if they get it cheap. Probably wouldn't bother with top end speeds, maybe GDDR5 thats cheap and cool at 4ghz. Dual channel, so 64GB/sec, surely fully unifed.

Edit: Also, probably HDDs all around, I would suspect large single platter drives for both PS4 and Xbox720, maybe 500-600GB. Wii2 is probably more likely to have a smallish SSD, 32-64GB.
 
Last edited by a moderator:
Remind me again which ones those were since I can't for life of me remember them & I played quite of few them myself.

Tommy McClain


Most notably the DOOM and Quake series, but Descent also had the same feature if I remember correctly...
 
Most notably the DOOM and Quake series, but Descent also had the same feature if I remember correctly...

I must had a severe case of CRS, because I never remembered any of those games doing that. Evidently Doom & Quake had it, but it must have been done with customization. As for Descent, I played a shitload of it & Descent II online, but don't ever remember either having that feature. Send me some links in PM so I can be enlightened. Hate to turn this into a game comparison thread. LOL

Tommy McClain
 
doom/doom2 had many command line parameters, most notably used for loading 3rd party maps, or networking. I've checked and there's -record and -playdemo ;)

with descent I believe you would hit alt and F-something
 
Search for DOOM Done Quick. It was a very popular feature in those days and, of course, by the time Quake came around, the feature became popular for benchmarking.
 
Well I realize that most likely the next generation of GPUs (whether they are many cores ala larrabee, evolution of actual architecture) are going to starve for bandwidth. Bandwidth has always be a limit but it looks like GPU are rexxally hitting the wall now (that's not the only wall they are facing power and heat comes to mind to), R8xx has twice the compute power and marginally more GBs of bandwidth to play with, it's the same for Nvidia. We may expect 28nm GPUs next year with possibly another X2 jump in compute power matched by marginal gains in bandwidth.
In the same time GPU computing is becoming a reality, more ALUs/compute power at the cost texturing power may make sense in this case. I mean nobody may notice that the new GPUs are less efficient at texturing if they face a bigger bottleneck and are starving for more bandwidth.

On the other side by going by Intel own estimates for game like Stalker I'm not sure that the removal of the rasterizer is a good idea as it performs well and takes really few space.

I also feel like it could help to clean the "memory model/system" in the GPUs which looks a bit messy to me, RAM => texture fetch => texture L1 & L2 cache => Alus => then two way out RBE/ROP or export a texture. Some thing like RAM =>generic caches =>ALUs&register => generic caches => RAM looks more sane.

I think in the case of Raster units, its flexibility. I think Intel talked about rasterization in software helping to reduce their bandwidth requirements. So flexibility in terms of using less bandwidth and flexibility in being fully programmable down the entire pipeline. Also aren't they going to be limited by the one triangle per clock set up rate sometime in the future as well? Even if the per area cost is worse than having fixed function units, having a flexible output path in the rendering pipeline does seem to make sense to me for the future.

So if you've got rid of the texture units. What are your ALUs actually doing? Would it be more efficient for them to directly emulate texture ops or would you need an entirely new model of operation such as procedural generation, say building textures directly using polygons and shaders?

Lastly I guess in a general purpose sense, how much fixed function hardware can be chucked from a modern GPU? We've got ROP units, Texture units, decoder units, legacy hardware support, anything else? Then once we've done that all we have left is a gigantic block of ALUs with the just the required supporting hardware to keep it fed. Is that an ideal situation? Or are we going to see some further specialisation of ALU units between different ranges of functionality?
 
nowhere close to be finished So please Squilliam don't answer now ;)
I think in the case of Raster units, its flexibility. I think Intel talked about rasterization in software helping to reduce their bandwidth requirements. So flexibility in terms of using less bandwidth and flexibility in being fully programmable down the entire pipeline. Also aren't they going to be limited by the one triangle per clock set up rate sometime in the future as well? Even if the per area cost is worse than having fixed function units, having a flexible output path in the rendering pipeline does seem to make sense to me for the future.
I may misunderstand but I don't see how Intel choice to not include a rasterizer is related to bandwidth saving measures. Intel chose deferred rendering as the most appropriate mode of rendering and nobody really discuss their choice but for me it's unrelated to the choice of not have a dedicated rasterizer. As i see it Intel chose a TBDF because each core in larrabee had a sizable amount of L2 and it made sense do to as many operation as possible on material within this memory space and make the most of bandwidth and flexibility provided by the caches. Even if rasterizer & triangle setup units are tiny Intel had to leverage their cost upon multiple cores. If Intel chose to have one multiple rasterizer and triangle units then come the question about how manages/feed those units. Bandwidth may have been a concern but I don't think the only reason behind Intel choice. They imho want to have a few as possible fixed function hardware because they weren't only aiming at graphic with larrabee and high FLOPS throughput was their primary goals.
So if you've got rid of the texture units. What are your ALUs actually doing? Would it be more efficient for them to directly emulate texture ops or would you need an entirely new model of operation such as procedural generation, say building textures directly using polygons and shaders?
I think that the ALUs would handle the textures units job most likely less efficiently.
Lastly I guess in a general purpose sense, how much fixed function hardware can be chucked from a modern GPU? We've got ROP units, Texture units, decoder units, legacy hardware support, anything else? Then once we've done that all we have left is a gigantic block of ALUs with the just the required supporting hardware to keep it fed. Is that an ideal situation? Or are we going to see some further specialisation of ALU units between different ranges of functionality?
I would also like to know, I'm not and by far an armchair expert on the matter, my original comment came from my realization about how "huge" texture units looked in R770.
As you saw I've been most likely mislead by some die shot (cf 3Dilettante's post). In the end it's a bit unclear how huge are the texture unit vs the ALUs. The other part of the problem is how much more efficient texture units in perf/mm² vs even reworked ALUs arrays. I've no answer I was trying to initiate a discussion on the matter.
EDIT
I delete the part where I let my imagination wander a bit too much.
 
Last edited by a moderator:
Ok ok I won't. I was actually wondering if it might be worthwhile to make a thread down in the 3D architectures section of the forum as this is right down that alley. Its not just a console specific train of thought. It would be interesting to see what they would have to say for themselves. Of course we could get a definative answer from Dave but hes mean and he won't share.
 
It blows my mind that the architect engineers are busy designing GPUs to be released in 36 months time.
It would be a bonus for the console companies to use some of this development in their consoles and we all know how effectively MS did this with ATI and how Kutaragi-San and co essentially missed a trick. I think what the Farid-meister was alluding to was some of this. It is time to look at what the main push for DX12 is going to be... Maybe?!
 
Ok ok I won't. I was actually wondering if it might be worthwhile to make a thread down in the 3D architectures section of the forum as this is right down that alley. Its not just a console specific train of thought. It would be interesting to see what they would have to say for themselves. Of course we could get a definative answer from Dave but hes mean and he won't share.
I've finished answering your post, I got a bit carried on by my imagination tho.
 
OK I want to try and put the LRB debacle in some perspective.
Firstly kudo's to Intel for trying something new with regards to accelerated 3D graphics and moving away from fixed function units.

However LRB was a DX10 class GPU (that cannot magically turn into a DX11 class GPU with a software/firmware fix), which was late, huge and there was indication in certain environments it under performed. What is the point of an accellerator if, due to some romantic notion of flexibility it gets destroyed by the competition because is 1. late, 2. huge 3. power hungry and 4. underperforming.

Eventually NVIDIA and ATI will move towardsa LRB like GPU but until then there is a crossover in die size, power consumption and featureset it is not worthy of belonging in a console. As the recent Sony exec said, LRB is not dead and maybe an option in PS5 or PS6. In a sense I am glad LRB was canned in its current form.
 
However LRB was a DX10 class GPU (that cannot magically turn into a DX11 class GPU with a software/firmware fix),
I'm very confused! How can it be a DX10 class GPU and yet be fully programmable? What features are in DX11/12 that LRB couldn't do?
 
Status
Not open for further replies.
Back
Top