Predict: The Next Generation Console Tech

Status
Not open for further replies.
And what's this patent application? http://appft1.uspto.gov/netacgi/nph...ND&d=PG01&s1=nintendo&OS=nintendo&RS=nintendo

Have we been over it before?

That seems to be inline with my thoughts of what Sony or Microsoft will do Next-Gen, release a new console then make upgrades like the iOS devices to keep the hardware fresh for years to come.


Slide9.jpg




Versions of a multimedia computer system architecture are described which satisfy quality of service (QoS) guarantees for multimedia applications such as game applications while allowing platform resources, hardware resources in particular, to scale up or down over time. Computing resources of the computer system are partitioned into a platform partition and an application partition, each including its own central processing unit (CPU) and, optionally, graphics processing unit (GPU). To enhance scalability of resources up or down, the platform partition includes one or more hardware resources which are only accessible by the multimedia application via a software interface. Additionally, outside the partitions may be other resources shared by the partitions or which provide general purpose computing resources."


This is clearly the next Xbox
 
Last edited by a moderator:
Do you have a source/citation for that?

I remember reading back then that since Rsx's pixel pipelines are coupled to the texture units whenever one of them is reading a texture it can't process a single thing.

That plus the non unified shader architecture could account for some substantial under utilization of the hardware.
 
I notice quite a lot of "mm2" or transistor numbers throwing around. Why is that?
"more is better"
Acer93 was using it as some kind of.... performance measurement ( ? ) though.
I like number of transistors a lot more, for that kind of comparison (still doesn't really relate to performance that much, if at all).

edit: maybe someone has a list of "fps per mm2", or "transistor per fps" for a number of GPU's, then you would see how much sense that makes (or doesn't make) when comparing architectures.
Great explanation!
I guess that comparing 2005 and 2012 GPU designs on the basis of mm2 is pretty useless then.
Maybe you could quote me because I don't think you understood what I said or the meaning of my posts.

Transistors are the physical microscopic "parts" that compose the logic and memory in modern electronics. Transistors aren't a great gauge of performance because (a) people count transistors differently, (b) different architectures use transistors more or less efficiently, (d) memory is more dense than logic hence one design may have a large amount of cache but less logic, etc. There have even been examples of moving down a process (hence close proximity), or time to refine a design, has resulted in an architecture reducing transistor count. Importantly, due to these reasons, my OP was only looking at Moore's Law as a guideline (~2x density every 18-24 months) and the scaling of transistors as a general guideline to what we could project n-process nodes into the future, with the caveat of architectural issues (e.g. features often come at the expense of performance; yet more advanced features may make certain desirable techniques performant whereas the older architecture scaled up would not, etc).

mm^2 (area, e.g. 10mm x 10mm is a 100mm^2 chip) is not a direct comparison of performance either. A ~ 250mm^2 RSX on 90nm is going to have about 1/10th the transistors of a 250mm^2 GPU of a similar architecture on 28nm. Area also doesn't tell us about the architecture and what kind of frequencies that architecture allows.

What mm^2 does allow us to do, as long as we take market conditions into consideration, is get a barometer of cost. This is not a 100% correlation due to said market considerations e.g. costs change over time (namely nodes tend to get more expensive), production early on a process is more expensive than when it is mature, wafers get bigger (which can reduce chip costs in the long run), etc.

So whether we are on 90nm process or a 28nm process a 225mm^2 (15mmx15mm) chip on a 300mm (12 inch) wafer (which are round) nets about 245 die. Assuming that your wafer cost is exactly the same in 2005 on 90nm as it is in 2013 on 28mm (bad assumption) you could get the same number of chips at the same cost. There are too many variables at play to get an exact cost change--e.g. we would have to know when the consoles would ship for one. But assuming late 2013 28nm will probably be more mature than 90nm was in 2005 at TSMC. But then again iirc there was more competition in the fab space in 2005 and costs have been increasing over time. But the transition to 300mm wafers is pretty standard (I don't remember if all 90nm production was on 300mm or some on 200mm).

What may also be missed in there is costs have not always gone up. Dies have gotten bigger over time and IHVs have been able to fit more product onto a board and make profits. Another dynamic is how the GPUs and CPUs have swallowed up other onboard chips (e.g. Intel's FSB was on the northbridge iirc, now IGPs that were on the Motherboard chipset are APUs on the CPU die, etc).

I fully admit there are many, many factors and variables. My posts recently looking at die size (area, mm^2) did touch on the very fact that die size alone doesn't tell the whole story--e.g. the new "barrier" may have shifted from cost of the die to TDP. If that is the case a larger chip aimed at a specific TDP (which would likely have lower voltage, frequency, and an architecture aimed at a set TDP) may provide more performance than a smaller higher clocked chip which will hit the TDP wall sooner. The other aspect is the bus size. If you want a 256bit bus you will need to be a certain size. Just as importantly you can go larger as you need to consider the limits at the next node. e.g. If you chip just fits a 256bit bus on 28nm then a shrink to 20nm may not help you as the chip bus is so large (they don't shrink fast at all). Hence so much talk of 128bit busses. Of course it may be cheaper to target 28nm for a much longer product cycle as 20nm may be expensive and integrating FINFETs may not be an option (or even available).

Anyways, I think you misunderstood my posts.

One of the walk away points in general is that I think we will see Sony and MS reduce the silicon budget (but rest assured they will claim the chips costs them even more...) and those budgets will be shifted to things like Kinect2 and media experiences (think of how Sony sold out for Blue Ray; this gen it will be selling out to the "Cloud"). I guarantee you that we will soon be inundated with a flood "n more transistors!" and the like and gawdy numbers like "6x faster!" when, I predict, we are going to see a large reduction in investment into silicon. Which is fine, just as long as they are not obscured by useless expressions like "But it is 6x faster with 5x the transistors!"

Likewise the performance levels being discussed these days (e.g. Cape Verde) is 2009 GPU performance and doesn't even crack 30fps in Crysis 2 at 1080p. Which, again, is fine as long as it isn't trumpted as some leap in performance. Hence my responses that these chip specs are quite old by today's standards, don't show any benefit of the extra long generation, don't offer technology that displays products far and away a generational leap over the current consoles, and would reflect a steep decline in silicon budgets. That last point is the one I have been mainly discussing. Steep cuts in chip size may be the best move for these companies, but my purpose to to look at the flip side of "Cape Verde is sooo much faster than Xenos!" and say, "Yes, but in terms of silicon footprint it is a massive reduction over RSX/Xenos and for those angling at n performance increase Cape Verde class hardware offers little in terms of offering the performance for a generational leap graphically." And the benchmarks show that. Alas this is more a back and forth of what people want/desire, expect, what is good strategy, what is possible, etc. If you take anything away from the previous posts is 1TFLOPs GPUs or Cape Verde would be much smaller than Xenos/RSX and there is no reason more could not be obtained from these GPUs will similar TDP limits if that, instead of die size, is the limiter.

The other point I was discussing was larger chips must have a very high TDP when this isn't necessarily true. It seems frequency accelerated toward the TDP wall faster than die size on some designs and processes so a larger, lower clocked GPU will likely provide more performance per Watt. Hence a larger die does not, out of hand, have to violate power constraints. In fact a larger chip within TDP budgets would indicate the design didn't run away from the performance issues and toss their hands up and say, "Well, lets just invest the extra money on Cloud Services." Which, again, may be the best strategy but not necessarily a technical barrier.

EDIT:

A chip that is 50% smaller on the outside could have 5 times the performance (2012 vs 2005).

Ok, I know you don't understand my posts. Go back over them again please before telling me my comparison is incorrect and has faulty logic. I am obviously not denying a GPU 1/2 the size on 28nm won't be faster than Xenos. In between frequency increases, architecture, and sheer logic increases it will be much faster.

Yet a 135mm^2 GPU is a huge reduction in physical real estate from, say, a 258mm^2 RSX and it represents a massive shift in console design and, frankly, purpose. If you are fine with a 135mm^2 GPU, that is fine but also neither here nor there in terms of my point.

What I would say is that looking at gaming benchmarks Cape Verde class GPUs struggle with games on high quality at 1080p which doesn't paint a rosey picture for traditional and progressive visual enhancements. It also has specific architectural considerations, as I noted in those posts, that why would someone argue for DDR4 with a lot of bandwidth AND eDRAM with Cape Verde obviously doesn't need much more bandwidth than what DDR4 would offer.

It actually seems that an APU with a Cape Verde class GPU with a large pool of DDR4 on a wide bus would be a very well balance system design--one large pool of memory (e.g. 8GB), CPU and GPU on a single large die which makes the wide DDR4 bus possible, and the close proximity of the GPU and CPU should cut down on some bandwidth.

If you wanted a cheap console that was able to balance a CPU, GPU, and Memory well equiped to work together this seems perfect to be quite honest. eDRAM doesn't seem necessary for this class of GPU and only complicates the design and makes it more expensive. Which was one of my points.
 
The reason for the *rumoured* 8GB RAM and heavy CPU performance in Durango

Perhaps they intend to use much more 'procedural content' in their next generation console. The reason why they have so much memory is so they can store content without having to recompute all the time and this is also the reason why the design seems to be more CPU centric.

Consoles have two major throughput bottlenecks, the optical drive and the internet as well as a storage problem if they want to create SKUs which don't have mechanical HDDs. Procedural content seems to solve both these problems by amplifying the quantity of 'real' data which can pass through by acting as a form of heavy compression. If an optical drive realistically tops out at 33MB/S and the internet tops out at 20Mbps realistically then this is a major problem which they need to solve in order to actually deliver and make use of stored content.

If they can increase the compression of data significantly then they can achieve a much higher throughput without having to resort to expensive technology such as flash in order to speed data delivery. This makes online distribution significantly more feasible for a wider variety of customers and it may mean they can get away with using solid state HDD's instead of mechanical HDDs if they prove cost effective. They could use 64-128GB of flash memory (not SSD) and offer easy expansion options for customers as well.
 
This is clearly the next Xbox
If they're really going for a scalable core system, they'll probably need a strongly customized GPU solution with some kind of advanced tile based rendering routine, aka something like Imagination Technology's PowerVR approach or a highly customized AMD design* that's at least as scalable.

They'd probably have to start with [(core system)*2]@28nm at launch to keep the next "jump" [(core system*3)@22nm] manageable.

=========

*Looking a little closer, what if those 64 ALUs mentioned in the leaked presentation actually are 4D-ALUs?

I always found the way AMD introduced their 4VLIW design rather curious. The fact that they put so much extra effort into an allegedly "transitional" architecture (with GCN already around the corner) just didn't make a lot of sense – and using that new arc in only ONE dedicated chip (Cayman) didn’t exactly make the move more reasonable from an economical point of view. Now, instead of going for a direct jump from VLIW5 to GCN, they use VLIW4 in Trinity.

So … what if there’s more to that VLIW4 architecture than meets the (retail) eye?

What I’m trying to say is that 6-8 smallish CPU cores (like ARM or Jaguar) + 64 VLIW4 ALUs + a XENOS-like daughter die with some eDRAM would basically make for a nicely balanced “core system”. Put two of those systems on an interposer, make sure the GPU parts can efficiently work together (evolved tile based rendering), add a lot of RAM and you’ve got the heart of a very nice, very scalable next-gen console.

On a side note, two GPU parts with 64 VLIW4 ALUs each would, interestingly, combine to offer just over 1TFLOP processing power @1Ghz core clock …
 
Last edited by a moderator:
Whilst interesting...that would be let down.
I thought some time ago that we would be getting a vliw 4 design....with something like 32 rops 64 texture units....and something crazy like 2000 shaders.. clocked @ around 750mhz that would make a very nice console gpu indeed.

Use a 128bit bus to 4gb ddr 4 and a unified pool of read/write edram of some 64mb in size...and we have a winner.
 
Last edited by a moderator:
It would be a let down for a system that's supposed to survive a ten years life cycle without major updates ... but it should be more than sufficient for a system that's actually designed to scale over time.

Assuming that they really aim to keep upscaling their hardware with future revisions (number of core systems, clock speeds), they don't need an overly powerful system at launch. What they do need is a system that's actually profitable at launch - and forward-compatible to run software that will soon be optimized to run on future, enhanced hardware.

The latter might also serve as a possible explanation for the huge amount of RAM that's been rumored: While the launch system won't have to be designed to run all future games and additional services in their highest fidelity, it has to have the memory reserves to be capable of at least somehow running them - possibly with some future services in the background. And that will - even in lower fidelity - require lots of RAM.

I'm still not sure whether, as a customer, I actually like the idea of hardware updates every few years. With many people continuously migrating to the "best" revision, it would certainly earn them a lot of additional money, though. Just like Apple with their continuous revisions of new iPhones and IPads, they could basically just keep milking a huge part of their user base with every new revision.
 
Last edited by a moderator:
So onQ123 according to your neogaf post you still think the old next-gen Xbox doc's spec have something to do right now?
Just aksing
 
So onQ123 according to your neogaf post you still think the old next-gen Xbox doc's spec have something to do right now?
Just aksing

Leaked Documents 4X - 6X Xbox 360 for games

Info that we have heard about the Next Xbox 1 - 1.5 TFLOPS


Xbox 360 GPU 240 GFLOPS

6X 240 GFLOPS = 1.44 TFLOPS
 
Whilst interesting...that would be let down.
I thought some time ago that we would be getting a vliw 4 design....with something like 32 rops 64 texture units....and something crazy like 2000 shaders.. clocked @ around 750mhz that would make a very nice console gpu indeed.

Use a 128bit bus to 4gb ddr 4 and a unified pool of read/write edram of some 64mb in size...and we have a winner.

128bit DDR4 is going to get you about 40GB of BW. Add to it in a UMA configuration you're talking a probable effective bandwidth under 30GB. EDRAM just isn't going to make up that much. And this with the specter of discreet PC gpu's going to 500GB to a TB of bandwidth with stacked memory a year or 2 after you launch.

I'm not seeing alot of win there.
 
immintrin.h, eh?

Curious.

Well, if there would be any truth here, I guess you could say this would confirm an x86 architecture, no?

128bit DDR4 is going to get you about 40GB of BW. Add to it in a UMA configuration you're talking a probable effective bandwidth under 30GB. EDRAM just isn't going to make up that much. And this with the specter of discreet PC gpu's going to 500GB to a TB of bandwidth with stacked memory a year or 2 after you launch.

I'm not seeing alot of win there.

With DDR4, I'm guessing you'd have to go to a 256-bit bus, but would that basically say you'll always be limited to 8 DDR chips. I don't think they make chips with 64-bit I/O.

I think they'll come up with a solution that isn't drastically bandwidth limited, I don't think MS would hamper themselves like that.
 

Kind of a neat look at a high level view of a system design
http://i.imgur.com/xnxhk.jpg

The rest of the figures are quite interesting... They look pretty much like the leaked system slide, just a lot better thought out.

edit:

It kind of reminds me of Windows Experience Index for some reason...

edit2:

Some of the figures showing alternative configurations
http://imgur.com/a/qsgHG#0
 
If there is any truth to the slide about the ARM/scalable architecture (well, the slide is most likely legit, whether they have actually gone with that architecture is a different story), maybe it is not only scalable for the future, to easily add more cores and what not, but also easily downscaled. If you want to sell it for really cheap, like a set top box, then you just remove all the gaming stuff and leave only the "always on" stuff...
 
So there will be dedicated CPU/GPU for gaming and dedicated ones for multimedia? Sounds expensive and complicated, hope they pull it of.
 
OK, in feeling like a 60 minutes reporter, I was able to actually contact the guy with the supposed Durango kit via IM. Prior to that I looked at his post history and he is def legit when it comes to Xbox dev kits knowledge (meaning 360 kits). The only other alternative is he was playing a practical joke.

Anyways when I asked him if he could give me any durango specs, he said quote "I cant just leak the specs haha" (Acert 93 response LOL). Anyways in pressing on, he seemed to say he couldnt get the specs right now anyway cause his laptop is broken (per his twitter). Not sure what his laptop has to do with the specs. When I asked if it has 8GB RAM, He says he seems to remember it will have 4-8GB RAM in the final. He seemed to lean 4 (which is contrary to BG's info I know). I said he must have an older kit. He said it's the only one they have out and that "beta isn't coming for agggeeees" (interesting info there, we may not get durango spec updates for a while then). So then I said odd since I thought it was hitting in late 13. He said it is holiday 13. He said it's 8 64 bit cores (did not specifically address whether x86, but i presume). I finally got around to the GPU, the coup de grace. I asked if it was "pretty powerful", (this was like my 3rd attempt to get something about the GPU mind you) his exact words "eh, it's okay". I have no idea what that means but it doesnt sound great, but jives perfectly with what we're hearing.

I may be able to talk to him more and get some more out of him. Dont want to press too hard though. Doubt he knows a whole lot more anyway, I have a feeling the broken laptop is his impediment right now (I guess you need it to interface with dev kit or something?). And something I'd really want like GPU specifics probably doesnt interest him too much.

I asked him whether he would be leaking details to vgleaks. He said best thing to do is watch his twitter.

Edit: i dunno his twitter really seems trollish though lol, as much as after the IM conversation I was convinced he's legit, the twitter makes me wonder a little again.

Edit: And his twitter is talking about making a PowerPC/360 emulator on his Durango kit? Wouldn't that take like, years? Then again his twitter convos include hardcore Xbox scene guys which fits again. Well as I said, his post history is seriously legit for xbox dev kit knowledge, he just seems like a different guy on twitter.
 
Status
Not open for further replies.
Back
Top