TXB on Xbox2 = 51.2GB/s bandwidth??

Panajev2001a said:
Nappe1 said:
Last planned PC-desktop graphics core with eDRAM would have had it only 3MB... so, you don't need it so much as you think. Only thing is that you need to make few things in different way than now.

PC GPUs also have, separate from system RAM, VERY fast ( compared to system RAM ) off-chip VRAM.

Consoles would likely have a UMA or hybrid UMA configuration.

I wouldn't call 175-250MHz SDR (in that same particular case.) as very fast and expensive ram...

but enough of this, I made my point obiously clear because it caused some discussion. :) I am not enough up to consoles to take this further...
 
Panajev2001a said:
You deferred rendering <bleep> :LOL:


well, not exactly... but more like TBR with some sort of polygon sorting per tile basis... (I don't know the exactly system they were having, but it did allow linking more than one edram based chip together get more tile speed... )

and well, I'll pop by to this forum look if there's something interesting. :) But as a consumer, I can't think of myself as potential console buyer. Though I play with these at my friends. And don't ask whihc console is my favourite, I really don't know... they all have some good sides and what is the most important thing: they all have some great games! :)
 
I think all future console will be UMA, and it will be cost driven.

One of the growing costs are for PCBs. Going from 4 layers to 6 or from 6 to 8 cost money. And this cost is not likely to fall nearly as fast as using fast ram, which after 6 month can be bought at dumping prices. So fitting one extra 64 bit or 128 bit bus for the CPU will simply be too expensive.

My guess is that XBox 2 will use whatever flavour of DDR-DRAM that is in vogue (GDDR3) at the time of release, but with a speed of something like 75% of the fastest possible attainable. Same story as with Xbox really.

Cheers
Gubbi
 
If XBox 2 launches in 2005, it'll likely be R500 based. probably use an "R520" much like XBox uses NV2A which is like an "NV22.5"

If XBox launches in 2006, it will use something derived from anything between an R550, to R600.

the point is, any R4xx VPU will be too outdated for the next Xbox.
 
I still dont see why Xbox2 realistically needs more than 50+gb/s of bandwidth, particularly since we know that we are moving towards excessive use of things like procedural textures, and slow shader code. Moreover, there are still plenty of bandwidth saving techniques that haven't been used yet, particularly in AA and volumetric and light map sectors. The money saved there by going with cheaper memory configurations can be better used to push more gates.

IMO the next war of consoles is going to be about who wins the war of number density of logic gate, coupled with efficiency per gate.

Shaders and CPU power is the war of the future!
 
Well, as far as powerful CPUs are concerned I think we also want to keep all the power well fed hence a good memory hierarchy and good bandwidth are two of the things designers need to work on.

Stop starving MPUs :p


Edit: I do not mean to be making fun of the people ( and all the beings ) around the world who do starve :(
 
Fred i'm a firm believe of more of everything. More bandwidth , more fill rate , more pixel shader performance and more vertex shader performance , and def more polygon performance
 
Gubbi said:
I think all future console will be UMA, and it will be cost driven.

One of the growing costs are for PCBs. Going from 4 layers to 6 or from 6 to 8 cost money. And this cost is not likely to fall nearly as fast as using fast ram, which after 6 month can be bought at dumping prices. So fitting one extra 64 bit or 128 bit bus for the CPU will simply be too expensive.

My guess is that XBox 2 will use whatever flavour of DDR-DRAM that is in vogue (GDDR3) at the time of release, but with a speed of something like 75% of the fastest possible attainable. Same story as with Xbox really.

Cheers
Gubbi


CPU performance would probably suffer with a UMA based on GDDR-3. The latency would be to high. In a real world situation where the CPU is running game code, it's getting stressed by a lot of different things. Computer AI, networking code (online play), physics, and so on. Hiding latency is something Intel works hard at, but high performance memory is the only thing that will keep a CPU crunching numbers fast.

A CPU is highly latency sensitive. Lots of random stuff getting thrown at it.

A VPU is going to be more bandwidth sensitive. Problem solving for a VPU is very parallel.


Sure the Xbox 2 might have a UMA, but I don't see it being the ideal solution. The way Sony is designing the PS3 is very elegant and clean. What a segmented memory for the Xbox 2 would lack in beauty, would be made up in very high raw performance.
 
Brim:
That's why CPUs have caches. Big caches even.

GDDR3 latency isn't going to be anywhere near a showstopping problem, that I promise you.


*G*
 
...but what is "big"? Perhaps the "big" caches we are anticipating were "big" if they were on the CPU's we were using 5 years ago. Being that CPU's are always moving towards faster and larger in the future, these caches that seem "big" right now, may not really be all that "big" when they do see the light of day in the architecture of the day. Also consider that applications and data sets are becoming larger and more complex. Not only does a cache have to buffer the speed disparity/latency between CPU and main memory, it has to buffer working data sets that are becoming increasingly larger in of themself to tax these monstrous CPU's.

So what will effectively be "big" for a full-core speed L2-ish cache in the 2005/2006 timeframe? 1 MB? 2 MB? 16 MB? 64 MB?
 
randycat99 said:
...but what is "big"? Perhaps the "big" caches we are anticipating were "big" if they were on the CPU's we were using 5 years ago. Being that CPU's are always moving towards faster and larger in the future, these caches that seem "big" right now, may not really be all that "big" when they do see the light of day in the architecture of the day. Also consider that applications and data sets are becoming larger and more complex. Not only does a cache have to buffer the speed disparity/latency between CPU and main memory, it has to buffer working data sets that are becoming increasingly larger in of themself to tax these monstrous CPU's.

So what will effectively be "big" for a full-core speed L2-ish cache in the 2005/2006 timeframe? 1 MB? 2 MB? 16 MB? 64 MB?

I would guess an off the shelf x-86 Intel desktop CPU in the late 2005/2006 time frame would have 2 MB of cache. The size estimate is based off of how cache tends to double each new CPU generation.
 
...and I would say that 2 MB would be woefully behind what is really needed. That is completely an opinion, of course. I'm not really in the biz, anyway, so what do I know? Your projection is just as valid. :)

My guess is that there is some chicken and the egg going on in there, as well. CPU manufactures squeeze in the most embedded cache as they can afford given the die process they have and the brute size of the core CPU. Software developers optimize their applications to best utilize that cache size target to mask data starvation. So even if you did have more cache, a lot of applications still don't show much improvement. All the while desktop computers are called upon to do more complicated things, more things at once, and run even more elaborate OSes, all occupying a main memory heap in sizes as great as 256, 512, and even 1024 MB. It just strikes me as unlikely that all the necessary things out of all that could fit into a 2 MB, let alone 256 kB. That said, it seems that the potential of every wonderous generation of uber processors is inherently throttled (despite the astounding clockrate) simply by "limited-size" caches and then the memory bus behind it. With the caches we do get, it seems more like a practice of brinksmanship in adding juuuuust enough cache to keep things from stalling outright, achieving about 250 msec of burst performance to service the marketing numbers, and then leaning hard on the memory bus to get the real work done.

I guess what I'm trying to say is that in the 2005/2006 time window, I would think that local/embedded cache sizes really need to be in the double digits to remotely cope with what we plan to be doing on computers in that day and the sheer veracity of the processors that will exist in that day.
 
A cache in a modern MPU has a hitrate of what, 90-95% on average? That's with random, sucky, bloated PC code by the way, with streamlined game code targetted towards the CPU and using explicit prefetching that's only bound to go up.


*G*
 
I'm not so sure of that if the application and data are to radically blow up in size in the future. There's only so much predicting you can do before you simply cannot fit everything you need as quickly as you need it to run at full speed- at which point you plainly need more room. Streaming and prefetching are certainly a benefit, but they are simply measures that lean on the memory bus side of things (bandwidth may be reasonable, but latency to random/unpredictable access may cause considerable penalty). At that point, it's pretty much concluded that the notion of "storage" in a "cache" has been blown out.
 
Grall said:
A cache in a modern MPU has a hitrate of what, 90-95% on average? That's with random, sucky, bloated PC code by the way, with streamlined game code targetted towards the CPU and using explicit prefetching that's only bound to go up.


*G*

Sometimes that code is not too random, if it were caches would be relatively useless as the principle of locality would not hold true.

Give me SPRAM or e-DRAM and fast main memory bandwidth to the CPU and keep the 1-2 MB cache :)


Well, cache's performance does become size dependent with certain kinds of code.

Even in 3D graphics, processing time for vertices and pixels is growing, but the data sets that are streamed in and out of the Shaders ( to feed the growing processing power ability ) makes the I/O issue very hard to forget.

For the most part 3D graphics work in the stream model, there is not as temporal locality as much spatial locality and in several cases, where the data access gets very random, you have neither: of course in several cases you can sort things to make use of temporal locality like sorting geometry by textures does make a texture cache more useful.

Even then, if the number of textures used per frame grows you get into troubles: the principle works because there are not too many unique textures in your frame and they are not too big.

Let the size and the variety of those displayed textures grow and you will have to add a larger cache.

I am beginning to babble now, so I will better stop it here for now, still the fun in designing high performance processors is that there are lots of different ways of achieving the same goal ( that is, making a faster chip ).
 
Intel has a CPU coming out with more cache. A very intresting marketing move by Intel.

http://www.theinquirer.net/?article=11576


Could this be a hint of whats to come? Today more cache, tommorow another core? Start a new line of tweaked CPU's for the gamers/power-user market and start branding that like they did with the "Intel Inside" stuff. I could see them eventually marketing a dual core cpu and getting game developer intrest along with consumer intrest.

A dual core cpu with hyper-threading would be greased lightning for running games. Intel would need to create enough consumer awareness/demand to make it economically feasible. Power-users/gamers would want them for sure if games took advantage of the CPU. The big side bonus would be the inclusion in a console.



Just some speculation to where Intels marketing might be heading.
 
From the Intel Developer Forum (Anand's):

Multicore CPUs

The first of the “Tâ€￾s Intel covered was Hyper Threading. Intel sees the adoption of HT rising to 50% of the market in the not so distant future, and we were shown a few graphs of the enhancements HT provides in multithreading and multitasking. As you can tell, most of this has been said before, but there were some very exiting revelations made. It seems that with the speed desktop users have embraced HT, Intel has decided that it is well worth their time to bring dual core processors to the desktop.

...

Intel went on to reveal that the Xeon would receive a dual core version with the Tulsa core (due out after Potomac ). Like its predecessors, Tulsa will also feature Hyper Threading technology meaning that each of the two cores would appear as two CPUs to the OS. Given the close relationship we've seen between Xeon processors and the Pentium 4 desktops, it would be safe to assume that a dual core desktop CPU would be due out either around the same time as Tulsa or slightly later. Given the increased transistor count of Tulsa (twice as much as a single core CPU obviously, assuming they don't share caches), we may see a desktop introduction delayed a bit due to the price sensitivity of the desktop market.

Pentium 4 Extreme Edition

Intel just announced their first microprocessor aimed at the gaming market - the Pentium 4 Processor Extreme Edition. The Extreme Edition will be available in the next 30 - 60 days and will run at 3.2GHz.

The major improvement to the Extreme Edition over the current Pentium 4 is the inclusion of an on-die 2MB L3 cache. This on-die L3 cache is in addition to the 512KB L2 cache, giving the Extreme Edition a total of 2.5MB of on-die cache. Note that this is identical to the Xeon MP (Gallatin) core, other than the fact that we're talking about a CPU that runs at 3.2GHz and has an 800MHz FSB.

The 2MB on-die L3 cache takes the Northwood's 55 million transistors and balloons it to an incredible 108 million transistors, which is still less than the Prescott's 125 million transistors. What's important to note here is that although the Prescott has less than half of the cache of this new Pentium 4, it still has more transistors - giving you some insight into how much Intel enhanced the core. We will have some benchmarking time with the Extreme Edition very soon...

Dothan Specs

Intel also revealed a bit about Dothan, the 90nm follow-on to Banias (Pentium-M). Although they didn't reveal transistor counts, Intel did say that Dothan would feature an 87mm^2 die, which is quite small considering its 2MB L2 cache.

As for XCPU2, I seriously doubt we'd see a multi-core solution, due to costs mainly. However, given Dothan's (very) small die size, I'd think that would be a more probable solution. Intel's 90nm lines would be in full swing by 2005 too, unlike the 65nm line that would be reserved for Intel's high end high margin parts.
 
I believe it's a foregone conclusion that Intel (among others) will utilize the significant space afforded to them by process improvements by increasing the die % devoted to the cache hierarchy aswell as integrating multiple processing "cores" at some future point.

Unfortunatly, this doesn't negate the problems associated with superscalars and the x86 platform. An issue which has been discussed here in several pre-existing threads.
 
zurich said:
From the Intel Developer Forum (Anand's):

Multicore CPUs

The first of the “Tâ€￾s Intel covered was Hyper Threading. Intel sees the adoption of HT rising to 50% of the market in the not so distant future, and we were shown a few graphs of the enhancements HT provides in multithreading and multitasking. As you can tell, most of this has been said before, but there were some very exiting revelations made. It seems that with the speed desktop users have embraced HT, Intel has decided that it is well worth their time to bring dual core processors to the desktop.

...

Intel went on to reveal that the Xeon would receive a dual core version with the Tulsa core (due out after Potomac ). Like its predecessors, Tulsa will also feature Hyper Threading technology meaning that each of the two cores would appear as two CPUs to the OS. Given the close relationship we've seen between Xeon processors and the Pentium 4 desktops, it would be safe to assume that a dual core desktop CPU would be due out either around the same time as Tulsa or slightly later. Given the increased transistor count of Tulsa (twice as much as a single core CPU obviously, assuming they don't share caches), we may see a desktop introduction delayed a bit due to the price sensitivity of the desktop market.

Pentium 4 Extreme Edition

Intel just announced their first microprocessor aimed at the gaming market - the Pentium 4 Processor Extreme Edition. The Extreme Edition will be available in the next 30 - 60 days and will run at 3.2GHz.

The major improvement to the Extreme Edition over the current Pentium 4 is the inclusion of an on-die 2MB L3 cache. This on-die L3 cache is in addition to the 512KB L2 cache, giving the Extreme Edition a total of 2.5MB of on-die cache. Note that this is identical to the Xeon MP (Gallatin) core, other than the fact that we're talking about a CPU that runs at 3.2GHz and has an 800MHz FSB.

The 2MB on-die L3 cache takes the Northwood's 55 million transistors and balloons it to an incredible 108 million transistors, which is still less than the Prescott's 125 million transistors. What's important to note here is that although the Prescott has less than half of the cache of this new Pentium 4, it still has more transistors - giving you some insight into how much Intel enhanced the core. We will have some benchmarking time with the Extreme Edition very soon...

Dothan Specs

Intel also revealed a bit about Dothan, the 90nm follow-on to Banias (Pentium-M). Although they didn't reveal transistor counts, Intel did say that Dothan would feature an 87mm^2 die, which is quite small considering its 2MB L2 cache.

As for XCPU2, I seriously doubt we'd see a multi-core solution, due to costs mainly. However, given Dothan's (very) small die size, I'd think that would be a more probable solution. Intel's 90nm lines would be in full swing by 2005 too, unlike the 65nm line that would be reserved for Intel's high end high margin parts.


Sure a dual core cpu for a console would result in a very small margin of profit, but what about the cost to Intel if the Playstation 3 and Cell perform well? Intel will have a challenge to its own architectures from IBM/Sony/Toshiba 5 years down the road. It's worthwhile to provide a powerful CPU to Microsoft that will go head to head with CELL. The more money IBM/Sony/Toshiba make on the PS3, the more money they have for future R&D, and the more oppritunities they have to spread CELL in to other markets.

Going head to head against CELL would be a very wise move for Intel. Why would Intel want to make it easy for IBM/Sony/Toshiba to introduce a new architecture that threatens all of Intels designs?
 
Back
Top