Predict: The Next Generation Console Tech

AlphaWolf · Mar 3, 2011

corduroygt said:
An order of magnitude difference there.

but not just in price.

eastmen · Mar 3, 2011

corduroygt said:
You don't have to buy chips from a company that makes ARM, you can just license it yourself. You cannot do the same with an x86 chip unless Intel agrees. Besides, a quad core x86 chip with IGP would cost at least $100, while those ARM chips in tablets are only a few bucks a piece. An order of magnitude difference there.

So wait.

With the x86 chip Amd sells you the chip and you pay a fee per chip

With the arm chip , The company allows you to take the chip to produce it anywhere but you have to pay a fee per chip

So in the end the x86 chip has the cost to manufacture + the company mark up for the tech.
Arm has the liscence fee and the cost to manufacture.

I don't see a diffrence .

You claim that the Arm chips in tablets are a few bucks a piece but I don't see it . The tegra 2 will be in the $20-$30 range while the brazos is in the $30-$40 range

I also don't see a tegra 2 or even tegra 3 competing in performance with Llano

corduroygt · Mar 3, 2011

eastmen said:
So wait.

ARM license fee is 1.2%, good luck trying to convince AMD from 100%+ profit margin to just 1.2% profit margin and sell their chips to you instead of the laptop market.
http://www.eetimes.com/electronics-products/processors/4114549/ARM-raises-royalty-rate-with-Cortex

eastmen said:
You claim that the Arm chips in tablets are a few bucks a piece but I don't see it . The tegra 2 will be in the $20-$30 range while the brazos is in the $30-$40 range

Tegra 2 is $10-15:
http://semiaccurate.com/2010/08/12/tegra-3-tapes-out/

eastmen said:
I also don't see a tegra 2 or even tegra 3 competing in performance with Llano

Of course not, they're vastly different in size, power budget, and price.

AlphaWolf · Mar 3, 2011

corduroygt said:
Tegra 2 is $10-15:
http://semiaccurate.com/2010/08/12/tegra-3-tapes-out/:

While I don't doubt that tegra 2 resides somewhere under the $20 mark.... that's charlie...

eastmen · Mar 3, 2011

http://androinica.com/2011/03/01/is...roinica+(Androinica+-++A+Google+Android+Blog) isupply says diffrently

Of course not, they're vastly different in size, power budget, and price.

and so in what way will an arm chip be an option for Llano class power ?

Even if ninendo decided to go with a quad core offering the NGP isn't even able to match the power of the ps3 while rendering at lower resolutions . Llano is more than a step beyond what the ps3 is capable of displaying at 720p

corduroygt · Mar 3, 2011

eastmen said:
and so in what way will an arm chip be an option for Llano class power ?

Even if ninendo decided to go with a quad core offering the NGP isn't even able to match the power of the ps3 while rendering at lower resolutions . Llano is more than a step beyond what the ps3 is capable of displaying at 720p

They can do what MS did, license the GPU IP from AMD, CPU IP from IBM/ARM and make it themselves for the lowest cost without being hostage to any single manufacturing company or fab.

liolio · Mar 3, 2011

3dilettante said:
When the 360 came out, its 3.2 GHz tri-core was equated (roughly) with ~2 GHz desktop CPUs.
If the rough equivalence holds true, current Bobcats are already too slow, as the Athlon Neo at 1.3GHz would lag Xenon significantly..

I remember Capcom drew a comparison between Xenon and early dual core PIV running @3.4ghz. Bobcats perfs are indeed a bit below of those provided by Athlon II / Thurion designs.
Pentium 4 were inefficient and not "real" dual cores (so badly perf scaling in regard to extra core).
But I'm not sure that kind of comparision is fair, P4 as xenon does pretty well in FP calculations due to their SIMD running at high clock speed and on some workload clock speed rules. As more and more calculations are to be moved to the GPU (especially if it is on chip with low latency acces) I wonder if it's still relevant to look at CPU perfs "overall". What I get from the benchs I saw is that bobcat even being a narrow CPUs is way more efficient per cycle than either P4 or xenon. It looks like a good basis as far as X86 is concern to me (even-though there are room for improvements).

A module of Bobcat cores would be less acceptable. The cores are already miniscule, and since they are already narrow they don't have as much hardware to share in the front end or FPU. The 2-wide front end is going to have far better utilization than the 4-wide Bulldozer is able to share.

I'm not sure about what you mean, it's:
* You're not sure it make sense as the front end is cheap on a Bobcat? In which case I would tend not to agree as if they manage to old true to the: promises they did with BUlldozer (ie 80% of the perfs of a real dual set-up while tinier and consuming less power) it would be a win looking forward.
or
* a 2 wide decoder would not be enough to feed 2 bobcat cores? Would you consider a 3 wide decoder needed?
or
* a blend of both?

I don't know as high (or low sadly) resources utilization is in nowadays OoO X86 CPU nor in Xenon for instance, I just know that the average IPC is low (with bursts). I also don't believe next gen console will be about high single thread performances (more is nice tho). I wonder if consoles manufacturers should better consider average parts put them on one chip and make sure they compliment each others and work together really efficiently instead of considering better parts but on "insulation. More on that.

That's were specialized hardwares kick in.
I remember that some early 360 games were allocated almost one core to compression/decompression. GPUs now have their video processing engine, Intel went ahead and included an encoder/decoder into SnB. Looking forward could the next step to have a properly programmable unit on chip that would handle encoding/decoding compression/decompression? That's an investment in silicon but a win in perfs and power consumption.

I also remember reading lately about how PhysicX on X86 were still using X87 but also that using SIMD units didn't provide htat much of a benefit. I was simply a recompile but I get a growing feeling that whereas large SIMD and GFLOPS are geeks sweet dreams plenty of workloads relevant to video games including physics doesn't need that much number crushing. On the other hand when a workload is really vector friendly and if you have GPU on chip it may be a win to move the task to it. It will only get truer within the next 1 or 2 generations of GPUs (which is what we should get in a nextgen system).

Overall I question more and more the fact that brute force is what going to make the difference, the 360 proved it this gen by being a match to the ps3 while obliterated in raw power by the PS3, and it could get even truer next gen. More on that later.

The question is what would happen if the TDP limits were relaxed, and how flexibly the core can be redesigned since its design philosophy is to exploit more automated methods and generic circuit implementation.

Indeed it highly depend on where AMD is heading for their next bobcat rendition. I don't think MS Sony or Nintendo have what it to make the proper change on their own or pay AMD to do all the changes properly (lot of R&D to found).

My concern at this point is that generic watered-down x86 cores do not offer much over the relatively unimpressive console cores, so what's the compelling reason?

Well if pricing issue is put aside I'm not sure that either ARM of IBM would do that much better in this power budget. I belief that AMD and Intel are the really good at their job even if it's about designing a watered down X86.
There are also software consideration, development is still done on X86, there are plenty of tools, GPGPU is getting mature on X86 but I'm more iffy about that ARM is growing strong.

To be continued when I'm back from work.

3dilettante · Mar 3, 2011

liolio said:
I remember Capcom drew a comparison between Xenon and early dual core PIV running @3.4ghz. Bobcats perfs are indeed a bit below of those provided by Athlon II / Thurion designs.
Pentium 4 were inefficient and not "real" dual cores (so badly perf scaling in regard to extra core).

The Cedar Mill dual cores were decent enough for the desktop market, aside from power concerns unrelated to their not being "native" dual cores. The bottleneck at the FSB and the limitations of off-die memory controllers weren't as crippling there as they were for the server market and higher core counts.
At any rate, my point with this is that even right now, Bobcat would be a step back from an old console CPU.

What I get from the benchs I saw is that bobcat even being a narrow CPUs is way more efficient per cycle than either P4 or xenon. It looks like a good basis as far as X86 is concern to me (even-though there are room for improvements).

That is my point. Higher efficiency per-cycle means there is less idle time and fewer idle resources that can be freely obtained for a second thread. Instead, both threads will interfere with each others' actual work.

I'm not sure about what you mean, it's:
* You're not sure it make sense as the front end is cheap on a Bobcat? In which case I would tend not to agree as if they manage to old true to the: promises they did with BUlldozer (ie 80% of the perfs of a real dual set-up while tinier and consuming less power) it would be a win looking forward.
or
* a 2 wide decoder would not be enough to feed 2 bobcat cores? Would you consider a 3 wide decoder needed?
or
* a blend of both?

It would be a mixture of both. Part of AMD's justification for sharing the 4-wide front end is that it is difficult to fully utilize 4-wide issue on a sustained basis, because superscalar execution experiences serious diminishing returns as the width increases. The inverse of that is that utilization gets much better the narrower the width.
It is hard to sustain 4 instructions of throughput across many workloads. It is more common to be able to find at least 2. The percentage drop from fully separate dual cores to a module is going to be higher for Bobcat because it doesn't have the same amount of spare resources idling.

I don't know as high (or low sadly) resources utilization is in nowadays OoO X86 CPU nor in Xenon for instance, I just know that the average IPC is low (with bursts). I also don't believe next gen console will be about high single thread performances (more is nice tho).

The console space is far below where single-threaded improvements wouldn't be helpful. Bobcat right now is worse than what we have already.

I also remember reading lately about how PhysicX on X86 were still using X87 but also that using SIMD units didn't provide htat much of a benefit.

The only numbers we saw were straight recompiles of x87 code to user scalar SSE, not packed instructions. There were some slight improvements, and a few cases with modest improvement that was somewhat surprising because it was a simple recompile.
If the code were refactored to take advantage of SIMD, the improvement would have been much higher. At any rate, it would not be a good idea to discount SIMD on a CPU and not also be skeptical of some of the claims for the much wider and harder to utilize SIMD width in GPGPUs.

Well if pricing issue is put aside I'm not sure that either ARM of IBM would do that much better in this power budget. I belief that AMD and Intel are the really good at their job even if it's about designing a watered down X86.

The price and cost issue is a big factor for the console space. Bobcat is also unacceptably slow, and it is 4-5 years newer than the unchanged pipelines of the Xenon and Cell cores. If IBM could beat Bobcat with a chip it designed 5 years ago, it doesn't show that AMD offers anything better.

There are also software consideration, development is still done on X86, there are plenty of tools, GPGPU is getting mature on X86 but I'm more iffy about that ARM is growing strong.

There could be some benefit to running the same code on the dev and console machine, but the console makers also have little desire to allow their console games to run on PCs.
GPGPU would require more design work for Bobcat. The bandwidth of the CPU/GPU interface doesn't seem to be all that high.

eastmen · Mar 3, 2011

corduroygt said:
They can do what MS did, license the GPU IP from AMD, CPU IP from IBM/ARM and make it themselves for the lowest cost without being hostage to any single manufacturing company or fab.

The problem is your quoting processors running at 1ghz. Your going to need to clock a tegra 3 (which is twice the size of tegra 2 btw and thus will cost roughly twice as much) at much higher speeds which will eat into yeilds and raise costs more .

Also amd is producing zacarate chips at TMSC so there is no reason nintendo would be held hostage by any single manufacturing company.

eastmen · Mar 3, 2011

The price and cost issue is a big factor for the console space. Bobcat is also unacceptably slow, and it is 4-5 years newer than the unchanged pipelines of the Xenon and Cell cores. If IBM could beat Bobcat with a chip it designed 5 years ago, it doesn't show that AMD offers anything better.

well accept that bobcat is a complete solution and amd is selling chips in the $30 range for these products.

Llano will be a better choice as it will target a more expensive and higher performing area.

Of course if nintendo wanted just a bunch of bobcat cores they could fit 4 of them in the same space as the current 2+ gpu design amd is currently using and then devote another full chip to the gpu

Also at 32nm later this year bobcat should see core improvements making it more competetive and a larger gpu also.

3dilettante · Mar 3, 2011

eastmen said:
Also at 32nm later this year bobcat should see core improvements making it more competetive and a larger gpu also.

I'm under the impression that Bobcat would shrink to the 28nm TSMC (GF?) process. 32nm is the SOI high-performance process at GF.

eastmen · Mar 3, 2011

3dilettante said:
I'm under the impression that Bobcat would shrink to the 28nm TSMC (GF?) process. 32nm is the SOI high-performance process at GF.

I'm not sure , right now its on 40nm tsmc. It may go to 28nm at TSMC but orders have been really good for them , they may want to push out a faster verison with lower power draw quicker than what 28nm may allow .

corduroygt · Mar 3, 2011

eastmen said:
well accept that bobcat is a complete solution and amd is selling chips in the $30 range for these products.

Why buy useless and slow x86 cores from AMD that clutter up the GPU bandwidth, especially since AMD also sells standalone GPUs and GPU IP? How about just getting the GPU part and finding a much better option than AMD cpus? 2.5-3 Ghz A15 would be a very good alternative.

liolio · Mar 3, 2011

3dilettante said:
The Cedar Mill dual cores were decent enough for the desktop market, aside from power concerns unrelated to their not being "native" dual cores. The bottleneck at the FSB and the limitations of off-die memory controllers weren't as crippling there as they were for the server market and higher core counts.
At any rate, my point with this is that even right now, Bobcat would be a step back from an old console CPU.

To be honest I agree about Bobcat V1 to not be good enough "overall", actually it's a grief of mine against AMD even-though they are a commercial success.
But where I disagree that the "core" of the core (I mean front end, Integer unit, I put FP/SIMD aside). On some benchmarks it beats an atom running at the same speed significantly, I'm not sure that per clock xenon is better than atom I may even be tempted to give Atom an slightly edge (Intel usually does really well with caches, the simd is as wide the one in xenon and can handle integers, etc.). Atom hold its own (vs Bobcat) on various benchmarks and tasks but I'm not sure about the reasons. The FP/SIMD in Bobcat is really tiny going by the early floor plan of a die AMD released, and then there are the case where SMT does well and lastly optimizations from Intel (which no matter the late law suit it's still early to dismiss). So my belief is/was that the basis of bobcat is an improvement in single thread performances vs what we have in consoles PowerPC (per cycle at least).

I would also drew a comparison between the Cedar Mills and Bobcats as they stand Bobcats are not really meant to scale as far as the number of cores is concerned. A console manufacturer wanting to put more than 2 cores (they would want it for sure) may ask AMD to consider an "uncore" more akin to the ones we find in Intel SnB.

That is my point. Higher efficiency per-cycle means there is less idle time and fewer idle resources that can be freely obtained for a second thread. Instead, both threads will interfere with each others' actual work.

It would be a mixture of both. Part of AMD's justification for sharing the 4-wide front end is that it is difficult to fully utilize 4-wide issue on a sustained basis, because superscalar execution experiences serious diminishing returns as the width increases. The inverse of that is that utilization gets much better the narrower the width.
It is hard to sustain 4 instructions of throughput across many workloads. It is more common to be able to find at least 2. The percentage drop from fully separate dual cores to a module is going to be higher for Bobcat because it doesn't have the same amount of spare resources idling.

The console space is far below where single-threaded improvements wouldn't be helpful. Bobcat right now is worse than what we have already.

I'm not sure I agree if AMD delivers with Bulldozer they said that when only one thread is active it got access to all the share resources for it-self. My idea might be wrong is that IPC is usually low on average (x<1) the odds for the two thread having a "burst moment" moment at the same time are low. On the other hand that mean that during the burst moment the CPU won't make up as well for the moment where IPC is really (X<<1). My idea as games have become heavily threaded is to trade a bit execution time (so single thread perfs) for hardware utilization. It's not that I question the need for more single thread perfs for consoles but because I belief the Powerpc In nowadays sucks way more than the "overall" perfs would let think (not two mention the hell of optimization console developers put in their codes). On top of it Xenon seems to suffer from multiple caveats, LHS, cache traching, SIMD can't handle integers, etc. that's some devs complained about often here.
So as Im' considering where AMD could go with forward I believe the basis is an improvement and that they achieve the "good enough overall" category without exploding their power or silicon budget (at least per cycle). More on that.

The only numbers we saw were straight recompiles of x87 code to user scalar SSE, not packed instructions. There were some slight improvements, and a few cases with modest improvement that was somewhat surprising because it was a simple recompile.
If the code were refactored to take advantage of SIMD, the improvement would have been much higher. At any rate, it would not be a good idea to discount SIMD on a CPU and not also be skeptical of some of the claims for the much wider and harder to utilize SIMD width in GPGPUs.

Here the floor plan of bobcat from AMD:

Actually as it is so tiny and have room for many architectural improvements (vs the wide sophisticated cores in bulldozer). Say they put a single top notch 4 wide SIMD instead of what looks to be a shabby one. they improve the front end a bit too, I believe they would have a chance two higher than the 80% of a real CMP setup (their promise for bulldozer) while still landing a nice win in die size.

I did not want to give the fake impression that I consider SIMD irrelevant, I've actually been wondering if share the between two cores (even though they would share the front end) would not be a good way to make the most of what is an expansive resource (on SnB the 256 wide are huge, even on prior designs the 4 wide was way meatier than the thing in bobcat). You will say it a bit like SMT? Well yes let it would be a blend between Intel SMT and AMD CMT philosophies, it's bit more expansive hardware wise but during "not that stressing moments" threads have more resources for themselves.
I got also another possibly weird idea when watching for instance at engine like the frost engine 2. They really on many tasks they try to make them as asynchronous as possible. They try to do as many things as possible "in parallel". For PS3 it looks like they are doing great things. So this got me wondering. Could such "bulldorized" Bobcat would be well suited for the job? Say you hace to threads with their own resources (even spread a bit thin) feed on pools of tasks them selves feeding the share SIMD units (4 wide but potent). The idea is that those cores are cleverer than say SPU and they could be more autonomous. The modules could run a runtime so developers could care less about optimizations there is runtime then the modules (and cores within them) are clever enough to handle the task developers throw at them. Developers would not have to think about I use 2 SPUs at a time for this while 1 and 2 others ones for this and this.

I also agree (by reading many comments) that GPGPU is still tricky. I think they have to evolve too but that another matter. and there's room to do better even within actual silicon budgets. If technics as SRAA and MLAA gain traction could the manufacturer invest less silicon in ROPs/RBEs? The saved silicon should be spend on making the GPUs more valid "generic" computing devices. If fusions chip are to gain traction either in the embedded, the consoles and the PC realms the reason to do this (as well as may sacrifying a bit of perf per mm2) will grow. It would be to sad to pass on the opportunity to use the power GPUs can provide when the result of such power is only some cycles away from the chip cache.

The price and cost issue is a big factor for the console space. Bobcat is also unacceptably slow, and it is 4-5 years newer than the unchanged pipelines of the Xenon and Cell cores. If IBM could beat Bobcat with a chip it designed 5 years ago, it doesn't show that AMD offers anything better.

Well as I said without changes I agree to that; I just want to see opportunities ( I think Bobcat as it is now is not enough of a step up from Atom and on their core market I could see Intel react "meanly" sooner than later and then it will take AMD longer to react than for Intel to push its next chip.. on a better process on top of it).

There could be some benefit to running the same code on the dev and console machine, but the console makers also have little desire to allow their console games to run on PCs.
GPGPU would require more design work for Bobcat. The bandwidth of the CPU/GPU interface doesn't seem to be all that high.

Indeed, ok bobcat is designed for budget computers but I feel like AMD still lacked ambitions for it looking forward.

Tahir2 · Mar 3, 2011

Bobcat is perfect for its target market - simple applications with good GPU acceleration to speed up Flash/3D games online/video playback. AMD did an excellent job with Bobcat - it is a tiny core (smaller than Atom) but there is no way AMD were about to target the netbook market with a processor that was over-designed (for its target market) and thus too big/expensive. This would also eat into notebook sales.

Bobcat = tablet/netbook x86 CPU with strong GPU
Atom = tablet/netbook x86 CPU with weak chipset and GPU

Xenon = High performance console CPU of its era
P4 = desktop performance part of its era
Athlon = higher IPC desktop performance CPU of its era

AMD would be better off offering a tweaked Phenom II core to any console manufacturer - keep the integrated GPU (400SP?) and add a discrete GPU with high external bandwidth (GDDR5).

As to development costs, it seems that nowadays ISA is not important - the compiler and development tools rule and the ARM ISA have caught up with regards to both simply due to their popularity in the mobile markets.

By 2014 perhaps MIPS will have something ready to talk to the Console manufacturers about. My personal opinion is that it will IBM with the best deal again (by best it really means cost/performance).

liolio · Mar 3, 2011

Tahir2 said:
Bobcat is perfect for its target market - simple applications with good GPU acceleration to speed up Flash/3D games online/video playback. AMD did an excellent job with Bobcat - it is a tiny core (smaller than Atom) but there is no way AMD were about to target the netbook market with a processor that was over-designed (for its target market) and thus too big/expensive. This would also eat into notebook sales.

Well it's a success right but for how long? It's late to begin with. Then it's not about over-designing it but make it better. Intel won't be long to react. I consider Moorestown as the most impressive Atom design to date, more impressive than bobcat for instance. Too bad it was still not good enough for its intended market but Intel will recycle the design to make it a match for netbook.
I expect Intel to fight back Bobcats and with more than respectable success sooner than later. They have what it takes now even without relying on PowerVR IP.
What it could be?
One core ~2GHz
Healthy amount of cache
one of their new GPU
the encoding/decoding device you find in SnB
That would be a serious threat to bobcat imho and @45nm. If Intel decide to jump on the guns and move to 32nm...

Bobcat = tablet/netbook x86 CPU with strong GPU
Atom = tablet/netbook x86 CPU with weak chipset and GPU <= for now, may not last that long

Xenon = High performance console CPU of its era <= sucky honestly
P4 = desktop performance part of its era <= thank to pretty unfair practices Athlon was way better
Athlon = higher IPC desktop performance CPU of its era

AMD would be better off offering a tweaked Phenom II core to any console manufacturer - keep the integrated GPU (400SP?) and add a discrete GPU with high external bandwidth (GDDR5).
Too big for what it achieves imho

As to development costs, it seems that nowadays ISA is not important - the compiler and development tools rule and the ARM ISA have caught up with regards to both simply due to their popularity in the mobile markets.
Agreed
By 2014 perhaps MIPS will have something ready to talk to the Console manufacturers about. My personal opinion is that it will IBM with the best deal again (by best it really means cost/performance).

I don't have any hope in MIPS. Well IBM is a huge possibility but one would have to found healthy R&D costs. Sony did last gen, Ms not that much they benefit a lot on researches done by STI.

In any case if Bobcat proved something is that OoO can be done for cheap and that it seems provide benefits more evenly across a wide range of applications than SMT.
OoO FTW²

3dilettante · Mar 3, 2011

liolio said:
On some benchmarks it beats an atom running at the speed significantly, I'm not sure that per clock xenon is better than atom I may even be tempted to give Atom an slightly edge (Intel usually does really well with caches, the simd is as wide the one in xenon and can handle integers, etc.).

Per-clock is one part of the overall equation. Neither Atom or Bobcat can get near the clock speed of Xenon, so even a modest per-clock advantage does not mean there is an overall improvement.

Atom hold its own (vs Bobcat) on various benchmarks and tasks but I'm not sure about the reasons. The FP/SIMD in Bobcat is really tiny going by the early floor plan of a die AMD released, and then there are the case where SMT does well and lastly optimizations from Intel (which no matter the late law suit it's still early to dismiss). So my belief is/was that the basis of bobcat is an improvement in single thread performances vs what we have in consoles PowerPC (per cycle at least).

There are also situations where OoOE does not yield significant performance gains, such as when there are memory latencies too large to hide, or the code is scheduled well enough that minimal reordering is necessary.
In areas of code with low ILP like some long chains of pointer-chasing, clock speed and the memory subsystem become dominant, and other parts of the architecture do not increase performance.
Without a redesign, Bobcat is not capable of scaling its clock close enough to Xenon to make up for the large clock disparity, at least not at acceptable voltages.

I would also drew a comparison between the Cedar Mills and Bobcats as they stand Bobcats are not really meant to scale as far as the number of cores is concerned. A console manufacturer wanting to put more than 2 cores (they would want it for sure) may ask AMD to consider an "uncore" more akin to the ones we find in Intel SnB.

That might require some additional engineering. AMD does not have an uncore like Sandy Bridge, nor does it seem to have one planned for a while. It tends to use a crossbar that takes up a fair amount of space and is less scalable to higher client counts. The modularized Bulldozer keeps the crossbar client count lower by making the pairs of cores share an interface.

I'm not sure I agree if AMD delivers with Bulldozer they said that when only one thread is active it got access to all the share resources for it-self. My idea might be wrong is that IPC is usually low on average (x<1) the odds for the two thread having a "burst moment" moment at the same time are low.

The average IPC for a number of workloads for Athlon was around 1 per cycle, at least on some desktop media applications when Anandtech looked into it a while ago.
There are low periods and burst periods. If a core is narrow, and if it is fighting for shared resources, those burst periods take longer. The probability, particularly under load conditions, goes higher because the bursts take longer to get through.

On the other hand that mean that during the burst moment the CPU won't make up as well for the moment where IPC is really (X<<1). My idea as games have become heavily threaded is to trade a bit execution time (so single thread perfs) for hardware utilization.

There are limits to what can be gained by adding more threads. The more thread-level parallelism is exploited, the more serial components dominate. Console single-threaded performance is not yet so high that it can be ignored, much less made worse.

Actually as it is so tiny and have room for many architectural improvements (vs the wide sophisticated cores in bulldozer). Say they put a single top notch 4 wide SIMD instead of what looks to be a shabby one. they improve the front end a bit too, I believe they would have a chance two higher than the 80% of a real CMP setup (their promise for bulldozer) while still landing a nice win in die size.

If you are stating that if Bobcat were redesigned so it was more like Bulldozer and had more resources to burn that it would benefit more from being put into a module, then I have not said anything to the contrary.

I did not want to give the fake impression that I consider SIMD irrelevant, I've actually been wondering if share the between two cores (even though they would share the front end) would not be a good way to make the most of what is an expansive resource (on SnB the 256 wide are huge, even on prior designs the 4 wide was way meatier than the thing in bobcat).

AVX has 8-wide SIMD, not 256.
Fermi and Cayman have 16-wide units, but due to their batch sizes, their minimum granularity is 32 and 64, respectively.

I also agree (by reading many comments) that GPGPU is still tricky. I think they have to evolve too but that another matter. and there's room to do better even within actual silicon budgets. If technics as SRAA and MLAA gain traction could the manufacturer invest less silicon in ROPs/RBEs?

The cores running the MLAA algorithm are most likely larger than the silicon found in the ROPs.
SRAA in particular uses higher-resolution buffers, which would take longer to build if the ROPs were scaled back.

3dcgi · Mar 4, 2011

corduroygt said:
ARM license fee is 1.2%, good luck trying to convince AMD from 100%+ profit margin to just 1.2% profit margin and sell their chips to you instead of the laptop market.
http://www.eetimes.com/electronics-products/processors/4114549/ARM-raises-royalty-rate-with-Cortex

Even Intel is far less than 100% profit margins. If you're going to attempt to use accurate numbers for ARM at least try to do the same for the competition. You're also ignoring the costs of manufacturing and testing the chips which evens things up a bit.

corduroygt · Mar 4, 2011

3dcgi said:
Even Intel is far less than 100% profit margins. If you're going to attempt to use accurate numbers for ARM at least try to do the same for the competition. You're also ignoring the costs of manufacturing and testing the chips which evens things up a bit.

It's about 65% for Intel over all the chips they sell, which means it's much higher than 65% on a sandy bridge, and lower than that on a C2D. We all know what happened with using a x86 on the Xbox.

eastmen · Mar 4, 2011

corduroygt said:
Why buy useless and slow x86 cores from AMD that clutter up the GPU bandwidth, especially since AMD also sells standalone GPUs and GPU IP? How about just getting the GPU part and finding a much better option than AMD cpus? 2.5-3 Ghz A15 would be a very good alternative.

I will break it down for you.

1) x86 slow cores may be all you need to out due to current gen consoles esp when coupled with a solid gpu

2) A bobcat or Llano would be much cheaper than getting a gpu and then another chip.

3) What are the yields of a 2.5-3ghz A15 processor ? Are they even made yet ? Are they on 40nm ? Where are they. If they require 28nm then your going to be paying moe for process and don't forget that the amd APU's can be put on lower processors also.

I've said before but i think if nintendo wants to pull a wii 2 and make something just a bit more powerful than the xbox 360/ps3 Llano will be a good choice. its a quad core x86 chip so they have acess to a ton of engines and tools already designed with it in mind and it has a radeon 5x00 class gpu attached to it. They might be able to get more performance with a dedicated cpu and a dedicated gpu but they wont get it at the prices that they can get Llano for.

Going with Arm will mean either having a very low power console or taking ARM chips some where they haven't been before which is very high clocks in a console. I'm not convinced that nintendo will be able to get quad core A15s at 2.5-3Ghz and certianly not at the costs they would be able to get Llano for.

Predict: The Next Generation Console Tech

AlphaWolf

Specious Misanthrope

eastmen

corduroygt

AlphaWolf

Specious Misanthrope

eastmen

corduroygt

liolio

Aquoiboniste

3dilettante

eastmen

eastmen

3dilettante

eastmen

corduroygt

liolio

Aquoiboniste

Tahir2

liolio

Aquoiboniste

3dilettante

3dcgi

corduroygt

eastmen

Similar threads