An order of magnitude difference there.
but not just in price.
An order of magnitude difference there.
You don't have to buy chips from a company that makes ARM, you can just license it yourself. You cannot do the same with an x86 chip unless Intel agrees. Besides, a quad core x86 chip with IGP would cost at least $100, while those ARM chips in tablets are only a few bucks a piece. An order of magnitude difference there.
ARM license fee is 1.2%, good luck trying to convince AMD from 100%+ profit margin to just 1.2% profit margin and sell their chips to you instead of the laptop market.So wait.
Tegra 2 is $10-15:You claim that the Arm chips in tablets are a few bucks a piece but I don't see it . The tegra 2 will be in the $20-$30 range while the brazos is in the $30-$40 range
Of course not, they're vastly different in size, power budget, and price.I also don't see a tegra 2 or even tegra 3 competing in performance with Llano
Tegra 2 is $10-15:
http://semiaccurate.com/2010/08/12/tegra-3-tapes-out/:
Of course not, they're vastly different in size, power budget, and price.
and so in what way will an arm chip be an option for Llano class power ?
Even if ninendo decided to go with a quad core offering the NGP isn't even able to match the power of the ps3 while rendering at lower resolutions . Llano is more than a step beyond what the ps3 is capable of displaying at 720p
I remember Capcom drew a comparison between Xenon and early dual core PIV running @3.4ghz. Bobcats perfs are indeed a bit below of those provided by Athlon II / Thurion designs.When the 360 came out, its 3.2 GHz tri-core was equated (roughly) with ~2 GHz desktop CPUs.
If the rough equivalence holds true, current Bobcats are already too slow, as the Athlon Neo at 1.3GHz would lag Xenon significantly..
I'm not sure about what you mean, it's:A module of Bobcat cores would be less acceptable. The cores are already miniscule, and since they are already narrow they don't have as much hardware to share in the front end or FPU. The 2-wide front end is going to have far better utilization than the 4-wide Bulldozer is able to share.
Indeed it highly depend on where AMD is heading for their next bobcat rendition. I don't think MS Sony or Nintendo have what it to make the proper change on their own or pay AMD to do all the changes properly (lot of R&D to found).The question is what would happen if the TDP limits were relaxed, and how flexibly the core can be redesigned since its design philosophy is to exploit more automated methods and generic circuit implementation.
Well if pricing issue is put aside I'm not sure that either ARM of IBM would do that much better in this power budget. I belief that AMD and Intel are the really good at their job even if it's about designing a watered down X86.My concern at this point is that generic watered-down x86 cores do not offer much over the relatively unimpressive console cores, so what's the compelling reason?
The Cedar Mill dual cores were decent enough for the desktop market, aside from power concerns unrelated to their not being "native" dual cores. The bottleneck at the FSB and the limitations of off-die memory controllers weren't as crippling there as they were for the server market and higher core counts.I remember Capcom drew a comparison between Xenon and early dual core PIV running @3.4ghz. Bobcats perfs are indeed a bit below of those provided by Athlon II / Thurion designs.
Pentium 4 were inefficient and not "real" dual cores (so badly perf scaling in regard to extra core).
That is my point. Higher efficiency per-cycle means there is less idle time and fewer idle resources that can be freely obtained for a second thread. Instead, both threads will interfere with each others' actual work.What I get from the benchs I saw is that bobcat even being a narrow CPUs is way more efficient per cycle than either P4 or xenon. It looks like a good basis as far as X86 is concern to me (even-though there are room for improvements).
It would be a mixture of both. Part of AMD's justification for sharing the 4-wide front end is that it is difficult to fully utilize 4-wide issue on a sustained basis, because superscalar execution experiences serious diminishing returns as the width increases. The inverse of that is that utilization gets much better the narrower the width.I'm not sure about what you mean, it's:
* You're not sure it make sense as the front end is cheap on a Bobcat? In which case I would tend not to agree as if they manage to old true to the: promises they did with BUlldozer (ie 80% of the perfs of a real dual set-up while tinier and consuming less power) it would be a win looking forward.
or
* a 2 wide decoder would not be enough to feed 2 bobcat cores? Would you consider a 3 wide decoder needed?
or
* a blend of both?
The console space is far below where single-threaded improvements wouldn't be helpful. Bobcat right now is worse than what we have already.I don't know as high (or low sadly) resources utilization is in nowadays OoO X86 CPU nor in Xenon for instance, I just know that the average IPC is low (with bursts). I also don't believe next gen console will be about high single thread performances (more is nice tho).
The only numbers we saw were straight recompiles of x87 code to user scalar SSE, not packed instructions. There were some slight improvements, and a few cases with modest improvement that was somewhat surprising because it was a simple recompile.I also remember reading lately about how PhysicX on X86 were still using X87 but also that using SIMD units didn't provide htat much of a benefit.
The price and cost issue is a big factor for the console space. Bobcat is also unacceptably slow, and it is 4-5 years newer than the unchanged pipelines of the Xenon and Cell cores. If IBM could beat Bobcat with a chip it designed 5 years ago, it doesn't show that AMD offers anything better.Well if pricing issue is put aside I'm not sure that either ARM of IBM would do that much better in this power budget. I belief that AMD and Intel are the really good at their job even if it's about designing a watered down X86.
There could be some benefit to running the same code on the dev and console machine, but the console makers also have little desire to allow their console games to run on PCs.There are also software consideration, development is still done on X86, there are plenty of tools, GPGPU is getting mature on X86 but I'm more iffy about that ARM is growing strong.
They can do what MS did, license the GPU IP from AMD, CPU IP from IBM/ARM and make it themselves for the lowest cost without being hostage to any single manufacturing company or fab.
The price and cost issue is a big factor for the console space. Bobcat is also unacceptably slow, and it is 4-5 years newer than the unchanged pipelines of the Xenon and Cell cores. If IBM could beat Bobcat with a chip it designed 5 years ago, it doesn't show that AMD offers anything better.
I'm under the impression that Bobcat would shrink to the 28nm TSMC (GF?) process. 32nm is the SOI high-performance process at GF.Also at 32nm later this year bobcat should see core improvements making it more competetive and a larger gpu also.
I'm under the impression that Bobcat would shrink to the 28nm TSMC (GF?) process. 32nm is the SOI high-performance process at GF.
Why buy useless and slow x86 cores from AMD that clutter up the GPU bandwidth, especially since AMD also sells standalone GPUs and GPU IP? How about just getting the GPU part and finding a much better option than AMD cpus? 2.5-3 Ghz A15 would be a very good alternative.well accept that bobcat is a complete solution and amd is selling chips in the $30 range for these products.
To be honest I agree about Bobcat V1 to not be good enough "overall", actually it's a grief of mine against AMD even-though they are a commercial success.The Cedar Mill dual cores were decent enough for the desktop market, aside from power concerns unrelated to their not being "native" dual cores. The bottleneck at the FSB and the limitations of off-die memory controllers weren't as crippling there as they were for the server market and higher core counts.
At any rate, my point with this is that even right now, Bobcat would be a step back from an old console CPU.
That is my point. Higher efficiency per-cycle means there is less idle time and fewer idle resources that can be freely obtained for a second thread. Instead, both threads will interfere with each others' actual work.
It would be a mixture of both. Part of AMD's justification for sharing the 4-wide front end is that it is difficult to fully utilize 4-wide issue on a sustained basis, because superscalar execution experiences serious diminishing returns as the width increases. The inverse of that is that utilization gets much better the narrower the width.
It is hard to sustain 4 instructions of throughput across many workloads. It is more common to be able to find at least 2. The percentage drop from fully separate dual cores to a module is going to be higher for Bobcat because it doesn't have the same amount of spare resources idling.
I'm not sure I agree if AMD delivers with Bulldozer they said that when only one thread is active it got access to all the share resources for it-self. My idea might be wrong is that IPC is usually low on average (x<1) the odds for the two thread having a "burst moment" moment at the same time are low. On the other hand that mean that during the burst moment the CPU won't make up as well for the moment where IPC is really (X<<1). My idea as games have become heavily threaded is to trade a bit execution time (so single thread perfs) for hardware utilization. It's not that I question the need for more single thread perfs for consoles but because I belief the Powerpc In nowadays sucks way more than the "overall" perfs would let think (not two mention the hell of optimization console developers put in their codes). On top of it Xenon seems to suffer from multiple caveats, LHS, cache traching, SIMD can't handle integers, etc. that's some devs complained about often here.The console space is far below where single-threaded improvements wouldn't be helpful. Bobcat right now is worse than what we have already.
Here the floor plan of bobcat from AMD:The only numbers we saw were straight recompiles of x87 code to user scalar SSE, not packed instructions. There were some slight improvements, and a few cases with modest improvement that was somewhat surprising because it was a simple recompile.
If the code were refactored to take advantage of SIMD, the improvement would have been much higher. At any rate, it would not be a good idea to discount SIMD on a CPU and not also be skeptical of some of the claims for the much wider and harder to utilize SIMD width in GPGPUs.
Well as I said without changes I agree to that; I just want to see opportunities ( I think Bobcat as it is now is not enough of a step up from Atom and on their core market I could see Intel react "meanly" sooner than later and then it will take AMD longer to react than for Intel to push its next chip.. on a better process on top of it).The price and cost issue is a big factor for the console space. Bobcat is also unacceptably slow, and it is 4-5 years newer than the unchanged pipelines of the Xenon and Cell cores. If IBM could beat Bobcat with a chip it designed 5 years ago, it doesn't show that AMD offers anything better.
Indeed, ok bobcat is designed for budget computers but I feel like AMD still lacked ambitions for it looking forward.There could be some benefit to running the same code on the dev and console machine, but the console makers also have little desire to allow their console games to run on PCs.
GPGPU would require more design work for Bobcat. The bandwidth of the CPU/GPU interface doesn't seem to be all that high.
Well it's a success right but for how long? It's late to begin with. Then it's not about over-designing it but make it better. Intel won't be long to react. I consider Moorestown as the most impressive Atom design to date, more impressive than bobcat for instance. Too bad it was still not good enough for its intended market but Intel will recycle the design to make it a match for netbook.Bobcat is perfect for its target market - simple applications with good GPU acceleration to speed up Flash/3D games online/video playback. AMD did an excellent job with Bobcat - it is a tiny core (smaller than Atom) but there is no way AMD were about to target the netbook market with a processor that was over-designed (for its target market) and thus too big/expensive. This would also eat into notebook sales.
I don't have any hope in MIPS. Well IBM is a huge possibility but one would have to found healthy R&D costs. Sony did last gen, Ms not that much they benefit a lot on researches done by STI.Bobcat = tablet/netbook x86 CPU with strong GPU
Atom = tablet/netbook x86 CPU with weak chipset and GPU <= for now, may not last that long
Xenon = High performance console CPU of its era <= sucky honestly
P4 = desktop performance part of its era <= thank to pretty unfair practices Athlon was way better
Athlon = higher IPC desktop performance CPU of its era
AMD would be better off offering a tweaked Phenom II core to any console manufacturer - keep the integrated GPU (400SP?) and add a discrete GPU with high external bandwidth (GDDR5).
Too big for what it achieves imho
As to development costs, it seems that nowadays ISA is not important - the compiler and development tools rule and the ARM ISA have caught up with regards to both simply due to their popularity in the mobile markets.
Agreed
By 2014 perhaps MIPS will have something ready to talk to the Console manufacturers about. My personal opinion is that it will IBM with the best deal again (by best it really means cost/performance).
Per-clock is one part of the overall equation. Neither Atom or Bobcat can get near the clock speed of Xenon, so even a modest per-clock advantage does not mean there is an overall improvement.On some benchmarks it beats an atom running at the speed significantly, I'm not sure that per clock xenon is better than atom I may even be tempted to give Atom an slightly edge (Intel usually does really well with caches, the simd is as wide the one in xenon and can handle integers, etc.).
There are also situations where OoOE does not yield significant performance gains, such as when there are memory latencies too large to hide, or the code is scheduled well enough that minimal reordering is necessary.Atom hold its own (vs Bobcat) on various benchmarks and tasks but I'm not sure about the reasons. The FP/SIMD in Bobcat is really tiny going by the early floor plan of a die AMD released, and then there are the case where SMT does well and lastly optimizations from Intel (which no matter the late law suit it's still early to dismiss). So my belief is/was that the basis of bobcat is an improvement in single thread performances vs what we have in consoles PowerPC (per cycle at least).
That might require some additional engineering. AMD does not have an uncore like Sandy Bridge, nor does it seem to have one planned for a while. It tends to use a crossbar that takes up a fair amount of space and is less scalable to higher client counts. The modularized Bulldozer keeps the crossbar client count lower by making the pairs of cores share an interface.I would also drew a comparison between the Cedar Mills and Bobcats as they stand Bobcats are not really meant to scale as far as the number of cores is concerned. A console manufacturer wanting to put more than 2 cores (they would want it for sure) may ask AMD to consider an "uncore" more akin to the ones we find in Intel SnB.
The average IPC for a number of workloads for Athlon was around 1 per cycle, at least on some desktop media applications when Anandtech looked into it a while ago.I'm not sure I agree if AMD delivers with Bulldozer they said that when only one thread is active it got access to all the share resources for it-self. My idea might be wrong is that IPC is usually low on average (x<1) the odds for the two thread having a "burst moment" moment at the same time are low.
There are limits to what can be gained by adding more threads. The more thread-level parallelism is exploited, the more serial components dominate. Console single-threaded performance is not yet so high that it can be ignored, much less made worse.On the other hand that mean that during the burst moment the CPU won't make up as well for the moment where IPC is really (X<<1). My idea as games have become heavily threaded is to trade a bit execution time (so single thread perfs) for hardware utilization.
If you are stating that if Bobcat were redesigned so it was more like Bulldozer and had more resources to burn that it would benefit more from being put into a module, then I have not said anything to the contrary.Actually as it is so tiny and have room for many architectural improvements (vs the wide sophisticated cores in bulldozer). Say they put a single top notch 4 wide SIMD instead of what looks to be a shabby one. they improve the front end a bit too, I believe they would have a chance two higher than the 80% of a real CMP setup (their promise for bulldozer) while still landing a nice win in die size.
AVX has 8-wide SIMD, not 256.I did not want to give the fake impression that I consider SIMD irrelevant, I've actually been wondering if share the between two cores (even though they would share the front end) would not be a good way to make the most of what is an expansive resource (on SnB the 256 wide are huge, even on prior designs the 4 wide was way meatier than the thing in bobcat).
The cores running the MLAA algorithm are most likely larger than the silicon found in the ROPs.I also agree (by reading many comments) that GPGPU is still tricky. I think they have to evolve too but that another matter. and there's room to do better even within actual silicon budgets. If technics as SRAA and MLAA gain traction could the manufacturer invest less silicon in ROPs/RBEs?
Even Intel is far less than 100% profit margins. If you're going to attempt to use accurate numbers for ARM at least try to do the same for the competition. You're also ignoring the costs of manufacturing and testing the chips which evens things up a bit.ARM license fee is 1.2%, good luck trying to convince AMD from 100%+ profit margin to just 1.2% profit margin and sell their chips to you instead of the laptop market.
http://www.eetimes.com/electronics-products/processors/4114549/ARM-raises-royalty-rate-with-Cortex
It's about 65% for Intel over all the chips they sell, which means it's much higher than 65% on a sandy bridge, and lower than that on a C2D. We all know what happened with using a x86 on the Xbox.Even Intel is far less than 100% profit margins. If you're going to attempt to use accurate numbers for ARM at least try to do the same for the competition. You're also ignoring the costs of manufacturing and testing the chips which evens things up a bit.
Why buy useless and slow x86 cores from AMD that clutter up the GPU bandwidth, especially since AMD also sells standalone GPUs and GPU IP? How about just getting the GPU part and finding a much better option than AMD cpus? 2.5-3 Ghz A15 would be a very good alternative.