Predict: The Next Generation Console Tech

Discussion in 'Console Technology' started by Acert93, Jun 12, 2006.

Thread Status:
Not open for further replies.
  1. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    8,562
    Likes Received:
    430
    Location:
    Treading Water
    but not just in price.
     
  2. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    9,990
    Likes Received:
    1,500
    So wait.

    With the x86 chip Amd sells you the chip and you pay a fee per chip


    With the arm chip , The company allows you to take the chip to produce it anywhere but you have to pay a fee per chip


    So in the end the x86 chip has the cost to manufacture + the company mark up for the tech.
    Arm has the liscence fee and the cost to manufacture.

    I don't see a diffrence .

    You claim that the Arm chips in tablets are a few bucks a piece but I don't see it . The tegra 2 will be in the $20-$30 range while the brazos is in the $30-$40 range



    I also don't see a tegra 2 or even tegra 3 competing in performance with Llano
     
  3. corduroygt

    Banned

    Joined:
    Nov 26, 2008
    Messages:
    1,390
    Likes Received:
    0
    ARM license fee is 1.2%, good luck trying to convince AMD from 100%+ profit margin to just 1.2% profit margin and sell their chips to you instead of the laptop market.
    http://www.eetimes.com/electronics-products/processors/4114549/ARM-raises-royalty-rate-with-Cortex

    Tegra 2 is $10-15:
    http://semiaccurate.com/2010/08/12/tegra-3-tapes-out/

    Of course not, they're vastly different in size, power budget, and price.
     
  4. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    8,562
    Likes Received:
    430
    Location:
    Treading Water
  5. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    9,990
    Likes Received:
    1,500
    http://androinica.com/2011/03/01/is...roinica+(Androinica+-++A+Google+Android+Blog) isupply says diffrently



    and so in what way will an arm chip be an option for Llano class power ?

    Even if ninendo decided to go with a quad core offering the NGP isn't even able to match the power of the ps3 while rendering at lower resolutions . Llano is more than a step beyond what the ps3 is capable of displaying at 720p
     
    #5205 eastmen, Mar 3, 2011
    Last edited by a moderator: Mar 3, 2011
  6. corduroygt

    Banned

    Joined:
    Nov 26, 2008
    Messages:
    1,390
    Likes Received:
    0
    They can do what MS did, license the GPU IP from AMD, CPU IP from IBM/ARM and make it themselves for the lowest cost without being hostage to any single manufacturing company or fab.
     
  7. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
    I remember Capcom drew a comparison between Xenon and early dual core PIV running @3.4ghz. Bobcats perfs are indeed a bit below of those provided by Athlon II / Thurion designs.
    Pentium 4 were inefficient and not "real" dual cores (so badly perf scaling in regard to extra core).
    But I'm not sure that kind of comparision is fair, P4 as xenon does pretty well in FP calculations due to their SIMD running at high clock speed and on some workload clock speed rules. As more and more calculations are to be moved to the GPU (especially if it is on chip with low latency acces) I wonder if it's still relevant to look at CPU perfs "overall". What I get from the benchs I saw is that bobcat even being a narrow CPUs is way more efficient per cycle than either P4 or xenon. It looks like a good basis as far as X86 is concern to me (even-though there are room for improvements).
    I'm not sure about what you mean, it's:
    * You're not sure it make sense as the front end is cheap on a Bobcat? In which case I would tend not to agree as if they manage to old true to the: promises they did with BUlldozer (ie 80% of the perfs of a real dual set-up while tinier and consuming less power) it would be a win looking forward.
    or
    * a 2 wide decoder would not be enough to feed 2 bobcat cores? Would you consider a 3 wide decoder needed?
    or
    * a blend of both?

    I don't know as high (or low sadly) resources utilization is in nowadays OoO X86 CPU nor in Xenon for instance, I just know that the average IPC is low (with bursts). I also don't believe next gen console will be about high single thread performances (more is nice tho). I wonder if consoles manufacturers should better consider average parts put them on one chip and make sure they compliment each others and work together really efficiently instead of considering better parts but on "insulation. More on that.

    That's were specialized hardwares kick in.
    I remember that some early 360 games were allocated almost one core to compression/decompression. GPUs now have their video processing engine, Intel went ahead and included an encoder/decoder into SnB. Looking forward could the next step to have a properly programmable unit on chip that would handle encoding/decoding compression/decompression? That's an investment in silicon but a win in perfs and power consumption.

    I also remember reading lately about how PhysicX on X86 were still using X87 but also that using SIMD units didn't provide htat much of a benefit. I was simply a recompile but I get a growing feeling that whereas large SIMD and GFLOPS are geeks sweet dreams plenty of workloads relevant to video games including physics doesn't need that much number crushing. On the other hand when a workload is really vector friendly and if you have GPU on chip it may be a win to move the task to it. It will only get truer within the next 1 or 2 generations of GPUs (which is what we should get in a nextgen system).

    Overall I question more and more the fact that brute force is what going to make the difference, the 360 proved it this gen by being a match to the ps3 while obliterated in raw power by the PS3, and it could get even truer next gen. More on that later.

    Indeed it highly depend on where AMD is heading for their next bobcat rendition. I don't think MS Sony or Nintendo have what it to make the proper change on their own or pay AMD to do all the changes properly (lot of R&D to found).
    Well if pricing issue is put aside I'm not sure that either ARM of IBM would do that much better in this power budget. I belief that AMD and Intel are the really good at their job even if it's about designing a watered down X86.
    There are also software consideration, development is still done on X86, there are plenty of tools, GPGPU is getting mature on X86 but I'm more iffy about that ARM is growing strong.

    To be continued when I'm back from work.
     
  8. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    The Cedar Mill dual cores were decent enough for the desktop market, aside from power concerns unrelated to their not being "native" dual cores. The bottleneck at the FSB and the limitations of off-die memory controllers weren't as crippling there as they were for the server market and higher core counts.
    At any rate, my point with this is that even right now, Bobcat would be a step back from an old console CPU.

    That is my point. Higher efficiency per-cycle means there is less idle time and fewer idle resources that can be freely obtained for a second thread. Instead, both threads will interfere with each others' actual work.

    It would be a mixture of both. Part of AMD's justification for sharing the 4-wide front end is that it is difficult to fully utilize 4-wide issue on a sustained basis, because superscalar execution experiences serious diminishing returns as the width increases. The inverse of that is that utilization gets much better the narrower the width.
    It is hard to sustain 4 instructions of throughput across many workloads. It is more common to be able to find at least 2. The percentage drop from fully separate dual cores to a module is going to be higher for Bobcat because it doesn't have the same amount of spare resources idling.

    The console space is far below where single-threaded improvements wouldn't be helpful. Bobcat right now is worse than what we have already.

    The only numbers we saw were straight recompiles of x87 code to user scalar SSE, not packed instructions. There were some slight improvements, and a few cases with modest improvement that was somewhat surprising because it was a simple recompile.
    If the code were refactored to take advantage of SIMD, the improvement would have been much higher. At any rate, it would not be a good idea to discount SIMD on a CPU and not also be skeptical of some of the claims for the much wider and harder to utilize SIMD width in GPGPUs.

    The price and cost issue is a big factor for the console space. Bobcat is also unacceptably slow, and it is 4-5 years newer than the unchanged pipelines of the Xenon and Cell cores. If IBM could beat Bobcat with a chip it designed 5 years ago, it doesn't show that AMD offers anything better.

    There could be some benefit to running the same code on the dev and console machine, but the console makers also have little desire to allow their console games to run on PCs.
    GPGPU would require more design work for Bobcat. The bandwidth of the CPU/GPU interface doesn't seem to be all that high.
     
  9. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    9,990
    Likes Received:
    1,500
    The problem is your quoting processors running at 1ghz. Your going to need to clock a tegra 3 (which is twice the size of tegra 2 btw and thus will cost roughly twice as much) at much higher speeds which will eat into yeilds and raise costs more .


    Also amd is producing zacarate chips at TMSC so there is no reason nintendo would be held hostage by any single manufacturing company.
     
  10. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    9,990
    Likes Received:
    1,500
    well accept that bobcat is a complete solution and amd is selling chips in the $30 range for these products.

    Llano will be a better choice as it will target a more expensive and higher performing area.

    Of course if nintendo wanted just a bunch of bobcat cores they could fit 4 of them in the same space as the current 2+ gpu design amd is currently using and then devote another full chip to the gpu

    Also at 32nm later this year bobcat should see core improvements making it more competetive and a larger gpu also.
     
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    I'm under the impression that Bobcat would shrink to the 28nm TSMC (GF?) process. 32nm is the SOI high-performance process at GF.
     
  12. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    9,990
    Likes Received:
    1,500
    I'm not sure , right now its on 40nm tsmc. It may go to 28nm at TSMC but orders have been really good for them , they may want to push out a faster verison with lower power draw quicker than what 28nm may allow .
     
  13. corduroygt

    Banned

    Joined:
    Nov 26, 2008
    Messages:
    1,390
    Likes Received:
    0
    Why buy useless and slow x86 cores from AMD that clutter up the GPU bandwidth, especially since AMD also sells standalone GPUs and GPU IP? How about just getting the GPU part and finding a much better option than AMD cpus? 2.5-3 Ghz A15 would be a very good alternative.
     
  14. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
    To be honest I agree about Bobcat V1 to not be good enough "overall", actually it's a grief of mine against AMD even-though they are a commercial success.
    But where I disagree that the "core" of the core (I mean front end, Integer unit, I put FP/SIMD aside). On some benchmarks it beats an atom running at the same speed significantly, I'm not sure that per clock xenon is better than atom I may even be tempted to give Atom an slightly edge (Intel usually does really well with caches, the simd is as wide the one in xenon and can handle integers, etc.). Atom hold its own (vs Bobcat) on various benchmarks and tasks but I'm not sure about the reasons. The FP/SIMD in Bobcat is really tiny going by the early floor plan of a die AMD released, and then there are the case where SMT does well and lastly optimizations from Intel (which no matter the late law suit it's still early to dismiss). So my belief is/was that the basis of bobcat is an improvement in single thread performances vs what we have in consoles PowerPC (per cycle at least).

    I would also drew a comparison between the Cedar Mills and Bobcats as they stand Bobcats are not really meant to scale as far as the number of cores is concerned. A console manufacturer wanting to put more than 2 cores (they would want it for sure) may ask AMD to consider an "uncore" more akin to the ones we find in Intel SnB.

    I'm not sure I agree if AMD delivers with Bulldozer they said that when only one thread is active it got access to all the share resources for it-self. My idea might be wrong is that IPC is usually low on average (x<1) the odds for the two thread having a "burst moment" moment at the same time are low. On the other hand that mean that during the burst moment the CPU won't make up as well for the moment where IPC is really (X<<1). My idea as games have become heavily threaded is to trade a bit execution time (so single thread perfs) for hardware utilization. It's not that I question the need for more single thread perfs for consoles but because I belief the Powerpc In nowadays sucks way more than the "overall" perfs would let think (not two mention the hell of optimization console developers put in their codes). On top of it Xenon seems to suffer from multiple caveats, LHS, cache traching, SIMD can't handle integers, etc. that's some devs complained about often here.
    So as Im' considering where AMD could go with forward I believe the basis is an improvement and that they achieve the "good enough overall" category without exploding their power or silicon budget (at least per cycle). More on that.

    Here the floor plan of bobcat from AMD:
    [​IMG]

    Actually as it is so tiny and have room for many architectural improvements (vs the wide sophisticated cores in bulldozer). Say they put a single top notch 4 wide SIMD instead of what looks to be a shabby one. they improve the front end a bit too, I believe they would have a chance two higher than the 80% of a real CMP setup (their promise for bulldozer) while still landing a nice win in die size.

    I did not want to give the fake impression that I consider SIMD irrelevant, I've actually been wondering if share the between two cores (even though they would share the front end) would not be a good way to make the most of what is an expansive resource (on SnB the 256 wide are huge, even on prior designs the 4 wide was way meatier than the thing in bobcat). You will say it a bit like SMT? Well yes let it would be a blend between Intel SMT and AMD CMT philosophies, it's bit more expansive hardware wise but during "not that stressing moments" threads have more resources for themselves.
    I got also another possibly weird idea when watching for instance at engine like the frost engine 2. They really on many tasks they try to make them as asynchronous as possible. They try to do as many things as possible "in parallel". For PS3 it looks like they are doing great things. So this got me wondering. Could such "bulldorized" Bobcat would be well suited for the job? Say you hace to threads with their own resources (even spread a bit thin) feed on pools of tasks them selves feeding the share SIMD units (4 wide but potent). The idea is that those cores are cleverer than say SPU and they could be more autonomous. The modules could run a runtime so developers could care less about optimizations there is runtime then the modules (and cores within them) are clever enough to handle the task developers throw at them. Developers would not have to think about I use 2 SPUs at a time for this while 1 and 2 others ones for this and this.

    I also agree (by reading many comments) that GPGPU is still tricky. I think they have to evolve too but that another matter. and there's room to do better even within actual silicon budgets. If technics as SRAA and MLAA gain traction could the manufacturer invest less silicon in ROPs/RBEs? The saved silicon should be spend on making the GPUs more valid "generic" computing devices. If fusions chip are to gain traction either in the embedded, the consoles and the PC realms the reason to do this (as well as may sacrifying a bit of perf per mm2) will grow. It would be to sad to pass on the opportunity to use the power GPUs can provide when the result of such power is only some cycles away from the chip cache.

    Well as I said without changes I agree to that; I just want to see opportunities ( I think Bobcat as it is now is not enough of a step up from Atom and on their core market I could see Intel react "meanly" sooner than later and then it will take AMD longer to react than for Intel to push its next chip.. on a better process on top of it).
    Indeed, ok bobcat is designed for budget computers but I feel like AMD still lacked ambitions for it looking forward.
     
    #5214 liolio, Mar 3, 2011
    Last edited by a moderator: Mar 4, 2011
  15. Tahir2

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,978
    Likes Received:
    86
    Location:
    Earth
    Bobcat is perfect for its target market - simple applications with good GPU acceleration to speed up Flash/3D games online/video playback. AMD did an excellent job with Bobcat - it is a tiny core (smaller than Atom) but there is no way AMD were about to target the netbook market with a processor that was over-designed (for its target market) and thus too big/expensive. This would also eat into notebook sales.

    Bobcat = tablet/netbook x86 CPU with strong GPU
    Atom = tablet/netbook x86 CPU with weak chipset and GPU

    Xenon = High performance console CPU of its era
    P4 = desktop performance part of its era
    Athlon = higher IPC desktop performance CPU of its era

    AMD would be better off offering a tweaked Phenom II core to any console manufacturer - keep the integrated GPU (400SP?) and add a discrete GPU with high external bandwidth (GDDR5).

    As to development costs, it seems that nowadays ISA is not important - the compiler and development tools rule and the ARM ISA have caught up with regards to both simply due to their popularity in the mobile markets.

    By 2014 perhaps MIPS will have something ready to talk to the Console manufacturers about. My personal opinion is that it will IBM with the best deal again (by best it really means cost/performance).
     
  16. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
    Well it's a success right but for how long? It's late to begin with. Then it's not about over-designing it but make it better. Intel won't be long to react. I consider Moorestown as the most impressive Atom design to date, more impressive than bobcat for instance. Too bad it was still not good enough for its intended market but Intel will recycle the design to make it a match for netbook.
    I expect Intel to fight back Bobcats and with more than respectable success sooner than later. They have what it takes now even without relying on PowerVR IP.
    What it could be?
    One core ~2GHz
    Healthy amount of cache
    one of their new GPU
    the encoding/decoding device you find in SnB
    That would be a serious threat to bobcat imho and @45nm. If Intel decide to jump on the guns and move to 32nm...

    I don't have any hope in MIPS. Well IBM is a huge possibility but one would have to found healthy R&D costs. Sony did last gen, Ms not that much they benefit a lot on researches done by STI.

    In any case if Bobcat proved something is that OoO can be done for cheap and that it seems provide benefits more evenly across a wide range of applications than SMT.
    OoO FTW² :)
     
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    Per-clock is one part of the overall equation. Neither Atom or Bobcat can get near the clock speed of Xenon, so even a modest per-clock advantage does not mean there is an overall improvement.

    There are also situations where OoOE does not yield significant performance gains, such as when there are memory latencies too large to hide, or the code is scheduled well enough that minimal reordering is necessary.
    In areas of code with low ILP like some long chains of pointer-chasing, clock speed and the memory subsystem become dominant, and other parts of the architecture do not increase performance.
    Without a redesign, Bobcat is not capable of scaling its clock close enough to Xenon to make up for the large clock disparity, at least not at acceptable voltages.

    That might require some additional engineering. AMD does not have an uncore like Sandy Bridge, nor does it seem to have one planned for a while. It tends to use a crossbar that takes up a fair amount of space and is less scalable to higher client counts. The modularized Bulldozer keeps the crossbar client count lower by making the pairs of cores share an interface.

    The average IPC for a number of workloads for Athlon was around 1 per cycle, at least on some desktop media applications when Anandtech looked into it a while ago.
    There are low periods and burst periods. If a core is narrow, and if it is fighting for shared resources, those burst periods take longer. The probability, particularly under load conditions, goes higher because the bursts take longer to get through.

    There are limits to what can be gained by adding more threads. The more thread-level parallelism is exploited, the more serial components dominate. Console single-threaded performance is not yet so high that it can be ignored, much less made worse.

    If you are stating that if Bobcat were redesigned so it was more like Bulldozer and had more resources to burn that it would benefit more from being put into a module, then I have not said anything to the contrary.

    AVX has 8-wide SIMD, not 256.
    Fermi and Cayman have 16-wide units, but due to their batch sizes, their minimum granularity is 32 and 64, respectively.

    The cores running the MLAA algorithm are most likely larger than the silicon found in the ROPs.
    SRAA in particular uses higher-resolution buffers, which would take longer to build if the ROPs were scaled back.
     
  18. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,436
    Likes Received:
    264
    Even Intel is far less than 100% profit margins. If you're going to attempt to use accurate numbers for ARM at least try to do the same for the competition. You're also ignoring the costs of manufacturing and testing the chips which evens things up a bit.
     
  19. corduroygt

    Banned

    Joined:
    Nov 26, 2008
    Messages:
    1,390
    Likes Received:
    0
    It's about 65% for Intel over all the chips they sell, which means it's much higher than 65% on a sandy bridge, and lower than that on a C2D. We all know what happened with using a x86 on the Xbox.
     
  20. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    9,990
    Likes Received:
    1,500
    I will break it down for you.

    1) x86 slow cores may be all you need to out due to current gen consoles esp when coupled with a solid gpu

    2) A bobcat or Llano would be much cheaper than getting a gpu and then another chip.

    3) What are the yields of a 2.5-3ghz A15 processor ? Are they even made yet ? Are they on 40nm ? Where are they. If they require 28nm then your going to be paying moe for process and don't forget that the amd APU's can be put on lower processors also.

    I've said before but i think if nintendo wants to pull a wii 2 and make something just a bit more powerful than the xbox 360/ps3 Llano will be a good choice. its a quad core x86 chip so they have acess to a ton of engines and tools already designed with it in mind and it has a radeon 5x00 class gpu attached to it. They might be able to get more performance with a dedicated cpu and a dedicated gpu but they wont get it at the prices that they can get Llano for.

    Going with Arm will mean either having a very low power console or taking ARM chips some where they haven't been before which is very high clocks in a console. I'm not convinced that nintendo will be able to get quad core A15s at 2.5-3Ghz and certianly not at the costs they would be able to get Llano for.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...