Intel Atom Z600

Discussion in 'Mobile Devices and SoCs' started by liolio, May 6, 2010.

  1. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Or the whole reason it's called ARM is because it stood for "Acorn RISC Machine", which has nothing to do with Apple. Yes, Apple started showing interest in ARM in the late 80's as a mobile platform. But ARM development began 4 years before Newton development.

    Saying it's well known and on Wikipedia is not actually giving a source. I think you're confused about what Apple's contribution actually was. The Wikipedia node on ARM shows nothing to corroborate your claims - only that Apple started working with Acorn long after ARM was first developed. That's not "helping create the ISA." The ISA was developed by Sophie Wilson, not Apple. The spin-off company ARM Ltd is completely separate from the ISA's development.
     
  2. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,563
    Likes Received:
    171
    Location:
    In the Island of Sodor, where the steam trains lie
    This talk by Steve Furber should shed some light on the history. I think the discussion of ARM starts around the 38 minute mark.
     
  3. darkblu

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,642
    Likes Received:
    22
    Thank you, Simon. A jolly good talk altogether. Btw, the ARM part starts from the 32min mark, but not listening to the whole talk would be a loss to any of the participants in this thread.
     
  4. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Thanks Simon, very interesting watch. I hope this clears up any doubts regarding Apple's involvement - Furber clearly says Apple "came knocking on the door" right after he left in 1990 (around 46:10).
     
  5. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,563
    Likes Received:
    171
    Location:
    In the Island of Sodor, where the steam trains lie
    I liked the bit about the full CPU simulator being only 800 lines of code. :)
     
  6. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    595
    Likes Received:
    18
    Location:
    UK
    We've clearly been using the wrong approach, it's better to have no money and no people available :ROFL:

    Great trip down memory lane!

    John.
     
  7. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    13,878
    Likes Received:
    4,727
    Netbooks are dieing out because of the high prices vs notebooks and the lack of performance. It also doesn't help that there hasn't been many refreshes in over a year. A 1.6ghz atom is going to perform just as shitty as a 1.2ghz atom.

    I would think with intel's nxt refresh that should bring dual core atom chips will help.


    I mean a dell mini 10 with an atom 1.2ghz cpu , 1 gig of ram and integrated 3 year old intel igp isn't very good as a deal for $300. Paying $400 for the same thing with a 1.6ghz cpu insead isn't great either.

    You jump up to $350 right between those , you get the inserpon 11z. It comes with a bigger screen , 2 gigs of ram , a celeron 1.3ghz and a much newer g45 intel igp.


    Netbooks just aren't a good value and many see that. The same might be said about the ipad and tablets in general 10 months from now as people learn that they aren't powerfull enough to replace thier laptops.
     
  8. darkblu

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,642
    Likes Received:
    22
    While I agree that netbooks are not particularly good value, I suspect you haven't actually tried an ipad yet. They are freakishly fast for what they do, and quite possibly the best web browsing device on the planet.

    ps: flash can go die in a barn fire, for all i care. Adobe had all their chances and blew them, like the little 'oh-look-we-have-the-windows-desktop-by-the-balls-why-bother-about-embedded' snobs that they [strike]are[/strike] were.
     
  9. Fox5

    Veteran

    Joined:
    Mar 22, 2002
    Messages:
    3,674
    Likes Received:
    5

    Creative's ARM based Zii architecture has a thoroughput focus, doesn't it? We'll probably see the first mobile device supporting OpenCL before anything uses Zii, though.
     
  10. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    While Zii has an ARM as its general purpose control core (like the PPC on Cell, and it's a very old and slow ARM at that) the big throughput compute array is anything but ARM.

    Actually, I say that but I have no idea what it really is, and I think no one else really does either, unless more info has surfaced.

    On the other hand, Furber did say (in the video Simon F just posted in this thread, at that) that he has a research project going with some utterly obscene number of ARM9 cores. Again, not comparable to modern ARM, but in some sense an ARM's an ARM. I do wonder if you could gain more with tinier/simpler cores. Furber has said a lot about ARM being super small and simple because they couldn't afford to make it complex, but it really does a lot of things that were quite extravagant for its time, even if much of it was just generalized solutions to things they needed to have on die anyway.

    It seems to me that if you want high data throughput going for really wide SIMD makes the most sense, which would be accomplished either by having a bunch of cores with a shared instruction fetch/decode frontend (GPU shaders approach) or really wide vector instructions (Larrabee approach). If you wanted something with really high control throughput, like AI might be, and I think this is what Furber is doing, you might want the opposite extreme - a bunch of extremely small cores with tiny register files and really small/simple instructions (and not a lot of them, with what you have being specialized for the application). ARM as it exists in any incarnation doesn't seem to cater fantastically to either extreme, but I do think it does better than vanilla x86. And no, I don't consider Larrabee vanilla x86, the x86 part is barely more than a casual point of interest.

    Main point is, for many-core you probably want something more specialized, but for right now we still need to run our existing general purpose code.
     
    #50 Exophase, May 16, 2010
    Last edited by a moderator: May 16, 2010
  11. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Real OT, but here's the snippet on Zii's compute array:

    Media Processing Array - Architecture
    High compute density SIMD architecture
    24 Processing Elements (PE) in 3 clusters
    Each cluster runs the same or independent code
    Multiple High bandwidth memory paths
    Advanced hierarchical cache structure
    Random access to memory per PE
    Shared access to ARM memory
    Independent DMA controller per cluster
    Integer, IEEE 32-bit and 16-bit floating point

    Sounds a lot like shaders on various GPUs right? But I'm far from the expert on these things like many here are. Would guess those 8-per cluster units are single-issue scaler units, but that still gives Zii grossly more shader ALU power than any portable 3D solution on the market (the units are said to run at 166MHz). DMA is nice too. Too bad that when you're doing 3D so much compute time is spent on texturing and other traditionally fixed function tasks.
     
  12. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Don't know if you guys have read that one: http://www.imgtec.com/factsheets/SD...Development Recommendations.1.8f.External.pdf

    I'm not sure how the Zii exactly handles integers, fp16 and fp32, but I'd say that it's safe to assume that at least for fp32 the ALUs act as scalar units.

    SGX ALUs on the other hand as described above can either operate as scalars 1 fp32 (highp), Vec2 fp16 (mediump) or Vec4 int8 (lowp). The developer recommendation for those precision levels out of the document above are:

    That's of course mostly for SGX520-545 (USSE), SGX543 (USSE2) not included.

    It would be interesting to know how much die area the Zii captures.
     
  13. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Even more OT, but the 4-way integer SIMD on USSE is actually 10bit 1.1.8 rather than 8bit, as can be seen in the description of lowp in the document you've linked. This does conflict with TI's description but I take IMG more at their word, and I believe I've received direct confirmation on this before.

    What's the highest end SGX we can consider really on the market right now, 535 still right? And that's 2x USSE1, no? So still only 8x int10 per clock at comparable clock speeds, which still pales in comparison to 24x on Zii.
     
  14. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    I disagree, I think wide vectors are used exactly because people are using fat cores and this is the only way to make it work with them. The 5 wide VLIW cores on AMD GPUs are a better example of a core well suited for throughput floating point computing IMO (in practice still used in a SPMD setup, but I think that has to do with the history of GPU computing where having low branch granularity didn't impact efficiency much).

    Compared to that both x86 and ARM are fat.
     
  15. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    but its not 5 wide VLIW. its 5x16 wide VLIW. As soon as they drop the x16, they'll have less density than nvidia!
     
  16. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    Still more density than NVIDIA would have if it tried to drop the x16 ... let alone Larrabee if it tried to drop the x16.
     
  17. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    I think it's apples and oranges IMO, the wide SIMD on modern x86 and ARM doesn't make it fat. Larrabee, for instance, is much leaner and much wider, and there I think it's the x86 part that's more tacked on than the wide SIMD part.

    I do consider SPMD still an example of the SIMD I'm getting at, and it's certainly making the cores much leaner (and not just out of heritage). VLIW obviously has its merits too, I didn't mean to exclude that, I was merely referring to a large number of operations per instruction fetch/decode ratio.
     
  18. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    A lot of them will be wasted though in divergent kernels. With both x86 and ARM you have the choice between scalar (lot of overhead) SIMD (low branch granularity) and superscalar (fat). VLIW expands the design space, because for each VLIW you can decide at compile time to use either superscalar or SIMD execution ... and when not bogged down with forward compatibility VLIW can do superscalar much leaner (VLIW combined with forward compatibility is really the worst of all worlds in the end, ie. Itanium).
     
    #58 MfA, May 16, 2010
    Last edited by a moderator: May 16, 2010
  19. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Interesting discussion, I hope no one minds our hijacking too much ;)

    I don't disagree with this, for the most part.

    I do think that especially "lean" VLIW many-cores will have a lot of specialization per execution-unit, and therefore shouldn't present that much opportunity for SIMD.

    One of the downsides of leaner VLIW is that you end up with much wider instructions that will inevitably have a bunch of execution unit NOPs in them. You can stitch them out like TI's C6x does, but then you end up with more complex variable length fetches (although nothing like x86, of course) and execution unit scheduling. If amortized over many cores this might not matter much.

    From here the main thing separating VLIW from conventional superscalar is interlocking. Superscalar doesn't necessarily need it to be superscalar, but of course superscalar on x86 and ARM do to be backwards compatible. Stuff like this bites you, and out of order execution then bites you a lot more. SMT is a leaner solution to hiding latencies than OoE, along with large enough register files for software scheduling and perhaps features like the software loop pipelining capabilities in C6x (although those are probably overkill).

    I guess it's worth comparing just how big N scalar cores vs 1 N-wide VLIW is when fetch/decode is amortized out. The VLIW has to be compressed to really have comparable code densities, if that's determined to even matter.

    I agree that forward compatibility is awful for VLIWs.

    A little more on topic: when regarding current x86 and ARM and which is "fatter", the real question to me is if there's anything about ARM that lends itself to leaner OoE than x86. Maybe someone else can comment, I don't have anything on this yet.
     
  20. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    13,878
    Likes Received:
    4,727
    I've used one and I like it , wont buy it cause its apple though.

    However some users may want to replace thier laptop and buy an ipad and find that it just doesn't cut it and at $500 for the average price of an ipad you can get a decent laptop.

    I've seen this trend . People had desktops and had to replace them so they bought laptops. However laptops weren't powerfull enough to replace desktops unless you dropped a huge chunk of money. So OEMs made big laptops to put bigger faster hardware in and people bought those but they were no longer very viable to take with you and battery life was really bad. Now we have netbooks that have great battery life so you'd wnat to keep them with you almost all the time. However the processor sucks for anything outside of word some internet sites and 8 year old games.


    The same trend may happen to tablets. In fact a friend of mine talked about wanting wow on his ipad (Before the ipad came out )and i had to sit and explain to him why he'd be waiting a long time and why he shouldn't get his hopes up.

    The ipad / tablets may work a bit better than netbooks because they are closer to cell phons with a really big screen while netbooks are closer to pcs with a really small screen. But they will still be limited and still face the same challanges that netbooks face.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...