NVIDIA's x86-to-ARM Project Discussion

Discussion in 'Graphics and Semiconductor Industry' started by neliz, Aug 18, 2010.

  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    Eagle is still targeted at a pretty power-constrained environment.
    Even a balls-to-the-wall OoOE implementation would be hard-pressed to hide translation overhead, and I'm pointing at FX!32 on Alpha as the ceiling at 50-75% of native (theoretically), and that is with the ability to store the translated binaries.

    OoOE also cannot fix some of the stressors emulation or translation involve.
    ALU load is higher, as any in-built operations must now explicitely go through the ALUs.
    OoOE does not increase ALU density, usually it is the opposite.
    Transmeta went VLIW, possibly in part because they knew that there would be enough redundant and statically detectable work involved in the emulation process.

    OoOE does help hide short-latency events, such as hiccups in the L1 and possibly L2 data caches.
    It does not help with hiccups in the instruction delivery pipeline, and OO chips typically are limited by instruction throughput (which emulation or translation worsen through bloat and cleanup, and can spill to memory in bad ways too long-latency to hide anyway).

    The likely x86 competitor is probably going to have similar clocks, so there isn't a fallback to brute force clocking for inevitably longer straight lines of code.

    Maybe it is possible to shoot for good enough and hope the GPU handles the graphical glitz well enough.
     
  2. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    I meant, x86 on ooo eagle perf ~ in order equivalent of eagle. That would be a steep hit, no doubt. Their best bet is relying on user space sw JITing to ARM and having a kick ass gpu. All JVM/.NET/Javascript could go that way, relieving LOTS of translation overhead.
     
  3. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    14,848
    Likes Received:
    2,267
    At the time the alpha chip was released wasnt it the fastest thing around ?
     
  4. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    In Mhz, yes
    In 64-bitness, yes
    In features, yes
    But that all stopped late 90's :(
     
  5. dkanter

    Regular

    Joined:
    Jan 19, 2008
    Messages:
    360
    Likes Received:
    20
    TMTA's VLIW was custom designed to be very close to x86. ARM is by definition not very close to x86 - totally different memory model, totally different semantics for flags, etc.

    Translating x86-->Alpha or x86-->IA64 isn't so bad, since Alpha and IA64 both have a lot more registers than x86. Unfortunately, ARM has the same number of registers as x86-64...so you're kind of in a bad situation there. Not only that, but you do need to be able to emulate SSE as well, and I don't know if ARM has 128b SIMD yet (they very might, but I am not sure).

    The bottom line is that ARM lacks many of the features that are needed for targeting x86 - look at the Chinese MIPS clone for a good example of something that kind of does. It's possible that NV could do it, but it would probably involve a lot of extra work.

    Also, anyone who thinks that the x86-->uop translation is anything like x86--->ARM, you need to learn more about ARM and more about uops : ) Not even remotely comparable.

    DK
     
  6. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    What do you mean by memory model?

    AFAIK, NEON is both 64 bit and 128 bit simd.

    Then as 3d suggested, may be crack both x86 and arm into a common representation. But that would suck for both ARM and x86. :wink:

    But like I said earlier, getting JVM/CLR/V8 to jit to directly to ARM will solve a lot more of their headaches with less work than emulating x86. They could add an instruction to switch from x86 to ARM and vice versa, just like jazelle, or the endianness switch in arm.
     
  7. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    I obviously agree with your other points David (sorry if you thought my handwaving about uops was a bit sloppy, wasn't meant to be rigorous at least though :)) and while I don't think they are necessarily show-stoppers assuming they do things differently from TMTA or Loongson, they do contribute to my skepticism about performance.

    But here I think you're forgetting something: this isn't about x86-64->ARM. It's about x86-64->ARM-64 (to be introduced in Eagle), so while we don't know the number of registers on the latter it's likely that it will be a higher than before which should help for translation. ARM also supports 128b SIMD (with multiply-add) although on the A8 and A9 it's done over two cycles on a 64b unit. But Qualcomm has native 128b in Snapdragon and presumably Eagle will have the same.

    EDIT: By the way, Charlie's information about TI being the 'lead licensee' for Eagle is false. From ARM's Q2 earnings release: "Major semiconductor company becomes the third lead-licensee for the “Eagle” Cortex-A™ class processor" - so it's nearly certain that both TI and NVIDIA are in that list. I don't know who the third company might be; Samsung perhaps as they've already had Eagle on one of their old roadmaps?
     
  8. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,491
    Likes Received:
    909
    I think Qualcomm is much more likely to be on that list than NVIDIA.
     
  9. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    Qualcomm is an architectural licensee. They are much more likely to come up with a new OoOE core of their own to replace Scorpion than to license Eagle. And assuming NVIDIA isn't on that list *if* Charlie's right is downright absurd - of course, Charlie could be wrong.
     
  10. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Hypothetically, what would be easier, licensing eagle and then coming up with an OoO derivative of scorpion, or making an OoO-scorpion all by themselves?
     
  11. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    Qualcomm couldn't just license it and base their OoO design on tricks invented by ARM for the A9 or Eagle. ARM sells completed blocks and architectural licenses, not mixes of the two. They're a corporation, not a research organisation like IMEC ;)

    Snapdragon1 taped-out in 1H07 if I remember correctly, and they've got a substantial team working on this stuff. I can't see why they couldn't come up with an OoO core 4-5 years after Scorpion.
     
  12. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,491
    Likes Received:
    909
    I thought Scorpion was a derivative of Cortex-A8, was I mistaken? It does seem to make more sense for Qualcomm to buy an architectural license plus the complete Eagle block, and then customize it as they see fit. I mean, why replicate all the engineering effort that ARM will put into Eagle?

    Plus, I don't see why the idea that NVIDIA wouldn't be on ARM's VIP list should be so ludicrous. After all, NVIDIA is a pretty small player in the ARM world (where you have companies like TI, Samsung, Qualcomm, Apple, Freescale, etc.).

    Actually, now that I think about it, this 3-member list is likely to be TI + Apple + Qualcomm or TI + Apple + Samsung, or something like that.
     
  13. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    So that means, if you take an architectural license, you don't get to peek inside their RTL? And if you don't take an architectural license, you can't modify anything inside?
     
  14. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,491
    Likes Received:
    909
    Yet I think this is what Apple did with the A4…
     
  15. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    You are mistaken - it's similar to the A8 architecturally and shares the same NEON ISA version, but it's a from-the-grounds-up design.

    Because ARM doesn't allow that, it's just asking for trouble to let everyone customise everything. Or if they do, it's certainly not part of their normal license and would certainly be substantially more expensive.

    They were on the lead licensee list for the A9 along with TI and Samsung. I don't see what's ludicrous at all - they might not be a big player share-wise, but they're investing a lot of money in it. You could argue it's ludicrous they are investing so much money into it given how difficult it has been and still will be for them to get a leading position in the market, but that's a separate conversation.

    TI + Apple + Samsung is indeed plausible. Apple has an architectural license though... there have been whispers they lost too many PA Semi engineers to finish that project, but I think it's more likely it has simply been delayed and they'll never license Eagle.

    Well obviously you get to have the RTL to implement it, but I'd *assume* there are fewer comments and explanations as in ARM's own version of it. Either way the normal license does not allow you to change the core.

    Yes and no. Here's the important bit:
    I don't know whether Intrinsity had to get special rights from ARM to do it, but obviously there's no reason for ARM not to let them do that kind of thing or charge anything extra. It's very very different from what you are all saying Qualcomm should do.

    The only precedent I am aware of where someone modified an existing ARM core substantially is Handshake Solutions (a Philips subsidiary) which created the first commercial clockless processor, the ARM996HS. But that was a joint collaboration where they together licensed it to a third party, both getting licensing and royalty fees. It's obviously not comparable.
     
  16. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    I did not remember this, but x86 is usually the more strongly ordered architecture in terms of memory consistency versus many other ISAs.
    In a multiprocessor environment on x86, a single processor's writes are seen in the order they were made (there are options that can relax this, I think certain complex ops and vector instructions are more relaxed). Reads can be reordered in some situations relative to other operations.
    ARM is weakly ordered, which means without a barrier instruction one CPU's view of other reads and writes may not match the order those changes became visible from the POV of other cores.
    That can lead to failures for code that assumes stronger consistency and lacks the needed barriers.

    I cannot find a reference at this time, but a while back there was talk of some RISC adding a mode that would make its consistency model closer to that of x86.
     
  17. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    So by memory model, you mean memory coherency analogous to cache coherency.
     
  18. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    I meant the architecture's defined memory consistency model, which is similar to coherence.
    Coherence would be concerned with how and when different caches and different cores see updates to the same location.
    Consistency would be concerned with how and when different caches and different cores see updates to different locations.
    It might even be for one core, I think there were some surprises on single cores with weak consistency, and then there are GPUs, which are weak enough that I'm not sure if they even count as being consistent.
     
  19. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,491
    Likes Received:
    909
    I stand corrected, then.

    I'm not saying anything's ludicrous, it's just that you said it was ridiculous to think that NVIDIA wasn't on that list, while I think there are more plausible candidates, therefore I don't find it ridiculous.

    As for investment, well, NVIDIA is spending a lot of money, but all that money goes into their own R&D alone, right? Why should ARM offer them preferential treatment for that?

    I don't really get what "cycle-accurate and Boolean equivalent" means. Just what did they change, exactly?
     
  20. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    There is coherence and there is consistency. The two are parallel concepts. Coherency deals with how various blocks see the memory state, consistency has to do with how that memory state changes.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...