NVIDIA's x86-to-ARM Project Discussion

Discussion in 'Graphics and Semiconductor Industry' started by neliz, Aug 18, 2010.

  1. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    cycle and Boolean equivalent mean that the logic did not change. Instead what intrisity did was optimize the implementation of the design by optimizing things such as transistors and layout.
     
  2. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    My 2 cents,

    cycle accurate = all instructions have the same latency, and the programmer visible state is identical to the original ARM core, every cycle.

    boolean equivalent = every programmer visible bit is identical at every stage of operation
     
  3. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    basically consistency is all about: if x and y are true then Z much be true. It concerns itself with how things are updated, a cpu may issue a stream of reads and writes and the consistency model determines what you can rely on as far as the state of memory for those reads and writes.

    X=0
    Wr X=1
    Rd X;

    Can that "Rd X" return 0 or must it return 1?

    X=0
    Rd X;
    Wr X=1;
    Rd X;
    Can the two RD X return different or the same values? There are models where both RD X can return 0, there are models where both could return 1. There are models where if the first returns 1 then the second must return 1, etc.

    Consistency basically has to due with the ordering of reads and writes to memory. Are reads ordered, are writes ordered, can read pass wrties, can writes pass reads, etc.

    The x86 memory model basically assumes that there are no caches, that there everything effectively arbitrates to read and write memory globally. Second 8.2 of Volume 3a of the intel Software Developer's Manual contains the specified x86 memory order model: http://www.intel.com/Assets/PDF/manual/253668.pdf
     
  4. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,490
    Likes Received:
    908
    Thanks, that's what I suspected. It is indeed very different from customizing the core.
     
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    Is there a consistency model that permits the failure to catch a read after write dependence, on a single CPU? I can see Rd X returning 0 if it was a different core doing it.

    I've read up on models that allow reads to move around writes, but the descriptions seemed to cover the relative order of operations on different memory locations by different cores.
    Coherence was attributed the behaviors of operations on a common location, and program order for behaviors on a single core.
     
  6. IdaGno

    Newcomer

    Joined:
    Jul 27, 2007
    Messages:
    11
    Likes Received:
    0
    One big clue about how badly Nvidia lost is in the FTC settlement, under Section I. F., Other Definitions. It defines, "Compatible x86 Microprocessor" as, in part iii. as "that is substantially binary compatible with an Intel x86 Microprocessor WITHOUT using non-native execution such as emulation".

    based upon that language, tomorrow, if they so choose
     
  7. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,519
    Likes Received:
    852
    In x86 a read from memory 'waits' for all pending stores to complete before it can execute (load/store unit magic aside). That is x86 stalls on all potential RAW hazards wrt. memory.

    A weak memory ordering model allows the loads to proceed without waiting for pending stores. If you can guarantee that your reads and writes don't alias this is a huge win since loads can start earlier. If loads and stores do (or can) alias, you need to use memory fences to guarantee consistency.

    IMO, the great jump in performance we saw from Core to Core 2 was largely because of the load/store unit in Core 2 supporting speculative loads, where load instructions can execute before the address of pending stores are known (and thus before any aliasing can be detected). This means that you have a semantically strong memory ordering model running at weak memory ordering speeds.

    Cheers
     
    #47 Gubbi, Aug 24, 2010
    Last edited by a moderator: Aug 24, 2010
  8. dkanter

    Regular

    Joined:
    Jan 19, 2008
    Messages:
    360
    Likes Received:
    20
    Yup, you guys nailed it.

    Also, Arun - note that ARM's 64-bit ISA is really just physical addressing extensions. So good luck with doing binary translation on anything that actually uses 4GB of VAS.

    DK
     
  9. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    Indeed - that's disappointing. I was expecting a true 64-bit ISA, but this obviously won't happen for at least another generation (2-3 years after Eagle). Presumably the number of registers has also stayed the same... This makes me seriously doubt all of Charlie's claims wrt NVIDIA creating a x86 translator for Eagle. It's not impossible, but if that project does exist I'd tend to share much more of his skepticism about its potential given these constraints.

    IIRC, you've been hearing about a 'New Transmeta' project within NV for a long time, out of curiosity (if you can say) had you heard of it being ARM/Eagle-based before Charlie's article?
     
  10. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    Cortex-A15 announced (with little new information), the three lead licensees are Texas Instruments, Samsung, and ST-Ericsson. So NVIDIA isn't even a lead licensee for Eagle...

    Either NV got a special deal with ARM to do what Charlie described and will wait to implement the A15 until their x86 translator is done, or more likely Charlie is just wrong from the first letter to the last and this entire thread discusses nothing more than a fabrication from Charlie's sources. The lack of x64 already made it less plausible, and now this seems to make it very improbable.

    And before anyone says this means NVIDIA is giving up on Tegra - it's most likely just a financial decision. It's more expensive to be a lead licensee than to wait 9-12 months and license it then. Not being a lead licensee can also be a scheduling decision (too late for your refresh, too early to be a lead licensee for the next one). Alternatively, they could decide to stick to a quad-core A9, which would be disappointing and a competitive disadvantage in the high-end but not impossible. I think the financial justification is the most likely though.
     
  11. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,519
    Likes Received:
    852
    Would it be such a disaster ? From what I can gather, the A15 is just a clock bumped A9 (because of process tech ?) with support for 40bit physical adressing as well as support for up to eigth cores instead of four.

    If you don't need >4GB memory and four cores, there seems to be little difference.

    *EDIT*: ignore the above, just found the other thread in the embedded forum

    Cheers
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...