NVIDIA's x86-to-ARM Project Discussion

I don't really get what "cycle-accurate and Boolean equivalent" means. Just what did they change, exactly?

cycle and Boolean equivalent mean that the logic did not change. Instead what intrisity did was optimize the implementation of the design by optimizing things such as transistors and layout.
 
I don't really get what "cycle-accurate and Boolean equivalent" means. Just what did they change, exactly?

My 2 cents,

cycle accurate = all instructions have the same latency, and the programmer visible state is identical to the original ARM core, every cycle.

boolean equivalent = every programmer visible bit is identical at every stage of operation
 
I meant the architecture's defined memory consistency model, which is similar to coherence.
Coherence would be concerned with how and when different caches and different cores see updates to the same location.
Consistency would be concerned with how and when different caches and different cores see updates to different locations.
It might even be for one core, I think there were some surprises on single cores with weak consistency, and then there are GPUs, which are weak enough that I'm not sure if they even count as being consistent.

basically consistency is all about: if x and y are true then Z much be true. It concerns itself with how things are updated, a cpu may issue a stream of reads and writes and the consistency model determines what you can rely on as far as the state of memory for those reads and writes.

X=0
Wr X=1
Rd X;

Can that "Rd X" return 0 or must it return 1?

X=0
Rd X;
Wr X=1;
Rd X;
Can the two RD X return different or the same values? There are models where both RD X can return 0, there are models where both could return 1. There are models where if the first returns 1 then the second must return 1, etc.

Consistency basically has to due with the ordering of reads and writes to memory. Are reads ordered, are writes ordered, can read pass wrties, can writes pass reads, etc.

The x86 memory model basically assumes that there are no caches, that there everything effectively arbitrates to read and write memory globally. Second 8.2 of Volume 3a of the intel Software Developer's Manual contains the specified x86 memory order model: http://www.intel.com/Assets/PDF/manual/253668.pdf
 
cycle and Boolean equivalent mean that the logic did not change. Instead what intrisity did was optimize the implementation of the design by optimizing things such as transistors and layout.

Thanks, that's what I suspected. It is indeed very different from customizing the core.
 
basically consistency is all about: if x and y are true then Z much be true. It concerns itself with how things are updated, a cpu may issue a stream of reads and writes and the consistency model determines what you can rely on as far as the state of memory for those reads and writes.

X=0
Wr X=1
Rd X;

Can that "Rd X" return 0 or must it return 1?
Is there a consistency model that permits the failure to catch a read after write dependence, on a single CPU? I can see Rd X returning 0 if it was a different core doing it.

I've read up on models that allow reads to move around writes, but the descriptions seemed to cover the relative order of operations on different memory locations by different cores.
Coherence was attributed the behaviors of operations on a common location, and program order for behaviors on a single core.
 
http://www.semiaccurate.com/2010/08/17/details-emerge-about-nvidias-x86-cpu/

So despite nV demo'ing linux running on a GF100, we won't have to expect nV releasing any x86 compatible product any time soon.

One big clue about how badly Nvidia lost is in the FTC settlement, under Section I. F., Other Definitions. It defines, "Compatible x86 Microprocessor" as, in part iii. as "that is substantially binary compatible with an Intel x86 Microprocessor WITHOUT using non-native execution such as emulation".

based upon that language, tomorrow, if they so choose
 
Is there a consistency model that permits the failure to catch a read after write dependence, on a single CPU? I can see Rd X returning 0 if it was a different core doing it.

In x86 a read from memory 'waits' for all pending stores to complete before it can execute (load/store unit magic aside). That is x86 stalls on all potential RAW hazards wrt. memory.

A weak memory ordering model allows the loads to proceed without waiting for pending stores. If you can guarantee that your reads and writes don't alias this is a huge win since loads can start earlier. If loads and stores do (or can) alias, you need to use memory fences to guarantee consistency.

IMO, the great jump in performance we saw from Core to Core 2 was largely because of the load/store unit in Core 2 supporting speculative loads, where load instructions can execute before the address of pending stores are known (and thus before any aliasing can be detected). This means that you have a semantically strong memory ordering model running at weak memory ordering speeds.

Cheers
 
Last edited by a moderator:
Yup, you guys nailed it.

Also, Arun - note that ARM's 64-bit ISA is really just physical addressing extensions. So good luck with doing binary translation on anything that actually uses 4GB of VAS.

DK
 
Also, Arun - note that ARM's 64-bit ISA is really just physical addressing extensions. So good luck with doing binary translation on anything that actually uses 4GB of VAS.
Indeed - that's disappointing. I was expecting a true 64-bit ISA, but this obviously won't happen for at least another generation (2-3 years after Eagle). Presumably the number of registers has also stayed the same... This makes me seriously doubt all of Charlie's claims wrt NVIDIA creating a x86 translator for Eagle. It's not impossible, but if that project does exist I'd tend to share much more of his skepticism about its potential given these constraints.

IIRC, you've been hearing about a 'New Transmeta' project within NV for a long time, out of curiosity (if you can say) had you heard of it being ARM/Eagle-based before Charlie's article?
 
Cortex-A15 announced (with little new information), the three lead licensees are Texas Instruments, Samsung, and ST-Ericsson. So NVIDIA isn't even a lead licensee for Eagle...

Either NV got a special deal with ARM to do what Charlie described and will wait to implement the A15 until their x86 translator is done, or more likely Charlie is just wrong from the first letter to the last and this entire thread discusses nothing more than a fabrication from Charlie's sources. The lack of x64 already made it less plausible, and now this seems to make it very improbable.

And before anyone says this means NVIDIA is giving up on Tegra - it's most likely just a financial decision. It's more expensive to be a lead licensee than to wait 9-12 months and license it then. Not being a lead licensee can also be a scheduling decision (too late for your refresh, too early to be a lead licensee for the next one). Alternatively, they could decide to stick to a quad-core A9, which would be disappointing and a competitive disadvantage in the high-end but not impossible. I think the financial justification is the most likely though.
 
Alternatively, they could decide to stick to a quad-core A9, which would be disappointing and a competitive disadvantage in the high-end but not impossible.

Would it be such a disaster ? From what I can gather, the A15 is just a clock bumped A9 (because of process tech ?) with support for 40bit physical adressing as well as support for up to eigth cores instead of four.

If you don't need >4GB memory and four cores, there seems to be little difference.

*EDIT*: ignore the above, just found the other thread in the embedded forum

Cheers
 
Back
Top