NVIDIA's Project Denver (ARM-based CPU)

So your reasoning is "very high performance" is a relative statement modified by the "ARM" parameter in the sentence? Interesting interpretation, but I don't think anyone would really buy that. If your aim is supercomputing, a very high performance ARM core that had the strength of say, a P4 Prescott (which handily kicks the ass of every ARM core out there right now), would not cut it.
 
The Cortex-A15 would kick the butt of the P4 pretty fricking hard (on the same process with the same design techniques). A 2.5GHz 8-core Cortex-A15 on 28nm could certainly be classified as "very high performance". Anyway keep in mind NVIDIA's goal here is to associate this core with GPUs for a single-chip CUDA system, not to take over the HPC market by the sheer awesomeness of their ARM core.

My expectation (as highlighted in that upcoming article which will get published as soon as I can take a hold of Rys) is that Project Denver is a 4-instruction-decode architecture (A9 and A15 are 2 & 3 respectively) which compares to 3 & 4 decoders for K8/K10 and Conroe/Sandy Bridge respectively. Although x86 decoders are more powerful than ARM decoders because you could have an arithmetic and a memory operation in the same instruction (not that this seems to help Intel Atom much mind you, its performance is not very impressive for a dual-decoder architecture - maybe because x86 suffers from only having instructions with 2 operands and (unlike x64) only 8 generic registers).
 
so there are going to be 3 cores in the system, x86, arm and "cuda". :)

Let's hope the arm and cuda ones are as tightly integrated as in larrabee. The x86 will prolly be the io co-processor. :)
 
I thought that difference between endianness was sorta like potato/puhtato. Is there more to it?

Little endian = memory ordering does not depend of word size used to access it
Big Endian = memory ordering does depend of word size used to access it

Basically big endian is at best a pain in the arse when designing hardware.
 
Little endian = memory ordering does not depend of word size used to access it
Big Endian = memory ordering does depend of word size used to access it

Basically big endian is at best a pain in the arse when designing hardware.

Interesting. Hadn't thought about it from that POV.

Makes me wonder why almost every RISC went the big endian way when x86 was doing little endian just fine.
 
Interesting. Hadn't thought about it from that POV.

Makes me wonder why almost every RISC went the big endian way when x86 was doing little endian just fine.

In general for CPU ISAs it was more an issue of what the company already did or what the engineers were comfortable with more than rigid engineering oversight.

AKA why is Power big endian? Because IBM was big endian.
Why was alpha little endian? because DEC was little endian.
Why was x86 little endian? because they used DEC PDPs which were little endian.
 
Little endian = memory ordering does not depend of word size used to access it
Big Endian = memory ordering does depend of word size used to access it

Basically big endian is at best a pain in the arse when designing hardware.
Just use decreasing addresses (in effect point at the end of structures/elements instead the beginning) and its the reverse.

btw compare the code for variable length integers and figure out whats simpler to do in either hardware or code:

Code:
// bigendian:
varint val = 0;
while (hasnextbyte())
  val = (val << 8) | nextbyte(); 

// little endian:
varint val = 0;
int shift = 0;
while (hasnextbyte()) {
  val += nextbyte() << shift;
  shift += 8;
}

Should explain why everything streaming/network chose bigendian
 
Should explain why everything streaming/network chose bigendian
May be the BSD sockets people were hooked onto big endian workstations, that's why. :???:

Still no idea what in the name of God made the USB-IF choose big endian. RISC's were supposed to be dead and buried by then.
 
FWIW, ARM is either-endian and last I checked all the mainstream application processors are implemented as little endian... So I'm not sure this is the right thread for this conversation ;)
 
I'm sure the first targeted use for this is in consoles. A bunch of high powered ARM cpus combined with a huge GPU would be all you need for a fantastic next gen console.

The one thing I don't know about is the effect of having super fast CPU->GPU communication. I read a bit on smallLUXGPU development and it seemed like one of the big bottlenecks was getting data back and forth to the GPU.

However...everything is getting super fragmented right now. Are game developers really going to program for...

-Ps3-Cell processor
-xbox 360-PowerPC
-Nvidia Maxwell/kepler
-Fusion/Sandy Bridge (eg normal x86 + GPU)

Seems like something radical needs to happen in the software space to make this less of a headache.

Use C + openCL + openGL and it will run on most platforms(about all except xbox360?), and have HW accelerated graphics AND physics/other calculations.
 
Little endian = memory ordering does not depend of word size used to access it
Big Endian = memory ordering does depend of word size used to access it

Big endian data is placed in register as it is in memory.
Data is easier to read from hex dump.
For me, big endian SIMD programming is easier especially for packed unaligned data and bit fields. If you pick bit field with 16bit chunk or more, it will be swapped in register.

Basically big endian is at best a pain in the arse when designing hardware.
There are no difference. Most RISC CPUs can be configured to either order (with tiny piece of logic)
 
You could just as well write the bytes vertically, or right-to-left.

No you can't, not in a culture where you read from left to right anyway. It is the natural way to list things.

Little endian, reading the least significant byte first is natural for arithmetic, because of rippling carries (with early out) and for using the same address to load a byte, word, double word etc. from memory which result in the same value if the higher bytes are zero.

Big endian, reading the most significant byte first is natural for decision making, like sorting and network routing.

LE saved a bit of work (transistors) for some tasks, BE saved a bit of work for some other tasks. In this day and age it matters f*ck all, IMHO.

Cheers
 
Eh? Big endian order typically reverses bytes within a word for the "convenience" of making them appear the same to a human reader but the memory order _does_ change as a result. What's worse is that memory order changes depending on the word size used. There are even multiple flavours of big endian with different orders within words.

Npl, bit confused by your example, memory ordering of byte data types is the same in both, further little endian allows you to optimise by using wider word shifts if they're available without worring about the effect that word width has on memory order.

This is all a bit mute anyway as little endian is by far the most common these days, thank god :)

John.
 
No you can't, not in a culture where you read from left to right anyway.
If you're from a different culture you can. ;)
Anyway, my point was that the way we write things is irrelevant to hardware since it does not correspond to any specific physical orientation memory or registers may have.
 
Eh? Big endian order typically reverses bytes within a word for the "convenience" of making them appear the same to a human reader but the memory order _does_ change as a result. What's worse is that memory order changes depending on the word size used. There are even multiple flavours of big endian with different orders within words.
Reverses relative to what? your assumption that pointers have to address the least important byte(s) (effectively assuming memory ordering has to be LE)? are you depending on undefined behavior of C for your argument?
As long as you do it consistently it doesnt matters one bit (/byte). and those "weird multiple flavours of big endian" just arent big endian.

Npl, bit confused by your example, memory ordering of byte data types is the same in both, further little endian allows you to optimise by using wider word shifts if they're available without worring about the effect that word width has on memory order.
You can use bigger shifts with big endian aswell, and the only thing that matters is memory ordering. but the important thing is that you have the simpler logic whether you read 1,4 or 7 bytes.
with a big endian value you just shift the result left one byte (can be adopted to 2,4,n bytes each step assuming you have a BE CPU or byteswap after reading). with a little endian you have to keep track of the number of bytes red and then shift the next byte by a variable amount.

where exactly does LE have an advantage here?
 
Back
Top