View Full Version : NVIDIA's Project Denver (ARM-based CPU)
Pressure
05-Jan-2011, 21:36
Interesting but in my opinion they really didn't have any choice. Intel, which decided to close the door on the lucrative chipset market and integrated graphics, and their delayed venture into the GPU market (Larrabee), to AMD who purchased ATI. They were sitting ducks in a market that focuses more and more on tight integration of both CPU, GPU and essential IO.
No details so far but it has a strong "Fusion"-feel to it, although they seem to be aiming at the integrated market for starters; tablets, phones and ultra-portables (even though they say it is meant for desktop).
I suppose we have reached a point where performance is enough for most mundane things.
Press Release:
NVIDIA Announces "Project Denver" to Build Custom CPU Cores Based on ARM Architecture, Targeting Personal Computers to Supercomputers
NVIDIA Licenses ARM Architecture to Build Next-Generation Processors That Add a CPU to the GPU
LAS VEGAS, NV -- (Marketwire) -- 01/05/2011 -- CES 2011 -- NVIDIA announced today that it plans to build high-performance ARMŽ based CPU cores, designed to support future products ranging from personal computers and servers to workstations and supercomputers.
Known under the internal codename "Project Denver," this initiative features an NVIDIAŽ CPU running the ARM instruction set, which will be fully integrated on the same chip as the NVIDIA GPU.
This new processor stems from a strategic partnership, also announced today, in which NVIDIA has obtained rights to develop its own high performance CPU cores based on ARM's future processor architecture. In addition, NVIDIA licensed ARM's current Cortex™-A15 processor for its future-generation TegraŽ mobile processors.
"ARM is the fastest-growing CPU architecture in history," said Jen-Hsun Huang, president and chief executive officer of NVIDIA. "This marks the beginning of the Internet Everywhere era, where every device provides instant access to the Internet, using advanced CPU cores and rich operating systems.
"ARM's pervasiveness and open business model make it the perfect architecture for this new era. With Project Denver, we are designing a high-performing ARM CPU core in combination with our massively parallel GPU cores to create a new class of processor," he said.
Warren East, ARM chief executive officer said, "NVIDIA is a key partner for ARM and this announcement shows the potential that partnership enables. With this architecture license, NVIDIA will be at the forefront of next generation SoC design, enabling the Internet Everywhere era to become a reality."
About NVIDIA
NVIDIA (NASDAQ: NVDA) awakened the world to the power of computer graphics when it invented the GPU in 1999. Since then, it has consistently set new standards in visual computing with breathtaking, interactive graphics available on devices ranging from tablets and portable media players to notebooks and workstations. NVIDIA's expertise in programmable GPUs has led to breakthroughs in parallel processing which make supercomputing inexpensive and widely accessible. The Company holds more than 1,600 patents worldwide, including ones covering designs and insights that are essential to modern computing. For more information, see www.nvidia.com.
Coincidentally MS has confirmed an ARM version of Windows in the works...
metafor
05-Jan-2011, 23:10
We're getting to the point where ever increasing processor performance is of little consequence. Sandy Bridge's main claim to fame is its integration. The improvements in per-core execution speed may be significant but probably won't be the selling feature for the majority of people.
Microsoft's getting pretty smart by trying to make Windows and applications architecture agnostic. This allows them to escape the PC model in the future if they so choose.
CNCAddict
06-Jan-2011, 00:40
Performance will matter a great deal once speech recognition and realtime voice translation type apps are built. I just worry that the gap in processing power between what we have now and what you need for the next killer app is a large one :?:
codedivine
06-Jan-2011, 02:38
Performance will matter a great deal once speech recognition and realtime voice translation type apps are built. I just worry that the gap in processing power between what we have now and what you need for the next killer app is a large one :?:
Off-topic, but have you seen the demos of Google's speech-to-text on Android? Its entirely cloud based, afterall speech and text are both easily transported.
metafor
06-Jan-2011, 03:15
Performance will matter a great deal once speech recognition and realtime voice translation type apps are built. I just worry that the gap in processing power between what we have now and what you need for the next killer app is a large one :?:
I'm not sure how computationally intensive those are. Honestly, I think that kind of stuff is mostly IO and memory bound due to the large data structures. A dual-core 1GHz A9-class chip can probably handle such things without a sweat.
CNCAddict
06-Jan-2011, 04:39
I'm not sure how computationally intensive those are.
Understanding natural language can't be that easy especially with tons of background noise, but I would love to be wrong. Doing it with the cloud would for sure be much easier, but I get a weird feeling knowing that without a good cellular signal my phone becomes a useless brick. However that's probably something i need to get used to since it seems like the future :wink:
One day everything maybe a service. Seems kinda silly to spend $400 on a graphics card that just sits idle 98% of the time. With a cloud setup and taking into account peak usage then I'm sure somewhere close to 50% utilization could be doable.
Performance will matter a great deal once speech recognition and realtime voice translation type apps are built. I just worry that the gap in processing power between what we have now and what you need for the next killer app is a large one :?:
I agree completely and think the gap exists.
metafor
06-Jan-2011, 06:20
Understanding natural language can't be that easy especially with tons of background noise, but I would love to be wrong. Doing it with the cloud would for sure be much easier but I get a weird feeling knowing that without a good cellular signal my phone becomes a useless brick...but that's probably something i need to get used to since it seems like the future :wink:
One day everything maybe a service. Seems kinda silly to spend $400 on a graphics card that just sits idle 98% of the time. With a cloud setup and taking into account peak usage then I'm sure somewhere close to 50% utilization could be doable.
Oh it's not easy but I was speaking of computationally intensive as opposed to memory bound comparatively speaking in typical mobile SoC architectures.
Reading about Nvidia's ARM plans, and Microsoft's Windows 8 announcement, I'm wondering if we're not standing on the threshold of a gigantic paradigm shift in the realm of personal computing - are we even aware of the immense implications for the future this could have?
There's truly been no previous greater threat to Intel's position of absolute domination than Microsoft going ARM, coupled with the rise of (reasonably) powerful ARM chips.
This could be huge, huge, huge. Apple showed not just once, but thrice, that you CAN in fact switch basic hardware architecture, and do so quite successfully and painlessly! If the suits over in Satan Clara doesn't have the jitters already, they will soon I bet. :razz:
Personally I'm quite ready and willing to say FU to x86. It's lived long past its usefulness, the basic PC architecture is archaic and full of old crap that's dragging it down. Even things like the little endian binary format of x86, its stack-based FPU and so on just shows what a crazy fucked-up old system it really is. No, a clean re-start would be much preferable, and an end of Intel's domination of the semiconductor industry would be a great boon to us all too I bet.
Intel once tried to kill off x86 with Itanic - this was in retrospect a bad move. However, Intel's reaction, to bet the farm on x86 and put it into everything from supercomputers to PCs and graphics cards, down to portables and cell phones, really isn't much better.
CNCAddict
06-Jan-2011, 21:50
I'm sure the first targeted use for this is in consoles. A bunch of high powered ARM cpus combined with a huge GPU would be all you need for a fantastic next gen console.
The one thing I don't know about is the effect of having super fast CPU->GPU communication. I read a bit on smallLUXGPU development and it seemed like one of the big bottlenecks was getting data back and forth to the GPU.
However...everything is getting super fragmented right now. Are game developers really going to program for...
-Ps3-Cell processor
-xbox 360-PowerPC
-Nvidia Maxwell/kepler
-Fusion/Sandy Bridge (eg normal x86 + GPU)
Seems like something radical needs to happen in the software space to make this less of a headache.
entity279
06-Jan-2011, 21:53
Reading about Nvidia's ARM plans, and Microsoft's Windows 8 announcement, I'm wondering if we're not standing on the threshold of a gigantic paradigm shift in the realm of personal computing - are we even aware of the immense implications for the future this could have?
More competition is always good. That's why its exciting to foresee ARM desktops. This and the continuing decrease of ISA importance.
I wouldn't try to read anything else beyond theses points - I guess no immense implications for me :D
Satan Clara? :P
Anyway, Apple's example doesn't really compare here. There's nearly 20 years of apps since Windows 95 that can still run on modern OSes, and far more users of said apps than Apple ever had to deal with.
I'm not saying it's impossible. I'm just saying that just because Apple did it doesn't mean that Microsoft can do it.
That said, I wouldn't mind if a real third player entered the PC CPU market... We need more competition. Intel pretty much keeps AMD around just to avoid antitrust lawsuits, and I doubt they'll tolerate AMD for much longer. They haven't been a competitor for four and a half years. I'm hoping Fusion changes that in some way, but I doubt it. Somebody with an ARM license doing some serious damage to Intel would be good.
Myself, I don't give a damn about x86/ARM/etc. I just want CPUs to get better, and as long as AMD continues its slow slide into nothingness, the future of good CPUs is in danger. If someone can step up and take some of the weight off of AMD's shoulders, this would be good.
Edit: This is a reply to Grall.
nutball
06-Jan-2011, 22:07
There's truly been no previous greater threat to Intel's position of absolute domination than Microsoft going ARM, coupled with the rise of (reasonably) powerful ARM chips.
Ummm.... I'm guessing here that you don't remember the earty 1990's then?
Myself, I don't give a damn about x86/ARM/etc. I just want CPUs to get better, and as long as AMD continues its slow slide into nothingness, the future of good CPUs is in danger. If someone can step up and take some of the weight off of AMD's shoulders, this would be good.
Edit: This is a reply to Grall.
I agree with this 100%. I actually have AMD CPUs in my desktop and HTPC, but laptops are all intel. It is getting difficult to come up with reasons to buy an AMD CPU. The current x6 I have though at least has proven its value in modeling where I can keep the cores busy. But now the new intel chips appear to be about as fast even with 4 cores.
It is funny the underdog is now Nvidia what will people do? Their heads will explode most likely :)
The one thing I don't know about is the effect of having super fast CPU->GPU communication. I read a bit on smallLUXGPU development and it seemed like one of the big bottlenecks was getting data back and forth to the GPU.
Yeah, you have a wtfpwn I/O interface in the PS3 (some 30-40ish GB/s or whatever on paper, way faster than actual framebuffer bandwidth anyway lol), almost all wasted because the clunky GPU Sony picked basically can't do GPGPU calculations...
Satan Clara? :P
Hah, I don't mean anything by it. It's just something that guy Mike whatsisface who fronted The Inquirer before he got ousted used to call Intel that I remembered as I was typing... :lol: I have all-Intel chips in my PCs right now, P4, Core2 Quad and i7...
Anyway, Apple's example doesn't really compare here. There's nearly 20 years of apps since Windows 95 that can still run on modern OSes
Yeah, but how many of those do people actually RUN, and how many of the really old ones still in use actually need cutting-edge performance? Most (meaning hugely vast majority) of the gigantic backlog of all x86 software ever made is obsolete ancient shit that's, well, been obsoleted, that nobody cares about anymore. You take any of the tens of thousands of DOS, win3.x and 9x apps in existence, you can't friggin' run 'em nowdays because modern windowses don't support 16-bit mode software. And that's a good thing too.
You can run 'em through dosbox, today, at a fraction of the speed of a modern system, but in most cases that would be quite enough.
I'm not saying it's impossible. I'm just saying that just because Apple did it doesn't mean that Microsoft can do it.
They don't need to get everything working. It's like making omelettes, you gotta break some eggs. And if your stuff got broken, then don't upgrade your hardware so that you can continue running your old voodoo stuff, or else upgrade your application to THIS century, and then it'll work on MS's new ARM-based Windows 8 just peachy... :twisted:
Somebody with an ARM license doing some serious damage to Intel would be good.
Nothing beats good ol' competition to bring good products onto the market. Heck, this theory proves itself over and over in the tech industry, in case anyone still doubts it... Just look at intel P4 -> AMD Opteron -> intel Core, IE6 -> Mozilla -> IE7, or Geforce FX -> ATI 9700 -> Geforce 6 and so on.
Myself, I don't give a damn about x86/ARM/etc.
I prefer when good solutions succeed over bad ones. PCs are full of bad solutions that are merely "good enough" so that they'll get the job done (any modern x86 CPU is a magnificient example of that, by brute-forcing performance using a shitty base architecture.)
A PC "franchise reset" using ARM would be wonderful IMO. Bring in J.J. Abrams, sprinkle lens flares liberally all over it...success! :lol:
Ummm.... I'm guessing here that you don't remember the earty 1990's then?
Hm, what am I supposed to remember? OS/2? :lol: Yeah, big threat THAT was... Did it crack 5% market share at any point during its existence? Maybe it did, but still never left any lasting impression. Sad fact is, DOS and win3.x ruled during the early 90s, as magnificiently crap as they both were.
Pretty sure the 1990s reference was to AMD owning intel. It was actually from about 95is till 2001ish as I recall. Fastest x86s were OC'd, pencil-modded thunderbirds and Athlons.
aaronspink
07-Jan-2011, 03:57
Pretty sure the 1990s reference was to AMD owning intel. It was actually from about 95is till 2001ish as I recall. Fastest x86s were OC'd, pencil-modded thunderbirds and Athlons.
Probably more a reference to WNT and all the various architectures it ran on back in the day (MIPS, alpha, etc).
codedivine
07-Jan-2011, 09:07
So I wonder if Nvidia might do real Fusion before even AMD? Wonder whether the GPU core will do CUDA and/or OpenCL and whether CPU and GPU will share address space?
rpg.314
07-Jan-2011, 09:26
So I wonder if Nvidia might do real Fusion before even AMD? Wonder whether the GPU core will do CUDA and/or OpenCL and whether CPU and GPU will share address space?
What do you mean by real fusion?
rpg.314
07-Jan-2011, 09:34
Reading about Nvidia's ARM plans, and Microsoft's Windows 8 announcement, I'm wondering if we're not standing on the threshold of a gigantic paradigm shift in the realm of personal computing - are we even aware of the immense implications for the future this could have?
I am split on the prospects of this thing. If they can get a CLR only app store going in time for this thing, then it might work out. If you think Intel will just sit aside and let MS walk all over their monopoly, think again. I bet Intel will crush this risk with their massive investments in arch and process tech. Also, within a year Medfield should out and I remember intel people claiming that they will close the power gap with arm by 32 nm.
This could be huge, huge, huge. Apple showed not just once, but thrice, that you CAN in fact switch basic hardware architecture, and do so quite successfully and painlessly! If the suits over in Satan Clara doesn't have the jitters already, they will soon I bet. :razz:That's because the number of important third party apps for mac are in single digits. Apple does most of the non OS apps for mac.
Personally I'm quite ready and willing to say FU to x86. It's lived long past its usefulness, the basic PC architecture is archaic and full of old crap that's dragging it down. Even things like the little endian binary format of x86, its stack-based FPU and so on just shows what a crazy fucked-up old system it really is. No, a clean re-start would be much preferable, and an end of Intel's domination of the semiconductor industry would be a great boon to us all too I bet.
x87 has been deprecated for years now, even if nv hasn't gotten the memo. Also, what's wrong with little endian format?
Simon F
07-Jan-2011, 09:52
IAlso, what's wrong with little endian format?
Nothing IMHO. It's the sensible way to do things.
chavvdarrr
07-Jan-2011, 10:05
Probably more a reference to WNT and all the various architectures it ran on back in the day (MIPS, alpha, etc).But none of these was even close to 5% market share.
aaronspink
07-Jan-2011, 10:45
But none of these was even close to 5% market share.
And hence the point...
rpg.314
07-Jan-2011, 12:08
Nothing IMHO. It's the sensible way to do things.
I thought that difference between endianness was sorta like potato/puhtato. Is there more to it?
I thought that difference between endianness was sorta like potato/puhtato. Is there more to it?you can argue for big/little endian both ways (just turning your memory upside down enough time will make each of them logical in any situation).
However where little-endian (byte-order) CPUs break down is that the bit-order is for some reason big-endian, making consistent bitshifts impossible.
eg. a 16 bit word will be arranged this way:
76543210 FEDCBA98
if you consume bits from memory (think of streams) you want to shift them out, but its impossible to get eg. 3210FEDC with simple shifts cause the bit-ordering is messed up.
Probably doesnt matter often enough in practice but its still an incredible stupid lack of consistency.
metafor
07-Jan-2011, 15:06
I thought that difference between endianness was sorta like potato/puhtato. Is there more to it?
The history of little endian is that RS232 and other serial connections sent bytes with bit0 first.
That kinda just stuck.
And nobody says puhtato.
However where little-endian (byte-order) CPUs break down is that the bit-order is for some reason big-endian, making consistent bitshifts impossible.
eg. a 16 bit word will be arranged this way:
76543210 FEDCBA98
if you consume bits from memory (think of streams) you want to shift them out, but its impossible to get eg. 3210FEDC with simple shifts cause the bit-ordering is messed up.
Not at all. There is no memory order for individual bits because bits don't have an address. Bit shifts work just fine with little endian.
codedivine
07-Jan-2011, 21:02
What do you mean by real fusion?
Sharing of address space.
chavvdarrr
08-Jan-2011, 20:02
Why is this thread in CPU forum?
I thought Maxwell should be discussed in 3d Arch&chips?
Yeah the '90s were the times of RISC being seen as the Intel killer. MIPS, ARM, Digital, Sun, PowerPC, etc. In the end Intel went RISC too in their CPUs and everybody else was annihilated. :) Well, they are all still around, but they aren't in desktop machines really. MIPS is in tons of network stuff. ARM is running phones. PowerPC is server stuff and consoles...
AMD was not a serious competitor until Athlon came out in 1999. K6 was slow and on a shitty platform. K5 was neat but couldn't clock high enough. And before that they were pretty much an Intel second party. Intel was dominating personal computing with their CPU prices starting at around $300.
Right now we have a sort of mobile computing renaissance going on that is really creating a new computing future as it evolves. Also most people are buying notebooks now instead of desktops. So the focus is becoming power efficiency. Intel can't seem to get into the really small mobile devices because x86 doesn't seem to be able to scale down well enough even with their awesome manufacturing capabilities. Atom was essentially their attempt to do that. Fortunately for them netbooks arrived when it did otherwise I think Atom would have completely bombed.
hkultala
10-Jan-2011, 17:35
AMD was not a serious competitor until Athlon came out in 1999. K6 was slow and on a shitty platform. K5 was neat but couldn't clock high enough.
K6 was not shitty,
K6 had a very good integer core, BUT:
1) It did not have pipelined FPU
2) It had worse memory architecture than Pentium 2, it's L2 cache was far behind slow bus.
(because AMD had to use the P5's bus protocol, they did not have licence for P6's bus, and they did not have resources & market momentum to create own good bus)
With similar memory architecture K6 core hold it's own against P6 in integer performance(K6-3 vs Dixon, K6 getting much better IPC, but P6-based chips could clock a bit higher on same mfg process)
And K5 was the fail, they designed a chip which had good IPC(which though was only some 5-15% better than K6's IPC) but it could only clock to half of the clock speed K6 could later clock with same mfg process.
What matters is performance = IPC * clock_rate, not either alone.
I said K6 was on a shitty platform and it was. Super 7 was terrible because of cheaply-built boards and low quality chipsets from VIA and ALI. Intel was so far ahead in platform quality that it was really quite incredible.
Athlon had the same problems for the first few years, pretty much until NVIDIA came into the picture with nForce. How many revisions of crap did VIA make? They didn't seem to get AGP and PCI working really well until like KT333!
If you could get your K6 system stable, usually by using 3dfx AGP cards, or by only using PCI, it was a nice CPU for everyone who didn't play a lot of 3D games. The low price, the result of not being capable of competing with Intel's quality and performance, was attractive to everyone though.
K5 was in some ways more interesting to me than K6 because it was entirely home-built. K6 was bought from Nexgen. K5 was AMD's first in-house design of a x86 CPU, and it was very advanced. It was similar to one of their non-x86 RISC CPUs and is one of the first RISC-like x86 chips. They just didn't really know how to bring everything together to beat the Intel monster. K6 didn't pull it off entirely either. AMD had to buy Alpha engineers to finally get serious.
straaljager
10-Jan-2011, 19:22
Why is this thread in CPU forum?
I thought Maxwell should be discussed in 3d Arch&chips?
Are you suggesting that Maxwell and Project Denver are the same? :wink: Makes sense. Couldn't it be Kepler instead?
I am not very optimistic about Win8 on ARM. Microsoft will probably bungle it all up. Are we heading to a world of fat Windows binaries?
rpg.314
19-Jan-2011, 18:15
http://channel.hexus.net/content/item.php?item=28540&page=2
Maxwell to use denver
Maxwell to use denver
Awesome.
Just...awesome. To think Nvidia is taking on Intel in the high-end CPU space, with Microsoft (silently, perhaps) backing them... Damn. That's just mindboggling news. Maybe there will be a day relatively soon when windows binaries will be dual ARM/x86.
DarthShader
20-Jan-2011, 16:11
Just...awesome. To think Nvidia is taking on Intel in the high-end CPU space, with Microsoft (silently, perhaps) backing them... Damn. That's just mindboggling news. Maybe there will be a day relatively soon when windows binaries will be dual ARM/x86.
What? I think you got too excited, take a cold shower now! :wink::grin:
That's what the piece claims... Very high performance CPU core for use in supercomputing.
Anyway, I've already had my cold shower(s) for the day. Took a nice sauna earlier, had a chat with a guy studying astrophysics, it was very interesting.
DarthShader
21-Jan-2011, 01:20
No, it says "Very high performance ARM CPU" and that's a big difference. :D
So your reasoning is "very high performance" is a relative statement modified by the "ARM" parameter in the sentence? Interesting interpretation, but I don't think anyone would really buy that. If your aim is supercomputing, a very high performance ARM core that had the strength of say, a P4 Prescott (which handily kicks the ass of every ARM core out there right now), would not cut it.
The Cortex-A15 would kick the butt of the P4 pretty fricking hard (on the same process with the same design techniques). A 2.5GHz 8-core Cortex-A15 on 28nm could certainly be classified as "very high performance". Anyway keep in mind NVIDIA's goal here is to associate this core with GPUs for a single-chip CUDA system, not to take over the HPC market by the sheer awesomeness of their ARM core.
My expectation (as highlighted in that upcoming article which will get published as soon as I can take a hold of Rys) is that Project Denver is a 4-instruction-decode architecture (A9 and A15 are 2 & 3 respectively) which compares to 3 & 4 decoders for K8/K10 and Conroe/Sandy Bridge respectively. Although x86 decoders are more powerful than ARM decoders because you could have an arithmetic and a memory operation in the same instruction (not that this seems to help Intel Atom much mind you, its performance is not very impressive for a dual-decoder architecture - maybe because x86 suffers from only having instructions with 2 operands and (unlike x64) only 8 generic registers).
rpg.314
21-Jan-2011, 18:13
so there are going to be 3 cores in the system, x86, arm and "cuda". :)
Let's hope the arm and cuda ones are as tightly integrated as in larrabee. The x86 will prolly be the io co-processor. :)
I thought that difference between endianness was sorta like potato/puhtato. Is there more to it?
Little endian = memory ordering does not depend of word size used to access it
Big Endian = memory ordering does depend of word size used to access it
Basically big endian is at best a pain in the arse when designing hardware.
rpg.314
21-Jan-2011, 18:48
Little endian = memory ordering does not depend of word size used to access it
Big Endian = memory ordering does depend of word size used to access it
Basically big endian is at best a pain in the arse when designing hardware.
Interesting. Hadn't thought about it from that POV.
Makes me wonder why almost every RISC went the big endian way when x86 was doing little endian just fine.
aaronspink
21-Jan-2011, 20:40
Interesting. Hadn't thought about it from that POV.
Makes me wonder why almost every RISC went the big endian way when x86 was doing little endian just fine.
In general for CPU ISAs it was more an issue of what the company already did or what the engineers were comfortable with more than rigid engineering oversight.
AKA why is Power big endian? Because IBM was big endian.
Why was alpha little endian? because DEC was little endian.
Why was x86 little endian? because they used DEC PDPs which were little endian.
Little endian = memory ordering does not depend of word size used to access it
Big Endian = memory ordering does depend of word size used to access it
Basically big endian is at best a pain in the arse when designing hardware.Just use decreasing addresses (in effect point at the end of structures/elements instead the beginning) and its the reverse.
btw compare the code for variable length integers and figure out whats simpler to do in either hardware or code:
// bigendian:
varint val = 0;
while (hasnextbyte())
val = (val << 8) | nextbyte();
// little endian:
varint val = 0;
int shift = 0;
while (hasnextbyte()) {
val += nextbyte() << shift;
shift += 8;
}
Should explain why everything streaming/network chose bigendian
rpg.314
22-Jan-2011, 03:38
Should explain why everything streaming/network chose bigendianMay be the BSD sockets people were hooked onto big endian workstations, that's why. :???:
Still no idea what in the name of God made the USB-IF choose big endian. RISC's were supposed to be dead and buried by then.
FWIW, ARM is either-endian and last I checked all the mainstream application processors are implemented as little endian... So I'm not sure this is the right thread for this conversation ;)
hkultala
26-Jan-2011, 09:28
I'm sure the first targeted use for this is in consoles. A bunch of high powered ARM cpus combined with a huge GPU would be all you need for a fantastic next gen console.
The one thing I don't know about is the effect of having super fast CPU->GPU communication. I read a bit on smallLUXGPU development and it seemed like one of the big bottlenecks was getting data back and forth to the GPU.
However...everything is getting super fragmented right now. Are game developers really going to program for...
-Ps3-Cell processor
-xbox 360-PowerPC
-Nvidia Maxwell/kepler
-Fusion/Sandy Bridge (eg normal x86 + GPU)
Seems like something radical needs to happen in the software space to make this less of a headache.
Use C + openCL + openGL and it will run on most platforms(about all except xbox360?), and have HW accelerated graphics AND physics/other calculations.
Vitaly Vidmirov
27-Jan-2011, 12:57
Little endian = memory ordering does not depend of word size used to access it
Big Endian = memory ordering does depend of word size used to access it
Big endian data is placed in register as it is in memory.
Data is easier to read from hex dump.
For me, big endian SIMD programming is easier especially for packed unaligned data and bit fields. If you pick bit field with 16bit chunk or more, it will be swapped in register.
Basically big endian is at best a pain in the arse when designing hardware.
There are no difference. Most RISC CPUs can be configured to either order (with tiny piece of logic)
Simon F
27-Jan-2011, 14:34
Big endian data is placed in register as it is in memory.
WTF :roll:
Vitaly Vidmirov
28-Jan-2011, 11:10
WTF :roll:
in memory: 11 22 33 44 55 66 77 88
in register:
BE 11223344, 55667788
LE 44332211, 88776655
in memory: 11 22 33 44 55 66 77 88
You're just using a writing convention where the lowest byte adress is on the left and the highest is on the right. You could just as well write the bytes vertically, or right-to-left.
You could just as well write the bytes vertically, or right-to-left.
No you can't, not in a culture where you read from left to right anyway. It is the natural way to list things.
Little endian, reading the least significant byte first is natural for arithmetic, because of rippling carries (with early out) and for using the same address to load a byte, word, double word etc. from memory which result in the same value if the higher bytes are zero.
Big endian, reading the most significant byte first is natural for decision making, like sorting and network routing.
LE saved a bit of work (transistors) for some tasks, BE saved a bit of work for some other tasks. In this day and age it matters f*ck all, IMHO.
Cheers
Eh? Big endian order typically reverses bytes within a word for the "convenience" of making them appear the same to a human reader but the memory order _does_ change as a result. What's worse is that memory order changes depending on the word size used. There are even multiple flavours of big endian with different orders within words.
Npl, bit confused by your example, memory ordering of byte data types is the same in both, further little endian allows you to optimise by using wider word shifts if they're available without worring about the effect that word width has on memory order.
This is all a bit mute anyway as little endian is by far the most common these days, thank god :)
John.
No you can't, not in a culture where you read from left to right anyway.
If you're from a different culture you can. ;)
Anyway, my point was that the way we write things is irrelevant to hardware since it does not correspond to any specific physical orientation memory or registers may have.
Simon F
28-Jan-2011, 14:12
in memory: 11 22 33 44 55 66 77 88
No, you have that back the front (at least in typical systems).
Eh? Big endian order typically reverses bytes within a word for the "convenience" of making them appear the same to a human reader but the memory order _does_ change as a result. What's worse is that memory order changes depending on the word size used. There are even multiple flavours of big endian with different orders within words.Reverses relative to what? your assumption that pointers have to address the least important byte(s) (effectively assuming memory ordering has to be LE)? are you depending on undefined behavior of C for your argument?
As long as you do it consistently it doesnt matters one bit (/byte). and those "weird multiple flavours of big endian" just arent big endian.
Npl, bit confused by your example, memory ordering of byte data types is the same in both, further little endian allows you to optimise by using wider word shifts if they're available without worring about the effect that word width has on memory order.You can use bigger shifts with big endian aswell, and the only thing that matters is memory ordering. but the important thing is that you have the simpler logic whether you read 1,4 or 7 bytes.
with a big endian value you just shift the result left one byte (can be adopted to 2,4,n bytes each step assuming you have a BE CPU or byteswap after reading). with a little endian you have to keep track of the number of bytes red and then shift the next byte by a variable amount.
where exactly does LE have an advantage here?
Vitaly Vidmirov
31-Jan-2011, 14:10
Eh? Big endian order typically reverses bytes within a word for the "convenience" of making them appear the same to a human reader but the memory order _does_ change as a result.
Order of what? And why does it change?
+00 11
+01 22
+02 33
+03 44
are placed in register as is
bit 0 ... 31
[11 22 33 44]
Imagine a bitfield data. With BE I can process it with big chunks of any width regardless of source data width.
MEM (4 consecutive bytes)
10110101 01010100 10110111 11110000
BE:
10110101 01010100 10110111 11110000
10110101 01010100 10110111 11110000 << 1
-------------------------------------------------------------------
01101010 10101001 01101111 11100001
LE:
11110000 10110111 01010100 10110101
11110000 10110111 01010100 10110101 << 1
--------
FAIL
What's worse is that memory order changes depending on the word size used.
You can't peek up a byte from the same address as word, but the order is still the same.
Why should you do it anyway?
further little endian allows you to optimise by using wider word shifts if they're available without worring about the effect that word width has on memory order.
Look at the example above.
I don't actually care about bit ordering but I prefer BE =)
Order of what? And why does it change?
+00 11
+01 22
+02 33
+03 44
are placed in register as is
bit 0 ... 31
[11 22 33 44]
Either you call the MSB "bit 0" for some reason or you're describing LE.
Vitaly Vidmirov
09-Feb-2011, 17:08
Either you call the MSB "bit 0" for some reason or you're describing LE.
Because I use PowerPC bit numbering notation (in register): from MSB(0) to LSB(31/63)
Because I use PowerPC bit numbering notation (in register): from MSB(0) to LSB(31/63)
I would intuitively number them the other way round since that nicely corresponds to the bit value 2^x.
And if you reverse this, your bitfield shift example works with LE, too.
Man from Atlantis
19-Jul-2011, 13:03
1st Silicon with NVIDIA Project Denver is an 8-Core ARM, 256 CUDA Core APU? (http://www.brightsideofnews.com/news/2011/7/18/1st-silicon-with-nvidia-project-denver-is-an-8-core-arm2c-256-cuda-core-apu.aspx)
During the same month (December 2011), NVIDIA plans to tape out the first silicon based on Project Denver, which combines up to 8-core custom NVIDIA-ARM 64-bit CPU with a GeForce 600-class GPU. The company had a lot of issues in development of a CPU and the general consensus is that NVIDIA is take a conservative approach with a single 28nm PD CPU design and the 28nm Fermi-based design (http://www.brightsideofnews.com/news/2010/1/18/nvidia-gf100-architecture-alea-iacta-est.aspx), i.e. the rumored Fermi-refresh in the form of notebook and lower-end desktop GeForce 600 Series cards (remember "GeForce 300"?). The interesting bit that we heard is that Project Denver is geared towards "full PhysX support", whatever that might be.
According to another source close to the subject, target for GPU part of the silicon is "at least 256 CUDA cores" which would put the product on pair with AMD's Trinity APU which will pair a Bulldozer-Enhanced CPU core with "Northern Islands" (http://www.techpowerup.com/135022/Cayman-Confirmed-To-Be-Using-VLIW4-SP-Arrangement-Redesigned-ROPs.html) VLIW4 architecture and will be the key APU for AMD in 2012. Compute power-wise, NVIDIA doesn't want to clock it to heavens' high, but rather to squeeze each IPC (Instruction Per Clock) as possible. Still, it is realistic to expect 2.0-2.5GHz for CPU and similar clock for the GPU part, with memory controller and the rest of the silicon working at a lower rate to keep everything well fed.
Unlike AMD's APU design, where CPU and GPU parts connect to the memory controller at the speed of DDR3 memory, Project Denver is looking for a more direct communication between CPU and GPU cores, i.e. relying on the best GPU design can offer: high-bandwidth connection. NVIDIA is not taking the conventional route with L1, L2 and L3 cache design, since the GPUs have 1TB/s+ connections to its cache memory, similar approach is rumored for Denver core design as well. Just like in the GPUs, memory controller takes the larger portion of the die and connects CUDA cores with CPU cores and CPU will have priority access to the bandwidth needed.
rpg.314
19-Jul-2011, 14:07
I thought Denver was meant for Maxwell.
But then, BSN just claimed a 8xA15 for T4 not too long ago. :)
trinibwoy
20-Jul-2011, 07:13
I didn't know BSN had started documenting Jen Hsun's greatest fantasies.
Why do people pay attention to BSN/Fudzilla/Dude who wants to suck AMD's cock every day, err, I mean. Semi-Accurate?
They are all trash.
Edit: Changed Charlie's description to be more accurate and less inflaming to AMD employees.
rpg.314
05-Aug-2011, 22:48
Charlie's take:
http://semiaccurate.com/2011/08/05/project-denver-is-more-than-a-t50-core/
http://semiaccurate.com/2011/08/05/what-is-project-denver-based-on/
entity279
06-Aug-2011, 07:31
Crusoe-style CPU? I donn't know he's gonna navigate his way out of this one when it will be proven incorrect :D
I'm guessing it will because from where I'm standing this looks like a very "courageous" architectural decision and i would suspect nV would do something safer.
Argh, Charlie keeps referring to the Denver CPU itself as T50. That doesn't make any sense and it's so stupid it makes me want to bang my head against the wall. Txx refers to SoCs - it's like if you said the GF100 chip has 16 GF100 cores. And the rest of it is equally ridiculous. I could maybe believe that NV would do some fairly unusual things (e.g. speculative execution of both sides of a branch?) but Crusoe-style isn't one of them. I was nearly ready to take part of his Tegra 3/3.3/4 article seriously but this once again proves his source in that area are completely worthless.
Crusoe-style CPU? I donn't know he's gonna navigate his way out of this one when it will be proven incorrect :D
Clearly, it will be a case of NVIDIA changing its plans just to spite him, the lone ranger fighting the good fight. Alternately, there's enough squirming room for that statement, just like the OMAGAD NV DOES SW TESS OMAGAD thing.
Why do people give that man attention? He just wants to sleep with AMD and Linus Torvald.
All day and all night.
Man from Atlantis
18-Jan-2012, 12:49
JHH: Denver is the world's first 64bit ARM processor and scheduled to ship in 2013
http://translate.google.com/translate?langpair=auto|en&u=http%3A%2F%2Fpc.watch.impress.co.jp%2Fdocs%2Fcol umn%2Fubiq%2F20120116_504845.html
Sounds very familar to what was proposed in this talk.
http://mediasite.colostate.edu/Mediasite/SilverlightPlayer/Default.aspx?peid=22c9d4e9c8cf474a8f887157581c458a 1d#
IE. the three layers of hetegenerous computing.
One big core meant for main house keeping and very fast single thread performance. (CPU)
Some cores not that good in single tread but close to computing units.(ARM)
Lots of computing cores for very paraller tasks. (APU/or what ever the little units are called.)
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.