Intel Atom Z600

The whole reason it is called ARM is because of the joint development with Apple which caused them to spin out the ARM team from Acorn to allow neutral development. Apple started working on ARM basically before the first version(ARM2/3) shipped and used the second generation (ARM6 and later ARM7) in the newton. This is somewhat known in the tech industry and confirmed from multiple sources including wikipedia

Or the whole reason it's called ARM is because it stood for "Acorn RISC Machine", which has nothing to do with Apple. Yes, Apple started showing interest in ARM in the late 80's as a mobile platform. But ARM development began 4 years before Newton development.

Saying it's well known and on Wikipedia is not actually giving a source. I think you're confused about what Apple's contribution actually was. The Wikipedia node on ARM shows nothing to corroborate your claims - only that Apple started working with Acorn long after ARM was first developed. That's not "helping create the ISA." The ISA was developed by Sophie Wilson, not Apple. The spin-off company ARM Ltd is completely separate from the ISA's development.
 
Thanks Simon, very interesting watch. I hope this clears up any doubts regarding Apple's involvement - Furber clearly says Apple "came knocking on the door" right after he left in 1990 (around 46:10).
 
I liked the bit about the full CPU simulator being only 800 lines of code. :)
 
We've clearly been using the wrong approach, it's better to have no money and no people available :ROFL:

Great trip down memory lane!

John.
 
Far from that. Jobs grew tired of trying to spin the negative clock difference with x86. That and IBM screwed up with a bunch of deadlines for the G5 (and Moto had not been particularly stellar with their G4 deliverables either). As you say, well-known things in the industry.



I said 'tablets'. Not hard to tell how they sell (given there's one brand that's actually selling). Also, not hard to tell how netbooks have been affected by the foray of said tablets:

netbook-sales-3.jpg

Some say a market's viability is measured by its growth. But i have no knowledge of such things.

Fine. Are game consoles also about the network architecture?


Oh wow. I wouldn't even try to comment on that.

Netbooks are dieing out because of the high prices vs notebooks and the lack of performance. It also doesn't help that there hasn't been many refreshes in over a year. A 1.6ghz atom is going to perform just as shitty as a 1.2ghz atom.

I would think with intel's nxt refresh that should bring dual core atom chips will help.


I mean a dell mini 10 with an atom 1.2ghz cpu , 1 gig of ram and integrated 3 year old intel igp isn't very good as a deal for $300. Paying $400 for the same thing with a 1.6ghz cpu insead isn't great either.

You jump up to $350 right between those , you get the inserpon 11z. It comes with a bigger screen , 2 gigs of ram , a celeron 1.3ghz and a much newer g45 intel igp.


Netbooks just aren't a good value and many see that. The same might be said about the ipad and tablets in general 10 months from now as people learn that they aren't powerfull enough to replace thier laptops.
 
Netbooks just aren't a good value and many see that. The same might be said about the ipad and tablets in general 10 months from now as people learn that they aren't powerfull enough to replace thier laptops.
While I agree that netbooks are not particularly good value, I suspect you haven't actually tried an ipad yet. They are freakishly fast for what they do, and quite possibly the best web browsing device on the planet.

ps: flash can go die in a barn fire, for all i care. Adobe had all their chances and blew them, like the little 'oh-look-we-have-the-windows-desktop-by-the-balls-why-bother-about-embedded' snobs that they [strike]are[/strike] were.
 
I like throughput computing, but ARM is only marginally better suited to it than x86 ... and hell, no one is even using ARM for this whereas x86 at least "has" Larrabee. ARM is actually going away from throughput efficiency with diminishing return speed ups such as superscalar OoO execution and large caches ... once cores get that fat where is the big advantage of ARM?


Creative's ARM based Zii architecture has a thoroughput focus, doesn't it? We'll probably see the first mobile device supporting OpenCL before anything uses Zii, though.
 
Creative's ARM based Zii architecture has a thoroughput focus, doesn't it? We'll probably see the first mobile device supporting OpenCL before anything uses Zii, though.

While Zii has an ARM as its general purpose control core (like the PPC on Cell, and it's a very old and slow ARM at that) the big throughput compute array is anything but ARM.

Actually, I say that but I have no idea what it really is, and I think no one else really does either, unless more info has surfaced.

On the other hand, Furber did say (in the video Simon F just posted in this thread, at that) that he has a research project going with some utterly obscene number of ARM9 cores. Again, not comparable to modern ARM, but in some sense an ARM's an ARM. I do wonder if you could gain more with tinier/simpler cores. Furber has said a lot about ARM being super small and simple because they couldn't afford to make it complex, but it really does a lot of things that were quite extravagant for its time, even if much of it was just generalized solutions to things they needed to have on die anyway.

It seems to me that if you want high data throughput going for really wide SIMD makes the most sense, which would be accomplished either by having a bunch of cores with a shared instruction fetch/decode frontend (GPU shaders approach) or really wide vector instructions (Larrabee approach). If you wanted something with really high control throughput, like AI might be, and I think this is what Furber is doing, you might want the opposite extreme - a bunch of extremely small cores with tiny register files and really small/simple instructions (and not a lot of them, with what you have being specialized for the application). ARM as it exists in any incarnation doesn't seem to cater fantastically to either extreme, but I do think it does better than vanilla x86. And no, I don't consider Larrabee vanilla x86, the x86 part is barely more than a casual point of interest.

Main point is, for many-core you probably want something more specialized, but for right now we still need to run our existing general purpose code.
 
Last edited by a moderator:
Real OT, but here's the snippet on Zii's compute array:

Media Processing Array - Architecture
High compute density SIMD architecture
24 Processing Elements (PE) in 3 clusters
Each cluster runs the same or independent code
Multiple High bandwidth memory paths
Advanced hierarchical cache structure
Random access to memory per PE
Shared access to ARM memory
Independent DMA controller per cluster
Integer, IEEE 32-bit and 16-bit floating point

Sounds a lot like shaders on various GPUs right? But I'm far from the expert on these things like many here are. Would guess those 8-per cluster units are single-issue scaler units, but that still gives Zii grossly more shader ALU power than any portable 3D solution on the market (the units are said to run at 166MHz). DMA is nice too. Too bad that when you're doing 3D so much compute time is spent on texturing and other traditionally fixed function tasks.
 
Don't know if you guys have read that one: http://www.imgtec.com/factsheets/SD...Development Recommendations.1.8f.External.pdf

I'm not sure how the Zii exactly handles integers, fp16 and fp32, but I'd say that it's safe to assume that at least for fp32 the ALUs act as scalar units.

SGX ALUs on the other hand as described above can either operate as scalars 1 fp32 (highp), Vec2 fp16 (mediump) or Vec4 int8 (lowp). The developer recommendation for those precision levels out of the document above are:

  • Use highp for vertex position and transformation matrices
  • Use highp or mediump for texture coordinates
  • Use lowp for normals and colours as long as the range is sufficient

That's of course mostly for SGX520-545 (USSE), SGX543 (USSE2) not included.

It would be interesting to know how much die area the Zii captures.
 
Don't know if you guys have read that one: http://www.imgtec.com/factsheets/SD...Development Recommendations.1.8f.External.pdf

I'm not sure how the Zii exactly handles integers, fp16 and fp32, but I'd say that it's safe to assume that at least for fp32 the ALUs act as scalar units.

SGX ALUs on the other hand as described above can either operate as scalars 1 fp32 (highp), Vec2 fp16 (mediump) or Vec4 int8 (lowp). The developer recommendation for those precision levels out of the document above are:



That's of course mostly for SGX520-545 (USSE), SGX543 (USSE2) not included.

It would be interesting to know how much die area the Zii captures.

Even more OT, but the 4-way integer SIMD on USSE is actually 10bit 1.1.8 rather than 8bit, as can be seen in the description of lowp in the document you've linked. This does conflict with TI's description but I take IMG more at their word, and I believe I've received direct confirmation on this before.

What's the highest end SGX we can consider really on the market right now, 535 still right? And that's 2x USSE1, no? So still only 8x int10 per clock at comparable clock speeds, which still pales in comparison to 24x on Zii.
 
It seems to me that if you want high data throughput going for really wide SIMD makes the most sense
I disagree, I think wide vectors are used exactly because people are using fat cores and this is the only way to make it work with them. The 5 wide VLIW cores on AMD GPUs are a better example of a core well suited for throughput floating point computing IMO (in practice still used in a SPMD setup, but I think that has to do with the history of GPU computing where having low branch granularity didn't impact efficiency much).

Compared to that both x86 and ARM are fat.
 
I disagree, I think wide vectors are used exactly because people are using fat cores and this is the only way to make it work with them. The 5 wide VLIW cores on AMD GPUs are a better example of a core well suited for throughput floating point computing IMO (in practice still used in a SPMD setup, but I think that has to do with the history of GPU computing where having low branch granularity didn't impact efficiency much).

Compared to that both x86 and ARM are fat.

but its not 5 wide VLIW. its 5x16 wide VLIW. As soon as they drop the x16, they'll have less density than nvidia!
 
Still more density than NVIDIA would have if it tried to drop the x16 ... let alone Larrabee if it tried to drop the x16.
 
I disagree, I think wide vectors are used exactly because people are using fat cores and this is the only way to make it work with them. The 5 wide VLIW cores on AMD GPUs are a better example of a core well suited for throughput floating point computing IMO (in practice still used in a SPMD setup, but I think that has to do with the history of GPU computing where having low branch granularity didn't impact efficiency much).

Compared to that both x86 and ARM are fat.

I think it's apples and oranges IMO, the wide SIMD on modern x86 and ARM doesn't make it fat. Larrabee, for instance, is much leaner and much wider, and there I think it's the x86 part that's more tacked on than the wide SIMD part.

I do consider SPMD still an example of the SIMD I'm getting at, and it's certainly making the cores much leaner (and not just out of heritage). VLIW obviously has its merits too, I didn't mean to exclude that, I was merely referring to a large number of operations per instruction fetch/decode ratio.
 
VLIW obviously has its merits too, I didn't mean to exclude that, I was merely referring to a large number of operations per instruction fetch/decode ratio.
A lot of them will be wasted though in divergent kernels. With both x86 and ARM you have the choice between scalar (lot of overhead) SIMD (low branch granularity) and superscalar (fat). VLIW expands the design space, because for each VLIW you can decide at compile time to use either superscalar or SIMD execution ... and when not bogged down with forward compatibility VLIW can do superscalar much leaner (VLIW combined with forward compatibility is really the worst of all worlds in the end, ie. Itanium).
 
Last edited by a moderator:
A lot of them will be wasted though in divergent kernels. With both x86 and ARM you have the choice between scalar (lot of overhead) SIMD (low branch granularity) and superscalar (fat). VLIW expands the design space, because for each VLIW you can decide at compile time to use either superscalar or SIMD execution ... and when not bogged down with forward compatibility VLIW can do superscalar much leaner (VLIW combined with forward compatibility is really the worst of all worlds in the end, ie. Itanium).

Interesting discussion, I hope no one minds our hijacking too much ;)

I don't disagree with this, for the most part.

I do think that especially "lean" VLIW many-cores will have a lot of specialization per execution-unit, and therefore shouldn't present that much opportunity for SIMD.

One of the downsides of leaner VLIW is that you end up with much wider instructions that will inevitably have a bunch of execution unit NOPs in them. You can stitch them out like TI's C6x does, but then you end up with more complex variable length fetches (although nothing like x86, of course) and execution unit scheduling. If amortized over many cores this might not matter much.

From here the main thing separating VLIW from conventional superscalar is interlocking. Superscalar doesn't necessarily need it to be superscalar, but of course superscalar on x86 and ARM do to be backwards compatible. Stuff like this bites you, and out of order execution then bites you a lot more. SMT is a leaner solution to hiding latencies than OoE, along with large enough register files for software scheduling and perhaps features like the software loop pipelining capabilities in C6x (although those are probably overkill).

I guess it's worth comparing just how big N scalar cores vs 1 N-wide VLIW is when fetch/decode is amortized out. The VLIW has to be compressed to really have comparable code densities, if that's determined to even matter.

I agree that forward compatibility is awful for VLIWs.

A little more on topic: when regarding current x86 and ARM and which is "fatter", the real question to me is if there's anything about ARM that lends itself to leaner OoE than x86. Maybe someone else can comment, I don't have anything on this yet.
 
While I agree that netbooks are not particularly good value, I suspect you haven't actually tried an ipad yet. They are freakishly fast for what they do, and quite possibly the best web browsing device on the planet.

ps: flash can go die in a barn fire, for all i care. Adobe had all their chances and blew them, like the little 'oh-look-we-have-the-windows-desktop-by-the-balls-why-bother-about-embedded' snobs that they [strike]are[/strike] were.

I've used one and I like it , wont buy it cause its apple though.

However some users may want to replace thier laptop and buy an ipad and find that it just doesn't cut it and at $500 for the average price of an ipad you can get a decent laptop.

I've seen this trend . People had desktops and had to replace them so they bought laptops. However laptops weren't powerfull enough to replace desktops unless you dropped a huge chunk of money. So OEMs made big laptops to put bigger faster hardware in and people bought those but they were no longer very viable to take with you and battery life was really bad. Now we have netbooks that have great battery life so you'd wnat to keep them with you almost all the time. However the processor sucks for anything outside of word some internet sites and 8 year old games.


The same trend may happen to tablets. In fact a friend of mine talked about wanting wow on his ipad (Before the ipad came out )and i had to sit and explain to him why he'd be waiting a long time and why he shouldn't get his hopes up.

The ipad / tablets may work a bit better than netbooks because they are closer to cell phons with a really big screen while netbooks are closer to pcs with a really small screen. But they will still be limited and still face the same challanges that netbooks face.
 
Back
Top