Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 09-Sep-2010, 10:34   #1
argor
Junior Member
 
Join Date: Nov 2008
Posts: 96
Default ARM Cortex-A15 the successor of ARM Cortex-A9

ARM has Unveils ARM Cortex-A15 the successor of ARM Cortex-A9
Cortex-A15 processor running at up to 2.5GHz
targeted at manufacture in 32nm, 28nm, and future geometries.
lead licensee partners are Samsung, ST Ericsson and Texas Instruments
more here http://arm.com/about/newsroom/arm-un...pabilities.php
argor is offline   Reply With Quote
Old 09-Sep-2010, 11:19   #2
roninja
Member
 
Join Date: Feb 2002
Posts: 258
Send a message via MSN to roninja
Default

The Eagle has landed?

So this is clearly next-gen stuff, will have to see if this gets paired with current or next-gen graphics i.e. Mali or Mali next v Tegra 2/3 v PowerVR series 5 SGX XT or series 6 (codename the "Daddy of all gpu's" i made that up btw)
roninja is offline   Reply With Quote
Old 09-Sep-2010, 11:39   #3
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,921
Default

I'll just quote myself from the NVIDIA x86 thread:
Quote:
Cortex-A15 announced (with little new information), the three lead licensees are Texas Instruments, Samsung, and ST-Ericsson. So NVIDIA isn't even a lead licensee for Eagle...

Either NV got a special deal with ARM to do what Charlie described and will wait to implement the A15 until their x86 translator is done, or more likely Charlie is just wrong from the first letter to the last and this entire thread discusses nothing more than a fabrication from Charlie's sources. The lack of x64 already made it less plausible, and now this seems to make it very improbable.

And before anyone says this means NVIDIA is giving up on Tegra - it's most likely just a financial decision. It's more expensive to be a lead licensee than to wait 9-12 months and license it then. Not being a lead licensee can also be a scheduling decision (too late for your refresh, too early to be a lead licensee for the next one). Alternatively, they could decide to stick to a quad-core A9, which would be disappointing and a competitive disadvantage in the high-end but not impossible. I think the financial justification is the most likely though.
A bit unfortunate that ARM decided not to reveal any architectural detail - they mention FPU/NEON improvements (it's finally quad-MAC like Snapdragon) but don't say anything about the integer portion. I'm starting to fear it's very incremental and they did not increase the issue width in any way.

EDIT: See my post just below.
__________________
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is offline   Reply With Quote
Old 09-Sep-2010, 12:51   #4
argor
Junior Member
 
Join Date: Nov 2008
Posts: 96
Default

http://www.theregister.co.uk/2010/09/09/new_arm/
Quote:
The Cortex A15 will consume the same energy as today's ARM chips, but will sport as many as 16 cores
argor is offline   Reply With Quote
Old 09-Sep-2010, 13:01   #5
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,921
Default

Okay, forget everything I said: the integer pipeline is now 3-issue, whereas the A9 was 2-issue. Here's by far the best article I've found so far: http://www.electronicsweekly.com/Art...cortex-a15.htm
Quote:
Originally Posted by EW
"The goal was 50% better performance than the A9 at the same geometry with the same clock," ARM v-p of marketing Peter Hutton told EW. "We have seen 2x and 3x in certain applications."

These performance improvements come from updates including a three issue pipeline compared with the A9's dual issue, and changes to memory interfacing.
Presumably the 2x to 3x figure is partially based on NEON going from 64-bit to 128-bit ALUs, but more than 50% higher performance per clock than the A9 for integer code would be very impressive indeed.

Regarding the number of cores: a single cluster is still 4 cores max (and 4MB L2 whereas the A9 supported 8MB interestingly enough, presumably for coherency reasons?), but it's now possible to put multiple clusters on the same chip. I don't know if there's a hard limit of 4 clusters of 4 cores (i.e. 16) or if that's just marketing. And never take power figures seriously unless it's very clear they are for the same process at the same clock or performance target.
__________________
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is offline   Reply With Quote
Old 09-Sep-2010, 13:12   #6
argor
Junior Member
 
Join Date: Nov 2008
Posts: 96
Default

argor is offline   Reply With Quote
Old 09-Sep-2010, 13:35   #7
Entropy
Senior Member
 
Join Date: Feb 2002
Posts: 2,009
Default

Quote:
Originally Posted by Arun View Post
Okay, forget everything I said: the integer pipeline is now 3-issue, whereas the A9 was 2-issue. Here's by far the best article I've found so far: http://www.electronicsweekly.com/Art...cortex-a15.htm
Presumably the 2x to 3x figure is partially based on NEON going from 64-bit to 128-bit ALUs, but more than 50% higher performance per clock than the A9 for integer code would be very impressive indeed.

Regarding the number of cores: a single cluster is still 4 cores max (and 4MB L2 whereas the A9 supported 8MB interestingly enough, presumably for coherency reasons?), but it's now possible to put multiple clusters on the same chip. I don't know if there's a hard limit of 4 clusters of 4 cores (i.e. 16) or if that's just marketing. And never take power figures seriously unless it's very clear they are for the same process at the same clock or performance target.
Actually, I have no problem believing the latter part of the quote that attributes the large improvements to the upgraded memory hierarchy.
"These performance improvements come from updates including a three issue pipeline compared with the A9's dual issue, and changes to memory interfacing."
The memory subsystems of these SoC are quite constraining...
Entropy is offline   Reply With Quote
Old 09-Sep-2010, 14:58   #8
mboeller
Member
 
Join Date: Feb 2002
Location: Germany
Posts: 879
Default

Quote:
Originally Posted by Entropy View Post
Actually, I have no problem believing the latter part of the quote that attributes the large improvements to the upgraded memory hierarchy.
"These performance improvements come from updates including a three issue pipeline compared with the A9's dual issue, and changes to memory interfacing."
The memory subsystems of these SoC are quite constraining...
...and with a 128bit AMBA4 interface, this limit is lifted, it seems

mboeller is offline   Reply With Quote
Old 09-Sep-2010, 15:27   #9
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,921
Default

Quote:
Originally Posted by Entropy View Post
Actually, I have no problem believing the latter part of the quote that attributes the large improvements to the upgraded memory hierarchy.
"These performance improvements come from updates including a three issue pipeline compared with the A9's dual issue, and changes to memory interfacing."
The memory subsystems of these SoC are quite constraining...
Right, but the hierarchy itself is actually completely unchanged (32+32KB L1, shared L2, no L3). The key word is 'interfacing' and that presumably refers to cache performance, load/store units, and/or AMBA 4 as mboeller said. But AMBA 3 already had a 64-bit bus, so in theory you'd be limited by external memory first - in practice, I suppose things can be very different. Alternatively maybe they're really comparing different memory controllers (since ARM licenses those too) although I doubt it.

On a slightly related note, it's interesting that L1/SRAM ECC is now mandatory (I believe it was an option that nobody used on the A9 but I'm not sure).
__________________
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is offline   Reply With Quote
Old 09-Sep-2010, 15:57   #10
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,424
Default

If this core is going to target servers, ECC would be necessary, particularly if the L1 and L2 are not inclusive.
Then again, the PAE-like address extensions seem to set ARM up to at most replace 32-bit x86 servers that haven't been replaced by x86-64 chips in the last 7 or so years, which doesn't sound like a large niche. Perhaps it's not so much servers as some other market that needs a bit more than 4GiB of memory?

Then there is the expectation that the susceptibility of SRAM to soft errors is going to get much worse at the future nodes this design targets. I have seen it alleged that the error rates at the leading edge for SRAM are worse than DRAM already.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is online now   Reply With Quote
Old 09-Sep-2010, 16:09   #11
frogblast
Junior Member
 
Join Date: Apr 2008
Posts: 77
Default

Quote:
Originally Posted by 3dilettante View Post
If this core is going to target servers, ECC would be necessary, particularly if the L1 and L2 are not inclusive.
Then again, the PAE-like address extensions seem to set ARM up to at most replace 32-bit x86 servers that haven't been replaced by x86-64 chips in the last 7 or so years, which doesn't sound like a large niche. Perhaps it's not so much servers as some other market that needs a bit more than 4GiB of memory?

Then there is the expectation that the susceptibility of SRAM to soft errors is going to get much worse at the future nodes this design targets. I have seen it alleged that the error rates at the leading edge for SRAM are worse than DRAM already.
Consider it in context of the virtialization extensions, where each instance has 4GB of address space, but the summed memory usage of all instances can exceed 4GB. In that model, only the hypervisor needs to be aware of PAE, which is actually a reasonable expectation.
frogblast is offline   Reply With Quote
Old 09-Sep-2010, 16:13   #12
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,921
Default

Quote:
Originally Posted by 3dilettante View Post
Then again, the PAE-like address extensions seem to set ARM up to at most replace 32-bit x86 servers that haven't been replaced by x86-64 chips in the last 7 or so years, which doesn't sound like a large niche. Perhaps it's not so much servers as some other market that needs a bit more than 4GiB of memory?
I think the idea is that each virtualisation instance can have up to 4GB, so for highly virtualised servers (which is not a negligible bit of the market nowadays) the lack of x64 isn't a big issue. As I said before, it is a disappointment, but various trade-offs aren't a show-stopper as the intended market is one that must be knowledgeable about the additional complexities of using a non-x86 solution.

Quote:
Then there is the expectation that the susceptibility of SRAM to soft errors is going to get much worse at the future nodes this design targets. I have seen it alleged that the error rates at the leading edge for SRAM are worse than DRAM already.
Ah yes, good point. That might be a good reason not to bother with a non-ECC version.
__________________
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is offline   Reply With Quote
Old 09-Sep-2010, 16:16   #13
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,424
Default

Quote:
Originally Posted by frogblast View Post
Consider it in context of the virtialization extensions, where each instance has 4GB of address space, but the summed memory usage of all instances can exceed 4GB. In that model, only the hypervisor needs to be aware of PAE, which is actually a reasonable expectation.
Is any of that problematic if the chip skipped ahead to 64 bits?
PAE may fit that usage case, but why target that one situation at the expense of being marketable for all the other ones?
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is online now   Reply With Quote
Old 09-Sep-2010, 16:18   #14
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,921
Default

Quote:
Originally Posted by 3dilettante View Post
Is any of that problematic if the chip skipped ahead to 64 bits?
PAE may fit that usage case, but why target that one situation at the expense of being marketable for all the other ones?
The A15 uses the same ISA version as the A8 and A9. They're being surprisingly conservative about ISA changes. Why? Who knows... but obviously ARMv8 should be 64-bit in two or three years.
__________________
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is offline   Reply With Quote
Old 09-Sep-2010, 16:23   #15
roninja
Member
 
Join Date: Feb 2002
Posts: 258
Send a message via MSN to roninja
Default

Will be interesting to see what Intel fires back with at idf next week, roadmaps are hotting up in the SoC space, and there can be multiple winners in this game I think.
roninja is offline   Reply With Quote
Old 09-Sep-2010, 16:25   #16
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,424
Default

It may be that ARM is taking it slow in revising the ISA, or that it has not yet validated a fully extended set.
The downside is that it is attempting to match x86 capabilities by also repeating a chapter of history in x86 that few remember fondly. It does write off a large swath of the server market that was similarly walled off from x86 until x86-64.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is online now   Reply With Quote
Old 09-Sep-2010, 17:17   #17
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,952
Default

Quote:
Originally Posted by Arun View Post
The A15 uses the same ISA version as the A8 and A9. They're being surprisingly conservative about ISA changes. Why? Who knows... but obviously ARMv8 should be 64-bit in two or three years.
I don't think it'll be exactly the same ISA, they'll need some extension set to facilitate hardware virtualization at least.

I wouldn't put much stock in ISA version number vs extension set in terms of how substantial it is. There isn't very much of a difference between ARMv4 and ARMv5, for instance, all of the big differences came in the optional extension sets.

LPAE does seem like a really incremental move towards attracting the server market, but it was probably not that hard for them to implement.

I kind of wonder if some intermediate approach towards getting > 32-bit virtual addresses would be appropriate. Including full 64-bit registers and ALUs seems like kind of a waste. It'd make sense to have a mode where registers (possibly just some) are extended to some larger bit size (40 bits? 48 bits?) where only the AGUs operate on the upper bits. Might be slightly tricky to get compilers working with and would be a little limited, but at least wouldn't incur the waste of full 64-bit registers and ALUs and would potentially be a much smaller impact to the ISA (maybe just an instruction to move the low x bits of a register into the upper x bits of the extended register would be sufficient)
Exophase is online now   Reply With Quote
Old 09-Sep-2010, 17:21   #18
Entropy
Senior Member
 
Join Date: Feb 2002
Posts: 2,009
Default

Quote:
Originally Posted by Arun View Post
Right, but the hierarchy itself is actually completely unchanged (32+32KB L1, shared L2, no L3). The key word is 'interfacing' and that presumably refers to cache performance, load/store units, and/or AMBA 4 as mboeller said. But AMBA 3 already had a 64-bit bus, so in theory you'd be limited by external memory first - in practice, I suppose things can be very different. Alternatively maybe they're really comparing different memory controllers (since ARM licenses those too) although I doubt it.

On a slightly related note, it's interesting that L1/SRAM ECC is now mandatory (I believe it was an option that nobody used on the A9 but I'm not sure).
Well, the diagram above clearly states "128-bit AMBA4 Advanced Coherent Bus Interface". Lacking any further insight, I assumed twice the width and probably better handled. Which, for a number of scenarios, in and of itself could yield a 2-3 times improvement.
I freely admit that I haven't dug into any meatier documents (and won't have time until Sunday at the earliest), so I could be jumping to conclusions. Probably am.
Entropy is offline   Reply With Quote
Old 09-Sep-2010, 17:38   #19
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,952
Default

I wonder if SoCs with quad-core Cortex-A15 will employ their own L3 cache. This option could be good reason for the move to the wider AMBA4 bus, because I doubt we'll be seeing 128-bit (paired or otherwise) memories on mobile devices any time soon. L3 in this arrangement probably wouldn't be an awful lot worse than it'd be if it were included in an ARM provided and internally interfaced cell instead.

SoCs are already free to share the L2 with other things (like nVidia is doing in Tegra 2), seems like there's some more flexibility here than with typical multicore designs we're used to.

One feature that I hope Cortex-A15 will have is the ability to share one NEON core among multiple CPU cores. The designs are fairly decoupled as it is so I hope that this will be a possibility. Having a NEON unit for every core will be overkill, especially with 4-way floating point now, I hate to think how much die space it'll take up. Having only one NEON unit for 4 cores should be quite good for a number of workloads - sharing between separate cores could help hide latency so you'd potentially get better utilization than with one unit in one core, although you'd still need the register set and some other contexts duplicated (and hence separated from the main NEON functional units).

I'm just concerned that we'll see the alternatives.. cores with no access to NEON that turn into a big OS problem (although I suppose NEON instructions could be trapped, then the thread can be rescheduled to a core that has it)... or worse, no NEON at all like in Tegra 2, which IMO is going to turn into a compatibility/market segmentation problem.

Last edited by Exophase; 09-Sep-2010 at 17:46.
Exophase is online now   Reply With Quote
Old 09-Sep-2010, 18:30   #20
metafor
Member
 
Join Date: May 2010
Posts: 463
Default

Quote:
Originally Posted by Exophase View Post
I don't think it'll be exactly the same ISA, they'll need some extension set to facilitate hardware virtualization at least.

I wouldn't put much stock in ISA version number vs extension set in terms of how substantial it is. There isn't very much of a difference between ARMv4 and ARMv5, for instance, all of the big differences came in the optional extension sets.

LPAE does seem like a really incremental move towards attracting the server market, but it was probably not that hard for them to implement.

I kind of wonder if some intermediate approach towards getting > 32-bit virtual addresses would be appropriate. Including full 64-bit registers and ALUs seems like kind of a waste. It'd make sense to have a mode where registers (possibly just some) are extended to some larger bit size (40 bits? 48 bits?) where only the AGUs operate on the upper bits. Might be slightly tricky to get compilers working with and would be a little limited, but at least wouldn't incur the waste of full 64-bit registers and ALUs and would potentially be a much smaller impact to the ISA (maybe just an instruction to move the low x bits of a register into the upper x bits of the extended register would be sufficient)
I believe the extension to this is called VMSAv7. It provides for 64-bit page descriptors and an added level of translation, using the result of the old translation as a pointer to the new table (and of course, supported in hardware).
metafor is offline   Reply With Quote
Old 09-Sep-2010, 19:26   #21
Grall
Invisible Member
 
Join Date: Apr 2002
Location: La-la land
Posts: 6,696
Default

I wonder how fast a consumer-level ethernet router could be with a chip like this driving it... Considering current top-end routers from D-Link, Netgear and so on employ 200-400MHz-ish ARM chips of generally unknown generation and capability (ARM 6-7 probably) and route 100MB/s and up, perhaps we could see full gigabit line speed from a multicore 2.5GHz CPU.
__________________
"Du bist Metall!"
-L.V.
Grall is offline   Reply With Quote
Old 10-Sep-2010, 02:21   #22
cal_guy
Member
 
Join Date: Jun 2008
Posts: 205
Default

Quote:
Originally Posted by Grall View Post
I wonder how fast a consumer-level ethernet router could be with a chip like this driving it... Considering current top-end routers from D-Link, Netgear and so on employ 200-400MHz-ish ARM chips of generally unknown generation and capability (ARM 6-7 probably) and route 100MB/s and up, perhaps we could see full gigabit line speed from a multicore 2.5GHz CPU.
Aren't most routers powered by MIPS based chips?
cal_guy is online now   Reply With Quote
Old 10-Sep-2010, 04:39   #23
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,241
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Exophase View Post
One feature that I hope Cortex-A15 will have is the ability to share one NEON core among multiple CPU cores. The designs are fairly decoupled as it is so I hope that this will be a possibility.

AKA Bulldozer
rpg.314 is offline   Reply With Quote
Old 10-Sep-2010, 05:12   #24
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,952
Default

Quote:
Originally Posted by rpg.314 View Post
AKA Bulldozer
Exactly
Exophase is online now   Reply With Quote
Old 10-Sep-2010, 08:05   #25
iwod
Member
 
Join Date: Jun 2004
Posts: 179
Default

I wonder what is the use 4GB + memory in server market. Most of the Hosting,Grid Hosting, Cloud Hosting are / can be done in a VM environment where you dont get more then 4GB Memory in one instance.

So that is at least part of the server market that can be addressed with ARM. It may not be a large % in server market shares, but it is a good fit for ARM nevertheless.

Software Stack are mainly open sourced software. So i am not at all worried about running windows like all internet forums have been crying about. It just means we have 2 - 3 years to polish LAMP and LLVM for ARM.
iwod is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 19:49.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.