Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 02-Dec-2011, 18:39   #1076
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,134
Default

Pretty much wherever AMD's revised count is showing up, it's being disputed by people who can perform arithmetic.

If AMD is using a schematic-level count for BD at 1.2B as opposed to the physical count, perhaps it is not comparable to the count given for each module, which may have been using a physical count.
That could open up a little leeway in the totals per die, but since 100M of each module must be just L2 cache cells, it's not leaving much room for the logic and everything else on the die that's not cache.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 03-Dec-2011, 11:06   #1077
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,863
Send a message via Skype™ to Jawed
Default

How many bits of error detection and correction are there per cache line? How many bits in a cache line?
__________________
Can it play WoW?
Jawed is offline   Reply With Quote
Old 03-Dec-2011, 12:21   #1078
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 2,819
Send a message via Skype™ to fellix
Default

For what I know, up to K10, AMD used to have ECC for the L1D in 8:1 ratio, i.e. eight ECC bits for every 64 bits of protected data, using 64-bit Hamming SED/DED method. The ECC bits were organized in separate banks along the main L1D array. For the lower cache levels I don't have any reliable information about protection implementations.

p.s.: In Bulldozer, AMD removed the ECC protection for the L1D caches due to the inclusive relation to the L2, so now any error in the L1D will trigger data reload form the L2, which is ECC protected.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 06-Dec-2011, 20:50   #1079
Color me Dan
Member
 
Join Date: May 2007
Location: Sweden
Posts: 300
Default

Does anyone feel like humoring a layman/outside observer?

I wonder if there is any way for performance to improve over time through "easily" implemented code optimization such as compilers and/or (and I guess) libraries for the Bulldozer uArch. Could (really out of my depth here) microcode be updated if that has any meaningful impact on performance?

The Anandtech review mentions that Windows 8 ought to have a better scheduler that takes the modular CPU architecture into account which ought to improve performance somewhat. That's what made me think about it as it sort of suggested that some problems could stem from how the CPU is seen, and thus used, by software.

No doubt there are serious flaws in design that will have to be rectified, I just wonder how much of the performance penalty stems from the architecture directly and how much is due to simple novelty.
Color me Dan is offline   Reply With Quote
Old 06-Dec-2011, 20:58   #1080
hoho
Senior Member
 
Join Date: Aug 2007
Location: Estonia
Posts: 1,218
Send a message via MSN to hoho Send a message via Skype™ to hoho
Default

It is possible to get some performance boost after OS manages to distribute threads equally over modules but it only helps as long as you don't load all the cores and even then the benefit is often tiny.

Biggest problem seems to be godawful cache architecture and only thing fixing it is redesigned chip, not going to happen for at least a couple of years.
hoho is offline   Reply With Quote
Old 06-Dec-2011, 21:40   #1081
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Color me Dan View Post
Does anyone feel like humoring a layman/outside observer?

I wonder if there is any way for performance to improve over time through "easily" implemented code optimization such as compilers and/or (and I guess) libraries for the Bulldozer uArch. Could (really out of my depth here) microcode be updated if that has any meaningful impact on performance?

The Anandtech review mentions that Windows 8 ought to have a better scheduler that takes the modular CPU architecture into account which ought to improve performance somewhat. That's what made me think about it as it sort of suggested that some problems could stem from how the CPU is seen, and thus used, by software.

No doubt there are serious flaws in design that will have to be rectified, I just wonder how much of the performance penalty stems from the architecture directly and how much is due to simple novelty.
There is room for that, but do not expect improvements of this kind to exceed 10% at best. And these sort of improvements are not exclusive to AMD.
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline   Reply With Quote
Old 06-Dec-2011, 21:48   #1082
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,863
Send a message via Skype™ to Jawed
Default

Quote:
Originally Posted by fellix View Post
For the lower cache levels I don't have any reliable information about protection implementations.
I was hoping that one of those people who could do arithmetic would have provided a solid answer by now.
__________________
Can it play WoW?
Jawed is offline   Reply With Quote
Old 06-Dec-2011, 23:40   #1083
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,134
Default

I've tried to calculate some of the overheads more clearly, but some of my numbers may not be correct. Regardless, the use of just the arrays as a floor value is the best-case for AMD's funny math. Any elaboration makes the margin left in 1.2B worse.

For cache tags, I am assuming the following: 6T SRAM, 2^23 for L3 with 32-way associativity.
With 64-byte lines, that leaves 2^17 cache lines and cache tags.
For cache tags, I am assuming a 48-bit address space.
2^17 / 2^5 = 2^12 sets.
Tag length = 48 - 12 - 6 = 30

30 bits in the tag X 2^17 lines x 6 transistors per bit is roughly 23.6M transistors for tags.

For the L2, it's 2^15 lines which leads to 2^11 sets with 16-way associativity, and I'm getting 31 bits in the tag.
2^15 x 30 x 6 = 6.1M per L2.


For ECC, I'm assuming the array would have 6T SRAM for the ECC, but I'm not sure.
If it's implemented with the same scheme as Opteron, that's 2^15 lines with 64 bits = roughly 12.6M transistors per L2.

I'm not sure what the L3 would have for ECC. It would be another 50.3M transistors if the overhead is the same as what I've calculated for the L2.

402.7M for L3 arrays
100.6M for each L2, which is then x4
~809M for L2 + L3
The tags for L2 and L3 add up to another ~50M
The ECC could add up to ~100M more.

It's close to a full billion in L2+L3 cache and associated arrays, leaving 200 Million for everything else.
If all other controllers and IO took 0 transistors, that leaves 50 million for the cores in each module.

I'm thinking there are inconsistencies still in AMD's counts, and that 1.2B is too low.
__________________
Dreaming of a .065 micron etch-a-sketch.

Last edited by 3dilettante; 06-Dec-2011 at 23:45.
3dilettante is offline   Reply With Quote
Old 06-Dec-2011, 23:53   #1084
Lightman
Member
 
Join Date: Jun 2008
Location: Torquay, UK
Posts: 913
Default

On top of that I remember AMD saying they moved to 8T SRAM at 32nm process, at least for some of their cache structures.

That alone would add few more transistors to your math

Last edited by Lightman; 07-Dec-2011 at 00:01.
Lightman is offline   Reply With Quote
Old 07-Dec-2011, 00:06   #1085
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,134
Default

That would affect the L1 caches, the L2 and L3 still use 6T.
The arrays for the L1I and 2xL1D are about 6.3M per module.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 07-Dec-2011, 00:07   #1086
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 2,819
Send a message via Skype™ to fellix
Default

Quote:
Originally Posted by Lightman View Post
On top of that I remember AMD saying they moved to 8T SRAM at 32nm process, at least for some of their cache structures.

That alone would add few more transistors to your math
That was for Llanos' L1 cashes, but Llano is energy efficient architecture, unlike Bulldozer.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 07-Dec-2011, 00:11   #1087
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,134
Default

The L1 is 8T for BD as well.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 07-Dec-2011, 01:35   #1088
denev2004
Member
 
Join Date: Apr 2010
Location: China
Posts: 143
Send a message via MSN to denev2004 Send a message via Skype™ to denev2004
Default

Why don't they let the L2/L3 of BD to be 8T? They said 8T is good for reducing power consumption
__________________
Well I'm not a native English speaker so there might be misuse through my words. I just hope it won't cause too much misunderstanding.
denev2004 is offline   Reply With Quote
Old 07-Dec-2011, 04:02   #1089
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

Area.

L1 is 16K/core. L2+L3 is 16M overall.
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline   Reply With Quote
Old 07-Dec-2011, 05:07   #1090
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 2,438
Default

Quote:
Originally Posted by rpg.314 View Post
Area.

L1 is 16K/core. L2+L3 is 16M overall.
Isn't L1I also 8T? That would make it 96K/module of 8T cache (of course that's still tiny compared to L2/L3).
mczak is offline   Reply With Quote
Old 07-Dec-2011, 05:36   #1091
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

Quote:
Isn't L1I also 8T?
I don't know.
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline   Reply With Quote
Old 07-Dec-2011, 12:48   #1092
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,863
Send a message via Skype™ to Jawed
Default

This article:

http://semiaccurate.com/2010/02/10/a...nm-llano-core/

says that L1 in Llano is a new architecture which is also used in BD.

4 modules at 211M plus 477M for L3 seems to make 1321M and looking at a die picture the stuff in the centre looks about the same size as a non-L2 portion of a module, i.e. around 100M, for a total of ~1.4 billion transistors.

Lower-overhead ECC would save around 50M transistors, say, so not making much of a dent in the excess.
__________________
Can it play WoW?
Jawed is offline   Reply With Quote
Old 07-Dec-2011, 21:36   #1093
Lightman
Member
 
Join Date: Jun 2008
Location: Torquay, UK
Posts: 913
Default

So we have:
- 2BT previously claimed by AMD themself
- 1.4BT calculated from info given by AMD documents presented at ISCC and some very good guesses
- 1.2BT new figure given by AMD


Lightman is offline   Reply With Quote
Old 16-Dec-2011, 15:35   #1094
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 2,819
Send a message via Skype™ to fellix
Default

AMD 'Bulldozer' gets an Update from Microsoft
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 16-Dec-2011, 15:43   #1095
Gubbi
Senior Member
 
Join Date: Feb 2002
Posts: 2,569
Default

Quote:
Originally Posted by fellix View Post
Quote:
....but this confirms Windows 7 was in fact hampering “Bulldozer” from performing at 100% in all prior benches
Right, so AMD making a CPU that requires page coloring to perform decent is Microsoft's fault.

Anyway, good news.

Cheers
__________________
I'm pink, therefore I'm spam
Gubbi is offline   Reply With Quote
Old 16-Dec-2011, 15:47   #1096
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 2,819
Send a message via Skype™ to fellix
Default

__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 16-Dec-2011, 23:32   #1097
itsmydamnation
Member
 
Join Date: Apr 2007
Location: Australia
Posts: 646
Default

for some reason you remind me of this......




which then leads me to

itsmydamnation is offline   Reply With Quote
Old 17-Dec-2011, 05:56   #1098
Albuquerque
Red-headed step child
 
Join Date: Jun 2004
Location: Guess ;)
Posts: 3,084
Default

Quote:
Originally Posted by fellix View Post
And it's been recalled. I can already hear the AMD apologists / standard MS haters crying how "M$" is broken and obviously wants to crater Christmas sales and how there's like 40% performance bumps just waiting to happen except that MS is inept and will never let it be that good and blah blah.

Followed by tons of blog and forum posts of "OMG my benchmarks went up 10% but it's SOOO MUCH SMOOOOTHER that you can't just measure how awesome it now is..."

__________________
"...twisting my words"
Quote:
Originally Posted by _xxx_ 1/25 View Post
Get some supplies <...> Within the next couple of months, you'll need it.
Quote:
Originally Posted by _xxx_ 6/9 View Post
And riots are about to begin too.
Quote:
Originally Posted by _xxx_8/5 View Post
food shortages and huge price jumps I predicted recently are becoming very real now.
Quote:
Originally Posted by _xxx_ View Post
If it turns out I was wrong, I'll admit being stupid
Albuquerque is online now   Reply With Quote
Old 24-Dec-2011, 13:23   #1099
fehu
Member
 
Join Date: Nov 2006
Location: Somewhere over the ocean
Posts: 634
Default

http://www.xbitlabs.com/news/cpu/dis...rocessors.html

Merry christmans and confusing new year
fehu is offline   Reply With Quote
Old 24-Dec-2011, 13:33   #1100
I.S.T.
Senior Member
 
Join Date: Feb 2004
Posts: 2,447
Default

Wow, X6s would be better than those things...
I.S.T. is online now   Reply With Quote

Reply

Tags
amd, blewdozer, oh well, patents

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 02:45.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.