If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1076 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,134
|
Pretty much wherever AMD's revised count is showing up, it's being disputed by people who can perform arithmetic.
If AMD is using a schematic-level count for BD at 1.2B as opposed to the physical count, perhaps it is not comparable to the count given for each module, which may have been using a physical count. That could open up a little leeway in the totals per die, but since 100M of each module must be just L2 cache cells, it's not leaving much room for the logic and everything else on the die that's not cache.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#1077 |
|
Regular
|
How many bits of error detection and correction are there per cache line? How many bits in a cache line?
__________________
Can it play WoW? |
|
|
|
|
|
#1078 |
|
Senior Member
|
For what I know, up to K10, AMD used to have ECC for the L1D in 8:1 ratio, i.e. eight ECC bits for every 64 bits of protected data, using 64-bit Hamming SED/DED method. The ECC bits were organized in separate banks along the main L1D array. For the lower cache levels I don't have any reliable information about protection implementations.
p.s.: In Bulldozer, AMD removed the ECC protection for the L1D caches due to the inclusive relation to the L2, so now any error in the L1D will trigger data reload form the L2, which is ECC protected.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic. Microsoft: Russia -- Big and bloated. Linux: EU -- Diverse and broke. |
|
|
|
|
|
#1079 |
|
Member
Join Date: May 2007
Location: Sweden
Posts: 300
|
Does anyone feel like humoring a layman/outside observer?
I wonder if there is any way for performance to improve over time through "easily" implemented code optimization such as compilers and/or (and I guess) libraries for the Bulldozer uArch. Could (really out of my depth here) microcode be updated if that has any meaningful impact on performance? The Anandtech review mentions that Windows 8 ought to have a better scheduler that takes the modular CPU architecture into account which ought to improve performance somewhat. That's what made me think about it as it sort of suggested that some problems could stem from how the CPU is seen, and thus used, by software. No doubt there are serious flaws in design that will have to be rectified, I just wonder how much of the performance penalty stems from the architecture directly and how much is due to simple novelty. |
|
|
|
|
|
#1080 |
|
Senior Member
|
It is possible to get some performance boost after OS manages to distribute threads equally over modules but it only helps as long as you don't load all the cores and even then the benefit is often tiny.
Biggest problem seems to be godawful cache architecture and only thing fixing it is redesigned chip, not going to happen for at least a couple of years. |
|
|
|
|
|
#1081 | |
|
Senior Member
|
Quote:
|
|
|
|
|
|
|
#1082 |
|
Regular
|
I was hoping that one of those people who could do arithmetic would have provided a solid answer by now.
__________________
Can it play WoW? |
|
|
|
|
|
#1083 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,134
|
I've tried to calculate some of the overheads more clearly, but some of my numbers may not be correct. Regardless, the use of just the arrays as a floor value is the best-case for AMD's funny math. Any elaboration makes the margin left in 1.2B worse.
For cache tags, I am assuming the following: 6T SRAM, 2^23 for L3 with 32-way associativity. With 64-byte lines, that leaves 2^17 cache lines and cache tags. For cache tags, I am assuming a 48-bit address space. 2^17 / 2^5 = 2^12 sets. Tag length = 48 - 12 - 6 = 30 30 bits in the tag X 2^17 lines x 6 transistors per bit is roughly 23.6M transistors for tags. For the L2, it's 2^15 lines which leads to 2^11 sets with 16-way associativity, and I'm getting 31 bits in the tag. 2^15 x 30 x 6 = 6.1M per L2. For ECC, I'm assuming the array would have 6T SRAM for the ECC, but I'm not sure. If it's implemented with the same scheme as Opteron, that's 2^15 lines with 64 bits = roughly 12.6M transistors per L2. I'm not sure what the L3 would have for ECC. It would be another 50.3M transistors if the overhead is the same as what I've calculated for the L2. 402.7M for L3 arrays 100.6M for each L2, which is then x4 ~809M for L2 + L3 The tags for L2 and L3 add up to another ~50M The ECC could add up to ~100M more. It's close to a full billion in L2+L3 cache and associated arrays, leaving 200 Million for everything else. If all other controllers and IO took 0 transistors, that leaves 50 million for the cores in each module. I'm thinking there are inconsistencies still in AMD's counts, and that 1.2B is too low.
__________________
Dreaming of a .065 micron etch-a-sketch. Last edited by 3dilettante; 06-Dec-2011 at 23:45. |
|
|
|
|
|
#1084 |
|
Member
Join Date: Jun 2008
Location: Torquay, UK
Posts: 913
|
On top of that I remember AMD saying they moved to 8T SRAM at 32nm process, at least for some of their cache structures.
That alone would add few more transistors to your math Last edited by Lightman; 07-Dec-2011 at 00:01. |
|
|
|
|
|
#1085 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,134
|
That would affect the L1 caches, the L2 and L3 still use 6T.
The arrays for the L1I and 2xL1D are about 6.3M per module.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#1086 |
|
Senior Member
|
That was for Llanos' L1 cashes, but Llano is energy efficient architecture, unlike Bulldozer.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic. Microsoft: Russia -- Big and bloated. Linux: EU -- Diverse and broke. |
|
|
|
|
|
#1087 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,134
|
The L1 is 8T for BD as well.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#1088 |
|
Member
|
Why don't they let the L2/L3 of BD to be 8T? They said 8T is good for reducing power consumption
__________________
Well I'm not a native English speaker so there might be misuse through my words. I just hope it won't cause too much misunderstanding. |
|
|
|
|
|
#1089 |
|
Senior Member
|
Area.
L1 is 16K/core. L2+L3 is 16M overall. |
|
|
|
|
|
#1090 |
|
Senior Member
Join Date: Oct 2002
Posts: 2,438
|
|
|
|
|
|
|
#1091 | |
|
Senior Member
|
Quote:
|
|
|
|
|
|
|
#1092 |
|
Regular
|
This article:
http://semiaccurate.com/2010/02/10/a...nm-llano-core/ says that L1 in Llano is a new architecture which is also used in BD. 4 modules at 211M plus 477M for L3 seems to make 1321M and looking at a die picture the stuff in the centre looks about the same size as a non-L2 portion of a module, i.e. around 100M, for a total of ~1.4 billion transistors. Lower-overhead ECC would save around 50M transistors, say, so not making much of a dent in the excess.
__________________
Can it play WoW? |
|
|
|
|
|
#1093 |
|
Member
Join Date: Jun 2008
Location: Torquay, UK
Posts: 913
|
So we have:
- 2BT previously claimed by AMD themself - 1.4BT calculated from info given by AMD documents presented at ISCC and some very good guesses - 1.2BT new figure given by AMD |
|
|
|
|
|
#1094 |
|
Senior Member
|
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic. Microsoft: Russia -- Big and bloated. Linux: EU -- Diverse and broke. |
|
|
|
|
|
#1095 | ||
|
Senior Member
Join Date: Feb 2002
Posts: 2,569
|
Quote:
Quote:
Anyway, good news. Cheers
__________________
I'm pink, therefore I'm spam |
||
|
|
|
|
|
#1096 |
|
Senior Member
|
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic. Microsoft: Russia -- Big and bloated. Linux: EU -- Diverse and broke. |
|
|
|
|
|
#1097 |
|
Member
Join Date: Apr 2007
Location: Australia
Posts: 646
|
for some reason you remind me of this......
which then leads me to |
|
|
|
|
|
#1098 | |
|
Red-headed step child
Join Date: Jun 2004
Location: Guess ;)
Posts: 3,084
|
Quote:
Followed by tons of blog and forum posts of "OMG my benchmarks went up 10% but it's SOOO MUCH SMOOOOTHER that you can't just measure how awesome it now is..."
__________________
"...twisting my words" |
|
|
|
|
|
|
#1099 |
|
Member
Join Date: Nov 2006
Location: Somewhere over the ocean
Posts: 634
|
|
|
|
|
|
|
#1100 |
|
Senior Member
Join Date: Feb 2004
Posts: 2,447
|
Wow, X6s would be better than those things...
|
|
|
|
![]() |
| Tags |
| amd, blewdozer, oh well, patents |
| Thread Tools | |
| Display Modes | |
|
|