Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 16-Oct-2011, 13:18   #976
hoho
Senior Member
 
Join Date: Aug 2007
Location: Estonia
Posts: 1,218
Send a message via MSN to hoho Send a message via Skype™ to hoho
Default

Considering people from AMD working on the kernel patch were talking about 3% improvement I'd take that 40-70% with a truckload of salt.
hoho is offline   Reply With Quote
Old 16-Oct-2011, 13:19   #977
Blazkowicz
Senior Member
 
Join Date: Dec 2004
Location: Toulouse
Posts: 4,133
Default

Quote:
Originally Posted by swaaye View Post
That is some truly horrid performance per clock on games. It's like the K6 days are back.
you are a bit tough, I would say it's a new Phenom I
Blazkowicz is offline   Reply With Quote
Old 16-Oct-2011, 14:38   #978
Rootax
Member
 
Join Date: Jan 2006
Location: France
Posts: 197
Default

Quote:
Originally Posted by Blazkowicz View Post
you are a bit tough, I would say it's a new Phenom I
Phenom I were at least competitive against older AMD generation, no ?
__________________
- I'm french. Sorry if you don't understand what i say -
Rootax is offline   Reply With Quote
Old 16-Oct-2011, 16:18   #979
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 2,816
Send a message via Skype™ to fellix
Default

Quote:
Originally Posted by Rootax View Post
Phenom I were at least competitive against older AMD generation, no ?
Mostly - yes, but there were quite few instances where Phenom slightly lagged behind the top A64 X2 models or even outright took the last place. Later on, the TLB bug patch sliced off a couple of percents on top of this.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 16-Oct-2011, 19:05   #980
swaaye
Entirely Suboptimal
 
Join Date: Mar 2003
Location: WI, USA
Posts: 6,845
Default

Well I thought Phenom I was terrible too. Phenom II was acceptable but still sometimes slower per clock than Kentsfield. Their pricing saved the day obviously.
swaaye is offline   Reply With Quote
Old 16-Oct-2011, 19:41   #981
digitalwanderer
Dangerously Mirthful
 
Join Date: Feb 2002
Location: Winfield, IN USA
Posts: 15,292
Default

Quote:
Originally Posted by fellix View Post
Quote:
The one thing that is for-sure here is that every hardware review website rushed to be the first to publish an AMD FX-8150 review, they all used the same generic benchmarks and NONE did any real world computing. The game is fixed, the big-dog spreads around the most ad-dollars.
Who is the "big-dog" supposed to be? Intel? I think they're too busy laughing to be fixing much of anything right now.
digitalwanderer is offline   Reply With Quote
Old 16-Oct-2011, 19:43   #982
I.S.T.
Senior Member
 
Join Date: Feb 2004
Posts: 2,439
Default

Quote:
Originally Posted by digitalwanderer View Post
Who is the "big-dog" supposed to be? Intel? I think they're too busy laughing to be fixing much of anything right now.
Seriously. That is just some guy trying to stir shit up for hits purposes.
I.S.T. is offline   Reply With Quote
Old 16-Oct-2011, 19:45   #983
digitalwanderer
Dangerously Mirthful
 
Join Date: Feb 2002
Location: Winfield, IN USA
Posts: 15,292
Default

Quote:
Originally Posted by I.S.T. View Post
Seriously. That is just some guy trying to stir shit up for hits purposes.
I know, heck even I noticed it.
digitalwanderer is offline   Reply With Quote
Old 16-Oct-2011, 19:49   #984
I.S.T.
Senior Member
 
Join Date: Feb 2004
Posts: 2,439
Default

Quote:
Originally Posted by digitalwanderer View Post
I know, heck even I noticed it.
Oh, I know. I was just making a general comment with my second line.
I.S.T. is offline   Reply With Quote
Old 17-Oct-2011, 18:35   #985
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 2,816
Send a message via Skype™ to fellix
Default

New B3 revision listed:



Source
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 17-Oct-2011, 20:13   #986
Alexko
Senior Member
 
Join Date: Aug 2009
Posts: 2,016
Send a message via MSN to Alexko
Default

I wonder what difference this will make. Any clues?
__________________
"Well, you mentioned Disneyland, I thought of this porn site, and then bam! A blue Hulk." —The Creature
My (currently dormant) blog: Teχlog
Alexko is offline   Reply With Quote
Old 17-Oct-2011, 20:35   #987
Man from Atlantis
Member
 
Join Date: Jul 2010
Location: Istanbul
Posts: 727
Default

at same vid, 300MHz bump
Man from Atlantis is offline   Reply With Quote
Old 17-Oct-2011, 20:37   #988
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 2,433
Default

Quote:
Originally Posted by Alexko View Post
I wonder what difference this will make. Any clues?
Since the revision guide only mentions B2 step (not older, not newer) and all bugs there are tagged with "No Fix planned" anyway it's hard to tell but I'd guess nothing earth-shattering. Maybe some slightly optimized design here and there to increase possible frequency at the same voltage a bit?
mczak is offline   Reply With Quote
Old 18-Oct-2011, 05:50   #989
Rootax
Member
 
Join Date: Jan 2006
Location: France
Posts: 197
Default

Quote:
Originally Posted by Alexko View Post
I wonder what difference this will make. Any clues?
I tough "20-30% increase" was the way to go with each BD news ?!
__________________
- I'm french. Sorry if you don't understand what i say -
Rootax is offline   Reply With Quote
Old 18-Oct-2011, 06:15   #990
hoom
Senior Member
 
Join Date: Sep 2003
Posts: 2,076
Default

Performance is terribly depressing.

Why such slow L3/northbridge??? Its big but not that big & I'd expected the 32nm to allow faster cache plus expected they'd have tweaked it for better performance with the different core architecture & all the years since they launched Phenom I.

I saw reference to there being something like 900million transistors 'missing' somewhere in the uncore/northbridge. Its a huge number & they don't even have an onboard PCIE controller like Intel has.

Main core clocks are far below my expectation.

I can't understand the poor per-clock performance.
My understanding was that Bobcat cores were performing well per-clock on a similar architecture, which should have meant good things for Bulldozer.

Perhaps they could just stick 8 Bobcat cores on a die
__________________
But it's DOUBLE CONFIRMED
hoom is offline   Reply With Quote
Old 18-Oct-2011, 07:17   #991
ninelven
PM
 
Join Date: Dec 2002
Posts: 1,371
Default

I wonder what 32nm versions of Deneb and Thuban would look like...
ninelven is offline   Reply With Quote
Old 18-Oct-2011, 12:20   #992
GZ007
Member
 
Join Date: Jan 2010
Posts: 416
Default

They could try to make 2 module desktop chip without the L3 and giant uncore. 2 modules with 2 MB L2 cache vertically aligned would make just 2*30.9 mm˛. Thats just 62 mm˛ + IO and memory controller under the L2 cache. Improve cache (probably just leaving out the slow L3 and uncore would help a lot with latency, L1 associativity).

And with 95W TDP they could bump up base clocks to 5 GHz with that tiny die size. Which in turn would increase cache bandwith too and help a lot with single threaded performance.

The fact is 30.9 mm˛ module with 2MB L2 cache looks good, while the 2 bilion transistor serverdozer not.
GZ007 is offline   Reply With Quote
Old 18-Oct-2011, 13:55   #993
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 2,433
Default

Quote:
Originally Posted by GZ007 View Post
The fact is 30.9 mm˛ module with 2MB L2 cache looks good, while the 2 bilion transistor serverdozer not.
It would look ok but not really good. Without any L2 cache a module appears to be slightly larger than a SNB core. Now it's hard to judge performance without considering uncore and even L2, but on a per-area basis it's difficult to imagine it would be more efficient than a SNB core. Still, the size would be manageable (though Llano's Husky core is only half the size again without L2, so there doesn't seem to be that much savings from a CMT module all things considered).

Quote:
Originally Posted by hoom View Post
I saw reference to there being something like 900million transistors 'missing' somewhere in the uncore/northbridge. Its a huge number & they don't even have an onboard PCIE controller like Intel has.
There are not 900 million transistors missing. However, AMD is telling us a module is just 215 million transistors, which would make everything else 1.1 billion transistors if the chip has 2 billion transistors. L3 cache is already ~400 million transistors, which leaves 700 million for HT links, MC, etc. So while not 900 million transistors are missing, that number definitely looks way too large. Maybe the transistors are counted differently for the modules but it still looks like an awful lot.
Also, saying it doesn't even have onboard PCIE is a bit unfair. Even just one HT link will use about the same die area, and this thing has 4 of them, 3 of them unused in desktops. After all Westmere-EP doesn't have PCIE neither.
(Of course there's no IGP neither, and that's a fair chunk of die size and transistors of SNB - but this die would have space for an IGP if you'd leave out the unneeded HT links and could use the unused areas - I'd bet Trinity will make far more efficient use of the available die area for desktop use.)
mczak is offline   Reply With Quote
Old 18-Oct-2011, 14:36   #994
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 2,816
Send a message via Skype™ to fellix
Default

I think AMD uses 8T SRAM cells for all major memory arrays in BD -- they already do for Llano's L1 caches at least. Factoring in the parity/ECC bits, the L3 cache alone should be ~600M transistors and that's without considering the bunch of SRAM tags. There's hardly any transistors "missing" in there.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 18-Oct-2011, 14:58   #995
GZ007
Member
 
Join Date: Jan 2010
Posts: 416
Default

Quote:
Originally Posted by mczak View Post
It would look ok but not really good. Without any L2 cache a module appears to be slightly larger than a SNB core.
Ok, but if they targeted high frequency than it could change things. They pay for waffers, so a single module at 5 GHz (and if the shared cores would reach +50% performance) could reach first class performance/die-area numbers. Even if just a single module would eat up half of the TDP budget on 5 GHz, in the end they could fit more of them in a single waffer.

For AMD same performance on a smaller are would be crucial these days. They sold much bigger chips for less than intel now for several years.
GZ007 is offline   Reply With Quote
Old 18-Oct-2011, 15:16   #996
hoom
Senior Member
 
Join Date: Sep 2003
Posts: 2,076
Default

Quote:
Even just one HT link will use about the same die area, and this thing has 4 of them
Thuban/Istanbul has 4 HT links too & is only 900m transistors with 9MB of cache.

Quote:
I think AMD uses 8T SRAM cells for all major memory arrays in BD
Thuban uses 6T? That would certainly make up some of the gap.
__________________
But it's DOUBLE CONFIRMED
hoom is offline   Reply With Quote
Old 18-Oct-2011, 15:18   #997
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
Default

Quote:
Originally Posted by mczak View Post
There are not 900 million transistors missing. However, AMD is telling us a module is just 215 million transistors, which would make everything else 1.1 billion transistors if the chip has 2 billion transistors. L3 cache is already ~400 million transistors, which leaves 700 million for HT links, MC, etc. So while not 900 million transistors are missing, that number definitely looks way too large.
Is it certain there isn't a problem similar to the Sandy Bridge 995M/1.16B mixup?
Depending on when the gate count is made, the totals can be different.
There was a margin of error of 165M transistors for SB, which is a chip close to 1/2 the transistor count of BD.

330M could be taken off if a proportionate mixup occurred relative to what happened with SB. Given that this is a marketing number, 100M either way could have been rounded in. There goes over half of the supposed disparity.
A less optimized circuit implementation may have an even larger inflation than SB.

Then we have the remainder for the expanded uncore and connectivity features.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is online now   Reply With Quote
Old 18-Oct-2011, 18:18   #998
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 2,433
Default

Quote:
Originally Posted by fellix View Post
I think AMD uses 8T SRAM cells for all major memory arrays in BD -- they already do for Llano's L1 caches at least. Factoring in the parity/ECC bits, the L3 cache alone should be ~600M transistors and that's without considering the bunch of SRAM tags. There's hardly any transistors "missing" in there.
You've got any source for the 8T sram for L2/L3? That's the first I've heard of it (haven't even seen rumors hinting about that). Last time I checked, AMD was using plain-jane 6T sram cells with no particular advantage over intel's one, except they were 30% larger...

Quote:
Originally Posted by GZ007 View Post
Ok, but if they targeted high frequency than it could change things. They pay for waffers, so a single module at 5 GHz (and if the shared cores would reach +50% performance) could reach first class performance/die-area numbers. Even if just a single module would eat up half of the TDP budget on 5 GHz, in the end they could fit more of them in a single waffer.
Oh yes if the design target really is higher than for SNB then the area wouldn't be that big. Though assuming design target for SNB was ~4 Ghz it would really need to be like 5 Ghz for BD to look good. That is possible but I wouldn't take it for granted.

Quote:
Originally Posted by 3dilettante View Post
Is it certain there isn't a problem similar to the Sandy Bridge 995M/1.16B mixup?
Depending on when the gate count is made, the totals can be different.
There was a margin of error of 165M transistors for SB, which is a chip close to 1/2 the transistor count of BD.
That's possible indeed. AMD just said 900 million for Thuban and 2 billion for BD but they could have counted them differently (as well as have counted the BD modules the other way around).
Quote:
Then we have the remainder for the expanded uncore and connectivity features.
Well uncore and connectivity remains largely the same as Thuban (granted I'm sure there's a bit more transistors there - improved HT frequency, larger SRQ etc. won't be quite free but for instance it's still the same number of HT links so apart from the larger cache I just don't see where the big increase would come from). Yet Thuban had 900 million transistors in total whereas if the transistors were counted the same BD would have more than that for uncore alone...

Last edited by mczak; 18-Oct-2011 at 18:35.
mczak is offline   Reply With Quote
Old 19-Oct-2011, 05:14   #999
Mendel
Mr. Upgrade
 
Join Date: Nov 2003
Location: Finland
Posts: 1,335
Default

Bulldozer doesn’t have just a single problem | SemiAccurate
Mendel is offline   Reply With Quote
Old 19-Oct-2011, 15:30   #1000
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
Default

The general themes seem consistent with an AMD that is trying to build an architecture competitive with Intel, but with more severe constraints in resources and process technology. Potentially, the company's organization and leadership are also inferior.

The passages lionizing Dirk Meyer I could do without, especially since BD is an architecture that was very much a part of his tenure at AMD. At best, the article could congratulate Dirk on owning up to a screwup he had a huge hand in bringing about instead of humiliating himself further.

There's some of ranting about how Dirk's honesty about screwing the pooch caused him to be punished by the financial community, as opposed to them rewarding him for failing to compete or something.

I'm not sure about Charlie's understanding of the cache hierarchy of BD. The text becomes increasingly muddled at the end, where he starts having problems distinguishing between the front end and the Icache path and the subdivided data cache path. He does not justify why spliting the L2 cache would massively reduce latency for this design.

I do not think the quality of Charlie's sources at AMD has improved, or it has, and the quality of AMD is what has gone down.

A lot of the article and its sequel is supposition with little in-depth analysis, and honestly I think this thread offers better insight in total, and definitely per word expended, and I do not claim that this thread has any great epiphanies in it.

I wonder how much of that article is compensating for Charlie's hinting about secret improvements that would surprise all the doubters in the leadup to the release. If they surprised anyone, they did so in the wrong direction.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is online now   Reply With Quote

Reply

Tags
amd, blewdozer, oh well, patents

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:13.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.