Why is AMD losing the next gen race to Nvidia?

I'm confused, thought you were talking about Fermi, gtx 580.

If you were talking about the r290, that card had supply issues for a couple of quarters. And wasn't it also relatively late compared to the gtx 780?
I talked about both , times in the past where AMD was faster and cheaper . I could post more examples if needed
 
gtx 580 was faster by far and that is why AMD's top end card was cheaper. Also the r290 came 2 Q's after the gtx 780 and one more month added if you are talking about the Titan.
 
Here's the revisionist history part: AMD was targetting the GTX570 for performance target of the 6970, rightfully so as the 6970 and GTX570 did a lot of trading blows at the time.
That'll depend on how you define as "being ahead". GF110 was significantly larger than Cayman (over 30% larger), so frankly the GTX 580 not being ahead would have been a disaster. None of the fermi chips had any advantage in perf/w (with the gf10x chips being quite bad there) and not in perf/area neither over similar Evergreen/NI chips (though they had obvious advantages in some compute tasks obviously).
That said, in retrospect fermi was an excellent foundation for future chips (which Northern Islands was not) even though parts of it looked like overkill back then (and they probably were, future chips did indeed scale back the fully distributed nature of things compared to alu resources but the principle stayed the same).
But using these metrics nvidia clearly has an advantage since Maxwell.
 
As a customer, I couldn't care less about die sizes and transistor counts. What I *do* care about is the impact on my wallet versus the impact on my performance.

Having owned AMD from 9500np -. 9800np -> x800pro -> x1850 something-or-other -> NVIDIA 7950Gt -> 3850 Crossfire -> 5850 Crossfire -> 7970 GHz Ed, and now back to the NV camp with my 980Ti I can tell you I prefer the NV models. If all else is equal, NV is likely to get my money at this point. The unfortunate reality is, in the last two or three generations, AMD hasn't been able to compete for my business. I made my 7970 last for quite a while, the 980Ti was a welcome relief.
 
gtx 580 was faster by far and that is why AMD's top end card was cheaper. Also the r290 came 2 Q's after the gtx 780 and one more month added if you are talking about the Titan.
not from the benchs I've look at and certainly not over time
 
A huge majority of the AA/AAA developers have to work with GCN on consoles and that's a pretty aggressive approach.
Not relevant and barely amounts to anything tangible. Different platform, different limitations (API, OS, memory heirarchy, CPU).

Even more with the XBone now using straight up DX12. And now pretty much all Microsoft 1st-party games coming for windows are showing pretty significant advantages on AMD cards.
Actually statistics show otherwise, Gears Of War was a mess on AMD hardware at first now it's a tie, Forza too, though both heavily favor NV at the high end. Tomb Raider favored NV too, Quantum Break favored AMD.

Regardless, this point is moot. AMD is now facing Pascal, that imaginary "significant advantage" is now vaporware, only people hung on that are old time, big time fans living in the past, until AMD have a proper answer they don't stand a chance.

It's exactly this kind of thinking that will devalue AMD if they choose to be stuck in it for long, the premise that a few handful games will sway the public mind into their products despite lackluster hardware and software. AMD has fallen on both fronts, especially the software side, there are more games that lag on AMD hardware than AMD would like to think (Project Cars, No Man Sky, Fallout 4, Rise of Tomb Raider, Arkham Knight, AC Unity, Just Cause 3, Dying Light, XCOM 2 ..etc). Their old API support is lagging which causes them constant headaches with performance on things like: Warhammer (DX11) and Doom (OGL), the new API versions comes way later after everything has been said and done, and after the damage has been made. Even their VR performance is subpar, in contrary to their claim of "VR for the masses". This directly influences the minds of the gamers into believing AMD is a second rate company.

AMD will get the confidence of the market when it competes on a single front, either hardware or software, but lagging on both? Nope, not even a pity purchase. That's evolution and survival of the fittest.

The only time you don't see the fruits of AMD's design wins on both consoles is when a third-party team makes the port using gameworks, a work that usually ends up being panned by critics and gamers alike (e.g. Arkham Knight).
An absurd generalization that couldn't be farther from the truth, even console games have had bad performing games on CONSOLES, games on PC from both sides deliver lackluster performance occasionally for reasons outside of the few vendor specific effects.

Either one is to agree with them or not, AMD was very successful in turning gameworks into a dirty word.
Dirty or not, it sells, and the number grows every year to the point that a great deal of AAA titles have GameWorks in some form or another. Almost all famous titles/series are sponsored by NV. And those are massive sellers. The only exception to that is Square-Enix studios, and not even all of them.
 
sry all for continually going off topic :)

I'm only going to post one thing about this, since yeah it is off topic,
If a data cloud server setup costs 10 mill, how much of that will be the cost of CPU's, I'm not surprised at the cost of a server, I stated the cost of the CPU is small portion of the rest what it entails. You pretty much stated what I stated so don't know what the problem is there.

There is no point comparing datacentre to server cost, Datacentres have ROI periods of upto decades , servers/storage have 3 years, networking and storage networking normally around 5 years sometimes upto 10. But if we look at a standard 2P server using iSCSI or FC for storage. The best value for money memory is 16gb rdimms, on a 2P box with intel that's 384gb. to go to 32gb rldimm's(768) is about 20% more per gb then 16gb, to go to 64gb rldimm(1536) is about 60% more per gb vs 16gb. Now if you are using 22 core E5's memory makes up 30%/50%/70% of the platforms cost. But dropping to 16-18 core CPU cuts CPU cost in half, this where most bulk VM farms sit in terms of CPU spec. With this spec your 384gb is now taking closer to 50% platform cost. This is what a standard VM farm server looks like. On average these servers are memory throughput and memory capacity limited.

Now with Zen, You could hit something like 18-24core a socket CPU with up 512gb of memory with 16gb dimms because you are generally memory capacity/performance limited. So assuming 18cores a proc you end up with Xeon giving you 5gb of memory a thread vs 7gb of memory a thread for Zen. assuming both run DDR4 2400 its also around 25-30% more memory throughput per thread. Where this really starts to play out is with over subscription/ballooning policies, You can over subscribe CPU massively as you are always memory or io limited, the second your memory limited or your hypervisor pages to disk its all over entire server is in the toilet.


Yeah and as I stated if Zen can't match up core for core against Xeon (which ever one is out there, Broadwell for now), forget the server, HPC markets.
This is flat out wrong, go look how a standard virtual server is sold, go look and what the average usage profile of a common enterprise application looks like (end to end dev,QA,pre-prod,prod). If you think a 10% perf per clock matters to all but niche area's you just don't understand what matters. Repeat after me, VM density is what drives the vast majority of server sales. Think about that, comprehend that.


You just linked to what I have been saying, they are critical of AMD's core performance, if they see Intel still having a 10% lead, which in fact isn't really 10%, cause the next version of Xeon's are coming soon, (which have 28 cores). Will be more than 10%.

Your own cognitive bias is getting in the way of basic interpretation skills.

1. The ball park spilt ball figure based off one data point as being 10% behind was 8core Zen to 8 core broadwell.
2. The next Xeon's aren't coming soon they are already out, 28core is the SOC size, not what you can buy
3. because of the small (~200mm sized zeppelins) on MCM they can mix and match different yielding chips and actually sell full sized SKU's
4 Next go look at the costs to buy the high end SKU, your just showing your ignorance, those 28core chips are going into boxes that have software with high per socket licencing costs Not the servers that make up 90% of server sales.
5. Your ignoring at a minimum the SOC level features not yet in broadwell, you know the things the big guys ask for in Xeon-D that are in Zen.....
6. Consider the price that P10 is sold at now consider that a Zeppelin die is likely smaller on the same node,
7. Servers don't need the high frequency scaling the need low power in the 2ghz to low 3ghz range, this largely mutes the DERP DERP mobile process issue because that's exactly where mobile processors sit.
8. You completely ignore the comments about the 32core product looking really good.... shock horror surprise!

And no I don't take what AMD showed off thus far as fact, what we know about the CPU it "should" perform close to SB, Haswell, but we don't know that yet, remember bulldozer, they told us a lot about it and on paper it looked good.....reality was quite different.

Yes and then lets look at jaguar, but that doesn't agree with your position so lets ignore that data point......

So you bring up bulldozer so lets compare bulldozer to Zen .

Bulldozer:
From the very first benchmark bulldozer looked bad
Bulldozer has issues with L1i associativity
Bulldozer had much smaller PRF/LS queue/Retirement queue per core then sandy bridge
Bulldozer had the horrible L1D/L2/WCC latency thing going on
Bulldozer was so ALU bottlenecked they moved some ALU instructions to the AGU
Bulldozer had long load to use latency for FPU and slow FPU execution for non FMA (SSE etc)
Bulldozer had lots of functions on few execution ports (branch and mul on same port)
Bulldozer * lots of other stuff see agnar, http://blog.stuffedcow.net/2014/01/x86-memory-disambiguation/ etc

Zen:
First benchmarks ( we only have geekbench4 on the 2P 32 core part which is 10% behind in ST clock for clock to E5 22 core part and the blender example) look good
First hand accounts from people known to work for OEM's saying that both ends of the Zeppelin spectrum look good ( 8 core and 32 core)
Zen has better L1i associativity then bulldozer comparable to Intel
Zen has broadwell to skylake sized internal data structures
Zen has a far more traditional cache structure and no Write Though L1D
Zen has as much integer execution resources as >=Haswell
Zen has multiple of all integer ops ports and can do two branches in a cycle
Zen has reduced load to use latency for FPU and lower execution latency for the FPU

The issues Bulldozer had where all around its target design and ultimately the target design was wrong, don't confuse target design with capability of individual components.

I have no idea where Zen performance will actually fall, but right now with all that we know from linkdin, compiler patch notes, AMD presentations, CERN leak etc all point towards a competitive micro architecture and a competitive at least in 1P platform configuration, I really hope for AMD 2P isn't an issue for NUMA aware independent workload stuff ( you know VM's all hyper visors are NUMA aware) given that we have been told there are 2P issues.

You have done nothing but hand waving and DERP DERP AMD.
So put your money where your mouth is what are the issues with Zen, Why would 10% perf per core matter ( be specific).
 
I talked about both , times in the past where AMD was faster and cheaper . I could post more examples if needed
The GTX 580 competed with the HD6970 for almost their entire lifecycle. When AMD GCN was released, Kepler soon followed.
The GTX 580 is still supported today by Nvidia, while the HD6970 has been abandoned last year by AMD.
 
sry all for continually going off topic :)
Yeah AMD did say even back in 2015 their goal is the datacenter market (which fits the design scale of Zen and we see Intel caring more about this than desktop PC as well these days), looking back found this quote from Lisa Su "Datacenter is probably the single biggest bet that we are making as a company......We have not been competitive the last few years, we will be competitive in the datacenter market."

The one disappointment for me though, the product is not going to be around until 2017 (staggered release and critical when they can launch the higher core models) and IMO it is needed now not when Intel will be responding with their own next generation of CPUs.
I wonder if the timeline could had been accelerated if Jim Keller had remained (I appreciate there will be two train of thoughts on that subject with one being it would make no difference after the initial design), maybe we will hear more of the background after Zen launches.

Cheers
 
Not relevant and barely amounts to anything tangible. Different platform, different limitations (API, OS, memory heirarchy, CPU).

I would rather think it is relevant indeed. Of course, the large-and-by game code is optimized for a different kind of system, but the shader-code itself is made to run as efficiently as possible on GCN-style hardware. I think that amounts to a healthy dose of out-of-the-box optimization and many man-hours of driver-fixing that AMD has NOT to spend here.
 
sry all for continually going off topic :)



There is no point comparing datacentre to server cost, Datacentres have ROI periods of upto decades , servers/storage have 3 years, networking and storage networking normally around 5 years sometimes upto 10. But if we look at a standard 2P server using iSCSI or FC for storage. The best value for money memory is 16gb rdimms, on a 2P box with intel that's 384gb. to go to 32gb rldimm's(768) is about 20% more per gb then 16gb, to go to 64gb rldimm(1536) is about 60% more per gb vs 16gb. Now if you are using 22 core E5's memory makes up 30%/50%/70% of the platforms cost. But dropping to 16-18 core CPU cuts CPU cost in half, this where most bulk VM farms sit in terms of CPU spec. With this spec your 384gb is now taking closer to 50% platform cost. This is what a standard VM farm server looks like. On average these servers are memory throughput and memory capacity limited.

Now with Zen, You could hit something like 18-24core a socket CPU with up 512gb of memory with 16gb dimms because you are generally memory capacity/performance limited. So assuming 18cores a proc you end up with Xeon giving you 5gb of memory a thread vs 7gb of memory a thread for Zen. assuming both run DDR4 2400 its also around 25-30% more memory throughput per thread. Where this really starts to play out is with over subscription/ballooning policies, You can over subscribe CPU massively as you are always memory or io limited, the second your memory limited or your hypervisor pages to disk its all over entire server is in the toilet.



This is flat out wrong, go look how a standard virtual server is sold, go look and what the average usage profile of a common enterprise application looks like (end to end dev,QA,pre-prod,prod). If you think a 10% perf per clock matters to all but niche area's you just don't understand what matters. Repeat after me, VM density is what drives the vast majority of server sales. Think about that, comprehend that.




Your own cognitive bias is getting in the way of basic interpretation skills.

1. The ball park spilt ball figure based off one data point as being 10% behind was 8core Zen to 8 core broadwell.
2. The next Xeon's aren't coming soon they are already out, 28core is the SOC size, not what you can buy
3. because of the small (~200mm sized zeppelins) on MCM they can mix and match different yielding chips and actually sell full sized SKU's
4 Next go look at the costs to buy the high end SKU, your just showing your ignorance, those 28core chips are going into boxes that have software with high per socket licencing costs Not the servers that make up 90% of server sales.
5. Your ignoring at a minimum the SOC level features not yet in broadwell, you know the things the big guys ask for in Xeon-D that are in Zen.....
6. Consider the price that P10 is sold at now consider that a Zeppelin die is likely smaller on the same node,
7. Servers don't need the high frequency scaling the need low power in the 2ghz to low 3ghz range, this largely mutes the DERP DERP mobile process issue because that's exactly where mobile processors sit.
8. You completely ignore the comments about the 32core product looking really good.... shock horror surprise!



Yes and then lets look at jaguar, but that doesn't agree with your position so lets ignore that data point......

So you bring up bulldozer so lets compare bulldozer to Zen .

Bulldozer:
From the very first benchmark bulldozer looked bad
Bulldozer has issues with L1i associativity
Bulldozer had much smaller PRF/LS queue/Retirement queue per core then sandy bridge
Bulldozer had the horrible L1D/L2/WCC latency thing going on
Bulldozer was so ALU bottlenecked they moved some ALU instructions to the AGU
Bulldozer had long load to use latency for FPU and slow FPU execution for non FMA (SSE etc)
Bulldozer had lots of functions on few execution ports (branch and mul on same port)
Bulldozer * lots of other stuff see agnar, http://blog.stuffedcow.net/2014/01/x86-memory-disambiguation/ etc

Zen:
First benchmarks ( we only have geekbench4 on the 2P 32 core part which is 10% behind in ST clock for clock to E5 22 core part and the blender example) look good
First hand accounts from people known to work for OEM's saying that both ends of the Zeppelin spectrum look good ( 8 core and 32 core)
Zen has better L1i associativity then bulldozer comparable to Intel
Zen has broadwell to skylake sized internal data structures
Zen has a far more traditional cache structure and no Write Though L1D
Zen has as much integer execution resources as >=Haswell
Zen has multiple of all integer ops ports and can do two branches in a cycle
Zen has reduced load to use latency for FPU and lower execution latency for the FPU

The issues Bulldozer had where all around its target design and ultimately the target design was wrong, don't confuse target design with capability of individual components.

I have no idea where Zen performance will actually fall, but right now with all that we know from linkdin, compiler patch notes, AMD presentations, CERN leak etc all point towards a competitive micro architecture and a competitive at least in 1P platform configuration, I really hope for AMD 2P isn't an issue for NUMA aware independent workload stuff ( you know VM's all hyper visors are NUMA aware) given that we have been told there are 2P issues.

You have done nothing but hand waving and DERP DERP AMD.
So put your money where your mouth is what are the issues with Zen, Why would 10% perf per core matter ( be specific).

You yourself stated it, you don't know where Zen performance will fall, it will be better to Bulldozer, but you don't know where. You want to see the counter argument to that? all you have to do is read what I posted. You took the positive, I took the negative.

If Zen comes out and doesn't match Xeon's core for core performance, no person in their right mind would put Zen CPU's in a server or HPC even if they cost 25% less, because if they do, Intel will try to match the price on performance. So Intel's faster chips will still cost more but there will be equilibrium.

You are saying what Intel does doesn't matter (and no I'm not talking about 10% IPC, I'm talking about 10% performance overall and the difference of Zen to Kaby will likely be even more than that), I think not, they have more pull, more influence, more leverage, better products for now and most likely in the future with Zen too, 99.2% of server infrastructure is based on Intel right now too! you totally dismissed all of this. Just because Zen performance mitigates the gap they have now, doesn't mean Jake in the server market, you know that!

What you are doing in your post, is you took out all the business aspects of the server market. Things don't change on a dime. If you look at the original Opterons, how long did it take them to make inroads? And that was with a sizable performance advantage. I don't understand why you can't understand if Zen doesn't match Xeon's core for core, what would the use be for current Xeon servers to be upgraded to Zen? Power envelopes looks to be similar, All it will come down to is cost, and that cost is meaningless because Intel will price appropriately to what AMD prices.

So what you are left with is people buying new servers. Well guess what Intel is still a better option, with the volume they have been purchasing before, I'm sure they were getting discounts before.....

Things don't change that fast.

All the talk about ARM server chips, AMD gaining back server marketshare, if the performance isn't there to Intel's Xeon's, its all a pipe dream.
 
Last edited:
If Zen comes out and doesn't match Xeon's core for core performance, no person in their right mind would put Zen CPU's in a server or HPC even if they cost 25% less, because if they do, Intel will try to match the price on performance. So Intel's faster chips will still cost more but there will be equilibrium.

By your rationale there is no market for Xeon-D, because you can get SKUs with higher core count and/or higher single thread performance.

If Zen is 20% lower performing per core wrt. Intel's cores, it means a 2.6GHz, 16 core ZEN competes against the E5-2683 v4 (2.1GHz 16 cores) instead of the E5-2697A v4 (2.6GHz 16 cores). The 2683 retails at $1846, the 2697 at $2891. That's still a ton of money.

What you are doing in your post, is you took out all the business aspects of the server market. Things don't change on a dime. If you look at the original Opterons, how long did it take them to make inroads?

In the *nix market, six months. Our hosting partner started deploying dual socket 1U IBM boxes with 2.2GHz Opterons late summer, 2004 with 64bit BSD. They were more than twice as fast per core as the Pentium 4 base Xeons they replaced, - and they had more cores.

Cheers
 
By your rationale there is no market for Xeon-D, because you can get SKUs with higher core count and/or higher single thread performance.

If Zen is 20% lower performing per core wrt. Intel's cores, it means a 2.6GHz, 16 core ZEN competes against the E5-2683 v4 (2.1GHz 16 cores) instead of the E5-2697A v4 (2.6GHz 16 cores). The 2683 retails at $1846, the 2697 at $2891. That's still a ton of money.

yeah but it all evens out, if Zen pushes Intel, in the performance per price, Intel, can match AMD's tactics, and if they want to even under cut AMD, balance of scales is not on AMD side when it comes to a price war. AMD is not in the same position it was in when they had k7- k8

In the *nix market, six months. Our hosting partner started deploying dual socket 1U IBM boxes with 2.2GHz Opterons late summer, 2004 with 64bit BSD. They were more than twice as fast per core as the Pentium 4 base Xeons they replaced, - and they had more cores.

Cheers

It took close to a year for Opteron to make a dent into Intel's server dominance in the early 2000's.

And that is also with a performance difference of close to 100%? with a price difference of about the same too.

If performance is there, yeah people will switch over, that is a given, because the initial investment of the cost of the CPU, is negligible for the over all cost of operations and infrastructure. But if it isn't there, or its just under Intel, where AMD isn't able to go into a price war (they just can't right now), well you are stuck at the same place. Intel has everything in thier favor, AMD is starting from stretch (in the server market).
 
Last edited:
By your rationale there is no market for Xeon-D, because you can get SKUs with higher core count and/or higher single thread performance.
This is a strawman; his logic does not extend in this way unless you're purposefully forcing the subject.

If Zen is 20% lower performing per core wrt. Intel's cores, it means a 2.6GHz, 16 core ZEN competes against the E5-2683 v4 (2.1GHz 16 cores) instead of the E5-2697A v4 (2.6GHz 16 cores). The 2683 retails at $1846, the 2697 at $2891. That's still a ton of money.
I'm the Enterprise Architect for Infrastructure [compute, storage, network, virtualization and cloud] for a US-based Fortune 250 organization, and "Retail" price means exactly jack. I can easily explain to you why processor cost is the least of my concerns, let's use our newest eCommerce virual environment to explain:

Our entirely new hardware stack for the "public" version of our eCommerce platform consists of:
Two HP C7000 Chassis, each with:
  • Qty 4: BLc FlexFabric 20/40 F8 Module
  • Qty 6: 2650W Platinum PSUs
  • Qty 1: Six "Active Cool" 200 FIO Fan Brackets
  • Qty 1: BLc Intelligent Power FIO
  • Qty 1: 3" LCD Enclosure
  • Three years of prepaid support: 24/7 with 4hr MTTR
In each C7000 Chas sis, there are ten BL460g9 blades, each with:
  • Qty 2: Intel E5-2698v3
  • Qty 16: 32GB 2Rx4 DDR4-2133 DIMMS
  • Qty 2: FlexFabric 20Gb 650FLB Adapter
  • Qty 1: SmartArray P244br/1G Controller
You know the most expensive part of these twenty blades and two chassis? The 10TB of memory, that's what. Want to guess the second most expensive part of these twenty blades and two chassis? All the operating system licensing (VMware, RHEL, Windows.) The line-item cost of those processors was significantly less than half of the MSRP printed on the Intel ARK site, and if their cost was ZERO, the total affect on the price of the new cluster is still less than a 5% difference. I actually saved money by purchasing fewer blades and mitigating with higher cores-per-socket XEONs.

There will be people who care about individual 1P socket/core prices. For those of us who do high-density virtualization with carrier-class equipment, your price mninutae means nothing. When I'm stacking VMs, I will go as dense and as fast as I can reasonably buy. The only reason I didn't buy the 18's is because they aren't available in the half-height blade form factor I wanted.
 
Yep pretty much what my friend said who is the IT manager server side, for all of the northeast for Cargill, not as descriptive, but he won't switch off of Intel cpu's unless AMD can offer much better performance then they are currently seeing.
 
Yep pretty much what my friend said who is the IT manager server side, for all of the northeast for Cargill, not as descriptive, but he won't switch off of Intel cpu's unless AMD can offer much better performance then they are currently seeing.
I know that's the case for the company I work for. We had just started selling AMD stuff back when Core2 came out and Intel got really good again. I was one of the ones telling them to go AMD. Man Hector bitchslapped me on that one, and the mediocrity hasn't stopped to this day. Those same people are still doing purchasing so the odds of them ever going AMD again are very low.

As for GPUs, we don't sell many but it seems the people in purchasing are only vaguely aware that FirePro even exists. We've only ever sold NVIDIA GPUs (mostly Quadro). I wonder if that's true for other business focused OEMs?
 
From the "No DX12 Software is Suitable for Benchmarking" thread, I thought that these results were interesting.

http://www.guru3d.com/articles_page..._graphics_performance_benchmark_review,8.html

The (basically 3 year old) 390X continues to best the 480, and in DX12 it's by up to 20%. The ancient 390X has practically the same flop count as the 480 but continues to outperform it.

I understand that the 480 is supposed to be the successor to the 380, even if it's currently priced by retailers closer to the 8GB 390. However, shouldn't actual performance per flop be increasing in newer architectures rather than dropping?

And is this drop in perf/flop likely down to reduced ROPs and memory channels?
 
Back
Top