Why is AMD losing the next gen race to Nvidia?

edit: the interesting thing is in crypto currency, a 1070 has around the same hash rate as a 470 and around the same power consumption ( both are memory throughput limited)................

To me that says it could very well largely entirely ROP related ( more the Dave Kanter theory then the uber compression theory).

I found an article that has the 1070 at double/almost double the hash rate of the 480. Where did you get your benches from.

http://cryptomining-blog.com/tag/rx-480-hashrate/
 
Similar arguments could be made about NVIDIA, they have at least 3 different cutting edge GPU chip designs. Big Pascal (P100), Consumer Pascal (P102+) and Embedded Pascal (PX etc.). Last I looked that have fingers in many CPU pies (2 internal ARM cores, POWER support, x86 support etc.) and there own console projects (Shields) and rumours abound about at least one semi-custom iGPU.

And TBH its a great strategy, they have products for various markets expanding who they can sell to.

AMD is in a similar space perhaps even a little behind NVIDIA in execution (but in fairness AMD are smaller and have an x64 CPU architecture to develop which is a big develop ask). No one will deny that so many products can provide for 'challenging' schedules but that is the business now and why competition is a good thing.

Good points. Not to mention the Automotive side..there are certain dedicated resources for that segment as well. Plus AFAIK Nvidia spends a lot more on Software than AMD does.
Sure, they got a lot of side projects besides the main GPU line. Maybe even more than AMD. However, they can afford it with their record revenue, brutal margins, market position, etc.

Chicken..meet egg.
 
I do not do mining but isn't the issue more to do with Ethereum and needing to use Linux to overcome them?
Under Windows this has low hashrates for Nvidia cards including Pascal.
Cheers
I think I read that in one of the articles i was browsing, but they were waiting for a driver fix from nvidia. But my point is that itsmydamnation seems to imply that a 1070 only matches a 470 in compute power in crypto currency and that the performance might be entirely ROP related.
 
I think I read that in one of the articles i was browsing, but they were waiting for a driver fix from nvidia. But my point is that itsmydamnation seems to imply that a 1070 only matches a 470 in compute power in crypto currency and that the performance might be entirely ROP related.
Thats not what i said....lol
I said ( maybe very poorly) from what i have seen on the latest drivers 1070 and a 470 has around 27GH/s. Both being limited by memory. If you OC the memory of either they both go up. So i should be more specific in saying 470's that have Samsung 8gbps memory modules on them, most people can take them to a clock of 2.2 and end up around 27-30GH/s.

Given that both the 1070 and 470 with equal memory can do equal hash rate, i was using that as a basis to say in gaming workloads (you know the things that use rops...lol) Alot of NV's advantage in terms of power consumption and need for less memory bandwidth could largely be from improved ROP's but i dont think its from compression, i think* it would be from not needing to move as much data around even on chip.

I wasn't aware that 1070 odes better in linux.

* please note im a complete layman :)


edit: person getting 27-28GH/s on a 1070 in windows on very latest drivers
https://forums.anandtech.com/threads/ethereum-gpu-mining.2463816/page-81#post-38424395

more discussion around that post.
 
Last edited:
well compression doing compute tasks, will not be effective at all, so bandwidth limitations if that is the case, there is no real way around that, as you stated, it all depends on the vram modules being used.
 
In our defense, unless you're futzing around with computation for its own sake, scientists have front line competence in some other field. The field where the actual problem is, as opposed to delving into the finer points of GPU architecture. Computation for us is a scientific tool. The more obstacles the tool introduces, and the more attention and energy its usage requires, the worse it is.

Unless of course your area of interest is computation itself. In which case the value of what is actually computed tends to be zilch.
Completely agree, man-hours are the most valuable resource.

But a hammer will sound overrated to you if you got used to putting nails with a screwdriver. :-|
 
well compression doing compute tasks, will not be effective at all, so bandwidth limitations if that is the case, there is no real way around that, as you stated, it all depends on the vram modules being used.
At least in the case of Etherium, I doubt it's the compression which scales so horribly.
But rather that Etherium - by design - prevents coalesced read and write transactions by accessing pseudo random memory locations which don't fit into an onchip cache. The GPU is constantly starved by memory latency, and AFAIK Pascal consumer GPUs still don't reach the same concurrency levels as the GCN equivalents (due to the smaller register files?), so there is nothing to hide the stall.

I'm also suspecting that Maxwell and Pascal are actually rather weak in terms of raw, sustained memory write transactions per second, not just a "miss match" between concurrency and available memory bandwidth. Hidden perfectly well if coalescing works in the L2 cache or in the ROPs, but once you miss that, say goodbye to efficiency.
 
Last edited:
The GPU is constantly starved by memory latency, and AFAIK Pascal consumer GPUs still don't reach the same concurrency levels as the GCN equivalents (due to the smaller register files?), so there is nothing to hide the stall.
Are you sure about that? Sebbi has been saying that register file pressure is one of his complaints about GCN.

Maybe it's different for later GCN versions, but the GCN white paper says 64KB per CU, where Maxwell has 256KB per SM. Now Maxwell has 4 sub-SMs, so that really 64KB per sub-SM. But GCN has a warp that is 64 wide, while Maxwell has 32. So you'd think that Maxwell can have more warps in flight.

Or am I looking at this the wrong way?
 
Are you sure about that? Sebbi has been saying that register file pressure is one of his complaints about GCN.

Maybe it's different for later GCN versions, but the GCN white paper says 64KB per CU, where Maxwell has 256KB per SM. Now Maxwell has 4 sub-SMs, so that really 64KB per sub-SM. But GCN has a warp that is 64 wide, while Maxwell has 32. So you'd think that Maxwell can have more warps in flight.

Or am I looking at this the wrong way?
GCN has 64KB of registers per SIMD, so 256KB per CU:
4 SIMDs per CU * 64 threads per SIMD * 256 registers per thread per SIMD * 4 bytes per register = 256 KB per CU.
 
At least in the case of Etherium, I doubt it's the compression which scales so horribly.
But rather that Etherium - by design - prevents coalesced read and write transactions by accessing pseudo random memory locations which don't fit into an onchip cache. The GPU is constantly starved by memory latency, and AFAIK Pascal consumer GPUs still don't reach the same concurrency levels as the GCN equivalents (due to the smaller register files?), so there is nothing to hide the stall.

I'm also suspecting that Maxwell and Pascal are actually rather weak in terms of raw, sustained memory write transactions per second, not just a "miss match" between concurrency and available memory bandwidth. Hidden perfectly well if coalescing works in the L2 cache or in the ROPs, but once you miss that, say goodbye to efficiency.
I read in some of the other forums figures showing that under Windows the hashrate was between 24 to 27GH/s, but using Linux the figure increases to the mid 40s and this is why if mining with Etherium it is recommended to use Linux on Nvidia cards.
The example you give should also affect Linux builds?

Cheers
 
Just to show how well Pascal 1070 does with other mining algorithms (not Ethereum) compared to AMD 480, this crypto mining site benchmarked both and the 1070 manages double the performance, which surprised them.
http://www.cryptocoinupdates.com/performance-of-the-amd-radeon-rx-480-for-other-algorithms/

They will be re-evaluating the Pascal 1070 with Ethereum and Linux at a future date as they also found the performance issue under Windows.
Worth noting while Nvidia has problems with Ethereum under Windows, Polaris with the 480 it seems has problems with sgminer under Windows (they have not tested it under Linux yet).

Cheers
 
Last edited:
So AMD 'bigger' GPU Vega is 1H17, not even 1Q17, that is so disappointing....not even the big one with HBM.. :(
Can we say they have lost this round already?

Just what is wrong with their 14nm GCN3.0? The process or the architecture? Who wants to put in some speculation with Polaris as a clue..?
 
Actuallly i read different report, articles saying Vega 10 october 2016, 2017 for Vega 11... H1 dont mean forcibly june, could just mean first half of 2017. ( i need to watch the investor video, slide are just there for background ).

But well as rumors are completely mixed, hard to really know.
 
So AMD 'bigger' GPU Vega is 1H17, not even 1Q17, that is so disappointing....not even the big one with HBM.. :(
Can we say they have lost this round already?

Just what is wrong with their 14nm GCN3.0? The process or the architecture? Who wants to put in some speculation with Polaris as a clue..?
I can see several obvious possibilities.
One is yield. Lisa Su made a comment about 14nm yield issies regarding Polaris just over a week ago, so it may be that Vega needs to be respun, and that they can't fully trust the respins to yield sufficiently well to dare push the "launch" button yet.
Another, more speculative, is that they are waiting for a more suitable process for high power chips than 14nm LPP, wich would perform better in the rather extreme highpower GPU segment. (I'd love to see how the 14nm HP process from IBM would perform. )
And again even more speculatively, HBM2 may be yielding too poorly to allow large volumes at decent cost to be counted on, which would also make being a bit cautious in their roadmappery wise.

We just don't know. We may never get to know.
 
It may be as simple as this came out during an investor meeting and revenue takes time. If they were expecting significant revenue 1Q17 they'd likely have to launch before Christmas. The slide I saw also specifically mentioned "enthusiast" which likely doesn't include both of the Vega chips. Speculating here, but if they are bonding dies together that puts HPC Zen and Vega11 in roughly the same timeframe. Seems reasonable the two could be related as Zen should feature leading graphics IP so some tech is likely shared. HBM2 is the other potential culprit because I'd have expected the Pascal Titan to be launched with it if readily available considering the price. Little reason not to add HBM2 and charge whatever you want on that market.
 
It may be as simple as this came out during an investor meeting and revenue takes time. If they were expecting significant revenue 1Q17 they'd likely have to launch before Christmas. The slide I saw also specifically mentioned "enthusiast" which likely doesn't include both of the Vega chips. Speculating here, but if they are bonding dies together that puts HPC Zen and Vega11 in roughly the same timeframe. Seems reasonable the two could be related as Zen should feature leading graphics IP so some tech is likely shared. HBM2 is the other potential culprit because I'd have expected the Pascal Titan to be launched with it if readily available considering the price. Little reason not to add HBM2 and charge whatever you want on that market.

For Titan, or they was initiallly allready plan to launch consumers grade with GDDr5x and so a different sku, GP102, for have it as fast possible, or they have change plan in between, something AMD was not as fast to do. ( this could explain too why GP102 have some little things that GP100 dont have )
 
Can we say they have lost this round already?
Um, wasn't it obvious they've lost this round when they first announced polaris as a mid-range GPU? The 480 is barely faster than the 390X, which in its original incarnation is nearly three years old by now, and there's no faster product anywhere on the horizion. If they had anything, AMD would have been hyping it to try and prevent gamers from buying NV high-end boards.
 
HBM2 is the other potential culprit because I'd have expected the Pascal Titan to be launched with it if readily available considering the price. Little reason not to add HBM2 and charge whatever you want on that market.


nV is already charging what ever they want with GP102 with GDDR5x, why would they need GP100 and HBM2? To cut margins or make a product that cuts into their professional cards market and thus cut down margins there too?

nV's stance on not using HBM, HBM2 in enthusiast level products, tells a great deal about how much more expensive it is to use such products.

It would have been cheaper for them to use GP100 and HBM2 because they wouldn't have needed to spend the money on design of the GP102, but margins must be quite different from HBM2 to GDDR5x for them to plan like this. nV even noted the reason why in the past they used professional chips in the top end enthusiasts, this is the first launch where they have branched away from that. (also have to factor in the cost of the die, GP102 is a smaller die, but a GP102 with HBM would have been doable, and they still didn't do that either, so it comes down to the cost of HBM and its needs).
 
Back
Top