The AMD Execution Thread [2018]

Alexko · Jan 31, 2018

digitalwanderer said:
Are we speculating or did you mistype "2016" as "2018", because I think 2018 is gonna be a LOT better than 2017....

I shouldn't post when I'm this tired—thanks for pointing it out, it's fixed!

digitalwanderer · Jan 31, 2018

itsmydamnation said:
its going to be interesting to see how much head room they cant get on "12nm" , the problem with ryzen in the DIY space isn't the clock speed , its the damn wall it hits, even if power goes up quite alot in the mid 4ghz if they can stop that wall effect so using things like AIO/water coolers makes a real difference i think we will see ryzen 2 being very popular , if the wall just moves a few 100mhz i expect it to just continue along its current sales trajectory.

I think EPYC will see very well, I think the APU will do very well as well. AMD will be hoping that mining boom keeps up because its making there GPU deficit irrelevant from a revenue point of view.

All in All if AMD has the manufacturing capacity 2018 I think AMD should see billions added to the revenue number vs 2017 ( ~7 vs 5)

I really, really, REALLY hate to say this but I'm not so sure AMD's capacity will be the limiting factor. I'm terrified that DDR4 prices are going to keep going up, and with all the cryptowhores out there GPUs are gonna be insane for a while. I'm not predicting the death of pc gaming or anything, but I will be betting on a rather huge and continued slump while the price of entry is so damned high.

There are some absolutely killer AMD cpu/mobo deals out there that would pump my system up to a real gaming rig again for under $250...but it's still gonna cost $180 for f-ing ram that cost $80 2 years ago and that just galls me to the bloody bone! I'm thankful as hell I still have a decent gaming card in my wife's rig I can steal (R390) because if I had to get one it would just TOTALLY kill it. :|

Please don't get me wrong, AMD is doing great and I'm glad..but there are a whole bunch of other factors at play that I have never seen affect the market the way it has in as short a period ever before in my lifetime so I am hesitant to be too bullish on anything right now. :/

DmitryKo · Jan 31, 2018

3dilettante said:
At that point, why invest in a massive amount of inter-die communication for a mining-targeted product?
AMD could provide a less intensive level of connectivity between the dies, and unless there's a class of mining algorithms that is widely used and profitable AMD could pocket the implementation savings with little revenue lost.

Exactly. There could be several versions of 'professional' multi-chip cards, with memory controller and interconnect implementation targeting a specific market.

Entry-level 'mining' accelerator card would use several low- or mid-range dies with reduced amount of inter-die communication, and maximize memory bandwidth with HBM stacks per each die. Thus each 'professional' accelerator card replaces several consumer video cards, working in the same 8-slot mining setup.
Such card would also have a value for HPC and/or gaming applications which can utilize multiple adapters and/or multiple adapter nodes.

High-end 'HPC' accelerator card would use high-performance dies with increased number of inter-die links, better PCIe bandwidth, and Cache Coherent Interconnect for Accelerators (CCIX) protocol.

I believe Raja Koduri stated that the mesh's bandwidth matched the memory controller bandwidth.

In the current implementation.
EPYC CPUs have a shared memory controller and a shared L3 cache for all dies.
Navi GPUs could have a dedicated memory controller with their own HBM stacks per each die - with a bandwidth increase past 512GB/s in the desktop Vega GPUs.

This presumes for mining that there's a compute-bound workload that is sensitive to whether there is one GPU of size X versus 2 of size X/2.

I mean HPC workloads - even a 'mining' accelerator card has to be viable for traditional compute tasks.
BTW ASICs with HBM memory are coming from Samsung, TSMC, and GlobalFoundries.

part of my proposal was the introduction of specific functions that boost mining efficiency that would be very trackable.

Yes, GCN architecture does implement hash instructions so AMD can disable this microcode for non-professional cards, but it wouldn't help much if EThash is memory bandwidth bound.

heuristic is, for example, "consumer SKUs must use 20% of hardware events for fixed-function graphics hardware events before duty cycling, and mining ops are 1/16 rate",

Game engines are increasingly using compute tasks for pre-rendering computations which do not need to be bound to the graphics pipeline - so these restrictions would just stand in the way of legitimate gamers.

The "laws of the free market" in this case is charging whatever the market will bear, which in the case of miners is a lot.

The PC gaming market will not bear these prices for much longer. It is in the interest of all parties to return the pricing situation to normal as soon as possible, or the ensuing crash will bury the entirety of desktop hardware makers.

A 'free market' solution would be increasing production to reduce prices, a 'planned economy' solution would be fixing the price and regulating the demand with queuing and rationing. We already know which one will work.

Big buyers are bypassing retailers and possibly part of the wholesale market, and as such are likely getting a bit more money back up the food chain to the AIBs and possibly AMD or Nvidia

Even wholesale buyers cannot bypass board makers - the latter are the ones benefitting the most from high prices, but they will also be the first ones to suffer if this situation continues. And they seem to realize this very well.

DmitryKo · Jan 31, 2018

A part of the underlying infrastructure for those counters is what allows DVFS to not melt the chip or engage turbo, since AMD's method uses progress measurement, estimated power cost per event, and utilization.

I can't see how power management counters could be used to reliably discriminate mining workloads.

Additional targeted restrictions could be checks on workloads not using the graphics pipeline in systems with more than 2-4 cards, and maybe a check for the negotiated PCIe link width being 4x or below.
This isn't DRM. I was not proposing that they prevent mining so much as providing a trade-off in terms of up-front cost for a mining SKU or unlock, versus a reduced hash rate that would still eventually pay for itself under current miner logic. This is market segmentation.

It's not practically enforceable by driver software and/or DRM schemes - there have to be different hardware to prevent driver modding.

I was talking about how the members of the Party or the power structure did well for themselves, not well for the nation at large.
I thought a fair number of power brokers under the Soviet Union wound up the opposite of paupers after its dissolution

The wealth of the current 'oligarchs' has nothing to do with goods shortages which ended with the Soviet Union.

why not still make fake accounts so you could make fraudulent purchases and get all the cards for free?

Impersonating someone's identity to avoid queuing or online restrictions is not the same as stealing someone's money from a bank account.

a package of coke doesn't broadcast itself to the whole world every couple seconds and log itself into a permanent record for the world to see, while hooked into a building with a utility-scale power contract with a thermal signature that could probably be seen from space.

Surely I can see Pacific Gas and Electric Company Police raiding residential buildings with thermal detectors in their hands. The staff will probably come from retired veterans of Florida Bathing Suit Patrol. Their effect on the gaming market would be zero or negative.

itsmydamnation · Jan 31, 2018

digitalwanderer said:
I really, really, REALLY hate to say this but I'm not so sure AMD's capacity will be the limiting factor. I'm terrified that DDR4 prices are going to keep going up, and with all the cryptowhores out there GPUs are gonna be insane for a while. I'm not predicting the death of pc gaming or anything, but I will be betting on a rather huge and continued slump while the price of entry is so damned high.

There are some absolutely killer AMD cpu/mobo deals out there that would pump my system up to a real gaming rig again for under $250...but it's still gonna cost $180 for f-ing ram that cost $80 2 years ago and that just galls me to the bloody bone! I'm thankful as hell I still have a decent gaming card in my wife's rig I can steal (R390) because if I had to get one it would just TOTALLY kill it. :|

Please don't get me wrong, AMD is doing great and I'm glad..but there are a whole bunch of other factors at play that I have never seen affect the market the way it has in as short a period ever before in my lifetime so I am hesitant to be too bullish on anything right now. :/

There is supposed to be a massive amount of ddr4 from China coming online this year , memory prices should go down not up.

Alexko · Jan 31, 2018

itsmydamnation said:
There is supposed to be a massive amount of ddr4 from China coming online this year , memory prices should go down not up.

Could you point me to some additional information on that, please? I haven't heard anything about it.

3dilettante · Jan 31, 2018

DmitryKo said:
In the current implementation.

While it may be possible to make links that fall short of the memory controller's throughput, the synchronization and routing for the fabric may run into issues if the memory clients responsible for arbitration, snoops, and ordering are overwhelmed by a link.

Yes, GCN architecture does implement hash instructions so AMD can disable this microcode for non-professional cards, but it wouldn't help much if EThash is memory bandwidth bound.

The neat thing about those focused on mining is their profit motive makes them contort their systems in significant ways beyond the limits of general or professional workloads, and there's no obligation to allow the hardware to be taken to those points for free.

Game engines are increasingly using compute tasks for pre-rendering computations which do not need to be bound to the graphics pipeline - so these restrictions would just stand in the way of legitimate gamers.

The engines aren't going to optimized for workloads that run the card off of a 1x PCIe 2.0 slot, uses no graphics buffers, ignores most graphics operations or floating point, no texturing, one or two kernel programs running for days, uses no API resources or operations, and random memory patterns and hashing through the same locations over and over.
The system and workload fingerprints are quite unique for mining, and while it may be possible to create code that works around some of the checks it won't come without a cost in hashrate or broader appeal for the hash algorithm.
The profit motive intrudes at that point. Work around restrictions at an ongoing penalty, or pay the premium and get on with mining before difficulty rises or the market corrects.

A game that somehow manages to trip some of these checks at launch would get the same sort of follow-up game-ready hotfix driver that they all get, or dev builds would show low performance and they'd adjust the engine.

The PC gaming market will not bear these prices for much longer. It is in the interest of all parties to return the pricing situation to normal as soon as possible, or the ensuing crash will bury the entirety of desktop hardware makers.

It's in the best interests of the GPU vendors to charge the ones who will pay the most more than those that will not. Charging miners more would help moderate their demand and provide the necessary financial cushion for the second-hand glut after a likely correction. They should be priced higher if only to account for the much higher risk a fickle mining market entails.

A 'free market' solution would be increasing production to reduce prices, a 'planned economy' solution would be fixing the price and regulating the demand with queuing and rationing. We already know which one will work.

Quadro, Titan, FirePro, Xeon, EPYC, i7, R7, Threadripper, and countless other upmarket products are indicative of that this sort of differentiation on the same hardware is sustainable.
The mining market takes things to the next level by not having the same level of workload complexity, economic sustainability, long-term thinking, or legal compliance the other buyers in less price-sensitive segments have.

Even if RTG and Nvidia took on the role of "GPU vendor of the miner-folk", their other markets, DRAM vendors, shareholders, and the foundries will not be abandoning their philosophy of maximizing revenue extraction.

DmitryKo said:
I can't see how power management counters could be used to reliably discriminate mining workloads.

The hardware knows at a unit granularity what the workload is doing, and the crypto algorithms that try to be ASIC-hard make certain choices that lead to discernible patterns. Focusing on local bandwidth as the limiter leads to a lot of pseudo-random accesses to cache and memory, mining focuses on reams of integer and bit operations, and resource allocations are not handled in a common fashion.
Games are highly variable even within a frame, so the cumulative time that a given set of operations takes up GPU execution can be tracked and duty-cycled with limited impact--unlike a workload churning through straightforward math or purposefully scattered accesses without variation for hours, days, or months.
The internal tables are also not entirely visible to software, or the choice exists to give more authority to the internal execution loop. Various thresholds can override clock or voltage settings, and the platform for encrypted payloads and a multitude of keys is in place.

It's not practically enforceable by driver software and/or DRM schemes - there have to be different hardware to prevent driver modding.

Quadro features and server options for the high-end SKUs have been segmented for decades on the same hardware, and current and future platforms are becoming more capable.

The wealth of the current 'oligarchs' has nothing to do with goods shortages which ended with the Soviet Union.

Despite the fact that so many were either part of the state control apparatus or their cronies?
How many planners went without the goods the common people couldn't get, and which side of the command structure would the vendors be in this analogy?

Impersonating someone's identity to avoid queuing or online restrictions is not the same as stealing someone's money from a bank account.

Bank accounts are expected to carry a minimum balance, account creation requires a fair amount of the bank's time and resources to set up and register, and the law takes a dim view of large-scale wire fraud and identity theft.
If these practical hurdles are somehow not a problem for your scenario, the overdraft, credit, and loan services of a bank that misses dozens to hundreds of (apparently untraceable?) fake accounts being created in a short period of time can be abused.

Hobbyist miners won't bother, and large-scale miners don't need it.

Surely I can see Pacific Gas and Electric Company Police raiding residential buildings with thermal detectors in their hands. The staff will probably come from retired veterans of Florida Bathing Suit Patrol. Their effect on the gaming market would be zero or negative.

Why do you think it's that hard for electric companies to know how much they need to bill their customers for?
For that matter, in the case of drug interdiction there are thresholds for review at sufficient levels of consumption.
Past a certain level of consumption, the utility, local inspectors, and law enforcement can become involved due to thresholds each can have for questioning power delivery to a location.

In the case of the big mining concerns, they aren't sneaking around. The power hookups and delivery plans are not secret. The delivery level, bulk rates, and service agreements are handled like they would be for any commercial/industrial customer.

The idea is to make them choose to pay more for a special SKU or hardware unlock, or go without the profit from an optimal hash rate. Charge them less than it takes to make it wholly unprofitable, but up to the limit of overhead and ongoing cost incurred from working around restrictions.

eastmen · Feb 1, 2018

itsmydamnation said:
its going to be interesting to see how much head room they cant get on "12nm" , the problem with ryzen in the DIY space isn't the clock speed , its the damn wall it hits, even if power goes up quite alot in the mid 4ghz if they can stop that wall effect so using things like AIO/water coolers makes a real difference i think we will see ryzen 2 being very popular , if the wall just moves a few 100mhz i expect it to just continue along its current sales trajectory.

I think EPYC will see very well, I think the APU will do very well as well. AMD will be hoping that mining boom keeps up because its making there GPU deficit irrelevant from a revenue point of view.

All in All if AMD has the manufacturing capacity 2018 I think AMD should see billions added to the revenue number vs 2017 ( ~7 vs 5)

I was under the impression that the real wall was the ram and motherboard stability with it. I am hoping with the new chipset and new power delievery set up with the new boards it will allow more overlocking room even for the older chips

Deleted member 13524 · Feb 1, 2018

https://seekingalpha.com/article/41...-2017-results-earnings-call-transcript?page=2

It seems the Radeon Instinct Mi25 sold very well:

Our professional graphics business, had its best quarter ever, based on growing data center sales, highlighted by strong Radeon Instinct, MI25 sales to a major cloud provider.

Pressure · Feb 1, 2018

It seems a single customer bought a lot.

Deleted member 13524 · Feb 1, 2018

Pressure said:
It seems a single customer bought a lot.

Baidu?

Due to recent news, people were wondering if the Baidu sales included substantial numbers for the Instinct Mi25, or if it was just a couple of GPUs with the blunt of it being EPYC CPUs.
Turns out they bought a lot of Mi25. It's just not in their 1P servers.

DmitryKo · Feb 2, 2018

3dilettante said:
While it may be possible to make links that fall short of the memory controller's throughput, the synchronization and routing for the fabric may run into issues if the memory clients responsible for arbitration, snoops, and ordering are overwhelmed by a link.

In June 2017, NVidia Research published a detailed paper on their proposed multi-chip GPU design, which was tested in a VHDL simulator. Their findings should also be applicable to AMD multi-die performance.

MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability
http://research.nvidia.com/publication/2017-06_MCM-GPU:-Multi-Chip-Module-GPUs

https://wccftech.com/nvidia-future-gpu-mcm-package/
https://techreport.com/news/32189/nvidia-explores-ways-of-cramming-many-gpus-onto-one-package
https://wccftech.com/amd-navi-gpu-launching-siggraph-2018-monolithic-mcm-die-yields-explored/

The simulated MCM-GPU has 4 GPU modules each with 64 SM cores and a 768 GByte/s link to local HBM. The modules are connected through a ring bus, and a required inter-module bandwidth is researched using real-world CUDA workloads running in the simulator (paragraph 4).

First they consider unoptimized MCM-GPU design, with a 16 MB L2 cache directly connected to each module's cross-bar/memory controller, and 128 KByte L1 cache connected to each SM core.
For memory intensive workloads, inter-module link bandwidth of 1x memory bandwidth (768 GByte/s) results in 60% of maximum theoretical performance (using an 'ideal' 6 TByte/s inter link), while 2x bandwidth (1.5 Tbyte/s) results in 90% and 4x (3 TByte/s) in 97%. For compute intensive workloads, 1x link corresponds to 85% of maximum performance, 2x link to 97%, and 4x to 100% (Figure 4, paragraph 3.3.2). An 'ideal' 6 TByte/s link has little to no performance gain over 4x (3 TByte/s) link.

They assume that an 1x (768 GByte/s) inter-module link can be easily implemented today, so this is the bandwidth they use in further research. 1.5 TByte/s link is reasonable to achieve as well, but 3 TByte/s would require further development of signaling/packaging technology (paaragraph 3.3.3).

Then they consider design optimizations for the 768 GByte/s link to reduce the performance gap, such as adding 16MB L1.5 cache per each GPU module in order to cache remote memory accesses (paragraph 5.1, Figure 6), and distributed scheduling of co-operative thread array (CTA) - i.e. enable contiguous data-local threads are re-grouped to execute on the same GPU module (paragraph 5.2, Figures 9/10).
These two optimizations would result in a 33% reduction of inter-module communication, improving performance of memory-intensive workloads by 23.4% and compute-intensive workloads by 1.9%.

The final optimization is first-touch page mapping policy, where virtual pages are mapped to the physical memory of the GPU which initiated a page load request, to further reduce inter-module traffic. When touch-first page mapping is combined with 8 MB L2 cache, 8 MB L1.5 cache, and distributed CTA scheduling, real-world performance improves by 51% in memory-intensive workloads and 11.3% in compute-intensive workloads (paragraph 5.3, Figures 13/14).

Overall, these three optimizations improve performance of the 1x (768 GByte/s) inter-module link by 22.8% over unoptimized design - which is about 90% of a similar monolithic or MCM-GPU with an 'ideal' aggregate bandwidth of 6 TByte/s, both of which cannot be practically implemented in silicon (paragraph 5.4, Figures 16/17).

Such MCM-GPU would also be very power-efficient. At 28 nm node, on-die and on-package transmission links require energies of 80 fJ/bit and 0.5 pJ/bit, while on-board links require 10 pJ/bit and system (inter-slot or inter-processor) links require 250 pJ/bit - each of these is an order of magnitude (10x) higher that the preceding one (paragraph 2.1, Table 2, paragraph 6.2).
The resulting MCM-GPU design is fully transparent to the programmer and works like a monolithic integrated GPU (paragraph 7).

Quadro, Titan, FirePro, Xeon, EPYC, i7, R7, Threadripper, and countless other upmarket products are indicative of that this sort of differentiation on the same hardware is sustainable

Quadro features and server options for the high-end SKUs have been segmented for decades on the same hardware, and current and future platforms are becoming more capable.

These are products with either additional significant hardware features or application-specific optimized driver (which BTW could be easily modded, if anyone cared to run CAD-optimized OpenGL path on their consumer gaming cards).

They just don't have this 'heuristic' dynamic DRM to limit your workloads to the arbitrary 'allowed' ones.

Even if RTG and Nvidia took on the role of "GPU vendor of the miner-folk", their other markets, DRAM vendors, shareholders, and the foundries will not be abandoning their philosophy of maximizing revenue extraction.

Whatever, I just want my high-end gaming video card, and they buy it from board vendors.

The idea is to make them choose to pay more for a special SKU or hardware unlock, or go without the profit from an optimal hash rate.

I'd rather charge them less for the special mining SKU that offers limited compute performance but the same high memory bandwidth, so they would release grip of high-end gaming cards.

Charging miners more would help moderate their demand and provide the necessary financial cushion for the second-hand glut after a likely correction.

It would not prevent the crash of the desktop gaming market, since gamers will soon just stop buying video cards for these insane prices.

PS. AMD is actually ramping up GPU production for both GDDR5 and HBM2 based parts:

http://www.guru3d.com/news-story/am...n-blames-availability-of-graphics-memory.html
https://wccftech.com/amd-ramping-gpu-production-confirms-memory-behind-shortage/
etc.

DmitryKo · Feb 2, 2018

3dilettante said:
The engines aren't going to optimized for workloads that run the card off of a 1x PCIe 2.0 slot

Mining workloads are not 'optimized' for 1x PCIe, loading a 3GB data set in a few minutes instead of a few seconds is hardly an 'optimization'. It's just 8+ slot motherboards are unable to offer more PCIe lanes using current low-end processors.

uses no graphics buffers, ignores most graphics operations or floating point, no texturing, one or two kernel programs running for days, uses no API resources or operations, and random memory patterns and hashing through the same locations over and over.

The hardware knows at a unit granularity what the workload is doing, and the crypto algorithms that try to be ASIC-hard make certain choices that lead to discernible patterns.

I can understand your description of these high-level tasks, but I can't understand a practical approach to implementing such detection logic, as hardware and drivers do not operate on a high level, and the only currently viable way to determine the exact type of workload is to have a graphics programmer analyze annotated C++/HLSL source code in the graphics debugger.
There may be individual pieces of the jigsaw puzzle, but the big picture just does not add up.

A game that somehow manages to trip some of these checks at launch would get the same sort of follow-up game-ready hotfix driver that they all get, or dev builds would show low performance and they'd adjust the engine.

Why the developers should even care to fix something that they didn't break in the first place? Rather everyone would just move to the greener eye-shaped pastures.

How many planners went without the goods the common people couldn't get, and which side of the command structure would the vendors be in this analogy?

Post-Soviet corporativist oligarchy of former state control apparatus is not based on their exloitation of the planned economy and an accumulated wealth from that period, which would be eaten wholesale by hyperinflation of 1992-1993.
I can't see the analogy with graphic card vendors either.

Bank accounts are expected to carry a minimum balance, account creation requires a fair amount of the bank's time and resources to set up and register
Hobbyist miners won't bother, and large-scale miners don't need it

You are going to cut off large-scale mining crowd by enforcing a hard limit on the number of cards sold in each order, and they just won't bother and won't need to evade it? We shall see.

Why do you think it's that hard for electric companies to know how much they need to bill their customers for?
In the case of the big mining concerns, they aren't sneaking around.

Why would power companies even care to charge mining custumers extra money, unless some idiot control-freak politician legislates a theoretical 'mining tax' into an unpleasant reality?

The only thing this would achieve in the long term, everyone installs a Tesla Solar Roof with a 15 kWh PowerWall and says 'kiss my ass' to the power grid police, for always.

Charge them less than it takes to make it wholly unprofitable, but up to the limit of overhead and ongoing cost incurred from working around restrictions.

A video card pricing model which involved arbitrary additional charges is not going to be sustainable.

their profit motive makes them contort their systems in significant ways beyond the limits of general or professional workloads

What's that got to do with the maximum memory bandwidth?

3dilettante · Feb 2, 2018

DmitryKo said:
In June 2017, NVidia Research published a detailed paper on their proposed multi-chip GPU design, which was tested in a VHDL simulator. Their findings should also be applicable to AMD multi-die performance.

My comment was related to the properties of AMD's coherent fabric implementation. The fabric itself is rather bound to the clocks and throughput of the memory controllers. For EPYC, no link exceeds the throughput of a memory controller's interfacing hardware, with certain links like xGMI dropping slightly below the 1:1 match of on-die link to MCM link bandwidth. It's possible there's some nice simplifying property to making sure a link's endpoint is able to service it.

These are products with either additional significant hardware features or application-specific optimized driver (which BTW could be easily modded, if anyone cared to run CAD-optimized OpenGL path on their consumer gaming cards).

They just don't have this 'heuristic' dynamic DRM to limit your workloads to the arbitrary 'allowed' ones.

Any of the CPU products have the same or similar hardware in their client and professional SKUs. Significant amounts of their management hardware can autonomously override inputs for clocks, voltages, and can set feature levels at the factory based on fuses, or in the wild with microcode or firmware updates.
DVFS for any of these products can willfully override outside commands in specific instances, that they haven't bothered in this way up until now is not because the hardware cannot override things further.
The introduction of new instructions and semantics for Spectre mitigation occurs through microcode updates to the x86 processors, via signed and encrypted blobs.
AMD already offers for the purposes of specific paying clients interfaces with its PSP for trusted computing and secure software running internally, and that shares infrastructure with DVFS and parts of its IO complex. For semicustom and their fabric's ability to easily build new products, the fabric is run through a secure domain with its control and data paths.

I'd rather charge them less for the special mining SKU that offers limited compute performance but the same high memory bandwidth, so they would release grip of high-end gaming cards.

If they bought those cards, they'd take the savings and income from further mining and continue buying the gaming cards, unless demand craters. This is enabling them to buy up even more cards, while robbing the gaming market of supply that could reduce prices--which the miners are better positioned to pay if they do not fall.

PS. AMD is actually ramping up GPU production for both GDDR5 and HBM2 based parts:

The paraphrasing from the first link goes:
"AMD reports they are working closely with memory partners to solve the issue at hand and ramp up production, but also mentioned it is one of the most important factors for the company to achieve. Graphics memory is now getting available in better quantities it seems, and as such AMD will be ramping up GPU production."

The supply constraint for DRAM is indicative of the overall demand spike for memory, and the batch ordering process for niche memories that usually applies to GDDR and probably HBM.
Unlike commodity DRAM that had dedicated lines running continuously, the boutique DRAM types are usually made to order--which contributes to their lower volumes and higher cost per device.
Working with DRAM partners means arranging for more orders, which in the constrained DRAM market means diverting from the in-demand commodity types. In the case of HBM in particular, the DRAM maker statements to investor indicates they can charge much more. Raising the demand for said memory doesn't motivate Samsung and Hynix to charge less, which raises the cost of the final board.
It's somewhat better position for AMD than last time they were hit by a glut and someone else's memory production didn't limit them from overextending.

It seems like it would be more effective to price the cards whose memory is several times more expensive higher.
If there is an interest in getting some cards at an affordable price to gamers, then it might help if those who are willing to pay 2-3x markups subsidize that.

3dilettante · Feb 3, 2018

DmitryKo said:
Mining workloads are not 'optimized' for 1x PCIe, loading a 3GB data set in a few minutes instead of a few seconds is hardly an 'optimization'. It's just 8+ slot motherboards are unable to offer more PCIe lanes using current low-end processors.

If that mattered significantly, the optimal setup would have fewer GPUs per motherboard. Miners are paying more than the cost of a motherboard+cheap CPU+power supply in the difference between MSRP and market price per GPU.

This is mostly because many popular mining algorithms purposefully avoid scaling with system-level interconnect bandwidth. A mining workload that can scale with PCIe throughput means it scales with large multi-socket boards and clusters with non-consumer specs like infiniband and high-end switches. This is something many of the developers were explicit in working against, as it would allow for those that could afford clusters with high-end interconnects to dominate.
What happened with the first set of ASIC-resistant algorithms is that their creators missed the possibility of cramming a lot of GPUs into a rig. Later algorithms altered the bottleneck, but generally still do not want to reward systems that have enterprise-grade connectivity.

I can understand your description of these high-level tasks, but I can't understand a practical approach to implementing such detection logic, as hardware and drivers do not operate on a high level, and the only currently viable way to determine the exact type of workload is to have a graphics programmer analyze annotated C++/HLSL source code in the graphics debugger.
There may be individual pieces of the jigsaw puzzle, but the big picture just does not add up.

HBCC has automatic tracking of memory access behavior and heuristics for data movement.
The intelligent work distributor actively combines instances and schedules calls to avoid context rolls.
DVFS calculates hardware events and progress counters at hundreds/thousands of times per second.
The turbo functionality also needs to accumulate a rolling window of hardware events in order to know how long clocks can be pushed for time periods that approach human-relevant lengths.
Things like Radeon chill, frame limiters, and the new VSYNC options can track progress and utilization at a game-frame level.
The driver compiler literally pores through every line of the shader code being loaded, and there's dynamic evaluation of what can be cached, precompiled, or combined based on execution history.
People were complaining about those thick DX11 drivers that got in the way of developers with all of their "managing the resource allocation of a whole application on their own" functionality.
The behaviors of the mining workloads are very inconsistent with gaming workloads, and because of their developers actively targeting a systemic bottleneck so that they are ASIC-resistant and cluster-unfriendly, there are some really non-standard elements to an optimal rig's setup, with an optimized rig being multiple times as effective per watt than a baseline install.
If you don't plan on undervolting to the point of instability, downclocking the GPU while upclocking RAM, running on a 2x PCIe slot, full crypto throughput every cycle for months on end, with custom DRAM timings and non-gaming drivers, then a gaming card that quietly walls off options at those points won't affect you.
Even something like throttling a super high rate of random no-graphics misses to DRAM after 12 hours would likely only hurt crypto, since games are nowhere near that consistent or unvarying in behavior.
What is needed is the choice to be made in design for the system to have those options actively guarded or overridden, or in this case to apply the measures that exist for this purpose.

Why the developers should even care to fix something that they didn't break in the first place? Rather everyone would just move to the greener eye-shaped pastures.

High-demand situation with shortages for all vendors. If Nvidia cards are sold out, a miner can either sit and not buy mining equipment, or buy something that is not the most optimal. If they're paying massive markups for even modest cards, I think I know what choice they're making. If the market shifts to where people aren't buying GPUs at stupid prices, then it's likely not worth bothering to cater to them.
(edit: I misunderstood your point. My response upon re-reading is that developers noticing slowness in some random corner case are just going to see 1 more out of hundreds of other reasons they might see slowness. The GPU isn't going to throw a massive error blaming blockchains. For the throttling to materially affect hash rate, it doesn't need to be crippling for a game anyway. There's no real profit loss if a game drops a percentage similar to the sorts of drops most newly launched games experience anyway. And since games get launch drivers and development effort/non-effort with Nvidia, what's the difference here?)

Post-Soviet corporativist oligarchy of former state control apparatus is not based on their exloitation of the planned economy and an accumulated wealth from that period, which would be eaten wholesale by hyperinflation of 1992-1993.

I'm asking about the oligarchs now. How did they get early and cheap access to the state interests being sold off, which none of the common people waiting in bread lines had the opportunity to do?
If you have physical assets, production, or capital, inflation means their price rises. Also, if sufficiently connected or wealthy, you can convert to an asset or currency not hyper-inflating--or just move.
The GPU vendors are not the little people, and they are not their friends.

You are going to cut off large-scale mining crowd by enforcing a hard limit on the number of cards sold in each order, and they just won't bother and won't need to evade it? We shall see.

The limit is a voluntary request for retailers, which the big miners likely don't have the time to mess with and might actually want in place.
They can go around the channel, and these limits can hurt smaller miners that could become realistic competitors if they were able to sustain a higher rate of board purchases. Instead, the intermediate miners wind up losing some of their purchase rate to gamers or part-time miners.
That means the contribution to the global hash rate for the cards not sold to the biggest miners is less.
The small-time miners/gamers won't care, and the big miners have an even bigger fraction of the global hash rate.

Why would power companies even care to charge mining custumers extra money, unless some idiot control-freak politician legislates a theoretical 'mining tax' into an unpleasant reality?

Free market says charge what the buyer is willing to pay. I don't follow why you think up-charging someone someone demonstrably willing to pay more is unthinkable in a capitalist system.
On top of that, power distribution at higher levels requires more work and investment on the part of the utility, and requires more care for grid stability. They aren't going to size the power lines in a neighborhood for a hundreds of KW or possibly more 24/7, and past a certain point the support and safety considerations require far more due diligence and regulatory review--given the tendency for death, fire, and explosions at those energy levels.
Past a certain threshold, getting past the up-front complications might give a lower rate, as the predictability and consistency of demand can be beneficial for the grid and for planning power generation.

The only thing this would achieve in the long term, everyone installs a Tesla Solar Roof with a 15 kWh PowerWall and says 'kiss my ass' to the power grid police, for always.

This is literally how everything has worked since almost the start of electrification. Almost nobody cares at the residential level, and they actively avoid the sorts of infrastructure necessary to deliver multiple times the normal residential hookup.
As for one buyer that would need to stay on the grid, a miner with a rig that would exceed the power delivery of a solar roof and the capacity of a PowerWall--which is one or two multi-GPU rigs in a detached residence with sufficient roof area in the summer, if that.

A video card pricing model which involved arbitrary additional charges is not going to be sustainable.

Then the miners don't buy the card, and they hope that the next miner won't buy the card. The miners willing to accept having just a little less additional profit up front will pay, or gamers have a card they can buy as-is. No crippled mining cards means a market correction will not have cards sitting in warehouses that nobody will buy.

What's that got to do with the maximum memory bandwidth?

Downclock the core clock and undervolt as much as possible, then upclock memory. Put as many cards on a board as possible, and use the power savings to put in as many rigs as possible before the limitations of the wiring or local hookup come into play.
Maximum bandwidth per watt and volume equals maximum hash rate and profit per location, at least for the algorithms that chose to be limited by local device bandwidth.

Anarchist4000 · Feb 3, 2018

3dilettante said:
My comment was related to the properties of AMD's coherent fabric implementation. The fabric itself is rather bound to the clocks and throughput of the memory controllers. For EPYC, no link exceeds the throughput of a memory controller's interfacing hardware, with certain links like xGMI dropping slightly below the 1:1 match of on-die link to MCM link bandwidth. It's possible there's some nice simplifying property to making sure a link's endpoint is able to service it.

Bound or just operating that way? It wouldn't seem difficult to make it go faster, but it would burn more energy. There were some multipliers as I recall for debugging.

3dilettante · Feb 4, 2018

Anarchist4000 said:
Bound or just operating that way? It wouldn't seem difficult to make it go faster, but it would burn more energy. There were some multipliers as I recall for debugging.

From the testing on Ryzen, the FCLK and DFICLK values are fixed, and are half the MEMCLK.
GMI is 4x FCLK.

The controller itself apparently has a debug 1:1 rate relative to the memory speed, but it doesn't appear to be functional or not really supported since setting it causes severe instability.

Silent_Buddha · Feb 5, 2018

Alexko said:
I think that's just normal seasonality. But 2017 is a lot better than 2016, and that's the most important thing.

Yup, year over year numbers are very good. The important part now for AMD is if they can solidify their recovery and move forward. After years of Intel not taking them seriously, they will once again become a focus for Intel competitively in the CPU scene. And in the GPU scene, they are still mostly behind NV.

Hopefully, they'll continue to execute well on the CPU front and continue to improve on the GPU front.

NV and Intel not having competition for so long isn't good for anyone (god I really REALLY hate NV's consumer drivers). Things will hopefully get interesting in the next few years.

Regards,
SB

Alexko · Feb 6, 2018

Silent_Buddha said:
Yup, year over year numbers are very good. The important part now for AMD is if they can solidify their recovery and move forward. After years of Intel not taking them seriously, they will once again become a focus for Intel competitively in the CPU scene. And in the GPU scene, they are still mostly behind NV.

Hopefully, they'll continue to execute well on the CPU front and continue to improve on the GPU front.

NV and Intel not having competition for so long isn't good for anyone (god I really REALLY hate NV's consumer drivers). Things will hopefully get interesting in the next few years.

Regards,
SB

They sure have their work cut out for them on the graphics front, but perhaps they can pull it off. If so, the company should eventually be very successful.

DrYesterday · Feb 22, 2018

Roy Taylor has left AMD

https://www.tweaktown.com/news/60959/roy-taylor-leaves-amd-studios-gets-reshuffled/index.html

The AMD Execution Thread [2018]

Alexko

digitalwanderer

DmitryKo

DmitryKo

itsmydamnation

Alexko

3dilettante

eastmen

Deleted member 13524

Guest

Pressure

Deleted member 13524

Guest

DmitryKo

DmitryKo

3dilettante

3dilettante

Anarchist4000

3dilettante

Silent_Buddha

Alexko

DrYesterday

Similar threads