The AMD Execution Thread [2018]

Discussion in 'Graphics and Semiconductor Industry' started by A1xLLcqAgt0qc2RyMz0y, Jan 8, 2018.

Tags:
Thread Status:
Not open for further replies.
  1. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    15,837
    Likes Received:
    4,799
    This I like. Although they should limit it to between 2-4 cards in the drivers. Consumer cards don't need to go higher than that. For greater than that many cards, it's reasonable to only offer that on educational, professional or mining cards VIA professional or mining oriented drivers.

    No need to muck around with EULAs, asking retailers to limit sales, etc.

    Granted, a miner could still just build multiple systems, but that drives up the cost significantly if they need MB, CPU, Mem, for every 2-4 cards.

    Although I'm not sure if you could sidestep this via Virtualization?

    Regards,
    SB
     
  2. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,789
    Likes Received:
    2,049
    Location:
    Germany
    I mentioned possibilities.
     
  3. Arnold Beckenbauer

    Veteran

    Joined:
    Oct 11, 2006
    Messages:
    1,408
    Likes Received:
    347
    Location:
    Germany
    https://www.amd.com/en-us/press-rel...industry-leaders-2018jan23.aspx?sf180055228=1

    https://www.anandtech.com/show/12363/amd-reassembles-rtg-hires-new-leadership

     
    #63 Arnold Beckenbauer, Jan 23, 2018
    Last edited: Jan 23, 2018
    Grall and Lightman like this.
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,098
    Likes Received:
    2,814
    Location:
    Well within 3d
    The hires seem to be bringing in, or back, some decent pedigrees. Splitting the management and engineering responsibilities would go towards allaying concerns that managing both for RTG may have been a factor in some of the inconsistencies in RTG's execution.
     
    CSI PC and Lightman like this.
  5. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    637
    Likes Received:
    477
    Location:
    55°38′33″ N, 37°28′37″ E
    It's just current HPC products are built from networked nodes that use off-the-shelf technology, where each blade has 1-2 central processors and 2-4 PCIe accelerator cards (Tianhe-2).

    On-die interconnects with a sufficient number of high-speed links have a potential for much better bandwidth and lower latency comparing to existing inter-socket links designed for DDR3/DDR4 memory controllers.

    6 link NVLink 2.0 should offer 150+150 GB/s bandwidth, and 6-link Infinity Fabric should offer up to 256+256 GB/s - that's on the scale of GDDR5 and HBM/HBM2.

    Unfortunately I can't find reliable sources for current EPYC implementation details or multi-die configuration plans for Navi.

    Until there is reference implementation running on existing HPC processor nodes, we wouldn't know for sure; unfortunately the required test bed - such as the above-mentioned Xeon 'Purley' with Xeon Phi 200 coprocessors for LGA 3647, or POWER9 with NVidia Volta GPUs - would currently run tens thousands of US dollars for the simplest hardware configuration...

    Of course power efficiency also comes into question - do you have specific reasons to believe that NVLink2 would be superior to Infinity Fabric in this regard?

    That's because current mining software runs its own computing task on each GPU - if given proper NUMA node support in OpenCL 2.x or Vulkan/OpenCL, who knows what kind of optimizations this would allow?

    Higher PCIe lane count could also be a welcome addition for general HPC tasks and/or possible socket TR4/SP3 co-processor products that would use CAPI/Gen-Z/CCIX protocols.
     
  6. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    637
    Likes Received:
    477
    Location:
    55°38′33″ N, 37°28′37″ E
    I'm just making an obvious observation from an economic point - if your GPU does not break working 24/7, you don't have to throw it out and buy an new one. There is no competition with a better reliability simply because consumer cards are not really designed to run 24/7.

    I grew up in a planned economy where all consumer goods were in permanent shortage. In the end, all restrictions implemented on the promise to relieve shortages only make matters worse - the prices rise even higher, criminal elements are introduced, then police raids on unsuspecting buyers suddenly become a reality.

    The costs of implementing such heuristics algorithms for dynamic workloads right in the video driver would be enormous, to say the least.

    I thought I was only joking... from another thread:
    In general, DRM solutions create much more problems for legitimate users than for attackers who can usually break them in a reasonable time...

    Someone would promptly hack the drivers to bypass this restriction.
     
    Kyyla likes this.
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,098
    Likes Received:
    2,814
    Location:
    Well within 3d
    It's not clear which workloads would consider this significant.
    Workloads similar to Ethereum are primarily concerned with the local DRAM bandwidth, which the immediately adjacent silicon can already satisfy. A fast MCM interconnect would be joining multiple chips that apparently don't need to talk to each other.

    An equihash-like algorithm attempts to use a computationally expensive function to fill local memory, but if the local silicon cannot saturate its local bandwidth, its neighboring chips in the MCM cannot satisfy their local bandwidth. They'd presumably have more productive use of their resources doing their best locally. The capacity-bound element of that algorithm would also mean that making the chips work together means focusing on one chip's memory pool, but the symmetrical distribution of memory and GPUs would mean hurting scaling by neglecting some fraction of the MCM's memory capacity.

    The big Nvidia GPUs with NVLink also atypically max out the power budget and for various reasons including link power are more variable in the throughput given in their marketing.
    For the Infinity Fabric numbers, are you indicating that 6 links give 256 GB/s in both directions? For EPYC, they do not. The inter-socket link uses a 16x PCIe connection running at 10.6 Gbps. That's ~21 GB/s in each direction with ~10% penalty due to the protocol needs. The MCM links are the same bandwidth, just half as fast and twice as wide.
    That's 126+126 GB/s for the whole package with 6 links, 50% more in aggregate than DDR4 DRAM bandwidth.

    The way EPYC does this, the one link's bandwidth in one direction is equal to the bandwidth of one DRAM controller. I'm not sure how AMD currently decide to maps HBM stack channels to controllers, or how many stacks this theoretical MCM GPU has per chip.
    As it stands, one HBM2 stack on its own requires double the one-direction link bandwidth of a whole EPYC MCM, and there would be 4 of them in an EPYC-like implementation. 48 links is a tight fit and ~30W for peak MCM link power. Not fully connecting could drop the 50% over-provisioning and get it to ~20W.

    xGMI and GMI figures derived from slides in the following:
    https://www.anandtech.com/show/1155...7000-series-cpus-launched-and-epyc-analysis/2

    All that aside, it's not clear which algorithms need this, since many seem to get along fine with PCIe 2.0 at x2/x4 each card.

    I'm not sure it would take more than a limited implementation for dominant mining targets, and NVLink2 wouldn't be what Nvidia has projected for any MCM GPU.


    From an economic standpoint from the GPU provider, if reliability has no material impact in terms of competitiveness, surviving past warranty is the extent of AMD's obligation unless the miner wants to pay more. The market is currently overheated to the point that such considerations are being treated as secondary, since they are scraping the bottom of the barrel in terms of less-efficient architectures or mid-low end cards.

    My understanding is that the central planners and their cronies often did well for themselves. Who would the GPU vendors and the large mining interests be in this analogy?
    Also given common use cases and behaviors in the crypto market, isn't it a bit late to worry about the criminal element?

    I'm not sure why. The kernels being run are a limited set with very specific instruction types, no graphics usage, odd memory patterns, peculiar card settings, and unconventional system setup.
    The driver would be able to scan the shader code when it's loaded, and it can tell what subset of the GPU the code uses. The performance counters used for the profiling tools are accessible to the driver as well, and it can be aware of a half-dozen cards running similar or the same workload, linked to a dinky CPU with PCIe 2.0 at 4x or 2x width.

    That depends on the definition of legitimate user and reasonable time in this market.
    Small-time buyers are dabblers or pump and dump suckers.
    Large-scale buyers charter dedicated flights to save them days/hours in time, and as multi-million dollar concerns they are worthwhile targets for litigation--or AMD can sell cards to the datacenter that abides by its agreements.
    The key setup for the PSP can be made complex enough that a non-trivial amount of time can be lost trying to override each card or setup, and the goal is to install and start running these things by the hundreds as soon as possible with minimal overhead.
    They could just pay up front for the version that lets them do it.
     
    Sxotty and pharma like this.
  8. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,485
    Likes Received:
    897
    "[N]ot that Raja had full overview over RTG’s business operations", @Ryan Smith is apparently being told.

    I'm curious to see AMD's Q4 results (January 30). You'd expect very good results for the RTG, given the crypto craze, but I fear the company may have failed to anticipate it and adjust production, so it's possible they're barely selling any more cards than usual—but just enough to lead to constant shortages.
     
  9. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,098
    Likes Received:
    2,814
    Location:
    Well within 3d
    It's also reported that they are rolling semi-custom under that umbrella, which may make the duties of the RTG leadership even less comparable.
    I'm not sure what the implications are for that just yet, though I'm guessing they won't shift revenue numbers around just yet.
    The claim that RTG's budget is increasing would be trivially true if it pulled in semi-custom, and a fair amount of AMD's organizational bandwidth for pumping out designs did go into semi-custom products that all had GPU IP.
    Does this mean that semicustom is expected to be GPU+other at this point?

    Crypto is still and odd case for trying to time it. Bitcoin's massive rise doesn't directly have bearing on GPUs, but it seems like its place as the benchmark helps encourage a level of correlation anyway.
    Memory prices could discourage producing many cards as well, if they are still linked to the old pricing structure. If the AIBs or wholesalers inhale the large price increases, producing more may not net AMD so much without additional measures.
     
  10. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Considering all the engineers being hired for RTG would suggest the budget was significantly increased irrespective of the semi-custom.

    Not necessarily as an enterprise customer would be more predictable and adapt their chosen algorithm to whatever hardware they chose. It's not current, but future coins/chains that would be the target. Institutional miners won't be targeting existing coins, but blockchains of their own making and design in partnership with other companies or organizations. Video content distribution for example could result in the major networks, Hulu, Netflix, Disney, Amazon, etc rolling their own chain to determine rights across the market. Game distribution could do the same, so an XBox title could be transferable to Playstation or PC. The problem with current coins is they need adopted for something. They are effectively in game currencies for some sort of market trading simulator right now.
     
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,098
    Likes Received:
    2,814
    Location:
    Well within 3d
    I don't have a handy reference to AMD's headcount to know how significant any such hirings are. If the numbers are significantly above replacement and other teams are holding steady, it may be consistent with a budget increase.

    I'm not sure what the definition of an institutional miner would be in this instance. If it's a company or some other group that is formulating its own block chain for some kind of product or internal use, the first decision may likely be to dispense with proof of work. Internally directed block chains are not driven to make the block chain difficult to hijack by blowing out the computational budget in a trustless environment. An institution would be its own source of trust or have a more reasonable outlay for computational load. Additionally, an institution is less likely to want to have its cost of doing business increased by or profit siphoned by anonymous third parties.

    The mining that is influencing the GPU market is trying to latch onto popular coins or trying to pump one coin or another into netting miners' (or fresh buyers in an exchange) attention. The ASIC-friendly coins like Bitcoin in particular have signs of significant control from established large mining pools, though long out of the scope of GPU hardware.

    The major networks already control their own infrastructure, protection methods, and ledgers. A block-chain could be used as a element for a distributed ledger or straightforward method of assigning/validating rights to items, but nobody but those networks need to have say in validating or tracking most of the ledgers. A much lower amount of computation would be needed to hash through blocks, and few external parties would need to hold much of the ledger.
    Other elements of blockchains that are used to create scarcity or avoid inflation are meaningless in this instance. Disney doesn't want an algorithmic limit to how many videos it can sell for all time, and would want the ability to negate or replace wallets or the equivalent nearly at will.
    They could likely fall back in computational load to standard servers or some form of ASIC in a few locations, as they don't need to care how fair their non-mining ledger is to miners they don't have.
     
  12. Samwell

    Newcomer

    Joined:
    Dec 23, 2011
    Messages:
    112
    Likes Received:
    129
    They are spending nearly 50% more R&D now than at the minimum. So of course the headcount should've gone up a lot for every part of the company. They're now at the level of 2013. The problem is, that nvidias R&D is still 50% higher and in the same time as Amd, nvidia also increased theirs 50%.
    With Navi i expect AMD to have a full lineup from the start again, but it still will be a tough fight.
     
  13. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    492
    Likes Received:
    212
    I doubt that will be the case.
     
  14. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    637
    Likes Received:
    477
    Location:
    55°38′33″ N, 37°28′37″ E
    Such bandwidth sensitive workloads could be opmitized by the driver (or managed explicitly by the programmer through API extensions) to use each die as a separate node with a local memory pool.

    At the time of Infinity Fabric announcement, the news site were citing 512 GB/s of bandwidth in graphics apllications (or directly in Vega GPU). Unfortunately I cannot find any references to original AMD materials in the white noise of reposts and retweets...

    Compute-bound workloads?

    It's only because miners are in anticipation of higher profits - top-end Vega cards earn about US$100-200 per month , so if it fails, at current prices, a replacement card is worth a full year's profit.

    Interpreting performance counters takes a qualified graphics programmer running graphics debugging tools in the Visual Studio IDE with full aceess to the source code. I'd like to see how well the video driver would fare in this task. Knowing common access patterns is not the same as identifying mining applications with a 100% confidence. Any such detection heuristics in the driver will just give a rise to obfuscation code pretending to perform genuine graphics rendering.

    If it works - the problem is, intricate DRM systems rarely work as intended; the SimCity fiasco should teach us a few lessons.

    No. The Soviet planned economy was essentially the economy of shortages by design, because it could not effectively react to changes in supply and/or demand by adjusting either price or production volume, like a free market economy does. It wasn't only consumer goods, the same happened with heavy equipment, transportation, housing, you name it.

    Every enterprise had a long-term government-approved production plan, and all prices were fixed by the government. It was genuinely the best intention of state planners to balance the system, but since they were precluded by communist ideology from using 'capitalist' methods, their complex and well-though measures were unable to resolve the shortage problem for decades.

    And then, the day the government withdrew from price regulation became the same day when all shortages ended, once and for all.


    I see quite a few similarities here. First, GPU vendors (or maybe video card vendors, as per Gamer's Nexus article ) are unwilling to increase production to satisfy the increased demand, fearing the end of the mining craze (though by far all such predictions did not materialize) that might leave them with stocks of unsold cards (or maybe making the most of highest-ever video card prices while they can). On the other hand, they cannot let the prices go much higher, or the whole PC gaming market would just collapse in a few years - not only sofware studios, but hardware makers as well (just like Apple Mac gaming market which does not even exist) - and at the same time they cannot let miners consume all available video cards for much longer, for the same reason.

    So instead of going through the obvious solution of applying the laws of free market, they propose a kind of 'video card rationing' system, which will certainly give a rise to a wide-reachng black market. Isn't it wise?
    There soon could be hordes of black market dealers collecting stolen personal data, forging idendity documents and making fake bank accounts - all to make illegitimate orders from non-existing customers. Involve the police and there is potential for this to become a new war on drugs.


    That said...

    AMD does not produce video cards, OEMs do. The Gamer's Nexus article below could explain why AMD may have decided to skip 12nm Vega parts.

    https://www.gamersnexus.net/industry/3211-what-do-manufacturers-think-of-mining-and-gpu-prices

    If OEMs are so much concerned about oversupply and Summer is traditionally a time for introducing refresh products, skipping new product generation altogether gives OEMs confidence for ordering higher production volumes of existing designs - so even if the demand from miners falls within a few months, at least it will not come to complete stall in anticipation of incoming new generation cards. Makes sense to me...
     
    #74 DmitryKo, Jan 25, 2018
    Last edited: Jan 31, 2018
  15. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,098
    Likes Received:
    2,814
    Location:
    Well within 3d
    At that point, why invest in a massive amount of inter-die communication for a mining-targeted product?
    AMD could provide a less intensive level of connectivity between the dies, and unless there's a class of mining algorithms that is widely used and profitable AMD could pocket the implementation savings with little revenue lost.

    I believe Raja Koduri stated that the mesh's bandwidth matched the memory controller bandwidth. The implementations so far have individual link bandwidth in one direction that matches one channel's bandwidth. EPYC over-provisions its MCM bandwidth in order to have a fully connected setup.

    This presumes for mining that there's a compute-bound workload that is sensitive to whether there is one GPU of size X versus 2 of size X/2. I'm asking if there's a class of mining algorithms with that limitation. Some of the most notable ASIC-resistant algorithms were constructed in part to not scale as much with resources that might favor a wealthy miner disproportionately or allow concentration of hash rate.

    A part of the underlying infrastructure for those counters is what allows DVFS to not melt the chip or engage turbo, since AMD's method uses progress measurement, estimated power cost per event, and utilization.
    Further, part of my proposal was the introduction of specific functions that boost mining efficiency that would be very trackable.

    If the heuristic is, for example, "consumer SKUs must use 20% of hardware events for fixed-function graphics hardware events before duty cycling, and mining ops are 1/16 rate", a miner would spoof it by leaving 20% of the GPU's effectiveness running a fake renderer and not using mining operations. That, or they could pay more up-front and still make money.
    Additional targeted restrictions could be checks on workloads not using the graphics pipeline in systems with more than 2-4 cards, and maybe a check for the negotiated PCIe link width being 4x or below. Special operations or the desired low voltages and clocks could be overridden by DVFS, which is already able to override settings at its discretion.

    It's not that one couldn't mine, just that someone trying to optimize a mining rig or mining data center for max profit while paying retail price gets only what they pay for.

    This isn't DRM. I was not proposing that they prevent mining so much as providing a trade-off in terms of up-front cost for a mining SKU or unlock, versus a reduced hash rate that would still eventually pay for itself under current miner logic. This is market segmentation.

    I was talking about how the members of the Party or the power structure did well for themselves, not well for the nation at large.
    I admit I'm not an expert on the transition, but I thought a fair number of power brokers under the Soviet Union wound up the opposite of paupers after its dissolution. I didn't think those named Yeltsin or Putin, their allies, or the friends and family of managers of state-run enterprises that bought said enterprises at a discount wound up sleeping under a bridge.
    Granted, any number of that pool did wind up mysteriously dead or imprisoned, but that's not an economic mechanism.
    AMD isn't the person standing in the bread line.

    The "laws of the free market" in this case is charging whatever the market will bear, which in the case of miners is a lot.
    The manufacturers are also the supply, which is the ultimate constraint on any type of market.

    There are already hordes of people like that, they're one of the few classes of individuals using crypto-coins as they were intended. Besides, if it were that easy to do this at scale, why not still make fake accounts so you could make fraudulent purchases and get all the cards for free?

    As far as drugs go (again, one of the few cases for there to be a non-zero valuation for most cryptocurrency), a package of coke doesn't broadcast itself to the whole world every couple seconds and log itself into a permanent record for the world to see, while hooked into a building with a utility-scale power contract with a thermal signature that could probably be seen from space.
    Besides, in this case, AMD's the cartel and they can add conditions on the disposition of their product as much as they like.

    The polite request that retailers constrain purchase quantity doesn't affect gamers that mine on the side, or more casual miners.
    Mid-level buyers that start renovating rooms or houses might get hurt, and it costs money, time, and possibly freedom to wholly get around this.
    Big buyers are bypassing retailers and possibly part of the wholesale market, and as such are likely getting a bit more money back up the food chain to the AIBs and possibly AMD or Nvidia (Nvidia's direct model could make this easier). As such, the big players wind up padding out margins for a board maker or GPU vendor, and it hurts the datacenters' smaller competitors.
     
    Sxotty likes this.
  16. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    I've seen hundreds of new positions post that weren't there before. Lot of layout and senior spots to accommodate more board designs and teams from the looks of it.

    Bank, government, or some sort of corporate partnership of competitors using the chain directly.

    Problem being they control their own networks with limited sharing options or exclusive rights. With future streaming services, content rights could benefit from the flexibility. Move the rights into a shared, distributed market where one company could fail without costing consumers access to their content. Credit expended in distribution.
     
  17. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,108
    Likes Received:
    1,802
    Location:
    Finland
  18. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,485
    Likes Received:
    897
    I think that's just normal seasonality. But 2017 is a lot better than 2016, and that's the most important thing.
     
    #78 Alexko, Jan 31, 2018
    Last edited: Jan 31, 2018
  19. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    17,031
    Likes Received:
    1,605
    Location:
    Winfield, IN USA
    Are we speculating or did you mistype "2016" as "2018", because I think 2018 is gonna be a LOT better than 2017.... :p
     
    Lightman, Alexko and Grall like this.
  20. itsmydamnation

    Veteran Regular

    Joined:
    Apr 29, 2007
    Messages:
    1,286
    Likes Received:
    385
    Location:
    Australia
    its going to be interesting to see how much head room they cant get on "12nm" , the problem with ryzen in the DIY space isn't the clock speed , its the damn wall it hits, even if power goes up quite alot in the mid 4ghz if they can stop that wall effect so using things like AIO/water coolers makes a real difference i think we will see ryzen 2 being very popular , if the wall just moves a few 100mhz i expect it to just continue along its current sales trajectory.

    I think EPYC will see very well, I think the APU will do very well as well. AMD will be hoping that mining boom keeps up because its making there GPU deficit irrelevant from a revenue point of view.

    All in All if AMD has the manufacturing capacity 2018 I think AMD should see billions added to the revenue number vs 2017 ( ~7 vs 5)
     
    digitalwanderer likes this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...