AMD Execution Thread [2024]

Lurkmass · Sep 2, 2024

trinibwoy said:
It’s literally the first two paragraphs. Can you stop pretending you don’t understand what I’m referring to?

How exactly is it the 'opposite' if ROCm only works on specific AMD HW (CPU/GPU/FPGAs) combinations with Broadcom (collaborative partner) PCIe switches ?

DmitryKo · Sep 2, 2024

Lurkmass said:
How exactly is it the 'opposite' if ROCm only works on specific AMD HW (CPU/GPU/FPGAs) combinations

trinibwoy said:
She laid down an AMD AI landscape that is polar opposite to Nvidia’s proprietary approach.

In her view, customers have a choice: choose a dystopian Nvidia world in which the company owns the assets, or select AMD’s world, where you can select your partners, hardware, technologies, and AI tools.

Isn't that a reference to Nvidia's current 'hardware subscription' model, where Nvidia is installing their own DGX-series supercomputers at major data centers to charge customers a $4500 yearly usage fee per each GPU?

Nvidia plans subscription-fueled journey to $1tr revenue

Think about Tesla charging extra for features like Autopilot, then look at your GPU card, and you'll get the idea

www.theregister.com

Nvidia’s subscription software empire is taking shape

$4,500 per GPU per year adds up pretty quick – even faster when you pay by the hour

www.theregister.com

Deleted member 2197 · Sep 2, 2024

DmitryKo said:
Isn't that a reference to Nvidia's current 'hardware subscription' model, where Nvidia is installing their own DGX-series supercomputers at major data centers to charge customers a $4500 yearly usage fee per each GPU?

Could be. Nvidia adopted the Oracle (database companies, etc..) subscription model for companies choosing to host their AI Enterprise product suite remotely at major data centers instead of locally on their corporate computers. Since they Nvidia and Oracle have a partnership I would guess the subscription is a carryover from Oracle's per CPU usage fee. Outside the database subscription, Oracle's product suite also includes a wide range of applications for federal, heathcare, manufacturing, etc... that run on their database. For companies choosing DYI applications usually become overwhelmed by the amount of annual updates required to maintain a reliable, functioning system.

All of the large consulting companies (Arthur Anderson, Booze Hamilton, etc..) also have subscription models for their Enterprise product suites. The amount of annual updates required by these products (especially latest Govt., and Federal changes affecting many industries) is well beyond the capability of most company's ability to maintain on an annual basis and the primary reason clients chose subscription based data center applications. Unfortunately consulting companies are getting on the AI bandwagon and now offering AI enterprise suites.

DmitryKo · Sep 2, 2024

Lurkmass said:
ROCm only works on specific AMD HW (CPU/GPU/FPGAs) combinations

Well, HIP is not limited to AMD hardware, the runtime can also run on NVidia hardware conforming to CUDA 6.0 or later - so if you program CUDA directly, you can convert your code to HIP and it will still run on NVidia hardware (though HIP is a subset of CUDA, so you will probably lose some functionality).
You will also need to port your code to hip* versions of common math libraries (BLAS, RAND, SOLVER, SPARSE) - but these will also run on NVidia hardware by redirecting to standard cu* versions.
So at least it's more portable comparing to CUDA, which is firmly locked to NVidia's own hardware.

And if you're building your ML models with high-level frameworks like ONNX, PyTorch and TensorFlow, these days you should be able to run them on either CUDA or HIP/ROCm hardware (unless you specifically require a recent version that hasn't been ported to ROCm yet).

pharma said:
Nvidia adopted the Oracle subscription model (database companies, etc..) for companies choosing to host their products remotely at major data centers

Companies couldn't even build their local HPC clusters lately because of NVidia's GPU shortages, as shipping dates would slip by several months. That's why NVidia's customers were forced to either turn to their hardware subscription model, or consider alternative platforms.

BTW Business Insider published an interview with AMD's head of AI in July, and it does put a similar emphasis on ROCm as an 'open-source' alternative to NVidia's ecosystem - but of course you should treat it as a marketing pitch rather than a technical statement. The intention was obviously to take advantage of GPU shortages and make people in charge reconsider AMD platforms, not provide fine technical details to engineers maintaining actual software...

AMD’s AI chief explains how it’s tackling Nvidia’s ‘lock-in’ and GPU shortage with an open-source approach – Ofljom Blog

AMD has an open-source plan to compete with NVIDIA

At the recent Reuters' Momentum AI conference in San Jose, California, Ramine Roane, AMD's corporate vice president of data center,

dataconomy.com

trinibwoy · Sep 2, 2024

Lurkmass said:
How exactly is it the 'opposite' if ROCm only works on specific AMD HW (CPU/GPU/FPGAs) combinations with Broadcom (collaborative partner) PCIe switches ?

That’s a question for the articles author.

DmitryKo said:
Isn't that a reference to Nvidia's current 'hardware subscription' model, where Nvidia is installing their own DGX-series supercomputers at major data centers to charge customers a $4500 yearly usage fee per each GPU?

At the last earnings call Nvidia downplayed the scope of its involvement in systems integrations. They claim their business is all about providing technology at every level of the stack and leaving integration up to partner ODMs. I doubt very much that “Nvidia’s model” here refers to fully integrated solutions that they provide since it’s supposedly not a large part of their business.

The part that’s opposite to what AMD is proposing is that Nvidia provides nearly all of the tech required up and down the stack. They’re a one stop shop and don’t need to negotiate interfaces or timelines with anyone else (aside from their own component suppliers of course).

Lurkmass · Sep 2, 2024

DmitryKo said:
Well, HIP is not limited to AMD hardware, the runtime can also run on NVidia hardware conforming to CUDA 6.0 or later - so if you program CUDA directly, you can convert your code to HIP and it will still run on NVidia hardware (though HIP is a subset of CUDA, so you will probably lose some functionality).
You will also need to port your code to hip* versions of common math libraries (BLAS, RAND, SOLVER, SPARSE) - but these will also run on NVidia hardware by redirecting to standard cu* versions.
So at least it's more portable comparing to CUDA, which is firmly locked to NVidia's own hardware.

You can auto-generate certain subsets of CUDA compatible code from HIP code or vis a vis but that doesn't ultimately change the fact that you as a developer have to do QA testing for each of these platforms (CUDA/ROCm) on a separate basis because they have incompatible runtimes/compilers (ROCr/HIP-CLANG vs CUDA/NVCC) and different programming features/paradigm (inline assembly vs intermediate representation/wave64 vs wave32/more SIMD functions vs independent thread scheduling/etc) ...

On a surface level (HLL source code/compatible subsets of APIs) you might be able pigeonhole some trivial commonality between them but AMD are absolutely expecting you to do manual coding/performance tuning especially if you want to use mutually exclusive extensions between CUDA or HIP. HIP is not an API designed to where you can just make the "write once, run everywhere" approach viable in every case. HIP softens the barrier to cross-platform development but you absolutely do need to do independent building/testing for them ...

Granath · Sep 2, 2024

Lurkmass said:
You can auto-generate certain subsets of CUDA compatible code from HIP code or vis a vis but that doesn't ultimately change the fact that you as a developer have to do QA testing for each of these platforms (CUDA/ROCm) on a separate basis because they have incompatible runtimes/compilers (ROCr/HIP-CLANG vs CUDA/NVCC) and different programming features/paradigm (inline assembly vs intermediate representation/wave64 vs wave32/more SIMD functions vs independent thread scheduling/etc) ...

On a surface level (HLL source code/compatible subsets of APIs) you might be able pigeonhole some trivial commonality between them but AMD are absolutely expecting you to do manual coding/performance tuning especially if you want to use mutually exclusive extensions between CUDA or HIP. HIP is not an API designed to where you can just make the "write once, run everywhere" approach viable in every case. HIP softens the barrier to cross-platform development but you absolutely do need to do independent building/testing for them ...

QA usually are different People than devs. And test shall be HW agnostic.

Lurkmass · Sep 2, 2024

Granath said:
QA usually are different People than devs. And test shall be HW agnostic.

Can they even guarantee testing on an application level to be hardware agnostic when you can implement potentially different code (algorithms & data structures)/undefined behaviours/features/etc. between these different platforms ?

trinibwoy · Sep 2, 2024

Proper QA must take hardware into consideration. It probably should take fully built data centers into consideration. It’s way too early to trust that AI workloads will just work on untested platforms.

Albuquerque · Sep 2, 2024

Speaking from direct experience, there's a limit on how much hardware any rationally sized QA team can actually test. Beyond a moderate handful of primary use cases, you can only test aginst items which should be indicative of the larger population. Eg you shouldn't need to test a 3050, 3050 mobile, 3060, 3060Ti, 3060 Mobile, 3070, 3070 Ti, 3070 Mobile, 3070 Super, 3070 Ti Super, 3070 Mobile Super.... You should be able to test against a single 3000-series video card, and when it works, make a rational assertion the rest of the 3000-series cards should repeat the same behavior and call it good. Same goes for CPU types (eg a single 14700 should suffice for the entire 14xxx line of 20-something CPUs.)

So, it's entirely possible one software can be tested against the majority of hardware cases, allowing for proxy substitutions. Reality will sneak in and still break things, as a function of driver changes, firmware changes, underlying system configuration changes, OS changes, and I'm sure another two dozen things I'm simply just not thinking about at this exact moment.

Granath · Sep 5, 2024

https://twitter.com/x/status/1831728952584573205

Someone in this thread mentioned other backends. Seems that vLLM catched up.

Deleted member 2197 · Sep 6, 2024

Granath said:
https://twitter.com/x/status/1831728952584573205
Someone in this thread mentioned other backends. Seems that vLLM catched up.

I think they tested using A100s & H100s and did not use TensorRT LLM due to support issues.

vLLM v0.6.0: 2.7x Throughput Improvement and 5x Latency Reduction

TL;DR: vLLM achieves 2.7x higher throughput and 5x faster TPOT (time per output token) on Llama 8B model, and 1.8x higher throughput and 2x less TPOT on Llama 70B model.

blog.vllm.ai

DegustatoR · Sep 8, 2024

AMD deprioritizing flagship gaming GPUs: Jack Huynh talks new strategy against Nvidia in gaming market

The battle seems to be over before it starts.

www.tomshardware.com

digitalwanderer · Sep 8, 2024

DegustatoR said:
AMD deprioritizing flagship gaming GPUs: Jack Huynh talks new strategy against Nvidia in gaming market

The battle seems to be over before it starts.

www.tomshardware.com

Is this new? I can't recall the last time AMD really tried to make a top end card. They always seem to go for middle-high end at best and their strength has been their mid-tiers for a while, 'til last gen or so. :s

entity279 · Sep 8, 2024

to me this is spelling out the narrative well after the corresponding the products decisions have been taken.

I guess lowering the expectations of consumers could be a good ideea, given how it played out with previous launches. (even including Zen 5 with even AMD having some slides that incorrectly stated they've launched the best gaming cpu ( or something like that ))

Frenetic Pony · Sep 8, 2024

entity279 said:
to me this is spelling out the narrative well after the corresponding the products decisions have been taken.

I guess lowering the expectations of consumers could be a good ideea, given how it played out with previous launches. (even including Zen 5 with even AMD having some slides that incorrectly stated they've launched the best gaming cpu ( or something like that ))

Yeah this is just PR controlling the narrative as to why RDNA4 doesn't have a flagship, you get out ahead of it and everyone believes you. Then when RDNA5 has a flagship you spin a new story

DavidGraham · Sep 8, 2024

Frenetic Pony said:
Yeah this is just PR controlling the narrative as to why RDNA4 doesn't have a flagship

Yep, it's pretty evident it's PR when he rationalizes going for the best in AI while not doing the same for graphics!

Here's the thing: In the server space, when we have absolute leadership, we gain share because it is very TCO-based [Total Cost of Ownership]. In the client space, even when we have a better product, we may or may not gain share because there's a go-to-market side, and a developer side; that's the difference.

It's like the AI side doesn't have a developer element while the consumer side does! Oh really?

Then he says this false thing:

Even Microsoft said Chat GPT4 runs the fastest on MI300

Which isn't surprising given the recent AMD marketing problems in RDNA3 and Zen 5 (their people must really lack base line accuracy), Microsoft only said MI300 is the best price to performance inference solution for GPT4. They said nothing about it being the fastest.

Also what the hell is that?

https://twitter.com/x/status/1832809368875646984

Geeforcer · Sep 9, 2024

This sounds oh-so-similar to PR that accompanied Navi 10 in 2019 (post-Vega), which in turn was an almost verbatim rehash of 2016 talking points that accompanied RX 480 launch (post-Fury). Which of course drew inspiration from 2007 and RV670 “midrange dominance” strategy (post-R600).

Call me foolish, but I honestly believe that AMD’s penchant for declaring that they “don’t want to play in that sandbox anyway” and abandoning high-end after a generation underperforms (for either technological or marketing reason) has been a major obstacle for them making the consistent market share gains. As many misses as Nvidia had, the users never had to wonder whether they would have to wait 3-4 years until a next high-end part comes along.

entity279 · Sep 9, 2024

Geeforcer said:
Call me foolish, but [...] abandoning high-end after a generation underperforms (for either technological or marketing reason) has been a major obstacle for them making the consistent market share gains.

It's a very reasonable assumption actually. However, doesn't mean the extra market share that would have been gained or at least preserved that way is enoungh to offset the overall investment costs in the high end part

RobertR1 · Sep 9, 2024

As I stated elsewhere, this is a good move for amd. Their current trajectory has them losing market share and mindshare because it sends the impression they keep falling short of the competition.

If they shift their messaging and product lineup to best value similar to early Zen generations they can generate a lot of positive buzz and sales momentum.

AMD Execution Thread [2024]

Lurkmass

DmitryKo

Nvidia plans subscription-fueled journey to $1tr revenue

Nvidia’s subscription software empire is taking shape

Deleted member 2197

Guest

DmitryKo

AMD has an open-source plan to compete with NVIDIA

trinibwoy

Meh

Lurkmass

Granath

Lurkmass

trinibwoy

Meh

Albuquerque

Red-headed step child

Granath

Deleted member 2197

Guest

vLLM v0.6.0: 2.7x Throughput Improvement and 5x Latency Reduction

DegustatoR

AMD deprioritizing flagship gaming GPUs: Jack Huynh talks new strategy against Nvidia in gaming market

digitalwanderer

AMD deprioritizing flagship gaming GPUs: Jack Huynh talks new strategy against Nvidia in gaming market

entity279

Frenetic Pony

DavidGraham

Geeforcer

Harmlessly Evil

entity279

RobertR1

Pro

Similar threads