Bondrewd
Veteran
An IP stash yes and AMD will leverage it in funny ways sooner or later.Portfolio will expand overnight
An IP stash yes and AMD will leverage it in funny ways sooner or later.Portfolio will expand overnight
Part of these exascale systems price is $300 million government investment in SYSCL effort... So yeah it proves that ROCm is nearly useless
No.Does AMD even support SYCL?
Raw SYCL is who cares, time to board OneAPI train.Also SYCL isn’t a replacement for ROCm
Raw SYCL is who cares, time to board OneAPI train.
AMDs official stack is HIP+ROCm. Their best bet would be to abandon HIP and build out an official SYCL/ROCm stack but there’s no sign of that. There’s no sign of them tossing ROCm in favor of oneAPI either.
How and when exactly is AMD planning to hop on that train?
Some cases are not the same as ALL cases.A100 programming in some cases is already that due to wonky as balls split L2.
Nope, 200GB/s, the two dies are connected by 4 IF links, each capable of 50GB/s up/down, so 200GB/s in total.More, like 400.
Then why bother putting two dies together with all the added complexity?The solution to MI250X's segmented memory access between the dies is to just launch twice as many compute kernel for maximum performance so that each die can run it's own compute kernels independent of each other in parallel
AMD's product page says up to 400 GB/s bidirectional in-package bandwidth between the GCDs, though the footnote appears to be incomplete.Some cases are not the same as ALL cases.
Nope, 200GB/s, the two dies are connected by 4 IF links, each capable of 50GB/s up/down, so 200GB/s in total.
Computation density in HPC. They need tons of FP64 flops in the smallest possible area.Then why bother putting two dies together with all the added complexity?
Then why bother putting two dies together with all the added complexity?
Same as here duh.Some cases are not the same as ALL cases.
400.Nope, 200GB/s
Well duh welcome to the very definition of NUMA lands.GA100 doesn't require any special cache treatment unless you want to reach out the absolute maximum performance for a single GPU config
You can treat those things as one bigass GPU.it has nothing common with the dual separate GPUs in MI250.
NUMA has nothing to do with this. Apparently A100 has full speed access to all memory banks without any optimizations, the only difference can be in cache latencies.Well duh welcome to the very definition of NUMA lands.
Duh oh ah you can treat 8 GPUs DGX system as a one bigass GPU, what a news!You can treat those things as one bigass GPU.
And you will likely get way less than A100 performance by treating them this way.You can treat those things as one bigass GPU.
You can treat the entire node as one bigass APU.
Totally not a misery point I swear to god.the only difference can be in cache latencies.
Who knows haha.And you will likely get way less than A100 performance by treating them this way.
what the fuck.Either MI200 is one year to late
membw says hello.AMDs numbers against A100 point in the direction of less than 50% real FP64 performance...