Intel ARC GPUs, Xe Architecture for dGPUs [2022-]


So ~4060Ti for $250 (vs $400) and with 12GBs (vs 8GBs).
Looks like a good deal breaker but the drivers situation is unknown as well as how much life is left in 4060Ti to begin with.
not on the 4060Ti's level. this is 10% faster than the 4060
 
Yeah it’s pretty bad. Intel has more control overhead than Nvidia and AMD. SIMD16 instead of SIMD32. But not sure that explains such a huge difference in density.
The relatively low L2 size also pushes the density down. AD106 already packs 32MB, although the Xe2 core implementation here gets an impressive 256kB of L1.
 
The relatively low L2 size also pushes the density down. AD106 already packs 32MB, although the Xe2 core implementation here gets an impressive 256kB of L1.

Sure but it doesn’t explain the overall size. If anything B580 should be even smaller given the lack of cache. It’s also just 20 cores vs 24 on the 4060. One difference is that there are 5 render slices on the B580 and only 3 GPCs on the 4060. Wider memory bus as well.
 
XeLL, not XeSS-LL ;)
And more specifically, XeSS is SR, XeSS 2 is SR+FG(+optional XeLL to my understanding)
thanks for the clarification. It sounds good, XeLL XeSS XeSS 2, that play on words sounds like it has a soul.

 
I looked at the perf/price chart and thought this is perf improvement, doh.
Still even at 4060 level it can be a good card, depending on drivers and how much time it has before Nv and AMD react.

The RTX 4060 comparison is done at 1440p ultra settings. I'm wondering how much of that is exposing the VRAM gap and just how playable it is across both. Intel I believe is only claiming 30fps+ gaming at 1440p ultra for the B580.

In case anyone is wondering you can see more specifics of how Intel's performance comparisons are derived - https://edc.intel.com/content/www/us/en/products/performance/benchmarks/desktop_1/
 
Sure but it doesn’t explain the overall size. If anything B580 should be even smaller given the lack of cache. It’s also just 20 cores vs 24 on the 4060. One difference is that there are 5 render slices on the B580 and only 3 GPCs on the 4060. Wider memory bus as well.
This leads to quite a disparity in old-school aspects. The B-21 seems to have 80/96 pixel pipelines, while AD107 has 48.
 
This leads to quite a disparity in old-school aspects. The B-21 seems to have 80/96 pixel pipelines, while AD107 has 48.

Yep also Intel gives each core 8 TMUs so that’s 160 TMUs to the 4060’s 96. TDP is also much higher. According to Intel’s numbers B580 is up to 40% faster than the 4060 in some games. That actually makes sense when you look at the raw specs.
 
According to Andreas Schilling on X, Tom Peterson said Battlemage was designed with PCIE 5.0 but it wasn't productized -
Good to keep the cost of the boards down if there isn't a benefit from the extra bandwidth. Though I dunno how much more precise the traces and stuff have to be for PCIe5 vs PCIe4. It probably matters if you're already close to selling the things at a loss.
 
Good to keep the cost of the boards down if there isn't a benefit from the extra bandwidth. Though I dunno how much more precise the traces and stuff have to be for PCIe5 vs PCIe4. It probably matters if you're already close to selling the things at a loss.
Yep, agreed. And given the performance level, PCIE 4.0 x8 seems plenty.
 
Another new release where Arc-A isn't doing too hot even despite the fact that it has better RT support than AMD and 16GBs of VRAM (this particular benchmark punishes everything with less than 12).
This has slowly become the norm this year and I sure hope that Arc-B won't follow the same path.
 
A good video explaining all the architectural improvements in Battlemage compared to Alchemist, Intel also admits some major mistakes they made along the way with Alchemist.


Another new release where Arc-A isn't doing too hot even despite the fact that it has better RT support than AMD and 16GBs of VRAM (this particular benchmark punishes everything with less than 12).
In the video, Tom Petersen explains this. Alchemist is slow on games that use ExecuteIndirect type of instructions. It was an oversight from Intel, they didn't realize how important this instruction is to performance, and thus emulated it (in software) without native hardware support, which is why Alchemist falls behind in any game that uses it a lot. Battlemage fixed this and added native hardware support, making it between 7x and 12x faster in executing this instruction than Alchemist.
 
Another interview with Tom Petersen with Hardware Unboxed:

Tom keeps mentioning their working on Xe4. Wonder if Xe4 is MCM based perhaps as this patent shows?
quite a few interesting topics they talk about, specially the VRAM needed for modern games. Thanks for sharing and welcome!
 
So, Intel is going the NVIDIA route with XeSS2, the frame generation part is going to be exclusive to Intel GPUs, it won't run on NVIDIA or AMD hardware. Intel is relying on their AI XMX cores to do all the work in the frame generation, that AI code is not easily transferable across vendors. There will be no XeSS2 DP4a this time.

However, Intel and NVIDIA differ in their implementation, NVIDIA needed the optical flow accelerator + tensor cores to make DLSS3 work, while Intel only needed their AI cores, Intel GPUs lack the optical flow hardware entirely, so they had to make it work with their AI cores alone.
 
Not a fan of how everyone in the industry followed Nvidia on lumping new features into newer upscaling versions.
Nvidia at least synchronized the versioning between components eventually even if that didn't bring much in terms of messaging transparency.
Here it's impossible to understand if XeSS SR will be the same between XeSS 1 and 2 or if it will continue to work on non-Intel h/w.

I'm fairly sure that the same thing will happen with FSR4.

I think that a much better option here would be to concentrate on the additional tech instead of incrementing the "umbrella" version.
Just call it what it is - XeSS FG, no need to make it "XeSS 2". Same applies to DLSS, FSR, etc.
If there would be a need to accentuate the differences between a previous version of a component and the next one then just use component versioning - XeSS FG 2, DLSS RR 2, FSR SR 4.

Nvidia has in fact arrived at that pretty much natively after they've synced the versioning between all three.
But at the same time they continue using "DLSS 2" as if that means DLSS SR which isn't helping anyone.
This will only get worse with the upcoming Blackwell launch.
 
Another interview with Tom Petersen with Hardware Unboxed:

Fantastic interview. Tim asked some really good questions and Tom was great as usual.

I’m surprised Tom didn’t know a big title like Indiana Jones requires hardware RT. Seems like something he should know in his job.
 
It's almost certainly just the video ME, but not all architectures will be able to integrate that at low latency with compute.
This seems to me more related to image/video AI processing than video per se. IIRC it appeared first on Tegra SoCs aimed at automotive markets.
 
Back
Top