Intel ARC GPUs, Xe Architecture for dGPUs [2018-2022]

Status
Not open for further replies.
Maybe wait to see real performances before being so affirmative ?

Also I'm not sure what the big deal is behind a VLIW architecture when there's strong possibility that Kepler and Maxwell/Pascal were technically VLIW architectures as well since they required explicit encoding to dual-issue instructions ...
 
I am not expecting the gaming chips (Xe-HPG) to provide any stellar or competitive performance with AMD or NVIDIA, since it relies on the same scalability scheme as Xe-HPC, ie: relies on racking up several tiles of graphics to scale up core count, this will be a mess for drivers and games in general.
I strongly believe this is bullshit. I expect NVidia to prove this by the end of 2022, perhaps 2021.

It's a mess when your architecture is wrong. Similar to how asynchronous compute is a mess when your architecture is wrong.

The architecture relies purely on software scoreboarding (software schedulers), which means Intel will have it's hands full writing good drivers for it to achieve good utilization (VLIW5 days anyone?), then on top of that they are scaling it up through tiling (multi core/die approach), which means it's going to be a nightmare to write drivers for, and to extract good performance from.
Doesn't NVidia do multi-instruction issue? Doesn't the compiler produce multi-instruction bundles for max throughput?
 
I expect NVidia to prove this by the end of 2022, perhaps 2021.
No, AMD will do it, next year, for both client and DC because they feel like it.
It's a mess when your architecture is wrong
You also need some very very delicate and nice packaging there.
Doesn't NVidia do multi-instruction issue? Doesn't the compiler produce multi-instruction bundles for max throughput?
Yes and yes.
 
We have a long history of GPU architectures to judge and forecast performance from, nothing is affirmed of course, but it's worth going through the motions to predict where performance will lie given what we already know from past experiences.

Furthermore, Xe-LP still retains the abysmal max 1 primitive per clock rate, and worse yet, it lacks all of the features from DX12U, except hardware RT.

Intel removed hardware scoreboarding from Gen11, which wasn't really that effective there to begin with. Gen11 had one Thread Control unit handling 2 ALUs, each ALU had control over 4 FP32 instructions, so in total each Thread Control unit had access to 8 FP32 instructions, which I would call a pretty weak arrangement to begin with. Intel didn't change this arrangements in Xe-LP, instead it allowed each Thread Control unit to supervise 16 FP32 instructions now, further weakening their already weak position.

Navi is 4 yes? What is Turing/Pascal?
 
We have a long history of GPU architectures to judge and forecast performance from, nothing is affirmed of course, but it's worth going through the motions to predict where performance will lie given what we already know from past experiences.

Furthermore, Xe-LP still retains the abysmal max 1 primitive per clock rate, and worse yet, it lacks all of the features from DX12U, except hardware RT.

Intel removed hardware scoreboarding from Gen11, which wasn't really that effective there to begin with. Gen11 had one Thread Control unit handling 2 ALUs, each ALU had control over 4 FP32 instructions, so in total each Thread Control unit had access to 8 FP32 instructions, which I would call a pretty weak arrangement to begin with. Intel didn't change this arrangements in Xe-LP, instead it allowed each Thread Control unit to supervise 16 FP32 instructions now, further weakening their already weak position.

So maybe, all in all, they are ok with soft scoreboarding from their experience with Gen11 ?

For the dx12u features , yes, It's not cool, but I don't believe it will mater a lot for a first design. And they still support VRS Tiers 1.

For me, they don't need to be perfect yet, they need to release the product, with good performances and good drivers. If they do that, it's already a big achievement imo. Then, of course, they need to improve, like AMD and nVidia...
 
Also I'm not sure what the big deal is behind a VLIW architecture when there's strong possibility that Kepler and Maxwell/Pascal were technically VLIW architectures as well since they required explicit encoding to dual-issue instructions ...

Was Pascal dual-issue managed by the compiler?
 
Was Pascal dual-issue managed by the compiler?

It depends mostly on how clever the programmer is with the hardware. If they understand the conditions/constraints (restrictions) behind dual-issue instruction scheduling then it's possible for the compiler to generate codegen for these dual-issue instructions ...
 
I thought Vega was 4 and "up to 17" based on the use of their primitive shader? (which obviously never came about)
 
, Xe-LP still retains the abysmal max 1 primitive per clock rate, and worse yet, it lacks all of the features from DX12U, except hardware RT.

That's from the Anandtech article analysis?

Well they updated it:
Update: Intel has since shot me a note stating that they have in fact upgraded their geometry front-end, so this is not the same 1/tringle/clock hardware as on earlier Intel GPUs. Xe-LP's geometry frontend can now spit out two backface culled triangles per clock, doubling Intel's peak geometry performance on top of Xe-LP's clockspeed improvements.
 
2 Rasterizer for 768 Shader?... this ich much.

the question ist how much Polygons can the Frontend accept. We know after backface culling it can rasterizer 2 Polygons but how much polygons is it able to cull?
 
That's from the Anandtech article analysis?

Well they updated it:
Yep. Leave it to Intel to completely overhaul their geometry front end for parallel execution, and then not bother telling anyone.:LOL:

The upshot, at least, is that their technical team is paying attention to what's being written. So if we get something wrong, they've been giving us the correct data.
 
Does anyone even benchmark raw triangle throughput anymore? I think the last was hardware.fr. Damien is sorely missed.
Most of the tools for this are quite old these days, as low-level benchmarks don't garner the interest they once did. Unfortunately, I'm not sure what Damien was using to begin with.
 
Most of the tools for this are quite old these days, as low-level benchmarks don't garner the interest they once did. Unfortunately, I'm not sure what Damien was using to begin with.

Yeah that’s understandable. Times have changed. It’s not even clear whether geometry throughput has a material impact on overall game performance these days. Maybe someone will write a mesh shader bench.
 
Yeah that’s understandable. Times have changed. It’s not even clear whether geometry throughput has a material impact on overall game performance these days. Maybe someone will write a mesh shader bench.
Doesnt seem to be too important given how well GCN compares to NV in most AAA games. This could be way too simplistic a view though.
 
the question ist how much Polygons can the Frontend accept. We know after backface culling it can rasterizer 2 Polygons but how much polygons is it able to cull?
I interpreted the quote to say they can call two primitives per clock meaning they likely rasterize one that survives culling.
 
Status
Not open for further replies.
Back
Top