Intel ARC GPUs, Xe Architecture for dGPUs [2018-2022]

Status
Not open for further replies.
What, you don't want rgb illuminated datacenter ?!

RGB Cloud

Highly_iridising_altocumulus.jpg
 
BTW, what really confuses me are those x86 rumors: https://www.techpowerup.com/261125/7nm-intel-xe-gpus-codenamed-ponte-vecchio
Other sites already confirmed this, but guess they are misinformed. Likely they mean more C++ than x86? Otherwise i would remember Larrabee and think this is no real GPU at all, or not related to the expected Xe dGPU.

I think it's probably just a misunderstanding. Intel is promoting the "oneAPI" to unify workloads across CPU, GPU, and FPGA (the underlying programming language is C++). There's no need for Xe GPU to support x86 under this arrangement.
 
Just thinking out loud here considering there is so little we know about the architecture. But does Intel posses any GPU patents that hint at what they hope to bring to the table? Considering both AMD and nVidia have a volumetric fuck-ton of patents Intel wouldn't want to step on I'm just assuming they'll look into their own goodie-bag in parallell with developing new patents.

Both Raja Koduri and Tom Petersen can at the very least make "educated guesses" as to where their former employers are heading without breaking NDA I'd wager. So they'll know what to look for to be competitive. For example we know Xe will probably be capable of ray-tracing. There AMD has its texture cache BVH patent while nVidia has their tensor solution. Does Intel have anything on the subject?

(Sometimes I wish there'd be a rumour wikipedia where all those tasty tidbits are collected instead of strewn out across message boards, blogs, and tech news sites.)
 
Why buy ImgTech when they have the glorious ghost of Larrabee for their tiled rendering needs? All hail Larrabee!

Tiled rendering yes, but not tiled deferred rendering. Don't compare the two, "'it's totally inappropriate. It's lewd, vesuvius, salacious, outrageous!"© :D

More seriously, do we know if they started from scratch (well, as close as possible in this area), or started from their igpu tech ?
 
Tiled rendering yes, but not tiled deferred rendering. Don't compare the two, "'it's totally inappropriate. It's lewd, vesuvius, salacious, outrageous!"© :D

More seriously, do we know if they started from scratch (well, as close as possible in this area), or started from their igpu tech ?
They have included elements of previous Gen-iGPU tech
https://www.anandtech.com/show/1513...an-interview-with-intels-raja-koduri-about-xe
IC: Is Xe anything like Gen at a fundamental level?

RK: At the heart of Xe, you will find many Gen features. A big part of our decision making as we move forward is that the industry underestimates how long it takes to write a compiler for a new architecture. The Gen compiler has been with us, and has been continually improved, for years and years, so there is plenty of knowledge in there. It is impressive how much performance there is in Gen, especially in performance density. So we preserved lots of the good elements of Gen, but we had to get an order of magnitude increase in performance. The key for us is to leverage decades of our software investment – compilers, drivers, libraries etc. So, we maintained Gen features that help on software.
 
Are you trying to put me out of a job?!?!


;)

I wouldn't dream of it!

They have included elements of previous Gen-iGPU tech

Hmm, considering the time constraints that's probably a prudent decision. Intels Gen architecture is a known and solid quantity. But I've never heard it described as performant, so I started to look it up. ExtremeTech had an interesting, if brief, overview on the differences between Gen 9 and Gen 11. Seems like the most important changes are A) a general beefing up in both execution units and bandwidth, and B) the introduction of their POSH (Position Only Tile-Based Rendering) system ti improve certain geometry processing to improve memory bandwidth efficiency. If you read the Intels own Gen11 architecture presentation it further presents Coarse Pixel Shading (a variant of variable pixel shading from what I can tell) as an important part.

It does seem that they're focusing heavily on tile based rendering, at least in part, with a much improved L3 cache to facilitate rapid memory reads and writes.

All in all it seems like a solid starting point for Xe really. I wonder how they'll adapt it for ray tracing though.
 
New overarching discussion on Anandtech: https://www.anandtech.com/show/1518...rete-xe-hpc-graphics-disclosure-ponte-vecchio

The big reveal here: There's SIMT and SIMD units. I guess that's how they'll be doing raytracing, just straight up multi simd units? Seems flexible at least, as in, "whatever AMD and Nvidia are doing, and thus gamedevs are going to do, we can do to". Well, maybe? I dunno.

Also interesting for any number of other reasons, but details are close to none, just starting to drill down from high level goals on a journey towards specifics. Thought it also tells us Intel is interested in HPC, a bit obvious with the exascale contract, but there it is; oh and 8 stacks of HBM? Geeze, though it seems to be a weird multi die setup or something. Edit - ok finished reading, derr. They also mention they've got their own coding language, "Distributed Parallel C++", but I don't know what it's like. AFAIK the reason everyone likes CUDA is it's just so nice and clean, unlike the monster C++ has slowly become, so, I dunno there either.
 
Last edited:
Wow, this reads as a complete and ambitious vision about future hard and software. Much bigger than just dGPUs for games. Exciting.
 
Larrabee sounded ambitious too at the time. I'll wait &see. Plus we don't know if Intel production tools (I mean 14/10/7nm/...) will allow the birth of the planned products, or if they will have to dial back some stuff.
 
IMO, "SIMT and SIMD units" = variable vector width ( SIMD8 / SIMD16 / SIMD32 ). It gives you (or the compiler) an option to trade TLP <-> DLP. Gen Graphics has this for years. It's similar to the choice of wave32 / wave64 modes on AMD RDNA.
Intel might add more modes to the existing 8/16/32 for Gen12.

Also I would expect intel to put a hardware ray-tracing block in subslice, sharing texture cache (or L1 cache) with TMU.
 
IMO, "SIMT and SIMD units" = variable vector width ( SIMD8 / SIMD16 / SIMD32 ). It gives you (or the compiler) an option to trade TLP <-> DLP. Gen Graphics has this for years.
I'm unsure they mean it this way.
Could it be the SIMT unit sits on top the SIMD unit, meaning you could process say vec4 as a single SIMD data element, with a single instruction?
So you could divide your SIMT unit once into 8/16/32... threads like now, which then operate on any of 1,2,4,8... wide vector data elements as needed?
Though, sounds overkill. I did not understand this point very well.
 
I'm unsure they mean it this way.
Could it be the SIMT unit sits on top the SIMD unit, meaning you could process say vec4 as a single SIMD data element, with a single instruction?
So you could divide your SIMT unit once into 8/16/32... threads like now, which then operate on any of 1,2,4,8... wide vector data elements as needed?
Though, sounds overkill. I did not understand this point very well.

Don't really know either. From the initial impression of variable width, it sounded something like AMDs ability to do double rate fp16/quad rate int8 and their new "double compute unity/work group processor" thing.

But then Cutress started talking about their SIMT units basically being regular GPU threadgroup procs, and their SIMD units being more flexible CPU like units. The below graphics seems to support something like that, showing SIMT as full baseline performance and their added "SIMD" units enhancing performance under some workloads; I'm assuming the enhanced ones are capable of GPU/CPU hybrid work. It's an odd route to go down, and smacks a little of not getting over the "Mill" series, but maybe it'll pay off, assuming that's correct at all.

36_575px.jpg
 
Makes sense although what you say is more the other way around. But yes, i also had the impression of SIMD being more similar to CPUs.
Not sure if this finds its way into consumer products at all. The complexity could make give them a hard start, but also the option to catch up faster and beat the competition later maybe.

I hope it works out and they also have some positive influence on software / game APIs.
 
What's really curious, is the fact that supposedly the DG1 discrete part in fact has 96 EU's just like the integrated on Tiger Lake. Also, why do the latter two links clearly indicate it's a damn cpu (6+2, 8+2) even though it's supposed to be "Discrete Graphics 96EU DG1"?
https://portal.eaeunion.org/sites/o...8&ListId=d84d16d7-2cc9-4cff-a13b-530f96889dbc
https://portal.eaeunion.org/sites/o...0&ListId=d84d16d7-2cc9-4cff-a13b-530f96889dbc
https://portal.eaeunion.org/sites/o...4&ListId=d84d16d7-2cc9-4cff-a13b-530f96889dbc
 
They also mention they've got their own coding language, "Distributed Parallel C++", but I don't know what it's like. AFAIK the reason everyone likes CUDA is it's just so nice and clean, unlike the monster C++ has slowly become, so, I dunno there either.

How is CUDA better than modern C++? Also, if you use CUDA you are locked in to nVidea HW, which I do not think is something Intel desire...
 
How is CUDA better than modern C++? Also, if you use CUDA you are locked in to nVidea HW, which I do not think is something Intel desire...

EDIT - I regret everything, never get into a discussion about the merits of varyious programming languages, it will never end :runaway:
 
Last edited:
Status
Not open for further replies.
Back
Top