Intel ARC GPUs, Xe Architecture for dGPUs [2018-2022]

Status
Not open for further replies.
The concept behind TSS/OSL is that texels that don't get requested for shading can all be used to cache the results ...

If we take dynamic lights as an example then it's projected area in object space will be used to generate shading requests on those specific texels therefore prior results on the same area get implicitly rejected ...

I don’t think you understand my question. I know how the mechanism works. Any cached texels in object space will still become stale very quickly if the view or lighting changes even slightly. The question is how does the app decide whether a previously shaded texel is still valid.
 
Intel going dedicated hardware for ray tracing, they could be quite performant/advanced in the RT department. Now lets see the rest of the GPU architecture, which still matters.
 
I don’t think you understand my question. I know how the mechanism works. Any cached texels in object space will still become stale very quickly if the view or lighting changes even slightly. The question is how does the app decide whether a previously shaded texel is still valid.

The answer is simpler than think ...

As long as our projected area (whether that's derived from the view or light) doesn't change then the texels which were shaded prior to the current frame remains valid ...

Even if our projected area or the orientation of our objects do change, as long as our shaded texels still belong within the boundaries of our image we can still apply reuse ...
 
Last edited:
I don’t think you understand my question. I know how the mechanism works. Any cached texels in object space will still become stale very quickly if the view or lighting changes even slightly. The question is how does the app decide whether a previously shaded texel is still valid.

One way to do it is to reshade one texel out of every 4x4 texel block and skip the other ones when the difference is small. Would that be possible to be done with high performance?
 
Last edited:
The answer is simpler than think ...

As long as our projected area (whether that's derived from the view or light) doesn't change then the texels which were shaded prior to the current frame remains valid ...

Even if our projected area or the orientation of our objects do change, as long as our shaded texels still belong within the boundaries of our image we can still apply reuse ...

I would challenge that. Sure it’s simple if your only criteria is “same direct light”. But that seems inadequate for all of the indirect lighting that we will get this generation. There has to be some other heuristic for invalidating the cache.
 
I would challenge that. Sure it’s simple if your only criteria is “same direct light”. But that seems inadequate for all of the indirect lighting that we will get this generation. There has to be some other heuristic for invalidating the cache.

Sure, fast changes to indirect lighting can be considered worse case scenario but there's lot's of cases where indirect lighting is baked, updated at low rates, or much of our geometry/light sources remains static in most scenes so TSS/OSL can be useful to exploit shading reuse in those instances ...
 
I don’t think you understand my question. I know how the mechanism works. Any cached texels in object space will still become stale very quickly if the view or lighting changes even slightly. The question is how does the app decide whether a previously shaded texel is still valid.
Some of the geometric and/or color based tests developed over the years to handle ghosting with temporal filtering can be used in object space as well.

Another option is to only store in object space low frequency terms that tend do not change much in time and/or space and then amortize the cost of updating them over multiple frames.
 
Some of the geometric and/or color based tests developed over the years to handle ghosting with temporal filtering can be used in object space as well.

Another option is to only store in object space low frequency terms that tend do not change much in time and/or space and then amortize the cost of updating them over multiple frames.

Which we already do with TAA and TAA upsampling. Or VRS with TAA if you want with a vis buffer EG http://filmicworlds.com/blog/software-vrs-with-visibility-buffer-rendering/. Which is not to say you can't do that with texture space shading, just that 99% of the useful stuff here either is perhaps not as novel as it might seem at first glance, or isn't really a win. EG temporal re-use probably isn't a win at all for common worst case, and oversampled shading is just brute forcing what pre-filtering should be doing.

I can see its use for diffuse in VR though, that does seem like it's a win there; and maybe if "lightfield" 3d displays become popular as well. Coherent shading without re-shading where appropriate (diffuse only) where non coherent views are concerned does seem like the real standout feature here, even if that's not yet a concern for most.
 
Last edited:
Which we already do with TAA and TAA upsampling. Or VRS with TAA if you want with a vis buffer EG http://filmicworlds.com/blog/software-vrs-with-visibility-buffer-rendering/. Which is not to say you can't do that with texture space shading, just that 99% of the useful stuff here either is perhaps not as novel as it might seem at first glance, or isn't really a win. EG temporal re-use probably isn't a win at all for common worst case, and oversampled shading is just brute forcing what pre-filtering should be doing.

I can see its use for diffuse in VR though, that does seem like it's a win there; and maybe if "lightfield" 3d displays become popular as well. Coherent shading without re-shading where appropriate (diffuse only) where non coherent views are concerned does seem like the real standout feature here, even if that's not yet a concern for most.
I'm optimistic we can use caching for (some) specular as well. If normal angle to viewer does not change much, should be fine even if the stuff we see in reflections is moving.
If the angle is changing too much, we could search for a nearby fragment with a better matching angle.
Though, to get this right, we better separate lighting and materials, so having diffuse and specular irradiance buffers (compromising roughness) to composite with material, meaning we have to do material evaluation and composition for every pixel again, which is a downside.

TSS + caching has huge complexity and requires texture resolution always to be high enough, while SS TA alternatives like such 'software VRS' have minimal complexity and just work. The latter really seems more attractive. At least for now.
On the other side, TSS + caching has an advantage of data locality over SS TA approaches, which is rarely mentioned. Most obvious using RT as example:
If we update a stochastic set of surface patches to cache and reuse, we trace many rays from the same location with similar directions per patch, so rays (but also texture fetches etc.) are more coherent.
If we update a stochastic set of pixels in screenspace and use TA approaches like denoising, rays are scattered all over the place, and we get no nice big packets of coherent rays.

Though, that's all theory. I expected such solutions like 'software VRS' to come up first and turning TSS + caching ideas less attractive. One more source of TA artifacts, but very practical.
 
On the other side, TSS + caching has an advantage of data locality over SS TA approaches, which is rarely mentioned. Most obvious using RT as example:
If we update a stochastic set of surface patches to cache and reuse, we trace many rays from the same location with similar directions per patch, so rays (but also texture fetches etc.) are more coherent.
If we update a stochastic set of pixels in screenspace and use TA approaches like denoising, rays are scattered all over the place, and we get no nice big packets of coherent rays.

Though, that's all theory. I expected such solutions like 'software VRS' to come up first and turning TSS + caching ideas less attractive. One more source of TA artifacts, but very practical.

I mean, there's Epic's optimization of tiled screenspace neighbor raypackets for coherency and low noise. I assume everyone will be doing that actually. It sacrifices some minute detail since you're biasing so much. Maybe the screenspace trace could be separated and be per pixel? But then you're complicating the hybrid screenspace/RT trace that works so well for a mimatch in LOD between screenspace and RT geo so... But anyway that seems to be the ticket to get rid of noise and does coherency as well. I think they said for UE5 EA the standard setting is 1/16SPP, and it looks better than Metro's 1SPP; until the occasional sample break, maybe they need to up the tile count a little to avoid those cases.
 
Intel Xe DG1 Benchmarked: Battle of the Weakling GPUs:
https://www.tomshardware.com/features/intel-xe-dg1-benchmarked

Ooooh, passively cooled and only 30 watt TDP? Hello, this might just be my new video card for my HTPC if the media playback abilities are good. Generally faster than the DDR4 based 1030 but slower but sometimes close to the GDDR5 based 1030. Good enough for retro games, casual games and Steam streaming to the TV in the living room (the HTPC the card would go in) if more graphics power is needed.

Too bad there is no 30 watt TDP Turing or Ampere based card with which to compare it to.

And the great thing, I might actually be able to find one for MSRP. :p

Then again, maybe I'll finally bite the bullet and just upgrade most of the internals to a newer CPU with integrated graphics. Speaking of which, it would be nice to see benches of this compared to integrated graphics on current CPUs (ones meant for build it yourself machines and not laptops) from Intel and AMD.

[edit] Ooops just noticed they include the 5700G and 4800U.

Regards,
SB
 
I am glad intel is also catering to the low end, it sucks that NVidia and AMD don't make any "decent" sub 75 watt cards any more. I wish they did because, I still game at 1080p and would like to build a passively cooled e-sports machine for the living room!
 
What's the point of a discrete GPU whose capabilities are matched or superseded by integrated solutions?

I'm not criticizing DG1 specifically -- that part has a specific goal of pipe-cleaning Intel's processes as they ramp up towards more capable discrete GPUs, and it seems to have largely succeeded.

But is this really a viable market segment? As a customer if I'm building an HTPC why would I prefer discrete 30W instead of an integrated APU?

The 75W fanless tier is more interesting since there's a notable gap vs integrated. But I don't see any eagerness from AMD or Nvidia to serve this tier.
 
There is the 1650 with most of Turing's feature set sans RT, but I guess you are talking about something more recent with RT enabled?

Intels DG1, as mentioned, is a pipe cleaner and can also be used by video streaming companies in high density racks.
 
What's the point of a discrete GPU whose capabilities are matched or superseded by integrated solutions?

AMD still makes CPUs without an integrated GPU. My 1600x and 3700x don't have integrated GPUs. :p And my Intel 2500k is still perfectly fine for HTPC tasks, but its integrated GPU is a bit of a joke.

There is the 1650 with most of Turing's feature set sans RT, but I guess you are talking about something more recent with RT enabled?

Intels DG1, as mentioned, is a pipe cleaner and can also be used by video streaming companies in high density racks.

It's faster sure, but it's also significantly more power hungry. And do they even make a passively cooled 1650? Interestingly the 1650 Ti consumes less power than the 1650, but it's still significantly more than the 1030 or this Intel card.

So, I'd be quite happy with the DG1 for my HTPC as it likely feature more modern media acceleration than the 1030.

Regards,
SB
 
Yes, the 1650 is a 75Watt card, as per neckthroughs posting. It's a fair bit faster than DG1 I'd say and it comes (came?) also as a passively cooled model from Palit. I've got the predecessor, the 1050 Ti KalmX in my HTPC, which is pretty ok still. It lacks AV1 hw decode, though, as does the 1650.

My main caveat with the DG1 one though is it's EEPROM-less design.
 
65W fanless APU, e.g.:

Sentinel Fanless Z2 (quietpc.com)

is ramping up in performance now: Intel has become competitive for gaming on these things and AMD has thrown its final Vega dice. 4x Steam Deck graphics performance in a year's time?
Yeah APUs are great for a variety of interesting builds (HTPC, emulation machine, and now handhelds), no question. AMD gets their CPU:GPU recipes right, unlike Intel who pairs their highest end Iris GPUs only with their highest end CPUs which is brain-damaged IMO. But that's a separate discussion.
 
Status
Not open for further replies.
Back
Top