Next Generation Hardware Speculation with a Technical Spin [2018]

anexanhume · Oct 24, 2018

iroboto said:
Xbox One X is 359mm^2
Xbox One is 363 mm^2

Vega is 510mm^2

I wouldn't use vega as a baseline for consoles, not the type of architecture that would be successful.

Vega is a 14/16 nm part. There’s going to be a large size reduction in the transition to 7nm by itself. 16FF to 7nm SoC scales down 70% in area. If HPC only scales down 50%, you could fit Vega 64 and have 100mm^2 left for CPU and non-memory I/O. Power scales 60% per TSMC, so the 50% may even be conservative, assuming power density is kept constant.

I suspect we can assume that the power delivery and cooling will be at least as good as last gen. Hopefully as good as the X.

For reference, Zeppelin die minus memory controller is 198mm^2 in 14nm. That would fit inside the leftover budget from above after a 50% shrink.

It’s for these reasons I’m assuming at least one of the next gen consoles will be 12TF or greater.

iroboto · Oct 24, 2018

manux said:
Wouldn't it be possible to simulate voxelization with dxr if one used boxes instead of finely meshed geometry for scene? unoptimal of course, but it would lead to correct result/approximation?

I'm not entirely sure how VXGI works. The larger the space, the more voxels that needs to be created to as I understand it. You parse the entire space into voxels and perform your calculations from that at least from my understanding.

I do not know if there are 'empty' voxels where there are no polygons to intersect against.

With BVH, the structure holds only data where the triangles are. So the scene is divided left to right, top to bottom, front to back etc until each 'space' has a singular triangle and you perform your ray tracing against that.

By doing so you've rejected all the rays that never touch a polygon etc (which tends to be the largest problem with trying to improve the efficiency of ray tracing, the next one ensuring that the ray can trace back to the light source, which I think would need to be solved differently)

iroboto · Oct 24, 2018

Shifty Geezer said:
Yeah, it will be interesting to see how compromised the RT solutions are if present on mid-level GPUs in next-gen consoles. Things like reflections can be rendered pretty blurry and still look great for the visual impact like BF's flames, but low sampling of shadows could easily generate issues.

An interesting point about why SVOGI hasn't been more widely used is because it's very demanding, but then so too is raytracing. Just read about nVidia's VXGI, integrated into Unreal Engine. Here's a video of realtime GI in Unreal Engine on a 970 GTX, which is apparently ~4TFs

How well does it perform with a 2080 class GPU? If RTX wasn't released, would we be looking at demos using volumetric tracing and how different would they be in terms of quality and performance?

Multiple sources of lights, multiple shadows. Quality is very good imo.

iroboto · Oct 24, 2018

anexanhume said:
Vega is a 14/16 nm part. There’s going to be a large size reduction in the transition to 7nm by itself. 16FF to 7nm SoC scales down 70% in area. If HPC only scales down 50%, you could fit Vega 64 and have 100mm^2 left for CPU and non-memory I/O. Power scales 60% per TSMC, so the 50% may even be conservative, assuming power density is kept constant.

I suspect we can assume that the power delivery and cooling will be at least as good as last gen. Hopefully as good as the X.

For reference, Zeppelin die minus memory controller is 198mm^2 in 14nm. That would fit inside the leftover budget from above after a 50% shrink.

It’s for these reasons I’m assuming at least one of the next gen consoles will be 12TF or greater.

I'm not following you here, I don't see the desire of shrinking a Vega 64 into a console format. There must be plenty of other architectures that will run with significantly better efficiency within the same silicon allocation.

anexanhume · Oct 24, 2018

iroboto said:
I'm not following you here, I don't see the desire of shrinking a Vega 64 into a console format. There must be plenty of other architectures that will run with significantly better efficiency within the same silicon allocation.

Of course, but Navi is a GCN iteration, so I’m assuming it can’t be that different from Vega. Just as I’m assuming Zen 2 won’t be that different than Zen. AMD has proudly stated that they put Zen engineers on Navi to improve power characteristics.

iroboto · Oct 24, 2018

anexanhume said:
Of course, but Navi is a GCN iteration, so I’m assuming it can’t be that different from Vega. AMD has proudly stated that they put Zen engineers on Navi to improve power characteristics.

But it runs super hot, very power hungry.
Vega is HBM, I'm not sure how that may change the silicon use inside the chip, basically I'm saying it's not a direct apples to apples comparison.

OCASM · Oct 24, 2018

Graham said:
I expect the problems RT will face will be very much the same but likely worse problems as SVOGI, or similar techniques. If you consider the memory access patterns of an SVO 'ray' compared to an RT ray, then very rarely would the RT ray win out in terms of the amount of data or computation it would have to read/perform to generate a result.
This is the natural disadvantage of an algorithm like ray tracing where you have to get a precise result.

In my mind voxelization / SVO / SDF / cone tracing etc is fundamentally not that different from RT in how the data is represented or iterated, it's just *much* a simpler representation at the leaf nodes - you don't need to load and iterate lists of triangles, reevaluate / read complex materials and lighting etc, you can often sample at higher mips to approximate cones that represent more diffused light, etc.

Having said all that, I'm expecting to see interesting things from hybrid approaches.

Well, we have already seen demos of actual games running with DXR effects enabled and the quality versus existing voxel techniques is so much better.

But then again, you can also raytrace voxels:

https://twitter.com/i/web/status/1052554765601771520

"Performance
Voxel raytracing performance depends largely on the length of the ray, the resolution and the desired quality (step length, mip map, etc), but it is generally really, really fast compared to polygon ray tracing. I don’t have any exact measures of how many rays per pixel I shoot, but for comparison I do all ambient occlusion, lighting, fog and reflections in this scene in about 9 ms, including denoising. The resolution is full HD and the scene contains about ten light sources, all with volumetric fog, soft shadows and no precomputed lighting. Timings taken on a GTX 1080."

Shifty Geezer said:
Yeah, it will be interesting to see how compromised the RT solutions are if present on mid-level GPUs in next-gen consoles. Things like reflections can be rendered pretty blurry and still look great for the visual impact like BF's flames, but low sampling of shadows could easily generate issues.

An interesting point about why SVOGI hasn't been more widely used is because it's very demanding, but then so too is raytracing. Just read about nVidia's VXGI, integrated into Unreal Engine. Here's a video of realtime GI in Unreal Engine on a 970 GTX, which is apparently ~4TFs

How well does it perform with a 2080 class GPU? If RTX wasn't released, would we be looking at demos using volumetric tracing and how different would they be in terms of quality and performance?

Edit: another more traditional one:

Edit: And a character:

A problem with voxel cone tracing is that the maximum resolution is fixed. If you want a lot of detail you also require massive amounts of memory. Ray tracing resolution is effectively infinite.

anexanhume · Oct 24, 2018

iroboto said:
But it runs super hot, very power hungry.
Vega is HBM, I'm not sure how that may change the silicon use inside the chip, basically I'm saying it's not a direct apples to apples comparison.

All AMD GPUs run hot. Nvidia has been much more efficient since they solved Fermi’s issues.

iroboto · Oct 24, 2018

anexanhume said:
All AMD GPUs run hot. Nvidia has been much more efficient since they solved Fermi’s issues.

lol well yes, but Vega is a pack leader among hounds

itsmydamnation · Oct 24, 2018

iroboto said:
lol well yes, but Vega is a pack leader among hounds

Not really, it just depends how you use it. The difference of 120 watts can be less then 10% performance.

Vegas problem is that the features to increase pref per clock largely don't work.

Tkumpathenurpahl · Oct 24, 2018

iroboto said:
But it runs super hot, very power hungry.
Vega is HBM, I'm not sure how that may change the silicon use inside the chip, basically I'm saying it's not a direct apples to apples comparison.

True, but Vega is a known quantity, so it's not a bad starting point when we have solid estimates of node improvements.

You're quite right though, that it's not the best comparison. But if we base our assumptions on 7nm Vega in the PS5/Scarlett, we can only be pleasantly surprised.

The talk of SVOGI and RT being somewhat similar is interesting. Are their requirements similar enough such that a lesser percentage of tensor or RT cores than seen in the RTX2080 could be beneficial for SVOGI (and its peers) rasterisation, but also grant developers the freedom to pursue a limited use of RT (reflections for example?)

Shifty Geezer · Oct 24, 2018

SVOGI doesn't need denoising, so in that regard you wouldn't need Tensor cores. We don't know to what degree they're needed in RT. Current demos denoise on compute, so potentially you could just do away with them in that regard. However, upscaling noisy raytraced data with reconstruction algorithms may be harder than upscaling rasterised data, which is where ML upscaling would be easier/better, I think.

Silent_Buddha · Oct 24, 2018

OCASM said:
In the long run.

In the long run is relative. If someone (even NV themselves) offers a more flexible solution that doesn't require a fixed function black box in the next year or two then the long run was only 1-2 years.

If no one can offer anything but fixed function solutions for the next 5-10 years then that's an appreciable long run.

I'm highly doubtful that we'll be stuck with fixed function and relatively inflexible solutions to RT acceleration that long into the future, in which case putting fixed function limited solutions into a console would be a huge mistake.

Regards,
SB

OCASM · Oct 25, 2018

Shifty Geezer said:
SVOGI doesn't need denoising, so in that regard you wouldn't need Tensor cores. We don't know to what degree they're needed in RT. Current demos denoise on compute, so potentially you could just do away with them in that regard. However, upscaling noisy raytraced data with reconstruction algorithms may be harder than upscaling rasterised data, which is where ML upscaling would be easier/better, I think.

And even without denoising SVOGI looks blurrier.

Silent_Buddha said:
In the long run is relative. If someone (even NV themselves) offers a more flexible solution that doesn't require a fixed function black box in the next year or two then the long run was only 1-2 years.

If no one can offer anything but fixed function solutions for the next 5-10 years then that's an appreciable long run.

I'm highly doubtful that we'll be stuck with fixed function and relatively inflexible solutions to RT acceleration that long into the future, in which case putting fixed function limited solutions into a console would be a huge mistake.

Regards,
SB

RTX is plenty flexible already. It's also faster than compute and speed is exactly what consoles need.

Shifty Geezer · Oct 25, 2018

OCASM said:
And even without denoising SVOGI looks blurrier.

You can ramp up the quality in SVOGI to get more details and you can ramp up the quality in RT to reduce noise.

RTX is plenty flexible already. It's also faster than compute and speed is exactly what consoles need.

Which is where raytracing may not be the best answer. Raytracing is a quality solution, not a speed solution. It's essential for production where quality is the most important thing, which is where RTX's main market is, but alternative methods designed for realtime may be more suitable for video games.

Looking at my above videos, they're running on 4TF GPUs. If that's the bare minimum needed, the current gen wouldn't be a platform to develop volumetric lighting as its not fast enough, so there'd be little investment and research. Next gen there'll be enough compute power to drive volumetric lighting in games, so development should increase dramatically as there'll actually be an audience (paying market) for the tech.

Shifty Geezer · Oct 25, 2018

iroboto said:
I'm not entirely sure how VXGI works. The larger the space, the more voxels that needs to be created to as I understand it. You parse the entire space into voxels and perform your calculations from that at least from my understanding.

There's a demo of an open world environment in UE. They don't mention VXGI by name so I didn't link it. But we have other games with volumetric solutions like Kingdom Come Deliverance (Cry Engine) that are also open world. I don't think there's a barrier there, especially factoring in these are first/second-gen volumetric solutions which have room for improvement.

milk · Oct 25, 2018

The solution for volumes in large scene is the same used for shadow-maps. Cascade them, or tile them.

Tkumpathenurpahl · Oct 25, 2018

Shifty Geezer said:
SVOGI doesn't need denoising, so in that regard you wouldn't need Tensor cores. We don't know to what degree they're needed in RT. Current demos denoise on compute, so potentially you could just do away with them in that regard. However, upscaling noisy raytraced data with reconstruction algorithms may be harder than upscaling rasterised data, which is where ML upscaling would be easier/better, I think.

If so, might RT cores without tensor be the best approach at this nascent stage of RTRT? For consoles, that is.

Even though fixed function, it seems that their function could be utilised in a flexible manner. So their inclusion alone would be quite a sensible way of Sony/MS hedging their bets, allowing RTRT hardware to refine and settle down for a few years in the PC space, whilst still allowing developers to tinker with RTRT on console, should they choose to do so.

Also, is ML hardware like the tensor cores necessarily best for something that's not learning, but instead, performing the best determined way of solving the problem tackled by ML hardware? For example could ML hardware be deployed to determine the best way of denoising on compute? Or is it just inherent that ML data needs ML hardware, no matter which side of the ML problem solving process said data sits?

Shifty Geezer · Oct 25, 2018

Tkumpathenurpahl said:
If so, might RT cores without tensor be the best approach at this nascent stage of RTRT? For consoles, that is.

What's the definition of 'raytracing core'?

Even though fixed function, it seems that their function could be utilised in a flexible manner.

There are various functoins that'd be worth accelerating in hardware, I'm sure.

For example could ML hardware be deployed to determine the best way of denoising on compute?

No. Deriving algorithms needs true intelligence. ML uses neural nets to find inferred patterns. I think the Tensor cores are required to scan the images and compare them to learnt behaviour to find a match.

OCASM · Oct 25, 2018

Shifty Geezer said:
You can ramp up the quality in SVOGI to get more details and you can ramp up the quality in RT to reduce noise.
Which is where raytracing may not be the best answer. Raytracing is a quality solution, not a speed solution. It's essential for production where quality is the most important thing, which is where RTX's main market is, but alternative methods designed for realtime may be more suitable for video games.

Looking at my above videos, they're running on 4TF GPUs. If that's the bare minimum needed, the current gen wouldn't be a platform to develop volumetric lighting as its not fast enough, so there'd be little investment and research. Next gen there'll be enough compute power to drive volumetric lighting in games, so development should increase dramatically as there'll actually be an audience (paying market) for the tech.

1) Ray tracing gives better results at the same cost.

2) Hardware accelerated ray tracing is a quality and speed solution. Faster than compute also.

Next Generation Hardware Speculation with a Technical Spin [2018]

anexanhume

iroboto

Daft Funk

iroboto

Daft Funk

iroboto

Daft Funk

anexanhume

iroboto

Daft Funk

OCASM

anexanhume

iroboto

Daft Funk

itsmydamnation

Tkumpathenurpahl

Shifty Geezer

uber-Troll!

Silent_Buddha

OCASM

Shifty Geezer

uber-Troll!

Shifty Geezer

uber-Troll!

milk

Like Verified

Tkumpathenurpahl

Shifty Geezer

uber-Troll!

OCASM

Similar threads