Next Generation Hardware Speculation with a Technical Spin [2018]

Status
Not open for further replies.
We all know a CU is composed of 64 shader cores. But how many for a small CU?
If it is composed of 6 shader cores, then a GPU with 52 CU+52 SCU would give us 3640 shader cores.
The "full CU" in that patent application has "simple" ALUs with 4 times the execution units of the "full" ALUs (4 'full' vs 16 'simple'); they do not disclose the ratio for the "small CU".

If you consider divisors of 3640, every combination is further divisible by 5, which makes possible the following configurations of CU blocks and full/simple ALUs:

104 × 35 (7:28), 52 × 70 (14:56), 26 × 140 (28:112), 13 × 280 (56:124),
91 × 40 (8:32), 182× 20 (4:16), etc
56 × 65 (13:52), 28 × 130 (26:104), 14 × 260 (52:208), etc

I think we loose more opportunities of software and hardware evolution by adopting Nvidia's paradigm than we win.
Bounding Volume Hierarchy is essentially a tree of bounding boxes - how much "research and experimentation" you need for a data structure that contains 2 memory pointers and 8 XYZ coordinates?

Yes, Nvidia's subdivision algorithm is proprietary and tied to their fixed function hardware, but what would be the use of letting the developers full control of the parameters, other that possibly stalling the BVH traversal hardware? NVidia's been doing their homework on BVH (see for example these four research papers), and if you really need your own structure, there are 3rd party OpenCL libraries.
 
Last edited:
an ARM based console with a good gpu would be interesting to see
It would take at least Snapdragon 8180 / SDM1000, and probably several more generations, until ARM chips could be considered for real game consoles (i.e. those that do not stick to 40 years old platformer games in a handheld form factor).

The days of, it looks too good to be fake are looooong gone.
As you said, people even make up physical mock ups also now.
If it's fake, whoever made it should be immediately hired by Nintendo.
 
Last edited:
Bounding Volume Hierarchy is essentially a tree of bounding boxes - how much "research and experimentation" you need for a data structure that contains 2 memory pointers and 8 XYZ coordinates?

Yes, Nvidia's subdivision algorithm is proprietary and tied to their fixed function hardware, but what would be the use of letting the developers full control of the parameters, other that possibly stalling the BVH traversal hardware? NVidia's been doing their homework on BVH (see for example

Every major computer algorithm is essentially very simple in concept. All elegance and simplicity go to hell when it comes to implementing with maximum performance, like game engines do.
One can potentially create a BVH with a shallow tree sith more branches per node, or a very tall tree with few branches per node. If you have special information about the kinds of meshes you are using, you can use different strategies for different parts of the tree. Or one may also potentially find it better for certain use cases to have multiple different trees for different kind of objects.
Specially when you remember games are not gonna be above sacrificing quality for performance. Unlike in CG, they will experiment much more, and be more aggressive with simplified tracing. Tracing shorter rays, leaking them into cubemaps after a certain lenght maybe, using different lods on the actual Rays, maybe using progressively lowe level LODs for every bounce, or for as the Ray gets longer, or both. Ignore certain objects for certain rays, but not for others.
For example: The engine can use much simpler geometry for reflection Rays of rough surfaces (blurry, reflection) and that performance gain may be used for more rays on that surface (which is more necessary in rough surfaces to reduce noise). Is a generic BVH optimal for that? Are there other clever ways to organize your data that could favor such tricks, that would ultimately improve quality and be more performant?
Also consider the different realities of different games. Some games are strictly 2.5D, others are top-down RTS, other have very little verticality (racing Sims) others have a lot of empty space (space combat). There is no way different space particioning schemes can't be more efficient for these vastly different constraints.
And again, there is the question of maybe reutilizing the same BVH structure you use for Ray tracing to accelerate other things, like collision detection (not Ray based), which I'm not sure can be done under Nvidia's scheme.
I want to let devs experiment with that stuff. See what they find out. Nothing in actual game development is ever as simple as the text-book definition of the algorithm.
 
Last edited:
I can also forsee a world in which devs are preemptively adjusting their design decisions to fit the kind of workload the specific way consoles end up constructing their BVH in their black-box likes. They always do that sort of thing. They reverse engineering how the black box works, at least to some extent, and the. that one GPU little characteristic ends up shaping a bunch of other stuff all throughout the engine. It's not the most creativity and innovation empowering way of going forward. I prefer that consoles end up with architectures that allow more varied experimentation, it makes things evolve faster, and that benefits the whole industry and it's consumers.
 
It feels like the entire discussion around RayTracing and BVH should be moved out into the Graphics > Rendering Technology and APIs forum as I don't see any of it having anything to do with "Next Generation Hardware Speculation".
I sort of agree. I am trying to maintain it within the boundaries of why I think nvidia's route of BVH acceleration could be negative for a next gen console. But I also agree it is maybe more about Ray tracing than about consoles.
 
Thanks. Can the be seen in die shots? Do you think it's just extra cache for the hardware side and/all the zen chips being capable of SMT through firmwares?

https://www.quora.com/Is-the-ability-for-CPU-hyper-threading-in-the-software-or-hardware

For Intel it seems there's actual physical additions to the CPU.

SMT is AMD's HT right?
SMT is the term for the general concept of running multiple threads through the same core by allowing different threads to take up different resources in parallel, vendors may or may not give their marketing names to their individual implementations.

As far as special hardware, I am unsure what counts as special. At its most basic, the core needs to keep track of the context that belongs solely to each thread like the next instruction pointer and control settings, and it has to track instructions well enough that their data and results never mix with those of different threads. That can be a few context registers and a register tracking table that keeps two independent lists. Out of order hardware can often be adjusted to handle SMT with little increase in hardware, since out of order execution already keeps track of specific instructions so that they do not accidentally interact with the wrong data or results.

In the most straightforward cases, not much is going to really stand out. Most units do not need to know what thread is using them, so long as their sources of data and outputs are directed appropriately. A small number of generic-looking registers here and there aren't going to stand out.
Performance enhancements like duplicating hardware per thread may be noticed by someone knowledgeable enough about the hardware to know where to look, although there can be other reasons why a given block appears to have more copies than expected.
 
It would take at least Snapdragon 8180 / SDM1000, and probably several more generations, until ARM chips could be considered for real game consoles (i.e. those that do not stick to 40 years old platformer games in a handheld form factor).
Given the generation, tdp, size, clock of the arm cpu in the switch and the fact that its not just running 40 year old platformers is enough to show that a console part wouldn't be that far away.

The only company out of the current 3 that could realistically go that route for standard console would be Nintendo though. That's more due to BC for ms and Sony, who would they go with, amd probably will give an overall better deal, than the problem being out right performance that the newer higher tdp based chips are targeting.
 
Hogswash. It's based on the past 20 years precedent in how graphics tech has advantage. Do you genuinely believe that going forwards, all rendering technology is going to stagnate on what we have now? That if RT wasn't introduced, we'd be looking at no algorithmic advances at all??

All rendering tech is going to advance. Given raytracing hardware, devs will find ways to use it in novel ways to get better results. Given more general compute and ML options, devs will find new ways to use it. There's zero wishful thinking about it - it's a certainty based on knowledge of how humanity operates and progresses, and the fact we know we haven't reached our limits.
You're moving the goalpost. Research on algorithms will continue, nobody is denying that. RTRT research is going strong and showing improvements year after year. Where's the research showing the elimination of SDF limitations in real time graphics? Should we also assume that a rasterization algorithm that is just as good (or better because why not) as path tracing will appear too because "all rendering tech is going to advance"? And all of it within the span of next-gen's console cycle?

That's a very simplistic way to wave off the issue.
If most of my scene is a perfect fit for the specific silver-bullet way Nvidia's driver decided to build the BVH except for some parts that would be tremendously more efficient if done another way through coumpute, then sure, just use compute for special cases. You may still be eating up a some redundancies depending on the situation, which in itself is a sorry ineficiency but not the end of the world. Well, for rendering.
But say the gaeme's pysics engine can also benefit from a BVH. But it doesn't rely on ray casts, and there is no easy way to translate whatever queries your physics engine needs into rays so it can use the DXR for that. That means your physics engine will create it's own BVH for the physics through compute, while NVIDIA's black box is creating another one, and is anyone's guess what it looks like, and there is no way to reutilize the work from one process to the other. That is a very sorry inefficiency.
And then there is the case where MOST of your scene would be a much better fit to your own compute BVH system, and you do implement it through compute. Nice, now you've got all RT silicon sitting idle giving you no extra performance because it was designed to do on thing and one thing only. That's another very sorry inefficiency.
But most of all, the most sorry thing, and one which your idea of "just use compute for special cases" ignores completely, is that you loose the contribution of research and experimentation of thousands of game graphics programmers by throwing a black-box into the problem and limiting all that R&D to GPU and API design teams. I undertand some are hoping next gen consoles get some form of RT acceleration similar to Nvidia's so that we get a wide breath of devs experimenting with it. But what I think you are ignoring, is that we leave a whole other field of research opportunities unexplored by doing that. I think we loose more opportunities of software and hardware evolution by adopting Nvidia's paradigm than we win.
Why would a dev use a physics system that relies on a BVH but not ray casting when they have RT acceleration hardware at hand? Are there even any physics systems that work like that? That's like asking: "what if a game makes use of a rendering system based on quads or some other primivitve other than triangles? Rasterization hardware is obviously a waste of sillicon!" Game tech is based mostly on the hardware it's intended to run on. But even considering such an edge case, should console hardware design be based on the general case or hypothetical extreme outliers like the one you proposed?

You're also denying the possibility of changes to the DXR API in future versions. DX12 is very different from DX1. DXR 1.0 is not the end of the line, it's the beginning.
 
You're also denying the possibility of changes to the DXR API in future versions. DX12 is very different from DX1. DXR 1.0 is not the end of the line, it's the beginning.

I just don't think a DXR 1.5 or 2.0 hardware acceleration is very feasible for a 2020 console. I already think a DXR 1.0 one is unlikely enough. And I'm claiming that's more of a blessing than a loss.
 
Soooo, does AMD's chiplet CPU design have any bearing on a future APU design?

Does it, in theory, allow for more easily scalable, easily manufactured designs? At least, in an age of two tier consoles.

Just "plug in" different quantities of higher and lower clocked CCX's, memory, and whatever configuration of GPU.
 
Soooo, does AMD's chiplet CPU design have any bearing on a future APU design?

Does it, in theory, allow for more easily scalable, easily manufactured designs? At least, in an age of two tier consoles.

Just "plug in" different quantities of higher and lower clocked CCX's, memory, and whatever configuration of GPU.

Yeah, I was thinking about the possibility of the base console comes in a SOC, but after 3 or so years if they can fit enough hw in a single chip to make a dramatic difference, they can just put a bunch of chillers there like there's no tomorrow and charge a premium for it.
 
I'm pretty sure nvidias ray-tracing bvh works with non-triangle geometry, you just can't use the fixed-function triangle intersection which offers the best performance. It seems to be more flexible than people are making it out to be. If consoles and the industry want you move to ray tracing, they'll have you start somewhere. Maybe next gen is too early, but you need to put something in the hands of devs, in a production environment, to see what works and want doesn't. It'll be a slow evolution, like programmable shaders, not a revolution where it just drops in a near perfect state. The reality is the console space, with all of its hardware constraints, is the best place for rapid development of software algorithms. Ray Tracing 2020
 
Status
Not open for further replies.
Back
Top