Next Generation Hardware Speculation with a Technical Spin [2018]

Xbat · Nov 9, 2018

Oh I didn't know it was debunked.

Tkumpathenurple · Nov 9, 2018

True or not, it doesn't necessarily mean anything for the release of the PS5.

I'm a firm believer that Sony had plans in motion to launch the PS5 in 2019, especially if the X1X had tipped the balance for MS. It's been successful, and MS are in a healthy position, but not to the extent that Sony *has* to release a new console.

Now, much like Sony have done with their first party games IMO, they're going to let it bake a little longer and move on to their 2020 design.

Assuming there is any truth to Sony being heavily involved in Navi's development, a decent perf/watt 7nm GPU could still be put to use in 2019 slimmer revisions of the PS4/Pro. The PS4 should be able to pull a PS2 and continue to sell a shed load of units even into the next generation, so it's necessary for them to use a design that's cheap to manufacture for many years. I might be wrong, and this is on the basis of Navi having better perf/watt than Vega, and being AMD's budget GPU for some time, but I think Navi+Zen emulating the PS4 is a good fit.

DmitryKo · Nov 9, 2018

DavidGraham said:
there is room to optimize many other things which will further increase performance

They are early adopters - once these obvious inefficiencies are eliminated, peformance will still be limited by hardware BVH search.
Sure a 25-30% improvement is nice, but even with these limited applications of realtime raytracing, we probably need like 5x (500%) performance improvement to scale flawlessly with complex geometry and 4K resolution.

Volta doesn't have any RT acceleration, it just accelerates denoising, maybe it emulates RT on the software level.

Volta exposes DirectX Raytracing tier 1 - could they be using compute units to emulate BVH acceleration? AFAIK any denoising is left for the developer to implement, it's not a part of the DXR API.

DavidGraham said:
Epic StarWars: Runs great on 4 TitanVs which are 3k a piece.
SEED: Running on one or more TitanVs
https://forum.beyond3d.com/threads/directx-ray-tracing.60670/#post-2024400

Thanks! But did they schedule different parts of the rendering pipeline - such as rasterization, raytracing, and compute - to run on separate cards? Or they can schedule different rays to run on a different card?
Also, stock UE4 does not support explicit multi-GPU yet, and SEED is an experimental rendering engine not used in any game.

OCASM · Nov 9, 2018

milk said:
Nonsense. Most "standard" features today were anything but standard when the hardware of PS4/ONE were designed. Modern engines do use a lot of novel techniques that are only possible today thanks to the flexibility provided by these machines.
Dreams is just the most extreme example, where the whole rendering pipeline is compute based and foregoes even the most basic primitives typically used.
But there are other commercial games TODAY using SDF ray-tracing for parts of their rendering, like for example: Fortnite. Their large scale AO raytraces against an SDF representation of the scene constructed out of smaller SDF volumes. It is in fact a standard feature of UE4, and can be used for variable penumbra shadows too.
Similarly, all modern engines today are moving towards variable resolution deferred renders with temporal injection and reconstruction for upscaling and AA. None of that is a standard feature of GPUs, nor were they envisioned by GPU designers, they were clever software tricks developers were able to pull off within the constraints of DX9 level hardware initially, and really started flourishing now on the DX11/12 level machines of today.
For geometry processing, most big AAA engines are already doing things similar to mesh-shaders and NGG on consoles using compute because the API is less restrictive there (our own sebbbi was a pioneer in that front together with the devs of AC Unity). GPUs are actually catching up to an idea they probably wouldn't have had themselves were it not for devs experimentation.
Texture Space shading, and before that, Tiled Resources are the same thing. GPU designers took a look at what carmak (and our sebbbi again) had done with virtual texturing by software, and adapted their hardware design to empower such tech and make more viable and efficient.
Man, I could go on and on about software solutions no gpu engineer would have ever thought of if it were not for their designs having become flexible enough to make those ideas possible. Deferred rendering itself is one. POM. Virtually all Screen Space effects such as AO, SSS, Reflections, Specula Occlusion (very important component of RDR2's look), soft z-rested particles, gpu particles...

You realize I'm not advocating for the removal of compute units from the hardware, right? And really, how would the addition of RT hardware prevent any of the developments you mention? It wouldn't. It actually increases the range of possibilities because it allows for things that would be too slow to do otherwise. I also find the term fixed-function when referring to RTX very misleading because it's not at all like T&L that can do the one algorithm that makes every single game look the same. Anti-aliasing, lighting simulation, collision detection, audio simulation and who knows what other uses it could have. And that's just RTX, we don't know the features/limitations of future/competing RT acceleration architectures.

Hardware design is all about trade-offs. You can maximize flexibility at the expense of speed or you could be balanced and have some of both. I mean if flexibility is all that matters lets just get rid of rasterization and devote all the sillicon to compute units...

Shifty Geezer said:
I like the way you're pro-RT stance highlights the negatives. To me, the results look great and the performance runs fine on existing GPUs. That's a significant win, a great piece of tech, and a great early adoption of SDF which is far newer tech than raytracing (only 20 years old?).

It's also limited to non-deformable meshes and procedural geometry.

DmitryKo said:
They are early adopters - once these obvious inefficiencies are eliminated, peformance will still be limited by hardware BVH search.
Sure a 25-30% improvement is nice, but even with these limited applications of realtime raytracing, we probably need like 5x (500%) performance improvement to scale flawlessly with complex geometry and 4K resolution.

It'll be sad seeing games devoting resources to native 4K rendering

milk · Nov 9, 2018

There are thousands of ways to create a Bounding Volume Hierarchy, and for each another thousands of ways to optimise it's traversal. Even more so, when you can custom-make it for the specific scope and characteristics of your game, instead of trying to create a silver bullet.
But the way DXR was envisioned, the BVH is created by the GPU's drivers and the dev knows nothing of it. This limits the people experimenting with that field to GPU engineers, when it could be an active field of research among the entire game development community.
A good BVH can also be used for a myriad of other things beside casting rays. From what I understand, the way DXR does it, also keeps the BVH in it's own little private space, and can't be queried or interacted with for other stuff other than by casting rays into it. In that case a game night have 2 simultaneous BVH's operating, one created by the dev for his own purposes, the other created by the GPU driver.
I hope no console's GPU hardware design has any silicon wasted on functionality created for that specific paradigm.
If AMD could have pulled some architecture out of their hat that can rival Nvidia's a couple years ago (which is when PS5/Scarlet initiated more solid designing) that is also much more open and flexible, then that's great. I'm not holding my breath though.

Shifty Geezer · Nov 9, 2018

OCASM said:
It's also limited to non-deformable meshes and procedural geometry.

For the moment.

DmitryKo · Nov 9, 2018

beyondtest said:
isn't dlss nvidia though?

Samwell said:
3640 shader cores, can't divide this with 64 to get an CU number. Someone put so much effort in this fake, but doesn't have basic architectural knowledge making such a fault.

DSoup said:
It looks as though it's supposed to be a draft press release but it's obviously fake. Firstly, it looks nothing like any of Sony's PlayStation press releases and secondly, rookie mistake is the use of PlayStation tm (trademark) instead of PlayStation(R) (registered trademark).
they've even mocked up a pig-ugly box the you could imagine Sony selling!

Looks legit to me. The design and the renders are professional quality, I can't see someone spending so much effort on a fake.
Early developer presentations are always rough, with a few mistakes here and there.
Sony uses PlayStation® as a general reference to the family, but they also use ™ for specific models and logos, such as PS4™, PS4™ Pro, PlayStation™Vue etc. Probably too much hassle to register on multiple markets.
DLSS is not trademarked either.
These boxes are probably developer kits, not the final console design.

milk said:
I hope no console's GPU hardware design has any silicon wasted on functionality created for that specific paradigm.

The mentioning of custom raytracing extensions (Radeon Rays) might imply that Navi has no hardware-accelerated raytracing...

Jay said:
MS already has win10 running native on ARM, including X86 emulation

Emulation is 32-bit only and its performance is lagging behind x86 APUs. x64 applications are not supported and need to be recompiled to ARM64.

Shifty Geezer · Nov 9, 2018

DmitryKo said:
Looks legit to me. The design and the renders are professional quality, I can't see someone spending so much effort on a fake.

They're not that hard to do for someone with experience. And people go to great lengths for fakes. We've even seen fake hardware mockups before.

As others point out, as a presentation the text is too small and verbose.

milk · Nov 9, 2018

Shifty Geezer said:
For the moment.

I don't see that limitation changing soon. Maybe for next gen, if conservative rasterisation and maybe other modern architectural changes can make real time voxelization faster, then we may be able to generate SDF's in real time for arbitrary meshes. Using SDF for occlusion instead of naive binary occlusion models might be the missing link to make voxel based GI less leaky.
Even then, that stuff is only efficient if you have a robust LOD chain, because you shouldn't be voxelising High Poly meshes. Thanks to the fact so many games are open world these days, or at least open-worldish, many games already have comprehensive LOD chains as well as a production pipeline for creating such LODs, which is also not a trivial problem, but one which Epic Games itself has been paying a lot of attention to for some years.

Shifty Geezer · Nov 9, 2018

milk said:
I don't see that limitation changing soon. Maybe for next gen, if conservative rasterisation and maybe other modern architectural changes can make real time voxelization faster, then we may be able to generate SDF's in real time for arbitrary meshes.

We've no idea. The point is, anyone who's willing to believe raytracing will improve over time with better algorithms should acknowledge the same can happen with other algorithms and techniques, rather than looking at the current limitations and expecting them to always be there. To afford RT the benefit of the doubt but not other techniques is simply discirimination.

DmitryKo · Nov 9, 2018

Shifty Geezer said:
as a presentation the text is too small and verbose

The video looks fine on 30" QHD / 32" 4K monitors, where it is probably supposed to be viewed.

The stated specs would line up with my own expectations - mid-range APU, 11 TFLOPs, GDDR6, no dedicated raytracing hardware (though they offer a native version of OpenCL 'Radeon Rays' )...

people go to great lengths for fakes. We've even seen fake hardware mockups before.

There is also a detailed product lineup complete with peripherals and hardware/software specs. If it's fake, it's a very elaborate one, created by professionals.

OCASM · Nov 9, 2018

milk said:
There are thousands of ways to create a Bounding Volume Hierarchy, and for each another thousands of ways to optimise it's traversal. Even more so, when you can custom-make it for the specific scope and characteristics of your game, instead of trying to create a silver bullet.
But the way DXR was envisioned, the BVH is created by the GPU's drivers and the dev knows nothing of it. This limits the people experimenting with that field to GPU engineers, when it could be an active field of research among the entire game development community.
A good BVH can also be used for a myriad of other things beside casting rays. From what I understand, the way DXR does it, also keeps the BVH in it's own little private space, and can't be queried or interacted with for other stuff other than by casting rays into it. In that case a game night have 2 simultaneous BVH's operating, one created by the dev for his own purposes, the other created by the GPU driver.
I hope no console's GPU hardware design has any silicon wasted on functionality created for that specific paradigm.
If AMD could have pulled some architecture out of their hat that can rival Nvidia's a couple years ago (which is when PS5/Scarlet initiated more solid designing) that is also much more open and flexible, then that's great. I'm not holding my breath though.

Just use compute for special cases. Like I said, it's not one or the other.

Shifty Geezer said:
For the moment.

I'll believe it when I see it, even in scene demos or research papers.

Shifty Geezer said:
We've no idea. The point is, anyone who's willing to believe raytracing will improve over time with better algorithms should acknowledge the same can happen with other algorithms and techniques, rather than looking at the current limitations and expecting them to always be there. To afford RT the benefit of the doubt but not other techniques is simply discirimination.

Beliefs based on current research trends VS beliefs based on wishful thinking.

Shifty Geezer · Nov 9, 2018

OCASM said:
Beliefs based on current research trends VS beliefs based on wishful thinking.

Hogswash. It's based on the past 20 years precedent in how graphics tech has advanced. Do you genuinely believe that going forwards, all rendering technology is going to stagnate on what we have now? That if RT wasn't introduced, we'd be looking at no algorithmic advances at all??

All rendering tech is going to advance. Given raytracing hardware, devs will find ways to use it in novel ways to get better results. Given more general compute and ML options, devs will find new ways to use it. There's zero wishful thinking about it - it's a certainty based on knowledge of how humanity operates and progresses, and the fact we know we haven't reached our limits.

Metal_Spirit · Nov 9, 2018

Actually the 3640 shader cores number may be correct. Let me quote:

"A super single instruction, multiple data (SIMD) computing structure and a method of executing instructions in the super-SIMD is disclosed. The super-SIMD structure is capable of executing more than one instruction from a single or multiple thread and includes a plurality of vector general purpose registers (VGPRs), a first arithmetic logic unit (ALU), the first ALU coupled to the plurality of VGPRs, a second ALU, the second ALU coupled to the plurality of VGPRs, and a destination cache (Do$) that is coupled via bypass and forwarding logic to the first ALU, the second ALU and receiving an output of the first ALU and the second ALU. The Do$ holds multiple instructions results to extend an operand by-pass network to save read and write transactions power. A compute unit (CU) and a small CU including a plurality of super-SIMDs are also disclosed."

We all know a CU is composed of 64 shader cores. But how many for a small CU?
If it is composed of 6 shader cores, then a GPU with 52 CU+52 SCU would give us 3640 shader cores.

milk · Nov 9, 2018

OCASM said:
Just use compute for special cases. Like I said, it's not one or the other.

That's a very simplistic way to wave off the issue.
If most of my scene is a perfect fit for the specific silver-bullet way Nvidia's driver decided to build the BVH except for some parts that would be tremendously more efficient if done another way through coumpute, then sure, just use compute for special cases. You may still be eating up a some redundancies depending on the situation, which in itself is a sorry ineficiency but not the end of the world. Well, for rendering.
But say the gaeme's pysics engine can also benefit from a BVH. But it doesn't rely on ray casts, and there is no easy way to translate whatever queries your physics engine needs into rays so it can use the DXR for that. That means your physics engine will create it's own BVH for the physics through compute, while NVIDIA's black box is creating another one, and is anyone's guess what it looks like, and there is no way to reutilize the work from one process to the other. That is a very sorry inefficiency.
And then there is the case where MOST of your scene would be a much better fit to your own compute BVH system, and you do implement it through compute. Nice, now you've got all RT silicon sitting idle giving you no extra performance because it was designed to do on thing and one thing only. That's another very sorry inefficiency.
But most of all, the most sorry thing, and one which your idea of "just use compute for special cases" ignores completely, is that you loose the contribution of research and experimentation of thousands of game graphics programmers by throwing a black-box into the problem and limiting all that R&D to GPU and API design teams. I undertand some are hoping next gen consoles get some form of RT acceleration similar to Nvidia's so that we get a wide breath of devs experimenting with it. But what I think you are ignoring, is that we leave a whole other field of research opportunities unexplored by doing that. I think we loose more opportunities of software and hardware evolution by adopting Nvidia's paradigm than we win.

AlNom · Nov 9, 2018

Metal_Spirit said:
Actually the 3640 shader cores number may be correct. Let me quote:

What are you quoting?

Jay · Nov 9, 2018

DmitryKo said:
Emulation is 32-bit only and its performance is lagging behind x86 APUs. x64 applications are not supported and need to be recompiled to ARM64.

Yep, that's why I said it would need to be recompiled. And I don't see that happening. Was just highlighting that they already have a native windows 10 arm version.

But, an ARM based console with a good gpu would be interesting to see. The switch shows that it's developer tools and engine compatibility that are important, where people think it's simply about x86/x64
The switch was based on older version of soc when it came out even.

anexanhume · Nov 9, 2018

AlBran said:
What are you quoting?

It’s from the YouTube video.

Jay · Nov 9, 2018

Shifty Geezer said:
They're not that hard to do for someone with experience. And people go to great lengths for fakes. We've even seen fake hardware mockups before.

It's crazy what people will spend their time doing. To fool the net?
The days of, it looks too good to be fake are looooong gone.
As you said, people even make up physical mock ups also now.

Deleted member 11852 · Nov 9, 2018

DmitryKo said:
Sony uses PlayStation® as a general reference to the family, but they also use ™ for specific models and logos, such as PS4™, PS4™ Pro, PlayStation™Vue etc. Probably too much hassle to register on multiple markets.

It's incredibly difficult to obtain a registered trademark for TLAs like 'PS4'. 'Vue' is already a registered trademark and you can't just annex a registered trademark (like PlayStation) with another registered trademarked word like 'Vue'. :nope:

That is kind of why trademarks and why common law trademarks, i.e. tm, exist! :yep2:

Next Generation Hardware Speculation with a Technical Spin [2018]

Xbat

Tkumpathenurple

DmitryKo

OCASM

milk

Like Verified

Shifty Geezer

uber-Troll!

DmitryKo

Shifty Geezer

uber-Troll!

milk

Like Verified

Shifty Geezer

uber-Troll!

DmitryKo

OCASM

Shifty Geezer

uber-Troll!

Metal_Spirit

milk

Like Verified

AlNom

Moderator

Jay

anexanhume

Jay

Deleted member 11852

Guest

Similar threads