Next Generation Hardware Speculation with a Technical Spin [pre E3 2019]

Status
Not open for further replies.
Naughty Dog must know Sony's plans and hardware. It could as well be that the guy was told to post a retraction because Sony aren't ready to talk about the hardware beyond what's been said. It's not really proof nor disproof, but I'm inclined to believe that the initial response had more insight than just the Wired article one-liner.

Not everyone in Naughty Dog may know everything. Information about critical, upcoming hardware releases is probably compartmentalised ... even there.


First time he heard of PS5 was today or yesterday.
Just read his reactions, he doesn't seem to have any information regarding ray tracing.

 
That strikes me as such a low bar that it renders (no pun intended) the idea of hardware acceleration pretty much meaningless. You can write an API for just about any kind of purpose that runs on just about any type of system that can pass and return variables and references.

There's always hardware somewhere down there. An API doesn't somehow make it specialised.

(And specialised hardware doesn't always give you the best results long term, in a rapidly changing environment!)

The problem is that RT is a massively parallel problem by its nature, so a massively parallel compute structure (read: GPU), already well suited to be a hardware solution to that. Is not hardware RT because those compute units do other things well? Can they only accelerate RT to be considered dedicated? What about ImgTec’s solution that isn’t full RT, does it count? What if GPU architecture evolves to where a generic compute unit does RT better? What’s the magic threshold to call it a hardware solution?

We've already had this discussion earlier in this thread. As I linked here even Microsoft defines DXR/RT as a purely compute workload that doesn't need/require specialized HW blocks. So the logic would be that HW RT is RT done on the GPU while SW RT is when it's done on the CPU (example Nvidia Iray running on CPU via SSE2 instructions vs running on CUDA GPUs or Radeon Pro Renderer or Cylces running on GPU vs running the same code on the CPU)..pretty straightforward...until people make up new conventions where, magically, HW RT is only when you have RT Cores/Specialized HW blocks for RT...so HW RT is now only "possible" on Turing GPUs with RT Cores ..apparently… Anyway...water is wet... or not..nobody knows anymore..
 
Last edited:
Mentioning "hardware RT" could simply be a marketing checkbox as the definition is technically correct.

Just like 8k is correct. He could have said PS4 Pro has ray tracing, cause it technically does.

It looks like PS5 is going to be a massive beast of a machine. Compared to the conservative OG PS4 that was deemed very weak compared to mainstream PCs at the time, it's an order of magnitude more powerful.

While we don't have any specs yet, Zen2 and Navi will both be there for the pc market this summer. That would mean PS5 contains over a year old hardware if it's releasing late 2018. A year old high-end hardware is a nice feat, better then the PS4 was. That is if zen2/navi in the PS4 are matching high-end pc components.
It wouldnät suprise me if the desktop variants of Zen2 and Navi are higher clocked, with more and faster memory. The successor to Navi is planned somewhere 2020/2021 if i'm not misstaken.

Compared to the PS4 it will be a massive beast, compared to a 2020/2021 high-end pc, not so. Time will tell, but we know soon how Zen2/Navi will perform, long before the PS5 releases.
 
We've already had this discussion earlier in this thread. As I linked here even Microsoft defines DXR/RT as a purely compute workload that doesn't need/require specialized HW blocks. So the logic would be that HW RT is RT done on the GPU while SW RT is when it's done on the CPU (example Nvidia Iray running on CPU via SSE2 instructions vs running on CUDA GPUs or Radeon Pro Renderer or Cylces running on GPU vs running the same code on the CPU)..pretty straightforward...until people make up new convention where HW RT is only when you have RT Cores/Specialized HW blocks for RT...so HW RT is now only "possible" on Turing GPUs with RT Cores ..apparently… Anyway...water is wet... or not..nobody knows anymore..
Glad I’m not in left field here.

Also don’t forget Intel’s Embree initiative.
 
Regarding RT I see it a bit like checkerboarding.
Does the 1X have hardware support for it? I.e. specific hardware/customization. No it doesn't

It's a different question to does it need it to do it.

In the end it will come down to performance, but even just tweeking the cu's could potentially make RT more performant compared to Nvidias implementation on their 10xx cards.
One of the biggest problems is that AMD doesn't really have competive cards for us to see how dxr would perform on it. Roll on navi, fingers crossed.
 
Regarding RT I see it a bit like checkerboarding.
Does the 1X have hardware support for it? I.e. specific hardware/customization. No it doesn't

It's a different question to does it need it to do it.

In the end it will come down to performance, but even just tweeking the cu's could potentially make RT more performant compared to Nvidias implementation on their 10xx cards.
One of the biggest problems is that AMD doesn't really have competive cards for us to see how dxr would perform on it. Roll on navi, fingers crossed.

Just to add some more info. AMD has already developed, more than a year ago, real time RT support in the Vulkan version of Radeon Pro Render for viewport rendering using an hybrid rasterization/RT solution. It's just that....no-one has implemented it yet (all Radeon PR integrations are still either OpenCL or Metal based right now). And RTG only has a fraction of the $$ Nvidia has to pimp their stuff to third parties (like Nvidia rightfully does IMO).

Announcement: GPUOpen
Starting at slide 54 of their GDC18 presentation.
aamSYrI.jpg

1SOOXnD.jpg

qKk5Hqo.jpg

wYUqSvR.jpg

lvtruGj.jpg

lWXVTvF.jpg

6T1I3Qe.jpg

8bEWqpa.jpg


Video presentation:
Edit: This is now called Full-Spectrum Rendering and is "coming soon"/next quarter to the Radeon ProRender Dev suite powered by the latest Radeon Rays 3.0 released a month ago:

NEW Vulkan®-based Radeon Rays 3.0 available now
  • Radeon™ Rays 3.0 supports both AMD GPUs and CPUs as well as those of other vendors using Vulkan®
  • Features include GPU-accelerated Bounding Volume Hierarchy (BVH) and half-precision (FP16) computation support
  • Works across Windows® and Linux®
Edit: before I forget..Radeon Pro Render now officialy has (since last week) an ML AI Denoiser and ML AI Upscaler (using OpenCL/ DirectML ).
 
Last edited:
In the wake of the first officially sanctioned information on PlayStation 5 - or 'the next generation console' from Sony - Rich, John and Alex convene to discuss the specs. We've got Zen 2, 7nm, Navi graphics and solid-state storage technology - and back-compat too! Get all the reaction right here.
 
Indeed. However, anyone working on a PS5 game, or even coming across work on it in the office, is going to know about the pipeline difference. How can you not pick up on such things when some artist in the office is creating art without having to worry about all the crap you're having to worry about?!

No confirmation, but my gut tells me it was accidental. If there aren't hardware RT acceleration structures in PS5, I'll actually be surprised because of this tweet. ;)
Playing devil's advocate maybe he didn't know and it doesn't have so that's why the big backtrack. This is getting crazy now lol, it's the best part of a next gen console. Trying to gleam any information from rumours or whatever, the truth is usually very boring:D
 
What defines "hardware RT" though? The presence of fixed function units because Nvidia has gone that approach? One could say that RT acceleration in the GPU via Compute is still hardware instead of purely CPU? Performing RT on the GPU has been around for decades and is still considered a hardware acceleration of RT vs CPU.

Mentioning "hardware RT" could simply be a marketing checkbox as the definition is technically correct.
If you're talking about Cotter's tweet, the guy mentioned we have slow software raytracing, at a time when we're running RT on 1080s. If we didn't already have compute based RT, we could argue that 'Hw' raytracing could mean acceleration on the GPU. However, as we have RT on compute already, and its not realistically fast enough to use in games, certainly not to a degree to get excited about, the next step for 'hardware acceleration' would be making that faster, so as fast as the RTX cards at a minimum.

That doesn't necessitate fixed-function units RTX style, but something on the GPU added for the purposes of accelerating RT.

The Wired article never mentioned hardware RT, which is why people picked up on the tweet as confirmation.
 
The problem is that RT is a massively parallel problem by its nature, so a massively parallel compute structure (read: GPU), already well suited to be a hardware solution to that. Is not hardware RT because those compute units do other things well?
If that's hardware RT, than we've had 'hardware RT' for...10 years? PS4 has hardware raytracing! Nintendo Switch has hardware raytracing!!

The 'magic threshold' is a design consideration to solve the bottlenecks of existing RT on compute to make it significantly faster. In the case of Turing, the addition of the intersect test units can double or better the raytracing performance, making that hardware acceleration of RT. Any change in the CUs that brings about an improvement at the hardware level over and above what can be managed with existing compute design philosophies would constitute RT hardware, even if it can be used for something else (a GPU is still a graphics processing unit despite it really being a general purpose processor). As the bottleneck appears to be memory test, traversing large memory structures, RT hardware acceleration probably needs to be focussed on solving that.
 
The problem is that RT is a massively parallel problem by its nature, so a massively parallel compute structure (read: GPU), already well suited to be a hardware solution to that. Is not hardware RT because those compute units do other things well? Can they only accelerate RT to be considered dedicated? What about ImgTec’s solution that isn’t full RT, does it count? What if GPU architecture evolves to where a generic compute unit does RT better? What’s the magic threshold to call it a hardware solution?
While RT can be said being suited well for parallel processing (because rays are independent of each other), in practice it's the opposite because of execution divergence (which hurts GPU) but mainly data divergence (which hurts GPU and CPU).
To tackle those problems one wants to group similar rays and close geometry for traversal, and hit points by material for shading (preferrably close hit points as well).
So it always ends up at binning / sorting to solve those issues. This requires to interrupt the classic traversal / intersection / shading algorithm with binning multiple times.

To make this faster (or worth it at all) hardware acceleration would mean assisting this binning processes (actually often a 3 step process: 1. Determinate how many objects go into each bin. 2. Determinate binned list start in memory (prefix sum) 3. Shuffle the objects so they are in cache friendly order of their list index)
And second, as we have interruptions, we do not want to dispatch this fine grained workloads from CPU anymore, we want to do it from GPU directly.

This would be the ideal kind of hardware acceleration to me, suited for much more than just raytracing. Traversal cores in contrast have only one application and prevent any flexibility about data structures, but surely that's another option.

What and when we can call any of this 'hardware RT' is subjective and likely more a matter marketing decisions. Even just some instructions to speed up ray triangle tests could be called HW RT.
 
Would tailoring say 8 CU's to share LDS (maybe larger size) give big benefit to RT?
Wouldn't do it to all of them due to increase die size etc.
Could run standard work loads there also but anything that is RT based (graphics, AI, audio) could be ran on the specifically tailored CU's.

That spare async compute on the standard CU's could be filled with normal graphical async workloads.

So raw TF may be the same or even down but actual performance up. Depending on if went for larger die or fewer CU's
 
DF agree with me then. This RT question is really about substance over form: meaning, whatever way they expect this to work, just the fact that Cerny mentioned it at this mini reveal means that he expect it to (1) work and (2) be used by developers. DF said it themselves, if I were a developer and heard this reveal I'd be very annoyed if I weren't then able to have some sort of RT working relatively well on the PS5. How this is achieved, at what cost, and which how much ease, well that is another question and pretty much abstract at this point.
 
If that's hardware RT, than we've had 'hardware RT' for...10 years? PS4 has hardware raytracing! Nintendo Switch has hardware raytracing!!

The 'magic threshold' is a design consideration to solve the bottlenecks of existing RT on compute to make it significantly faster. In the case of Turing, the addition of the intersect test units can double or better the raytracing performance, making that hardware acceleration of RT. Any change in the CUs that brings about an improvement at the hardware level over and above what can be managed with existing compute design philosophies would constitute RT hardware, even if it can be used for something else (a GPU is still a graphics processing unit despite it really being a general purpose processor). As the bottleneck appears to be memory test, traversing large memory structures, RT hardware acceleration probably needs to be focussed on solving that.

Emphasis mine. So if Navi CUs are 1% faster at RT than Vega iso-clock, they’re hardware RT acceleration?
 
Emphasis in wrong place.
a design consideration to solve the bottlenecks of existing RT on compute

If the changes are introduced to circumvent a current RT bottleneck and accelerate RT by 1%, then yes, that'd be hardware acceleration. If those changes are just changes to compute that happen to result in 1% gains for RT, then no. Let's be realistic here though - no hardware design to improve RT performance on a GPU is only going to yield a 1% gain. Hardware considerations look for substantially better returns. Maybe 50% faster minimum in specific workloads, but I'd expect better, so like 5-10x faster on a particular type of workload. There's no quantitative threshold as you're asking for, but a qualitative one to do with purpose and a deliberate intention to accelerate raytracing performance in the way the hardware works.
 
Emphasis mine. So if Navi CUs are 1% faster at RT than Vega iso-clock, they’re hardware RT acceleration?

No because 1% is hardly an improvement lol? That's why he used the expression "magic threshold".

I wonder if we are going to see Ray Tracing used extensively though. I bet that we might see it on launch titles to highlight the differences to the previous generation, but as the generation evolves, it might take a backseat to allow resources to be used for more complex graphics still using rasterization heavily? After all we are still in the early stages of having Ray Tracing on consumer hardware and even an RTX2080Ti has trouble keeping it working steadily at 4K...
 
Last edited:
No because 1% is hardly an improvement lol? That's why he used the expression "magic threshold".

I wonder if we are going to see Ray Tracing used extensively though. I bet that we might see it on launch titles to highlight the differences to the previous generation, but as the generation evolves, it might take a backseat to allow resources to be used for more complex graphics still using rasterization heavily? After all we are still in the early stages of having Ray Tracing on consumer hardware and even an RTX2080Ti has trouble keeping it working steadily at 4K...

Thank you, this illustrates my point about how muddy the waters.

Any change to CUs will be advertised as RT targeted regardless of the underlying reason because it will market well.

Joej lays out specific hardware strategies which I think is the only critical way to look at it.
It’s either that or you lay out a case for alternate ways of computing in an accelerated fashion with structures to accelerate your approach.

Just a reminder about the awesomeness of Zen 2, few months old benchmark of test sample Ryzen 3000 8C/16T [the same that will end up in PS5] vs stock Intel i9 9900K.

I'm so happy we are getting a proper CPU and storage with gen9.

This chip has somewhere in the realm of a 65-95W TDP.

We should expect 35-45W if we’re expecting similar TDP to last gen, which probably means clocks in the 2.8GHz to 3.2GHz range based on 2700E TDP and clocks.

I’m very interested if SMT is enabled and what restrictions it has, if any.
 
Last edited:
Just a reminder about the awesomeness of Zen 2, few months old benchmark of test sample Ryzen 3000 8C/16T [the same that will end up in PS5] vs stock Intel i9 9900K.

Note that the console version will likely run at much lower clocks because the total TDP is limited and it frankly makes more sense to burn the power in the GPU than it does to burn it in the CPU.
 
Status
Not open for further replies.
Back
Top