IHV Business strategies and consumer choice

The 4090 is exactly what you want. It has as much FP32 compute performance as AMD's MI250X or 2x of the MI210. UE5 doesnt use any kind of HW RT with software Lumen.
How much faster is the 4090 with 90 TFLOPs over a 6900XT with 23 TFLOPs in UE5? 60%.

Nearly 55 TFLOPs of compute performance or a whole MI210 card is unused by UE5. Do you still think pure software solutions are the future?
Why do you keep posting this? In no practical game scenarios are 90 TFLOPs usable by the 4090.
 
Last edited:
Not this. This was supposed to be a mature discussion about the marketplace. So far it's a fair bit of ranting and emotional language.

You're way too emotional.
Was it really supposed to be a mature discussion about the marketplace? This thread was created because nVidia fanboys were infesting the AMD thread and discussions arose about how nVidia and its blind followers have been messing up the market. I would know, because I was quoted in the opening post of this thread.

It's not a competition. If someone's wrong, you enlighten them and they are grateful.
LOL. This isn't heaven or Perfect Pony and Rainbow Land. No one here wants to get enlightened and there's no one here that is grateful. All they want is to be right and live in a green echo chamber.

MOD: User given a 7 day thread ban for this.
 
Last edited by a moderator:
The power of software is that we can work out efficient algorithms to improve time complexity...
The power of software was greatly aided by the exponential pace of Moore's Law. With that ending, a movement toward less programmable but more efficient specialized hardware is inevitable.
 
I guess if you believe these are the grounds for this discussion, nothing productive can ever be achieved.

What sort of understanding do you expect you can reach with people you consider to be "infesting nVidia fanboys" ?
The best solution I've found is to agree to disagree, I'm even trying to get more pleasant about it.

Neither side is entirely right, it's a subjective thing not objective. Both sides are wrong too, it just depends on your opinions and needs. That's not to say that some people don't get over zealous in their defense of their brand, but aside from that I really think there is no right answer and so the flames will continue until we can all agree that both sides got some points to make.
 
The power of software was greatly aided by the exponential pace of Moore's Law. With that ending, a movement toward less programmable but more efficient specialized hardware is inevitable.
Why do you think we still need to buy even more performance, even at the cost of loosing the flexibility we would need to use faster algorithms?
That's really the wrong way of looking at it, only leading to a dead end.
I'm happy i do not depend on fixed function HW just to squeeze some triangles or samples more out of it.
Games Industry could do much better business if they would finally realize it's time to let loose from a dependency on HW progress. They act like slaves who do not know there is another way of free will.
 
Why do you think we still need to buy even more performance, even at the cost of loosing the flexibility we would need to use faster algorithms?
Look at what you said: "faster algorithms". In the end, "faster" is the objective. (for this discussion let's just say that more performance == more visual fidelity, which really is CG's objective).

Flexibility has always been a useful means to that end. Flexibility allows for democratization and rapid iteration of algorithms. But flexibility has been *massively* aided by the exponential increase in underlying hardware capability. Otherwise, why do you need an RDNA3 or Ada? Nanite and Lumen SWRT should run performantly on a Fermi. Or a 486. It's Turing-complete, and therefore 100% flexible. But you can't, you need hardware performance too.

There has always been a tradeoff between performance and flexibility. Fixed-function hardware is always more efficient. HOWEVER, as long as Moore's Law was alive the pace of improvement of general purpose hardware was so breakneck that by the time you would design a fixed-function solution to a problem, the general hardware would have become 2x, 4x, 8x faster while enabling algorithms that the poor schmuck designing the fixed-function logic couldn't even had dreamed of when they started on their project. THAT was the power of Moore's Law.

That story is now dead. And don't for a moment think that I am happy. I don't like fixed-function hardware any more than you do. But if we're at a point where we have to make a hard choice between proven performance from specialized hardware and hypothetical performance from flexibility + marginal general-purpose perf improvements, I'll choose specialized hardware (grudgingly). Especially because the hardware specialization we are talking about isn't about making dumb application specific chips. It's about recognizing fundamental idioms in algorithms that are stable and broadly useful. I think Matrix acceleration is a great success story, and they are here to stay.

Whether today's RT engines are exactly the right idiom or not, time will tell. If the design needs to change, it will. If the APIs need to change, they will. But the overall approach of hardening common idioms into silicon, and more importantly, hw/sw codesign will only accelerate *unless* we get some major Si breakthrough.
 
Matrox in Parhelia, of course, but before that there were PN triangles in R200 and NURBS in NV20, iirc.
The question itself sounds totally silly and irrelevant. It wasn't a single company that came up with all these ideas or solutions)
Sure, and if we play with the definition of NURBS we could take it all the way back to NV1, for example. But whether the features materialized in games is undoubtedly significant.

Anyway, I was only interested in what would DavidGrahams' answer be.
 
Sure, and if we play with the definition of NURBS we could take it all the way back to NV1, for example
That's simple - if you want fully-featured tessellation with displacement maps, that's Parhelia 512; if you want splines converted to triangles, that's NV20 with a dedicated HW unit for tessellation to make the conversion happen; If you want something that materialized in games, that would be ATi's PN-triangles, as they were dead simple to integrate (though with some quite hilarious "blobby" results here and there). Back then, everyone and your grandma had their own vision and roadmap for tessellation features, which is why I mentioned in my previous post that the question was silly)
 
Last edited:
There has always been a tradeoff between performance and flexibility. Fixed-function hardware is always more efficient. HOWEVER, as long as Moore's Law was alive the pace of improvement of general purpose hardware was so breakneck that by the time you would design a fixed-function solution to a problem, the general hardware would have become 2x, 4x, 8x faster while enabling algorithms that the poor schmuck designing the fixed-function logic couldn't even had dreamed of when they started on their project. THAT was the power of Moore's Law.
Hard to agree with, especially when looking at games visual fidelity improvements, which were bought entirely by using more powerful HW. In recent decades i have rarely noticed any new algorithms in gfx (before RT which required some progress). Beside Nanite, i could not list anything worth to mention at the moment. And worse, all algorithms in gfx are brute force. No hierarchical processing, no fast multipole method, nothing clever. Things like trees are even considered 'slow' and mostly ignored in favor of batching over large brute force workloads. I'm impressed by low level and close to the metal optimization skills of gfx devs, but on top of that they seemingly lack any creativity at the high level. That's also why they love fixed function hardware acceleration. It tells them what they can do and how, so they do not need to think any further, and after they master to maximize throughput using that, they think they have achieved optimal performance.
Sorry for the prejudges and disrespect. I know there are many smart people in games, but the overall impression is that bad, and mainly boring. It feels like the invention of GPUs has caused a stagnation that still holds on til current day.
By realizing MLID, but still continuing to rely entirely on faster HW to solve the problem (a problem that i personally can not even see), basically confirms you're out of ideas and so everything i've just said, kind of.
(I hope you don't mind the harsh tone an d continue. I might learn something...)

But maybe we get each other wrong. It's not that i naively think we could make everything faster with 'better algorithms'. In fact efficient GI is the only idea i personally have. But that's something, as lighting sucks up most GPU performance.

I have two questions: What kind of fixed-function logic can you imagine to achieve further improvements?
But mainly: Why do you think we even need higher performance? I mean, we can actually do whatever we want. The sky is the limit already now?
If you would say 'AI will enable new games', i would buy that. But why more of the same old things that have been exhausted long enough already? What would it enable that we can not do now?

But if we're at a point where we have to make a hard choice between proven performance from specialized hardware and hypothetical performance from flexibility + marginal general-purpose perf improvements, I'll choose specialized hardware (grudgingly).
I see indeed got you wrong. Sorry again.
I agree with that. If we can HW accelerate a building block such as tracing a ray, rasterizing a triangle, linear algebra, or even things like sorting or binning, i agree we should.

I'll let the rant above stand just for the record, to - umm... - add some more specific prejudges the games industry has to deal with nowadays. ; )
 
What happened to the rest of my list?
Sorry, but I have other stuff to do besides debunking random lists of inventions that fanboys attribute to their favorite companies.

But here are one-liners:

Who came up with the unified shader architecture? - ATi in consoles, NVIDIA on PC
Who came up with tessellation first? - NVIDIA
Who had DX12 support first? - NVIDIA (they were the first to offer DX12 drivers).
Who invented HBM? - a whole bunch of companies
Who invented Vulkan? - Khronos group
Who came with async compute first? - AMD, but why don't you ask who introduced compute first? That's a rhetorical question
Who had chiplets first? - Microsoft in the Xbox 360.
Who actually has a user interface that doesn't appear to be from the 2000s? - the best question
 
Was it really supposed to be a mature discussion about the marketplace? This thread was created because nVidia fanboys were infesting the AMD thread and discussions arose about how nVidia and its blind followers have been messing up the market. I would know, because I was quoted in the opening post of this thread.
No, it was created to provide a single place for IHV comparisons without them crossing into every platform thread.
LOL. This isn't heaven or Perfect Pony and Rainbow Land. No one here wants to get enlightened and there's no one here that is grateful. All they want is to be right and live in a green echo chamber.
7 day temp ban. Don't return if you aren't serious.
 
I've removed a good chunk of posts including (unfortunate collateral damage) decent fact checking efforts as they are derived from dumb shit-talking. Any conversation that's about bias or fanboys will be purged including replies. If someone's being a jerk, ignore them. I'll get them sooner or later.

I'm currently enthusiastic enough to moderate this thread at the moment. Again, if it gets too much, I walk.
 
Last edited:
Hard to agree with, especially when looking at games visual fidelity improvements, which were bought entirely by using more powerful HW. In recent decades i have rarely noticed any new algorithms in gfx (before RT which required some progress). Beside Nanite, i could not list anything worth to mention at the moment. And worse, all algorithms in gfx are brute force. No hierarchical processing, no fast multipole method, nothing clever. Things like trees are even considered 'slow' and mostly ignored in favor of batching over large brute force workloads. I'm impressed by low level and close to the metal optimization skills of gfx devs, but on top of that they seemingly lack any creativity at the high level. That's also why they love fixed function hardware acceleration. It tells them what they can do and how, so they do not need to think any further, and after they master to maximize throughput using that, they think they have achieved optimal performance.
Sorry for the prejudges and disrespect. I know there are many smart people in games, but the overall impression is that bad, and mainly boring. It feels like the invention of GPUs has caused a stagnation that still holds on til current day.
By realizing MLID, but still continuing to rely entirely on faster HW to solve the problem (a problem that i personally can not even see), basically confirms you're out of ideas and so everything i've just said, kind of.
(I hope you don't mind the harsh tone an d continue. I might learn something...)

But maybe we get each other wrong. It's not that i naively think we could make everything faster with 'better algorithms'. In fact efficient GI is the only idea i personally have. But that's something, as lighting sucks up most GPU performance.

I have two questions: What kind of fixed-function logic can you imagine to achieve further improvements?
But mainly: Why do you think we even need higher performance? I mean, we can actually do whatever we want. The sky is the limit already now?
If you would say 'AI will enable new games', i would buy that. But why more of the same old things that have been exhausted long enough already? What would it enable that we can not do now?


I see indeed got you wrong. Sorry again.
I agree with that. If we can HW accelerate a building block such as tracing a ray, rasterizing a triangle, linear algebra, or even things like sorting or binning, i agree we should.

I'll let the rant above stand just for the record, to - umm... - add some more specific prejudges the games industry has to deal with nowadays. ; )
TAA I would say is a very smart and efficient solution to aliasing that isn't reliant on more powerful hardware.
 
Kepler used an ILP approach for increasing compute performance. In the worst case you can lose 1/3 of the performance.
This isnt true since Maxwell anymore. And you get the full 128 MADD FP32 operations from a Ampere and Lovelace compute unit. nVidia has even moved the Gaming-Ampere architecture to Hopper. The lastest example is the new Cinebench R24 with GPU compute: https://www.computerbase.de/2023-09...ty-benchmark/#abschnitt_ergebnisse_einreichen

nVidia's dual FP32 pipeline works just fine. Game engine designers have access to nearly unlimited compute performance with Ampere, Lovelace and (maybe) RDNA3.

I would look at it this way: Your FPS should be high enough and fluid? So even if the GPU would be faster at it, you would not really notice the improvement anyway?
Currently, maxing out a 4090 surely is no priority for the industry. It may take some years until it can show its muscles beyond PT.
Isnt this a problem? Not using the full potential of different architectures? I was interested in how much faster the 4090 is in UE5 games over the 2080TI:
Immortals of Aveum : 2.83x (TPU)
Remnant 2: 2.58x (TPU)
Fort Solis (HW Lumen): 2.82x (GameGPU)
Layers of Fear (HW Lumen): 2.46x (GameGPU)

And here a few examples from others games with more traditional rendering benched by TPU:
Amored Core 6: 2.4x
Atlas Fallen: 2.68x
Baldurs Gate 3: 2.67x
Ratchet w/o RT: 2.56x
Jedi Survivor: 2.59x
Atomic Heart: 2.58x
Hogwarts Legacies: 2.50x
Resident Evil 4: 2.85x

Despite UE5's more heavy and more compute focus workloads the difference is basically the same. UE5 has the same performance limitiation as a more traditional renderer.
This engine limitation creates a huge problem: Instead of 50 FPS -> 150 FPS, the performance difference in UE5 only goes from 15 FPS -> 45 FPS in 4K. Playing in 4K is not really possible on a 4090 with 90 TFLOPs anymore...

Focusing on more software solutions with more flexibility works only with a very optimized software stack. When 1.33x to 2.0x of the compute performance is not used, the software solution will slow down the processor massivly.

As a bonus here two numbers from Cyberpunk with Pathtracing which is a very compute focus workload:
Native 1080: 4.43x
1440p with DLSS Performance (nativ 720p): 3.68x

These numbers are very close to the Cinebench R24 GPU benchmark.
 
Last edited:
Despite UE5's more heavy and more compute focus workloads the difference is basically the same.

Compute doesn't necessarily mean flops will be the key factor in performance. Nanite's rasterizer for example makes heavy use of 64-bit atomics which could very well be a bottleneck. But you make a very good point. You would expect Nanite and Lumen to scale better than other engines on modern hardware. Really interesting that it's scaling about the same as "classic" rendering pipelines.

More Cinebench 24 results

Cinebench 24 seems to be using hardware RT on Nvidia cards but not on AMD so not a fair comparison. Looking at those Redshift numbers 4090 is 77% faster than the 7900 XTX in raw compute and 124% faster with hardware RT enabled. In Cinebench it's 130% faster which matches the Redshift result with hardware RT.
 
Compute doesn't necessarily mean flops will be the key factor in performance. Nanite's rasterizer for example makes heavy use of 64-bit atomics which could very well be a bottleneck. But you make a very good point. You would expect Nanite and Lumen to scale better than other engines on modern hardware. Really interesting that it's scaling about the same as "classic" rendering pipelines.
...

Isn't that because LOD management generally means that hardware rasterization is not the bottleneck? It'll never be an apples to apples comparison, because you would never have the same geometry "resolution" or shadow resolution with a traditional hardware rasterizer and LOD management vs UE5 nanite with VSMs. Nanite, VSMs are only more efficient in the case where polygons are small enough that performance would not be ideal for the fixed hardware. Make the polygons bigger and the fixed hardware is more efficient again, which is why nanite has a fallback for larger polygons. Nanite selects sw vs hw rasterization on a cluster level, and the sw rasterizer is faster for any cluster where the clusters longest edge is less than 32 pixels long.

Long way of saying nanite probably does scale better for one use case, and hw rasterizer scales better for another use case.
 
Last edited:
Isn't that because LOD management generally means that hardware rasterization is not the bottleneck? It'll never be an apples to apples comparison, because you would never have the same geometry "resolution" or shadow resolution with a traditional hardware rasterizer and LOD management vs UE5 nanite with VSMs. Nanite, VSMs are only more efficient in the case where polygons are small enough that performance would not be ideal for the fixed hardware. Make the polygons bigger and the fixed hardware is more efficient again, which is why nanite has a fallback for larger polygons. Nanite selects sw vs hw rasterization on a cluster level, and the sw rasterizer is faster for any cluster where the clusters longest edge is less than 32 pixels long.

Long way of saying nanite probably does scale better for one use case, and hw rasterizer scales better for another use case.
One of the bigger problems with the graphics pipeline in respect to micro triangle geometry is that GPUs shade 2x2 quads to compute the LoDs for texture sampling. If a triangle only covers a single pixel (active lanes) within a 2x2 quad, hardware still has to shade the 3 uncovered pixels (helper lanes) if the user wants the correct LoDs ...
 
Isn't that because LOD management generally means that hardware rasterization is not the bottleneck? It'll never be an apples to apples comparison, because you would never have the same geometry "resolution" or shadow resolution with a traditional hardware rasterizer and LOD management vs UE5 nanite with VSMs. Nanite, VSMs are only more efficient in the case where polygons are small enough that performance would not be ideal for the fixed hardware. Make the polygons bigger and the fixed hardware is more efficient again, which is why nanite has a fallback for larger polygons. Nanite selects sw vs hw rasterization on a cluster level, and the sw rasterizer is faster for any cluster where the clusters longest edge is less than 32 pixels long.

Long way of saying nanite probably does scale better for one use case, and hw rasterizer scales better for another use case.

Compute resources have increased faster than hardware rasterizer throughput. Therefore you would expect modern, more compute heavy games to scale better than games that still rely on fixed function hardware. Nanite should theoretically be more compute heavy and therefore make better use of the abundant flops available on modern gpus. Theoretically…
 
One of the bigger problems with the graphics pipeline in respect to micro triangle geometry is that GPUs shade 2x2 quads to compute the LoDs for texture sampling. If a triangle only covers a single pixel (active lanes) within a 2x2 quad, hardware still has to shade the 3 uncovered pixels (helper lanes) if the user wants the correct LoDs ...
This raises me two questions:
If we don't use texture LOD, can any or all modern GPUs shade one sub pixel triangle per lane? I don't think so, but your formulation makes it sound like that.
Do helper lanes only help with derivatives, but are then marked inactive to bypass things like texture fetches? I've thought so, but again your use of 'shade' gives me doubts.
 
Back
Top