Realtime AI content generation *spawn

Why was this conversation split off into a Gaming Technology thread when the discussion was precisely about whether it should or should not happen, regardless of technical ability?
It was started in DF Tech discussion where it should have been about tech. It moved towards AI, so it's talking about AI tech in an AI tech thread in a tech forum. There are other threads on whether AI should happen or not. eg:

 
If you think AI generated garbage is art then you don’t like art. It’s removing the human aspect completely.

The only people I see that think AI slop is art are tech industry bugmen.

This kind of language is really not necessary. Do you think it's ok if I say something like "If you think AI generated images are not art then you don't like technology"?
As @Shifty Geezer said, please keep emtions out of this thread.
 
AI is actually just software with options. It all depends on what data this software is loaded with, and what data and images it has access to. Furthermore, how the AI uses this available data is controllable. So it all depends on the size of the framework, if it moves in a well-defined relatively closed interval, it can work well, even helping the work of graphic artists. An example of computer graphics: the artist creates the basic lower-resolution texture, and the well-controlled AI improves it to high resolution and more detail based on the exact instructions given by the artist. With this method, the artist can save time and thus create several textures in the same time.

By feeding this data into the game engine, and taking into account the storage space/computing capacity running on the current hardware, a more favorable result can be achieved due to the real-time texture enlargement.

Clearly, the use of AI can have many advantages, the question is how it is used.
 
This kind of language is really not necessary. Do you think it's ok if I say something like "If you think AI generated images are not art then you don't like technology"?
As @Shifty Geezer said, please keep emtions out of this thread.
To put it less emotionally, art created by robots isn’t art at all. Art is an innately human endeavor. But like the mod said this is all off topic anyways and better for another thread.
 
I think there's a difference between true art and practical art. Package design involves art, but it's hardly a work of passion how Chocolate Digestives are represented. Trained, skilled artists will apply a bunch of rules and precedents to produce perfunctory content, including in games. For the purposes of creating visuals in a game, we're just talking about that generation process as 'art'. The philosophy of human endeavour is something else altogether.
 
I think there's a difference between true art and practical art. Package design involves art, but it's hardly a work of passion how Chocolate Digestives are represented. Trained, skilled artists will apply a bunch of rules and precedents to produce perfunctory content, including in games. For the purposes of creating visuals in a game, we're just talking about that generation process as 'art'. The philosophy of human endeavour is something else altogether.
You clearly haven't seen the outrage caused by changing classic packaging to something generic/modern/whatever :runaway:
 
And terrible results. ;)
Amazing results actually -like you said- considering the early effort, and the non specialty of the AI model used, it's an old not so stable model called Runway Gen 3, most probably running on commercial level hardware. It's not Sora running on a supercluster of H100s or anything. It's also more successful with slow motion with less artifacts.


The original sentiment in this recent discussion was what would it take to achieve photorealism. Glitches and issues that are tolerable in computer games miss that target. So you could have a future where games are approaching photorealism in 5 years, say, but aren't confused with TV because of all the artifacts and issues.
Even after achieving photo-realsim, games will still be easily distinguishable from TV scenes, because photo-realism is not the same as physics-relaism, the way things move and interact (animation, particles, destruction, weight, ..etc) is an entirely different problem than the way things look. Sweeny's comment only refers to the way things look (photo-realism), however physics-relaism is an order of magnitude more difficult to solve.

So this whole (non distinguishable from TV) point doesn't really matter, we won't achieve it anyway until we solve physics-relaism, which needs even more hardware power and new algorithms.

You say they have things like hair physics and cloth physics, but these don't move right
They are more convincing though, especially vs the current gaming tech where they don't move at all or move in all the wrong ways.

How much effort will it be to create a ground truth for one game, let alone a model that can handle any and every game?
Why would the model handle every game? it's enough in the beginning to handle the situation in a game by game basis, until it accumulates enough data. We don't need every game to be AI enhanced, a select few titles with advanced visuals are enough in the beginning, just like Path Tracing now.

1) Your examples aren't realtime. Creating blobby pseudovideo with odd issues takes a lot of time and processing power. You need a massive improvement just to get these results in realtime.
I honestly think these videos don't take that many hours to render, the channels on YouTube are making dozens every week, they are most probably running on consumer level hardware as well, so using a specialized hardware will make things faster for sure.

I already gave an estimate on how things will progress regarding that point, maybe in 4 to 6 years GPUs will have more TOPs than FLOPS, and thus will be powerful enough to do 720p30, and upscale it using more advanced machine learning algorithms to the desired resolution and frame rate.

See here, an AI model running on a single H100 rendered a Minecraft clone entirely using AI with a 720p20 fidelity in realtime.

Oasis is the first playable, real-time, open-world AI model that takes users' input and generates real-time gameplay, including physics, game rules, and graphics.


Google also ran Doom with 20fps on a single TPU in realtime.


So things are advancing faster than we anticipate, in a single year models are often updated several times with new capabilities and faster processing, AI hardware is also released every 2 years now with often 2x to 3x more hardware power, unlike the traditional rendering where hardware speed and software advancements seems to have slowed down significantly. AI is advancing much much faster than traditional software.
This wonderful photorealistic future is by no means a certainty.
We won't get there immediately of course, we will have to go through baby steps first, explained below.

I think there are some areas ML would be very effective. My instinct is you could ML lighting on top of conventional geometry. You might also be able to ML enhance a game so lower rendering quality can be enhanced to Ultra quality, and you could train an ML on a better-than-realtime model to enhance the base model
I agree, but my premise is that we will start to with simpler things first like a post process technique to enhance hair, fur, fires, water, particles, clothes and other small screen elements .. etc, coupled with motion vectors and other game engines data, the results should be great and move in a convincing manner.

Enhancing textures is also a strong possibility, we already have a prototype of replacing textures in a game editor (RTX Remix) in real time, could be possible with further developments. Enhancing lighting is also another strong candidate in the beginning, just not the very beginning, maybe down the road a little.

And just like that, AI will be enhancing rendering one thing a time, bit by bit and piece by piece, until it's all enhanced in the end.
 
Amazing results actually -like you said- considering the early effort
You are only looking at it from one perspective. Yes, the results are amazing in terms of what's accomplished. But they are also pretty terrible in terms of realism and authenticity, in just preserving a realistic world. One can argue that's because of early models, etc., but it should be acknowledged what's wrong as well as what's right. Only startups chasing clueless investors should be pushing a "everything's great just think of the possibilities!!!!" line and an objective look at the tech should consider all the elements.

A particular issue with these generative AI solutions is generating incorrect information. Everything from sticks that materialise on the ground, to blobs on the wall, to reflections that just manifest and disappear, to arms fusing together. These are problems that need to be acknowledged if they are then to be (attempted to be) solved.

See here, an AI model running on a single H100 rendered a Minecraft clone entirely using AI with a 720p20 fidelity in realtime.


Google also ran Doom with 20fps on a single TPU in realtime.

Is there a reason these are really simple visuals chosen for their generation as opposed to photorealistic? I'm guessing yes, the greater the target quality, the more demanding the process. So we need to get from blobby 720p20 Minecraft on an H100 to pixel-perfect 4k60 photorealism on a consumer grade GPU.

And let's just look at what is actually being achieved at 720p20

1730548871235.png

1730549003338.png

This is not 720p20 of the fidelity achieved via rasterisation. Heck, even full pathtracing runs far faster and at far better quality!

So before aiming for 1080p30 photorealism, it'd be nice to have 720p20 (which is actually a 540x540 window?) that can produce straight lines and clear edges and visible icons. What will that take? As the models increase in complexity to add more detail, the processing power increases. Is there any way to predict exactly how much so? The many advances in AI results are largely being powered by faster energy-munching AI hardware, which we know will suffer the same diminishing returns as all other silicon.

Let's imagine a 5090 can run this Minecraft at 720p20 (pretending it's generating 1280x720 pixels instead of 540x540). The next GPU doubles that performance so we have 1080p20. Then we double it again for the next, we hit 1080p30 and maybe upscale the framerate. Are we getting anywhere close to photorealism with all this processing? No, we're hitting blobby Minecraft, where those GPUs will be able to produce far better using conventional rendering.

I agree, but my premise is that we will start to with simpler things first like a post process technique to enhance hair, fur, fires, water, particles, clothes and other small screen elements .. etc, coupled with motion vectors and other game engines data, the results should be great and move in a convincing manner.
This is a very different system to generative examples. You pointed to Mafia etc. as what AI can do. You continue this with the examples above of ML game generation. I'm saying that route doesn't go anywhere realistically (probably!) and so doesn't work as any form of indicator. "Look at what AI can already do." Those examples are a dead end not indicative of the future of ML in gaming.

Now change the proposition to augmenting aspects, that's very different, and more plausible and something I agree with. This is I think the more interesting consideration for what ML will contribute to gaming in the coming years. But even then, there's not a lot of evidence of how that works and what'll be required. Take for example what's currently possible. It's been stated that the Tensor cores on an nVidia GPU are well used for upscaling, denoising, and frame generation. If they are already occupied, how much ML capacity is free to work on other workloads like lighting and hair physics?
 
Nvidia released a paper on an alternative to Gaussian splatting that leverages the HWRT capability of existing GPUs.
Particle-based representations of radiance fields such as 3D Gaussian Splatting have found great success for reconstructing and re-rendering of complex scenes. Most existing methods render particles via rasterization, projecting them to screen space tiles for processing in a sorted order. This work instead considers ray tracing the particles, building a bounding volume hierarchy and casting a ray for each pixel using high-performance GPU ray tracing hardware. To efficiently handle large numbers of semi-transparent particles, we describe a specialized rendering algorithm which encapsulates particles with bounding meshes to leverage fast ray-triangle intersections, and shades batches of intersections in depth-order.

It's interesting that HWRT capability can still be useful even in a rendering paradigm that isn't based on triangle meshes.
 
One can argue that's because of early models, etc., but it should be acknowledged what's wrong as well as what's right
Of course it should! Nobody presented these results as the perfect thing, they are all presented as "early" and "promising" results if development continued and resulted in faster and more accurate model. I think this is obvious in every conversation about this.

As the models increase in complexity to add more detail, the processing power increases. Is there any way to predict exactly how much so? The many advances in AI results are largely being powered by faster energy-munching AI hardware, which we know will suffer the same diminishing returns as all other silicon
Generating the whole game on the fly is much much more compute demanding than just augmenting already rendered scenes. These two things are completely different and the scale of required compute is completely different as well.

It's the same as ray tracing the whole game from scratch or just ray tracing lighting/reflections through augmentation with other screen space elements/textured polygons .. etc. These two things are not the same compute wise.

This is a very different system to generative examples. You pointed to Mafia etc. as what AI can do.
To what AI can "eventually" do, I didn't say these things will be available tomorrow. Big difference.

It's been stated that the Tensor cores on an nVidia GPU are well used for upscaling, denoising, and frame generation. If they are already occupied, how much ML capacity is free to work on other workloads like lighting and hair physics?
Currently there is not enough free capacity, but I already outlined that this future is not achievable on current GPUs, I already stated that for this AI future GPUs will dedicate most of their dies to tensor cores in (4 to 6 years), by then, faster, specialized, and more accurate models that rely on 3D data would have been developed, they would enhance objects on screen piece by piece, until hardware and software are powerful enough to enhance the whole things, the goal then would be to render at 720p30 and upscale and frame generate from that to desired resolution. I don't see this path as difficult at all, especially after what was achieved by these early results.

Let's imagine a 5090 can run this Minecraft at 720p20 (pretending it's generating 1280x720 pixels instead of 540x540)
We don't need to perform that comparison for the reasons above. Instead it should be a 7090 thing with 75% of it's die dedicated to tensor cores.

I already outlined all of that in my original reply to the topic of reaching photo realism, I don't know why we keep circling back to this point. I also outlined that current conventional methods have reached stagnation, hardware wise we need to scale current transistor budgets so high that it's not feasible anymore with current process technology. Software wise we have reached many bottlnecks and CPU limited code continues to be a major problem.

Solution? scale transistor and software paradigm into a different direction, instead of GPUs where 90% of the die is FP32 cores, do GPUs where 75% of the die is FP4 matrix cores, run AI models on them and upscale and frame gen the results. In short, follow the path with the least resistance.
 

If he keeps having this mindset, I predict Epic won't exist in 10 years or be much less successful/relevant.
You think the company that makes the game engine which is quickly becoming universal will not exist because the CEO doesn’t like AI hallucinations masquerading as games?

I don’t think this sentiment is at all uncommon among anyone involved in making games, or art in general. Besides all of that it looks not very good and it’s by definition derivative: this model would have never came up with Minecraft in a vacuum, it can only do this because it was trained on the existing game.
 
Currently there is not enough free capacity, but I already outlined that this future is not achievable on current GPUs, I already stated that for this AI future GPUs will dedicate most of their dies to tensor cores in (4 to 6 years), by then, faster, specialized, and more accurate models that rely on 3D data would have been developed, they would enhance objects on screen piece by piece, until hardware and software are powerful enough to enhance the whole things, the goal then would be to render at 720p30 and upscale and frame generate from that to desired resolution. I don't see this path as difficult at all, especially after what was achieved by these early results.

We don't need to perform that comparison for the reasons above. Instead it should be a 7090 thing with 75% of it's die dedicated to tensor cores.

I already outlined all of that in my original reply to the topic of reaching photo realism, I don't know why we keep circling back to this point. I also outlined that current conventional methods have reached stagnation, hardware wise we need to scale current transistor budgets so high that it's not feasible anymore with current process technology. Software wise we have reached many bottlnecks and CPU limited code continues to be a major problem.

Solution? scale transistor and software paradigm into a different direction, instead of GPUs where 90% of the die is FP32 cores, do GPUs where 75% of the die is FP4 matrix cores, run AI models on them and upscale and frame gen the results. In short, follow the path with the least resistance.
Tensor cores will likely take up greater and greater portions of die space but I don't think they will ever take up the majority of it for standard consumer GPUs. FP32 is just too important for too many use cases. RT hardware will also take up more die space; current GPUs only have dedicated HW for BVH traversal and intersection testing but future GPUs will have HW for BVH construction and ray sorting as well. Giving tensor cores the majority of die space would only make sense if almost every application had a rendering pipeline in which neural tasks takes up the majority of frame time. This may be the case for some but not a large majority of games.

I think there will be a great diversification in real-time rendering techniques, not a consolidation around one paradigm. There will be some apps that work like today's cutting edge games do: mesh shader/software rasterization for primary visibility, RT for lighting (either separate shadow, reflection, and GI passes or a unified path tracing pipeline) and neural for upscaling, antialiasing, denoising, frame generation, and radiance caching. There will be others that go all in on HWRT with a unified path tracing pipeline shooting rays from the camera to determine primary visibility, and use neural hardware for the same purposes. There will be others that are like that but use a few more neural techniques, like various neural primitives that coexist with the triangle mesh primitives, or neural materials to replace regular textures. Some will ditch triangle meshes completely in favor of radiance fields or some other neural primitive; even with radiance fields there are multiple ways to render them: NeRFs, Gaussian splatting, and the new Gaussian ray tracing technique. And there might be some that use video synthesis models in some manner as well.

The consumer GPUs of the future will be expected to provide good performance with all of these applications, along with applications with 3D graphics that have ever been made up until then and any GPGPU apps that consumers use. Thus, they will need to split die space between a carefully-chosen mix of cache, I/O, ROPs, TMUs, FP32 cores, tensor cores, RT cores, optical flow accelerators, and more. That being said, PSSR proves that the FP32 cores can also perform inference. Future inference-heavy real-time graphics applications would ideally split the inference workload between the FP32 cores and tensor cores.
 
Currently there is not enough free capacity, but I already outlined that this future is not achievable on current GPUs, I already stated that for this AI future GPUs will dedicate most of their dies to tensor cores in (4 to 6 years)...
Even if your theory about how AI can generate and augment is true, this argument here seems highly implausible to me. It'd mean a GPU that can play new games written for the new paradigms which won't run on old GPUs, but not play old games. And who is going to write games that work for a GPU that's mostly Tensor cores and ignore all the GPUs that aren't? Surely that'll be a tiny niche market.

If GPUs in 6 years time are mostly Tensor,

We don't need to perform that comparison for the reasons above. Instead it should be a 7090 thing with 75% of it's die dedicated to tensor cores.

I already outlined all of that in my original reply to the topic of reaching photo realism, I don't know why we keep circling back to this point.
No circling back. You introduced a new reference point; I provided my thinking based on it. I have no idea how 75% of a 7090 being Tensor would relate to a 4090's Tensor power. What level of improvement would be expect in raw calculation potential?

Solution? scale transistor and software paradigm into a different direction...
Yes. But I don't think that'll be enough to hit photorealism any time soon. I wasn't saying ML is a dead end, but that the idea we have enough GPU power to reach photorealism in a near-term timeframe - Sweeney's statement on 40 TF was only 8 years ago and only 6 years before it happened in the 4080 in 2022 - is optimistic and unlikely. I don't think the software paradigms exist now nor will develop in the next few years to see ML contribute enough, and I don't think the processing power for ML will be enough to solve the issues it'll face in producing photorealistic images.

I think just as conventional rendering and RT have come so far and now have a massive performance wall, ML will get so far and then hit a performance/quality wall. The only way I see for a photorealistic gaming future in the near time is some magical new paradigm like nVidia's 3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes. That looks like the target in terms of visuals, when using photo sources. When they drop virtual objects they look terrible, but that's not been the area of progress.
 
Back
Top