Realtime AI/ML game technologies *spawn

raytracingfan · Oct 7, 2024

dorf said:
Those game clips are generated with Gen 3 which has a max length of 10sec. This is not great for temporal consistency as every 10sec Lara Croft's shirt might turn from woolly to leather to whatever material.

If this type of architecture will ever be used for visual effect in real-time gameplay context it has a lot of maturing ahead of it. The good thing is that game engines already have depth buffers, color etc to ground the video.

If this type of architecture was used I don't think it would be an off-the-shelf model. The devs would train their own model for the game.

MfA · Oct 7, 2024

It doesn't really solve lighting or animation or physics interaction. It just gussies up the render, replacing fine detail. Most of the very hard simulation problems remain even if you could use this in real time, used some kind of transfer from a long term reference for consistency and ignored the inevitable screw ups.

DavidGraham · Oct 7, 2024

Metro Exodus.

DavidGraham · Oct 15, 2024

Mafia.

MfA · Oct 15, 2024

Man it makes a mess of the cars.

Shifty Geezer · Oct 15, 2024

It's so...blobby! Also the mess it makes of the overhead cables. I do wonder how generative AI will be able to cope when it has zero understanding. If it can be fed geometry data as well, it'll be a lot more robust. But really, we need to see AI working well in offline movies before we have even the hint of a chance of it working well in realtime in games. When Hollywood feeds a model a basic previs render and it outputs final-quality results, then we can look at accelerating that.

Karamazov · Oct 15, 2024

could work great for a silent hill game style parallel world.

DavidGraham · Oct 16, 2024

There's a new generative AI startup in town called Tales, and its goal is to empower gamers and creators to essentially make whatever they want with mere text prompts

the technology is based on a Large World Model (LWM) called Sia

This LWM is allegedly capable of generating all the components of a video game, from environments, 3D models, and gameplay to NPC (non-player character) behavior - along with detailed metadata

Startup Tales Wants to Let You Build Your Own Games with Generative AI Beginning Next Month

AI startup Tales wants to let you build your own games with gen AI starting from a simple text prompt. Early access launches next month.

wccftech.com

raytracingfan · Oct 18, 2024

Nvidia's goal for neural rendering in the near-future seems to be a unified hybrid rasterization-raytracing-neural approach in which the neural renderer would have access to the g-buffer produced by conventional rendering and standard objects (triangle meshes) can be used alongside neural objects. Artists would still have full control of the final output. There will be no need to resort to feeding frames and text descriptions to an AI video generation model and praying it gives the desired output instead of hallucinating something random.

Compositional Neural Scene Representations for Shading Inference | NVIDIA Real-Time Graphics Research

We present a technique for adaptively partitioning neural scene representations. Our method disentangles lighting, material, and geometric information yielding a scene representation that preserves the orthogonality of these components, improves interpretability of the model, and allows...

research.nvidia.com

From the paper:

The purpose of the image generator is to synthesize an image from a novel, unobserved view of the scene. The generator receives: parameters of the camera, a G-buffer rendered using a traditional renderer from the novel view, and a view-independent scene representation extracted by the encoder.

One shortcoming of the neural model employed in this article (up to this point) is the rather poor visual quality on high-frequency visual features. However, the fine details and structures due to local illumination can be synthesized inexpensively using classical methods (if an accurate 3D model is available). The output from the classical renderer can be provided to the neural renderer as an additional input, or simply combined with the generated image.The two renderers can complement each other with the neural one focusing on the costly effects only.

We investigate one such scenario in Figure 17: we compute direct illumination via ray tracing and optimize the neural model to produce only indirect illumination.

Neural Scene Graph Rendering | NVIDIA Real-Time Graphics Research

We present a neural scene graph---a modular and controllable representation of scenes with elements that are learned from data. We focus on the forward rendering problem, where the scene graph is provided by the user and references learned elements. The elements correspond to geometry and...

research.nvidia.com

In recent years, computer-vision algorithms have demonstrated agreat potential for extracting scenes from images and videos in a(semi-)automated manner [Eslami et al. 2018]. The main limitation, common to most of these techniques, is that the extracted scene representation is monolithic with individual scene objects mingled together. While this may be acceptable on micro and meso scales, it is undesired at the level of semantic components that an artist may need to animate, relight, or otherwise alter.

Compositionality and modularity—patterns that arise naturally in the graphics pipeline—are key to enable fine control over the placement and appearance of individual objects. Classical 3D models and their laborious authoring, however, are ripe for revisiting as deep learning can circumvent (parts of) the tedious creation process.

We envision future renderers that support graphics and neural primitives. Some objects will still be handled using classical models (e.g. triangles, microfacet BRDFs), but whenever these struggle with realism (e.g. parts of human face), fail to appropriately filter details (mesoscale structures), or become inefficient (fuzzy appearance), they will be replaced by neural counterparts that demonstrated great potential. To enable such hybrid workflows, compositional and controllable neural representations need to be developed first.

Shifty Geezer · Oct 21, 2024

Not realtime, but this illustrates what's possible with greater source clarity than just images.

Although the fries do go a bit disconnected. With 3D sources, a lot more is going to be possible than processing images. I'm also curious how the horse was handled. I'm guessing there must be prompts.

trinibwoy · Oct 21, 2024

Shifty Geezer said:
I'm also curious how the horse was handled. I'm guessing there must be prompts.

Wouldn’t it be easy enough to infer the 2D image was a horse and go from there?

Shifty Geezer · Oct 21, 2024

Yes, but you'd need to have a predefined idea of what a horse is. There must be something prompting the AI on different possible interpretations.

And from that, how much deviation can you have until it doesn't recognise a 'horse' and doesn't know to give it four legs? Or what if you create a 6 legged salamander, drawing three 2D legs? Or a three-legged tripod creature?

So, if it is to work well, I think there must be something guiding the AI, whether word prompts or some other meta data to inform it.

Kaotik · Oct 21, 2024

trinibwoy said:
Wouldn’t it be easy enough to infer the 2D image was a horse and go from there?

Shifty Geezer said:
Yes, but you'd need to have a predefined idea of what a horse is. There must be something prompting the AI on different possible interpretations.

And from that, how much deviation can you have until it doesn't recognise a 'horse' and doesn't know to give it four legs? Or what if you create a 6 legged salamander, drawing three 2D legs? Or a three-legged tripod creature?

So, if it is to work well, I think there must be something guiding the AI, whether word prompts or some other meta data to inform it.

Exactly, the AI models don't know what horse is the way humans do. It can be trained on x images and told that's a horse but it will only get you so far. What if someone stuck a carrot in it's forehead to make it look like a unicorn. Human would know it's a horse with a carrot stuck on its forehead, but AI might trip not to recognize it as anything it's been trained on or if it's trained on unicorn images, it could interpret it as unicorn.

Shifty Geezer · Oct 30, 2024

What would it actually take to get photorealistic graphics at 1080p and 2160p? Tim Sweeney said 40 TFs, but that seems far off the mark save in some specific cases like body-cam urban scenes. Is it a case of the computational power being wrongly directed, or has the workload of reality been grossly underestimated? Given the complete fail of things like accurate foliage that we are nowhere near solving, and truly natural human behaviours, and solid, correct illumination, and realistic fire and smoke, and the many, many flops of ML that we're looking to to solve some of these, the actual workload to create something like watching a film in realtime seems a long, long way off, if even possible. We inch ever closer, but the closer we get, the more the shortcomings stand out.

Seanspeed · Oct 30, 2024

Shifty Geezer said:
What would it actually take to get photorealistic graphics at 1080p and 2160p? Tim Sweeney said 40 TFs, but that seems far off the mark save in some specific cases like body-cam urban scenes. Is it a case of the computational power being wrongly directed, or has the workload of reality been grossly underestimated? Given the complete fail of things like accurate foliage that we are nowhere near solving, and truly natural human behaviours, and solid, correct illumination, and realistic fire and smoke, and the many, many flops of ML that we're looking to to solve some of these, the actual workload to create something like watching a film in realtime seems a long, long way off, if even possible. We inch ever closer, but the closer we get, the more the shortcomings stand out.

Animation is always gonna be a difference between pre-rendered CGI and real time games graphics. Being able to create bespoke animations for any given scene is a huge advantage compared to the more freeform nature of interactive gameplay. You'd need some nearly magical level of flawless procedural animation and physics working together to make a game that had truly realistic and dynamic behaviors for people and other entities.

Naughty Dog is the clear industry leader in this area, who seem to heavily prioritize these things, and they're still heavily reliant on a limited library of scripted animations at the end of the day, much as everybody else.

Shifty Geezer · Oct 30, 2024

Even if we excuse things like ambulation, that need to be cheated to provide responsiveness, and look at a Quantic Dream type thing, or Death Stranding walking simulator, facial animations aren't close to realistic yet. The closer we get, the more the fine details fail. Lip sync isn't perfect and that really gives the game away. Facial deformation isn't perfect on a miniscule level, but that miniscule level is enough to make it look wrong. Clothing doesn't fold and bunch and crease and flex right. Skin doesn't either. At the level of world simulation, we've only really just begun and have so very far to go!

DavidGraham · Oct 30, 2024

Shifty Geezer said:
What would it actually take to get photorealistic graphics at 1080p and 2160p? Tim Sweeney said 40 TFs, but that seems far off the mark save in some specific cases like body-cam urban scenes. Is it a case of the computational power being wrongly directed, or has the workload of reality been grossly underestimated? Given the complete fail of things like accurate foliage that we are nowhere near solving, and truly natural human behaviours, and solid, correct illumination, and realistic fire and smoke, and the many, many flops of ML that we're looking to to solve some of these, the actual workload to create something like watching a film in realtime seems a long, long way off, if even possible. We inch ever closer, but the closer we get, the more the shortcomings stand out.

I actually think with the advent of LLMs and machine learning we have a shot at reaching photorealism quickly, AI will be the shortcut here.

As we've seen with the videos showing gaming scenes converted with AI to photorealsitic scenes full of life like characters, hair physics, cloth simulation, realistic lighting, shadowing and reflections, we have a glimpse into the future. There are many shortcomings of course, but they will be fixed when the AI is closely integrated into the game engine.

The AI model will have access to 3D data, full world space coordinates, lighting information and various details instead of the 2D video data, this will be enough to boost it's accuracy and minimize the amount of inference it has to do. We will also have faster and smarter models requiring less time to do their thing.

I can see future GPUs having much larger matrix cores, to the point of out numbering the regular FP32 cores, CPUs will also have bigger NPUs to assist, this would be enough to do 720p @60fps rendering, maybe even 1080p30 or 1080p60 if progress allows it.

Next, this will be upscaled, denoised and frame generated into the desired fidelity.

All in all, this path is a much quicker path -at least in theory- than waiting for traditional rendering to be mature and fast enough, which is becoming much harder and requires longer times, we simply lack the transistor budget to scale up the required horse power for traditional rendering to reach photorealism and do so at the previously feasible economic levels.

Even now traditional rendering faces huge challenges, chief among them is the code being limited by the CPU, and the slow progress of CPUs themselves, something has to give to escape these seemingly inescapable hurdles that existed for far too long.

So, playing with the ratios of different portions of these transistors budgets to allow for bigger machine learning portion than the traditional portion would be the smart thing to do, especially when it allows access to entirely new visual capabilities.

Shifty Geezer · Oct 30, 2024

DavidGraham said:
I actually think with the advent of LLMs and machine learning we have a shot at reaching photorealism quickly, AI will be the shortcut here.

I'm not so convinced. It's always the case with prototyping games that you get something fabulous in a weekend, but all the efforts needed to make the polished final takes forever. I think these quick results show promise, but the end result is actually a long way off and the imagined potential isn't within reach. At best, subdividing the game into aspects the ML can solve, like cloth dynamics, might work. I've too much life experience to look at these current results and extrapolate a near-term future of the best we can imagine! The magic bullets never are, and what we always end up with is an awkward compromise of glitchy fudges no matter how much power we throw at it.

davis.anthony · Oct 30, 2024

DavidGraham said:

That video has just shown me that I 100% don't want to play games that look like real life.

Cappuccino · Oct 31, 2024

DavidGraham said:
I actually think with the advent of LLMs and machine learning we have a shot at reaching photorealism quickly, AI will be the shortcut here.

As we've seen with the videos showing gaming scenes converted with AI to photorealsitic scenes full of life like characters, hair physics, cloth simulation, realistic lighting, shadowing and reflections, we have a glimpse into the future. There are many shortcomings of course, but they will be fixed when the AI is closely integrated into the game engine.

The AI model will have acess to 3D data, full world space coordinates, lighting information and various details instead of the 2D video data, this will be enough to boost it's accuracy and minimize the amnout of inference it has to do. We will also have faster and smarter models requiring less time to do their thing.

I can see future GPUs having much larger matrix cores، to the point of out numbering the regular FP32 cores, CPUs will also have bigger NPUs to assist, this would be enough to do 720p @60fps rendering, maybe even 1080p30 or 1080p60 if progress allows it.

Next، this will be upscaled, denoised and frame generated into the desired fidelity.

All in all, this path is a much quicker path -at least in theory- than waiting for traditional rendering to be mature and fast enough, which is becoming much harder and requires longer times, we simply lack the transistor budget to scale up the required horse power for traditional rendering to reach photorealism and do so at the previously feasible economic levels.

Even now traditional rendering faces huge challenges, chief among them is the code being limited by the CPU, and the slow progress of CPUs themselves, something has to give to escape these seemingly inescapable hurdles that existed for far too long.

So, playing with the ratios of different portions of these transistors budgets to allow for bigger machine learning portion than the traditional portion would be the smart thing to do, especially when it allows access to entirely new visual capabilities.

Yeah tbh I really don’t think that video looks good at all. Most of these ‘AI re-imagined’ games look like stylistic messes.

I feel like people forget games are art and therefore you need more than an artificial intelligence coming up with all the artwork. Chasing photorealism at the expense of the art form produces bad results.

Realtime AI/ML game technologies *spawn

raytracingfan

MfA

DavidGraham

DavidGraham

MfA

Shifty Geezer

uber-Troll!

Karamazov

DavidGraham

Startup Tales Wants to Let You Build Your Own Games with Generative AI Beginning Next Month

raytracingfan

Compositional Neural Scene Representations for Shading Inference | NVIDIA Real-Time Graphics Research

Neural Scene Graph Rendering | NVIDIA Real-Time Graphics Research

Shifty Geezer

uber-Troll!

trinibwoy

Meh

Shifty Geezer

uber-Troll!

Kaotik

Drunk Member

Shifty Geezer

uber-Troll!

Seanspeed

Shifty Geezer

uber-Troll!

DavidGraham

Shifty Geezer

uber-Troll!

davis.anthony

Cappuccino

Similar threads