Realtime AI content generation *spawn

The voice is terrible, 0 stars, the responses aspire to the heights of chatgpt 2, no serious gamedev would use this with a straight face. It's a great example of Nvidia experiencing too much success and starting to throw money into a dumpster fire just like all tech companies that get big enough eventually start doing. (That being said I wouldn't be surprised if Ubisoft makes an "experimental" game with this, because they'll throw money at literally anything new).
For non important/secondary NPCs (pedestrians/common enemies) in open world games, this tech is miles ahead of any voice over trash we have in current open world games, right now the best you can hope for talking to a pedestrian is a couple of soulless voiced over lines that repeat forever. With this tech, you will have vastly more immersive worlds that are more dynamic and responsive. It's still early days for the tech too (it's a demo after all), it's going to vastly improve with time, just like ChatGPT.
 
For non important/secondary NPCs (pedestrians/common enemies) in open world games, this tech is miles ahead of any voice over trash we have in current open world games, right now the best you can hope for talking to a pedestrian is a couple of soulless voiced over lines that repeat forever. With this tech, you will have vastly more immersive worlds that are more dynamic and responsive. It's still early days for the tech too (it's a demo after all), it's going to vastly improve with time, just like ChatGPT.

That doesn't matter, they sound like trash and have nothing to say. Just imagine playing God of War 3 and after some AAA super highly polished cinematic delivered by pro voice actors some random NPC starts spewing garbage at you in a robo voice, you'd assume there's some bizarre bug.

Besides, writing isn't the bottleneck for lines of dialogue in a game currently, older just text based games with a twentieth the budget had vastly more writing that today's games, because the bottleneck is voice acting. Which with offline tools you can do pretty well, Luke Skywalker has had an AI voice in a multi million dollar streaming show already. Some modder managed this by themselves to decent (and vastly better than this Nvidia demo) results; just give AAA budgeted games the same tools and watch your exact dream scenario come to life, but with well written lines and solid voice acting instead of stuttery robo garbage:

 
This is a hilariously bad demo, the kind of thing that never comes out and went out of style as a tech demo over a decade ago after even the common observers started to twig to this stuff not coming out anytime soon. The voice is terrible, 0 stars, the responses aspire to the heights of chatgpt 2, no serious gamedev would use this with a straight face.
I agree, this surely has to be staged or is a joke?
 
I agree, this surely has to be staged or is a joke?
Why would it be staged? All of that is doable, speech-to-text from human input (the lady playing the demo), text-generation with an LLM using her input and then speech synthesis based on the LLM output.

Maybe dual GPU or running the AI models in cloud? I'd guess otherwise you'd see some severe fps dips when GPU is doing AI inferencing.

I think it's a neat tech demo as stuff like this is definitely quite new and different. Not everyone's cup of tea, I'm sure.
 
Why would it be staged? All of that is doable, speech-to-text from human input (the lady playing the demo), text-generation with an LLM using her input and then speech synthesis based on the LLM output.

Maybe dual GPU or running the AI models in cloud? I'd guess otherwise you'd see some severe fps dips when GPU is doing AI inferencing.

I think it's a neat tech demo as stuff like this is definitely quite new and different. Not everyone's cup of tea, I'm sure.

I mean, it's definitely not a joke. Someone worked hard on it, from a purely "this is a tech demo" standpoint it's neat. My only criticism is that they are somehow trying to sell this as a thing developers could use in major games today. Like with all tech demos things tend to take quite a while going from "tech demo" to "something you'd actually want to use in a major product"
 
Just imagine playing God of War 3 and after some AAA super highly polished cinematic delivered by pro voice actors some random NPC starts spewing garbage at you in a robo voice, you'd assume there's some bizarre bug.
Well, the tools are in early stages, later they will add emotions to the AI generated lines, some AI tools can modify songs by imitating famous singers, some AI models are very good as narrators and are used widely in TikTok and facebook reels. Movies and ads are already using AI generated voices with emotions. Things will improve, the demo is just a quick proof of concept.

writing isn't the bottleneck for lines of dialogue in a game currently
Unfortunately, current games are limited by both writing and voice overs. The writing of current games is generally weak even in short games, as studios are dedicating smaller and smaller budgets to writing, amidst the ballooning costs for making games in general.
 
Next gen cpu will have about 45TOPS available for stuff, do you think that it will be enough for this kind of interaction? And if not, what's necessary to play untethered from the cloud?
 
The early days of OpenAI still has too much weight and they are too coy about what they are doing now.

All the x-shot shit is almost certainly a lark at this point, modern chatGPT is more likely a triumph of good old annoted data, many many thousands of man hours of annotation. This is where the future will need to be for roleplaying bots, someone will have to pony up the money for massive amounts of annotated and specially crafted roleplaying data. (ie. dialogue, but including an annotation for context of the scene and character descriptions for all the participants.)
 

There have been many famous internet cats, but it's possible no internet cat has made more people cry than Chubby. Depictions of Chubby vary, but he is always rotund, ginger and AI-generated. He is almost always involved in a deeply sad or peculiar situation. And he has baffled, outraged and won over millions of people.

Content creators on TikTok and YouTube Shorts tell stories about Chubby and his family in wordless slideshows of AI-generated pictures. A recent video by the TikTok account @mpminds opens with Chubby and his child, Chubby Jr, dressed in tatters. Chubby holds a cardboard sign that reads "Will Purr Fro Eood" (AI image generators can churn out impressive graphics, but they're notoriously bad at rendering text). In the next images we see Chubby shoplifting from a grocery store, getting arrested by the police and leaving a distraught Chubby Jr to an uncertain fate. The last image shows Chubby behind prison bars, dreaming wistfully of his son. The video has over 50 million views and 68,000 comments, written in several different languages.

 
That might be awhile still. Those "fake" movie videos on youtube have very little actual animation/motion for the objects in each scene much less interaction between them.

I wonder if an interesting first step would be some sort of hybrid with the "background" ML generated while only closer space around the player/camera is actually rendered and calculated for. This way you'd have basically no penalty for "infinite" view distance and far details compared to rendering.
 
I think the path to AI rendering will be through improvement of the existing AI upscalers/interpolators. Future upscalers will be trained on geometry, animations, and textures, so they can upscale a scene rendered at low resolution with lower-polygon models and lower texture resolutions to one that appears higher resolution with high-detail models and high resolution textures. Since they would recognize animations, ghosting will be reduced and interpolation will be more accurate. They will also be fed with more input data such as the position, orientation, velocity, and animation state of game objects and the camera, or the ray tracing BVH. Per-game training might return.
 
Artists would have almost no control over the content that way. By the time you give an artist any real control over say a character, all a neural model can realistically do is be a compressed geometry format (with prefiltered hierarchy). The upscaler might not need to sample it every frame for every intersecting ray, but eventually all detail must come from a detailed description.

A page full of descriptive text is not a very efficient description to sample from, even if it might be useful as an intermediate to reduce the need for classically trained 3D artists eventually. Not in the near future, the models have little idea of the real composition of the world, you can always find some edge cases not captured in the training set where that becomes obvious. By the time they can model the world as well or better than us, we won't need humans any more.
 
Last edited:
Not any time soon unless there is some AGI breakthrough, but if that happens all bets are off.

An animatable 3D model is far more difficult than image/video and video ain't all that good to begin with. The "pirate moar data" approach doesn't work very well for 3D, it's far harder to get good annotated data. We will need systems which can learn from relatively little data like humans ... and that brings us to AGI and the irrelevance of humans.
 
Not any time soon unless there is some AGI breakthrough, but if that happens all bets are off.

An animatable 3D model is far more difficult than image/video and video ain't all that good to begin with.
I didn't even think about animation o_O.

LLM can spew text based on training data and gait recognition is a thing.
I mean NN can learn this thing , then what would be the challenges in making it apply on a 3d model (if we limit the character models to humans).
 
Why not produce training data? You could have artists create poses and animations in an existing system like Metahuman. We have loads of human proxies such as Poser that have established the constraints for realistic people and allow the automatic generation of people for movies etc. Set up an automated systems that tween between parameters, produces content, and then trains on that.

I think realistic people an easy solve. Once you want to diverge from that, it gets harder.
 
Back
Top