yea if the goal is really to make ML/AI fun; we can introduce several policies to come up to make a decision that drives behaviour. So we can train it based a million trained games ; we can have it examine the environment and make basic AI goals, and we can have it do learned behaviour from the player. And then having the 3 Policies come up with their respective next step and then have RL AI choose the next best step.This is why I always throw in the cautionary tale of the unpredictability of machine learning. Once you open that box belonging to Pandora, its difficult to close. There are things that ML could be really suited for, like multiplayer bots when there aren't enough people but this would be massively risky having bots learn from player behaviour because it can be manipulated/influenced.
it is of course; insanely taxing.
You’d be one hell of an engineer if you could get this to go well in real time with a small footprint in memory.