Remote game services (OnLive, Gaikai, etc.)

If this technology became standard there wouldn´t be a need for XBOX or PS consoles. The software developers could deliver games without caring for the platform, actually they could provide it themselves.

Software developers won't be deploying cloud networks in strategically placed facillities any time soon. The cost to get workable technology and then deploy it on a scale that hits critical mass (high availability in man markets) is going to be in the billions in terms of start up and operating costs. Someone, somewhere needs to (a) pay for the hardware and (b) connect software with people. As for not caring for platform the software needs to be designed with some form of hardware in mind. Even if it is ubiquitous PCs they will need to taget performance envolopes, write compatible code, take into consideration hardware limitations (or hardware budgets). Further, companies like Valve and MS who have large established online networks are going to manage and cultivate those customer relationships. They aren't going to give that away for "free" and will be an edge/compelling selling point in terms of reach.

At some point a model similar to this will be viable, at least for some games. What type of model and marketshare works out and how it changes the market is unknown. But the first round or two of cloud based gaming services that stream gameplay from high end servers won't be pushed out by developers and none of the publishers, sans maybe Blizzard-Activision and Valve, have established online presences to really even dream about going it alone at this point. Now a conglomeration of publishers who wish to bypass both retailers and console manufacturers...
 
It all seems to be about sleight of hand, with Perlmen making up figures to match the illusion. Sleight of hand I have no problem with whatsoever. The figures make him sound like someone telling us what we want to hear - a salesman, effectively.

Bottom line is that we need to have an 'HD' stream coming through the pipe at 5mbps - 625K/s.

So ruling out for a moment the magic beans, what is possible in real life?

A few starters:

1. Drop down from 720p to 576p - bearing in mind compression artefacts and the hit on video quality we aren't going to miss a few pixels. Nobody complained about Call of Duty or Halo outside the hardcore and the fanboys. Fewer pixels means fewer macroblocks.
2. Take a 720p60 source and serve 30fps on the video output, via frame blending
3. Simply output 30 normal frames - 90% of console games are/aspire to this any way
3. Produce an interlaced video output (again 30fps but giving the illusion of 60fps) and have a hell of a good deinterlacer client-side
4. Different mixtures of the above depending on game - something like Silent Hill probably would work entirely to OnLive spec and still look good.

IGN's impressions are interesting, particular Chris Roper who noted that Burnout Paradise is not as smooth as it is on PS3, ergo not 60fps. The bottom line is the OnLive won't be bought by the hardcore, so why target 60fps at all when 30fps can work so much better and actually make the concept viable?

So, assuming the sleight of hand theory, what other bandwidth reducing measures might there be?
 
I take it, that sending partially rendered imaged across wouldn't save them any bandwidth?
 
Even if they really can get the technology working over the Internet, I don't see this succeeding from a market point of view:

a) It is, at the end of the day, a game rental service. Its hook is being able to rent games that normally would require a more powerful machine than you may own.

b) The demand for better graphics isn't all that huge. I mean, everyone likes better graphics and all, but we already know that most people don't like better graphics enough to buy high-end PC cards or forgo access to exclusive console content (whether we are talking about Halo 3 or Wii Fit). So do people like better graphics enough to give up the ability buy games and forgo exclusive content from Microsoft, Nintendo, Sony, Blizzard, and Valve? If not, this isn't going to displace anything...it'll just be a rental service that some people use in addition to their PC/console gaming habits. But that neutralizes its main selling point, which is not needing to buy a more powerful machine every few years.

c) Blizzard & Valve seem to think that the key is selling (as opposed to renting) games that are easy to get into and run on the machines people actually own. They dominate the PC space as a result...since they're not on board with OnLive, they don't seem to think that the future is in renting out high-end games. That's like finding out that Warren Buffet isn't jumping on some investment bandwagon.

d) Production costs for the games OnLive is designed to give access to are so high that companies need to be remunerated to the tune of $significant per user to make money. Continuously upgrading the cloud servers to be able to handle the latest advances in graphics technology (as well as maintaining them) is not going to be of negligible cost. This service is not going to be as cheap as many people think.
 
He has a point though, cause there's a lot of research nowadays on further compressing textures with some jpg-like algorithm that is decompressed on the fly to some cache direct as DXTx before using the texture.

So, imagine this pipeline:

1) stream texture in main memory in compressed form
2) decompress texture to DXTx in video memory (or equivalent)
3) use texture

There are two texture pools here, one for streaming and one for decompression.
This idea also helps a lot on reducing streaming bandwidth, DVD usage, main memory footprint. There are also other advantages, if you imagine that not the entire texture is always used, so the "empty" areas can just be filled with some constant color that further helps compression.

Some of these ideas have been used in Fable 2 for example.

http://forum.beyond3d.com/showpost.php?p=1284194&postcount=347

Does this help Grandmaster?
 
I have quite limited understanding of how complex video encoding algorithms work. So take this as uniformed speculation.

My impression is that the major factor in video quality is the ability of the compression algorithm to predict movement of the frame. Hence not knowing the next frame significantly degrades quality.

Well..
Perhaps the key parts of this 'interactive algorithm' are not part of the video encoding at all?
Perhaps, for example, they have an intelligent system that tracks draw calls between frames. It wouldn't be too hard to match up geometry from one frame to another, and then - in theory - determine the change in transform on that draw call.
I'd imagine that for the majority of games such a conceptually simple system would produce quite good results. The problems would occur with complex shading (such as the bioshock water effects mentioned in the gizmodo article).

If you can leave the motion prediction to a system that, for the most part, is 100% accurate (for flat surfaces), then how complex does the encoding become?

I have no idea, but I'd be interested if anyone else has any insight.
 
Thats really interesting Graham, great point. But I think its time for you to go to bed mister, its 2.27am depending on whether or not you believe daylight saving to be over or not.
 
Oh, I think their technology absolutely must leverage access to information early in the rendering process. I think that's why they aren't doing what everybody says they should do and just make money on their encode tech for conventional uses like video teleconferencing. Aside from the fact that no one is really clamoring for a better solution in those markets, especially not one that requires custom hardware, their technology may just not be applicable to regular video feeds.
 
... My impression is that the major factor in video quality is the ability of the compression algorithm to predict movement of the frame....

Well..
Perhaps the key parts of this 'interactive algorithm' are not part of the video encoding at all?
Perhaps, for example, they have an intelligent system that tracks draw calls between frames. It wouldn't be too hard to match up geometry from one frame to another, and then - in theory - determine the change in transform on that draw call....

If you can leave the motion prediction to a system that, for the most part, is 100% accurate (for flat surfaces), then how complex does the encoding become?

I have no idea, but I'd be interested if anyone else has any insight.

Draw calls are somewhat predictable but their granularity is too high and the information they would be able to provide the encoder is insufficient.

A draw call doesn't tell the encoder much about what a pixel will end up being on screen.

Can you say what this would produce on screen much less a single pixel:
glDrawElements(GL_TRIANGLES,oglVboMetaData->_indexCount,GL_UNSIGNED_INT,0);

Neither could the encoding algorithm until this call finished.

You are suggesting that the encoding algorithm should predict the output of "drawing calls/shaders/custom rendering algorithms."

That isn't possible given the information that would be provided to the algorithm and if the information were available it would have to basically render the scene and then "guess" what gets drawn next at the "pixel" level. If such an all encompassing algorithm exists I would say...OnLive was the "wrong" idea to try and make money with it.

Video encoders does not have to run the rendering system in order to work effectively and this is a great saving on both speed and complexity.
 
Last edited by a moderator:
Having just the geometry and trying to guess the next frame from that, doesn't it mean that rather than trying to solve a segmentation problem in 2 dimensions you're having to do it in three dimensions (or do we expect the geometry to be marked with information to make sense of it)? And I remember segmentation being a really, really, really hard problem.
 
Last edited by a moderator:
Even if they really can get the technology working over the Internet, I don't see this succeeding from a market point of view

I have not followed OnLive discussion diligently. But in general, I think the business folks will start with some low hanging beachhead first. For example, they may try to attack the better connected, high speed Korean networks initially. Or they may try to zoom into hotel video + game on demand services.

If they do it that way, they may have slightly more time to perfect the technology in the wild and make some money in the mean time. Basically, prove the business model.


Sony's PS Cloud will have to follow similar route. If they try to launch the service on the net for all immediately, the risk is too high (because the experience is too hard to control *and* the investment is too big).
 
Oh, I think their technology absolutely must leverage access to information early in the rendering process. I think that's why they aren't doing what everybody says they should do and just make money on their encode tech for conventional uses like video teleconferencing. Aside from the fact that no one is really clamoring for a better solution in those markets, especially not one that requires custom hardware, their technology may just not be applicable to regular video feeds.

Actually that's exactly what they want to do. Apparently the thinking is that video games is the most difficult type of mainstream video to encode (which is true) so if they can crack that, they can crack anything. Teleconferencing, streaming movies, they want it all. Basically they want their client on everyone's PC/Mac and they want their box in everyone's home and OnLive is seen as the vehicle to do it.
 
Actually that's exactly what they want to do. Apparently the thinking is that video games is the most difficult type of mainstream video to encode (which is true) so if they can crack that, they can crack anything. Teleconferencing, streaming movies, they want it all.
In which case they can't be using backbuffer information to aid compression. It has to be based solely on 2D image data from one frame to the next.
 
Well let's not rule it out because hiring of expensive apps is also part of the game, but I also imagine that would be 2D based. It may well be that multiple codecs are in play, depending on content. Apparently HP's Remote Graphics software does the same (described to me by one major industry player as the best thing they've ever seen).
 
The backbuffer doesn't store enough information for the encoding algorithm.

The state of the backbuffer is much too volatile in that in many cases information is written to it only to be overwritten to in the very next instance. The encoder would have to be incredibly intelligent to figure out which pixels are still in flight, which are finalized, which will not change in the next frame, and how the rest *will* change in the next frame....without access to engine/game code on the whole.

If they could predict changes to the backbuffer itself then they will have solved an immensely complex task. However, what they would have is an engine technology (an amazing one at that) that wouldn't apply to video encoding as a whole.

-------------------------------------------------------------------------------------

If their encoder is as proficient as claimed they could literally wipe the floor with the HD broadcast market...or any other market which depends heavily on video encoding. Given the same luxury of compressing data offline like everyone else they should be able to produce unmatched quality across the board.

Why would they target gaming alone when they could have it all?
 
Last edited by a moderator:
The point of the Eurogamer feature was to attempt to understand how they might be doing it, and to address head-on the technical issues in a way that everyone else was skirting around for some reason. Perlmen asked for sceptism. In fact he demanded it, so he got it.

I think the Eurogamer article has a lot of merit, but that it did IMO stray at a couple of points away from "balanced" conjecture (if there is such a thing) and into the territory of passing judgement when not enough was really known and understood.

Talking about injunctions against the BBC (obonicus was exaggerating I know) seemed a bit much considering that none of us really understand exactly how they are intending to achieve the things they hope to, but all of us (including me) are making comments and assertions.

I'm more interested in how many CPUs and GPUs are being used per instance and also how the bandwidth ceiling inherent in PC architecture is being managed in order to run 10 game instances simultaneously. I am assuming that the current APIs and the use of video RAM wouldn't take kindly to constantly switching between different game instances. Multiple micro-systems on a single motherboard? Surely one GPU per instance?

I'm assuming that there will be a number of factors determining which combinations of software get run where. For every person currently out there playing Crysis, there are several people playing (for example) Guild Wars and The Sims. Some of these games may even run several to a core on something like a fast Core i7.

As you say, graphics will be a big issue but I think they will still be able to run at greater than 1 client per graphics card. I can almost run Portal at playable speeds at 720 x 480 on a laptop with TWO (count 'em) pixel shaders and no dedicated video ram. I imagine a 4870 or 280 with 1GB of video ram would have no trouble running 4 (or maybe even 8) instances of the game at acceptable quality at 640 x 480. And three 4870 X2's per system is possible, if a little power hungry.

I'm guessing they have a number of algorithms that balance things like CPU load, GPU load (including balancing high and low resolution rendering), memory footprint, memory bandwidth, and maximum game instances that the video encoder card can handle, to find the optimal load per server.
 
Draw calls are somewhat predictable but their granularity is too high and the information they would be able to provide the encoder is insufficient.

A draw call doesn't tell the encoder much about what a pixel will end up being on screen.

Can you say what this would produce on screen much less a single pixel:
glDrawElements(GL_TRIANGLES,oglVboMetaData->_indexCount,GL_UNSIGNED_INT,0);

Neither could the encoding algorithm until this call finished.

You are suggesting that the encoding algorithm should predict the output of "drawing calls/shaders/custom rendering algorithms."

That isn't possible given the information that would be provided to the algorithm and if the information were available it would have to basically render the scene and then "guess" what gets drawn next at the "pixel" level. If such an all encompassing algorithm exists I would say...OnLive was the "wrong" idea to try and make money with it.

Video encoders does not have to run the rendering system in order to work effectively and this is a great saving on both speed and complexity.

Yes. Basically.
What I was suggesting, is really quite simple conceptually. It certainly wouldn't be simple to implement - but it's quite within the realm of possibility.
It's also probably overkill, and I'm sure there are much simpler alternatives (one of which I'll get to in a moment...)

I think, if you were smart about it, you could monitor the API calls made by an application, and using them, you could match up draw calls between frames (so, match this draw call because it is using these buffers, these textures, etc). I'd bet you could pick up 90% of them fairly reliably.
Given this, it wouldn't be too much of a leap to implement a system that pulls the shaders apart and works out how the geometry is transformed. And from this, determine how the transform has changed between frames.
Then it's a 'simple' matter of rendering motion vectors to another render target using the same geometry.
(I know it'd be far more complex than this :yes:)

It's not an easy solution. But in theory, it's a solution that is possible and would provide very accurate motion information to a video encoder. (Assuming this is even the problem, as I say, I'm not an expert)

But the much simpler answer is that the developers need to implement this to support onlive. Cysis already renders motion vectors after all.

Well. That's my thoughts on the matter anyway.
 
Talking about injunctions against the BBC (obonicus was exaggerating I know) seemed a bit much considering that none of us really understand exactly how they are intending to achieve the things they hope to, but all of us (including me) are making comments and assertions.

But that's not really the problem, the problem was the BBC just letting Perlman talk and at no time going 'whoa, hold on, what you said is pretty farfetched, I'll need to elaborate a bit'. As a result, it looked like Perlman's own blog. And it's not the BBC's fault, everyone does this. Almost all of the gaming press is going 'well, we don't know, why not give him the benefit of the doubt?' which is... well, it's how the gaming press behaves, and that's generally not good.

Being an optimist is all well and good, but there are reasons for folks to doubt this and the Onlive people have done nothing but wave their hands and say 'nah, none of those things is an issue'.
 
Yes. Basically.
What I was suggesting, is really quite simple conceptually. It certainly wouldn't be simple to implement - but it's quite within the realm of possibility.
It's also probably overkill, and I'm sure there are much simpler alternatives (one of which I'll get to in a moment...)

I think, if you were smart about it, you could monitor the API calls made by an application, and using them, you could match up draw calls between frames (so, match this draw call because it is using these buffers, these textures, etc). I'd bet you could pick up 90% of them fairly reliably.
Given this, it wouldn't be too much of a leap to implement a system that pulls the shaders apart and works out how the geometry is transformed. And from this, determine how the transform has changed between frames.
Then it's a 'simple' matter of rendering motion vectors to another render target using the same geometry.
(I know it'd be far more complex than this :yes:)

It's not an easy solution. But in theory, it's a solution that is possible and would provide very accurate motion information to a video encoder. (Assuming this is even the problem, as I say, I'm not an expert)

But the much simpler answer is that the developers need to implement this to support onlive. Cysis already renders motion vectors after all.

Well. That's my thoughts on the matter anyway.

Accurate motion prediction is very important to video encoding. There are posts in this thread that validate your assumption.

---------------------------------------------------

Geometry transformations only begin to define what your pixels will ultimately look like.

You have transformations, shadings, lightings, cullings, sortings, occlusions, blendings etc to account for...which are affected by asset management systems connected to the scene graph like LOD etc...which is affected ultimately by user input.

Having the API calls isn't enough. Each call has a parameter list which affects the output of the call. You must also predict what the paremeters to each call would be, and what the expected output would be in "conglemerate" of all calls that affect a pixel before you could accurately predict what the state of a pixel will be "next."

You have to predict both what the rendering system(and systems that affect it) will do and what its output will be before it does it before you can use that information to predict what will happen with pixels.

The API calls and positional data of geometry is not enough information to feed to the encoding algorithm. You cannot know the output of shaders/culling,blending/etc. before you run them or there would be no point in running them in the first place.

OR

You could take what happened with pixels and guess what they will do next based upon a cache of the states of previously rendered pixels.
 
Last edited by a moderator:
Maybe another way of working back from this is to ask what you can achieve with a 2w, $20 ASIC or FPGA encoder chip. Similarly, what we should be expecting client-side in terms of decoding from a 1MB piece of client code or ultra-cheap 'micro console'. I'd suggest that the far-out compression schemes aren't going to work on such menial hardware.

Let's not forget that the encoder has to encode in 1ms (well let's be generous and say it's actually 16ms), while client-side it has to decode a frame in 16ms or 32ms if it is 30fps as I suspect it is.
 
Back
Top