GPU vs CPU Architecture Evolution

I assume they modified CryEngine 2 to get that working.
Maybe...

But there's still all the other points.

BTW, the fact that Z related optimization buffer can be stream in an out of memory, doesn't mean that the internal buffer isn't a fixed sized resource. It's probably too implemented as a cache, with all the trashing problems that go with it.
 
Interesting. How many bits do you need per pixel? 1 bit per 4 pixels for Z compression? That would work out to be 60KB for a 1600x1200 render target. Not the end of the world in terms of additional bandwidth.

Not sure exactly how many bits are required, but it's not many. It's on a per-tile basis anyway and the stored depth value is a conservative low-res, and then you need a bit for whether the tile is compressed and there might be a bit or two for other stuff too. It's going to be neglectible on bandwidth in any case.
 
Just to be 100% clear: my whole point is around the fine grained context switching thing. If those guys are running Crysis at such low resolution, quality settings and frame rate that the GPU is completely idle 90% of the time then, yes, of course it's possible to run multiple instances. At that point, the whole thing becomes a software question and I basically lose interest. ;)

But the claim that you can increase the overall performance of an already busy GPU by a factor of 2 or 3, let alone 10, by fast context switching is, IMHO, "absurd".
 
But the claim that you can increase the overall performance of an already busy GPU by a factor of 2 or 3, let alone 10, by fast context switching is, IMHO, "absurd".
I've heard stories of developers who wrote stress tests that made their GPU run far hotter that during a typical game. Personally I also experience that some games make my graphics card's fan rev up audibly while others keep it running below ambient noise...

I know, this isn't really exact science, but it doesn't sound all that absurd to me that today's complex games don't fully utilize the GPU. Shaders can have very different characteristics and there are many dependencies (both logical and physical) that prevent it from always keeping every resource busy.

Anyway, it looks like OTOY is working on a site and so they're probably preparing to announce some product and maybe will reveal some of the technology...
 
I know, this isn't really exact science, but it doesn't sound all that absurd to me that today's complex games don't fully utilize the GPU.
I'm not refuting any of that. If you have a shader that does nothing but calculations while hardly using textures, yes, TEX and ROP will be underused.

But it is a whole other thing to say that current hardware has the means to solve this by running multiple instances at the same time.
 
From above linked article, RV770 = Holy crap on both axes :oops:
compute-efficiency-1.png

Interesting that G200 comes in the middle of the others on both axes, I'd have expected it to be an outlier in at least one.
 
Well, as David Kanter said himself, referring to the DP implementation in GT200: "I tried not to be too critical in that article, but it's pretty much crap."

Obviously, on an SP chart, the results would be quite different.
 
From above linked article, RV770 = Holy crap on both axes :oops:
compute-efficiency-1.png

Interesting that G200 comes in the middle of the others on both axes, I'd have expected it to be an outlier in at least one.

I'd like to see where does rv870 lies, in sp. :LOL:

And yeah, expect to see charts like these from now on in many of amd's pr material to bust the cuda balloon.
 
I'd like to see where does rv870 lies, in sp. :LOL:

And yeah, expect to see charts like these from now on in many of amd's pr material to bust the cuda balloon.

Actually this chart is a bit misleading because not all DP are equal though. For example, DP in most CPU are IEEE 754 compliant with all fancy things including exceptions, signal NaN, denormalized numbers, etc. which cost quite a bit to implement well. On the other hand, GPU and CELL have various level of support for 754 compliance. So it's no wonder they can have better DP efficiency.
 
Larrabee, if it were released in a form similar to the die shots a while back, would interestingly be in the same general vicinity as RV870 for DP (assuming 2 GHz, it's 2 TF SP and 1 TF DP), since it was estimated as being over 600 mm2.

The per watt numbers would be interesting to see.
 
Back
Top