AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
I've considered precisely the same thing before, but what kind of latency is there for GPU readback? Can it be done asynchronously like HDD access or will it stall the CPU during this time?
If you are clever about it it's going to cost you nothing, you just need as many 1x1 exposure textures as many frames in flight (across all your processors) you might have.
Typically 3 or 4 textures on PC (while on PS3 I have done it with just one, but that's another story..) so that you can lock one of them without running the risk of stalling the CPU.
A few frames latency, in this particular case, is not really an issue.
 
If you are clever about it it's going to cost you nothing, you just need as many 1x1 exposure textures as many frames in flight (across all your processors) you might have.
Typically 3 or 4 textures on PC (while on PS3 I have done it with just one, but that's another story..) so that you can lock one of them without running the risk of stalling the CPU.
A frames latency, in this particular case, is not really an issue.
I understand that, but I was wondering more about the time between locking and reading the value, even when the resource you're locking isn't being used by anything. I assume this won't be as fast as a simple memory access.

BTW, what are your thoughts on doing HDR this way? It's a moving window of range that itself is "only", say, 8000:1, but can be anywhere on the luminance scale you want it to be. I discussed this with you in some thread way back, but IIRC you weren't convinced at the time.
 
I understand that, but I was wondering more about the time between locking and reading the value, even when the resource you're locking hasn't been used. I assume this won't be as fast as a simple memory access.
Driver overhead aside I don't see why reading back 4 bytes should be that slow, but maybe console development spoiled me :)

BTW, what are your thoughts on doing HDR this way? It's a moving window of range that itself is "only", say, 8000:1, but can be anywhere on the luminance scale you want it to be. I discussed this with you in some thread, but IIRC you weren't convinced at the time.
IIRC I wasn't convinced on the approach Valve used for their HDR (I mean the way they compute or better determine exposure), but deferring exposure usage is fine, in fact Heavenly Sword does that (mostly to speed up tone mapping as it enabled me to remove a milion tex2D() per frame with a simple scalar constant )
 
Good example, but this can be fixed quite easily as you don't really need the GPU to readback that value.
Let the CPU do it (in the following frame(s)) and send it back to the GPU(s) as a pixel shader constant. No sync points between GPU(s) and no need to sample exposure on a per pixel basis anymore while tone mapping. Double win :)

Sure. And if you don't read back it's always an option to sample in the vertex shader and pass as an interpolator, which takes away the sampling cost from the pixel shader. But the problem is of course, how much time are developers going to spend a whole lot of time optimizing for CrossFire/SLI. About as much as optimizing for S3 cards I suppose since I guess it represents about the same size of the market.
And this example is of course one of the simpler to find a solution for, in other cases it may not be reasonable to let it lag a few frames. Not to mention all the cases where perfectly reasonable optimizations that work well on single-GPU setups has problems on multi-GPU. For instance in your average racing game, you may want to update a cubemap for the car reflections, and to speed things up you just update one face each frame. That reduces the overhead to a fraction of the cost of updating all faces, but for multi-GPU it introduces a sync point. If the update is relatively early in the frame it may not have to be a problem, but if it's late the GPUs may end up being idle for much of the frame.
 
Ailuros,

What exactly is your point? That you think r8xx could double the FLOP performance over r7xx?

Honestly why not? If it doesn't double then a healthy increase. What's the theoretical floating point throughput difference between RV670 and RV770? Besides that pure theoretical factor the RV770 isn't of course by as many times faster than the RV670.
 
Driver overhead aside I don't see why reading back 4 bytes should be that slow, but maybe console development spoiled me :)

IIRC I wasn't convinced on the approach Valve used for their HDR (I mean the way they compute or better determine exposure), but deferring exposure usage is fine, in fact Heavenly Sword does that (mostly to speed up tone mapping as it enabled me to remove a milion tex2D() per frame with a simple scalar constant )

On our last engine I also just locked the 1x1 exposure map, and used that value as a shader constant later on. It's very fast on PC also, if you have multiple 1x1 textures to prevent lock stalling. Only the last texture is really needed, and 1x1 textures are very small (the memory overhead is really nothing).
 
On our last engine I also just locked the 1x1 exposure map, and used that value as a shader constant later on. It's very fast on PC also, if you have multiple 1x1 textures to prevent lock stalling. Only the last texture is really needed, and 1x1 textures are very small (the memory overhead is really nothing).
I'm glad to hear that works well on PC too, I thought multiple 1x1 textures would have been able to avoid lock stalling, but I never tried to implement it for real.
On PS3 I just used a single 1x1 rendertarget stored in XDRAM, RSX renders direcly to it and CELL reads it back and 'cache it' right after a flip().
 
It was? The 2 chips that got to silicon, Pyramid3D and Axe, were at least both single chips

Glaze3D designs were a long time after Pyramid 3D: this was the "Extreme Bandwidth Architecture" part, with embedded-DRAM. I definitely recall discussions about how the multi-chip versions would divide the screen up into tiles, and that this choice was made because it would thrash the texture caches less than (say) the Voodoo 2 SLI approach of rendering alternate horizontal scan-lines.

My memory is a little hazy but I think they may have talked about a 4-chip version of this, as well as 2-chip.


timeframes

Pyramid3D developed around 1995-1996, announced in early 1996 (thus mid 90s). There were at least 2 different chips, one with on-chip geometry engine/T&L, and one without that was just a rasterizer. Should've been released in late 1996 or early 1997.

Glaze 3D (which I guess never taped out) was announced in 1998, due to be released in 1999
(thus late 90s)

XBA - Extreme Bandwidth Architecture was some evolution of Glaze 3D with eDRAM, or was an implementation of Glaze 3D, this was announced in 2000.

Axe and Hammer - Were early this decade 2000,2001
evolutions or implementations of XBA that were DX8 and DX9 respectively.
 
Last edited by a moderator:
Hey just for a laugh, since were talking about old school multi GPU configurations. I thought I would get my 3Dlabs Oxygen GMX2000 card from the loft (Attic)

I bought it in 1999 for £1200 ($2400) in todays exchange rate.

It had 96 MB of Memory.
3 fans. God knows how many different chips.
And it was a beast.

http://www.web2suite.com/temp/gmx2000.jpg

I used it for visualisation of 3D work in softimage 3.8 and XSI v1 (beta)

Crazy how far things have moved on.
 
Yeah, It wouldnt fit in my respectable sized case until I took a small plastic handle off the end. And then it fit with about 5 mm to spare at end of case!

Crrrr AZY.
 
ATI working on a DirectX 11 card

ati.gif


Comes next year
http://www.fudzilla.com/index.php?option=com_content&task=view&id=8696&Itemid=1

After the success of its RV770 and refreshed DirectX 10.1 cards ATI wants to continue its winning streak with a DirectX 11 card.

We still don’t know the codename but let's assume that next generation performance / mainstream chip is codenamed RV870. This chip will likely have DirectX 11 support at least this is what our sources believe at this time.

There is an important indication that RV870 might launch before DirectX 11 becomes available but this can only be good thing for ATI but if we were betting people we would suggest that such a chip should be available a year from now, at late Q2 or early Q3 2008.

DX11 support in RV870!
 
There is an important indication that RV870 might launch before DirectX 11 becomes available

Reminds me R300.... No DX9 API was available when Radeon 9700Pro launch Aug 2002.
 
I guess it's not to surprising. I'm not sure on all the features being added in DX11 but doesn't the current hardware support most of them? I wouldn't think very many hardware changes would be necessary.
 
I guess it's not to surprising. I'm not sure on all the features being added in DX11 but doesn't the current hardware support most of them? I wouldn't think very many hardware changes would be necessary.

AFAIK DX10 and DX10.1 cards can support DX11, it just won't support all features I guess since certain hardware changes must be made.

That said, the ATI cards seem to support nearly all of the so-far released DX11 list... the tesellation unit, shader computations, etc. so I'm not sure what the big differences will be.
 
AFAIK DX10 and DX10.1 cards can support DX11, it just won't support all features I guess since certain hardware changes must be made.

That said, the ATI cards seem to support nearly all of the so-far released DX11 list... the tesellation unit, shader computations, etc. so I'm not sure what the big differences will be.

The "DX10 & 10.1" cards can support DX11 is most likely just the very same that was true before, as long as you had drivers, even a Voodoo3 "supported" DX9 - it could run with it, sure it didn't sport even DX7 featureset but the drivers were still compatible (3rd party drivers, that is)

"compute shaders" will most likely be available for 10/10.1 hardware, too, though, and it's possible that tesselation unit on HD3/4k can be used too, but that's about it, IMO.
 
That said, the ATI cards seem to support nearly all of the so-far released DX11 list... the tesellation unit, shader computations, etc. so I'm not sure what the big differences will be.
AFAIK RV670 tessellation unit doesn't match DX11 style tessellation pipeline. Same story regarding AMD ALUs suddenly being able to support SM5.0, I find it quite unlikely.
 
AFAIK RV670 tessellation unit doesn't match DX11 style tessellation pipeline. Same story regarding AMD ALUs suddenly being able to support SM5.0, I find it quite unlikely.

I have no real idea, but I remember reading something that the tesselation unit on RV670 was a bit different from the R600 one, and RV770 would be identical to that of RV670. The only real reasons I can come up with the initial modifications would be either to comply with DX11s tesselation or space savings.
 
NV30 supported DX9 SM2, but it was terribly slow. I would assume it requires hardware modification for properly running DX11 SM5 code path - otherwise you have a problem.
 
NV30 supported DX9 SM2, but it was terribly slow. I would assume it requires hardware modification for properly running DX11 SM5 code path - otherwise you have a problem.

I don't think anyone really thinks the cards would be capable of SM5.0, but Tesselation & Compute Shaders could be separated from SM5.0 (as in, don't require SM5.0 supporting GPU, just SM4.0 for compute shaders & tesselation unit for tesselation, think of geometry instancing, "sm3.0 card" classed feature which could be supported by ATI SM2.0 GPUs aswell, now just remove the need for "hacks" to use the feature on older cards and you're good to go)
 
Back
Top