AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
These hints appear to be orthogonal to what would be needed for an atomic operation. Hints don't prevent some inopportune memory traffic pattern from interfering.
I think this comes back to MfA's desire for lockable lines, e.g. wanting to keep atomic variables in-cache. But obviously there's a scalability problem there, too, as the count of concurrent atomics balloons (e.g. millions). Sorting the domain so that all the atomics come packed together in time and space, like Larrabee's tiled forward rendering, works around that scaling problem.

Jawed
 
I'm curious what happens in the case of bin split, where there is some redundant calculation for triangles that cross tile boundaries.
Atomics in that case might make the <5% figure the Larrabee paper stated more noticable.
 
I'm curious what happens in the case of bin split, where there is some redundant calculation for triangles that cross tile boundaries.
Atomics in that case might make the <5% figure the Larrabee paper stated more noticable.
:???: Which atomics?

Jawed
 
As far as standard x86 goes, there is a compare and exchange, and a LOCK prefix that can set various integer operations to be atomic.
What the vector extensions might do, I do not know.

I'm not sure exactly how much work is duplicated with bin spread, and how far in the rendering process this duplication goes.
If a triangle's complement of shaders is processed fully in both tiles, then something might overlap with the work done in other cores.
 
The only atomic effect of a triangle is at the pixel level, which is obviously bounded within a tile. With the triangles sorted by bin (tile), I can't see any atomic interaction between the instances of a given triangle.

Perhaps you're thinking of a scenario in which vertices are shaded twice: first time only executing the path that computes screen-space coordinates, in order to perform binning; second time to run the entire shader. But I still can't think what might be atomic in that context if a triangle was multiply binned.

I suspect the overheads associated with bin spread are to do with the resulting longer triangle lists and the costs of things like (re-)fetching attributes and (re-)rasterisation which all increase the longer the triangle list is. The paper doesn't state what the increase in workload is, only that the triangle list effectively grows by upto ~5%.

Though I do wonder what happens with a G-buffer creation pass where the sheer quantity of render targets will cause a radical reduction in tile size. Couple that with MSAA and you could easily end-up with 16x16 tiles. With a low ALU:TEX it could end-up being tough to hide texturing latency. I can imagine that swamping the bin-spread cost at the triangle level.

Jawed
 
The only atomic effect of a triangle is at the pixel level, which is obviously bounded within a tile. With the triangles sorted by bin (tile), I can't see any atomic interaction between the instances of a given triangle.

Perhaps you're thinking of a scenario in which vertices are shaded twice: first time only executing the path that computes screen-space coordinates, in order to perform binning; second time to run the entire shader. But I still can't think what might be atomic in that context if a triangle was multiply binned.

I was thinking about this in two parts.
The first was that the setup thread that distributes to tiles processes vertex and geometry shaders as well.
I went back to the Larrabee paper, and saw I had forgotten about the front-end geometry processing section that adds a serializing step. This seems particularly useful for geometry shaders--as I have recently run across some slides showing that they have an atomicity to their invocations.

The second part was the possibility that the pixels that are covered by a triangle could have a high probability of referencing common locations atomically due to the coherent nature of belonging to the same triangle, or that they could refer to locations atomically that fall on the same cache line, which incurs the same penalty as if they were contending for the same location.

In the future, as the graphics pipeline becomes freer, programmers may gain further ability to inject explicitely atomic operations into various shaders, or some implicit contention may arise.
 
I was thinking about this in two parts.
The first was that the setup thread that distributes to tiles processes vertex and geometry shaders as well.
I went back to the Larrabee paper, and saw I had forgotten about the front-end geometry processing section that adds a serializing step. This seems particularly useful for geometry shaders--as I have recently run across some slides showing that they have an atomicity to their invocations.
As far as I can tell any ordering that constrains geometry creation/assembly is bounded by binning, so that atomicity is also bounded.

The second part was the possibility that the pixels that are covered by a triangle could have a high probability of referencing common locations atomically due to the coherent nature of belonging to the same triangle, or that they could refer to locations atomically that fall on the same cache line, which incurs the same penalty as if they were contending for the same location.
These are only reads, and regardless of duplicate count of any triangle, any common data fetched during distinct tile processing is, by definition, unchanging.

In the future, as the graphics pipeline becomes freer, programmers may gain further ability to inject explicitely atomic operations into various shaders, or some implicit contention may arise.
Yeah, the obvious case is D3D11's UAV, while it seems read-modify-write of render targets cannot be atomic. There are also implicit atomics such as appending to user-defined buffers.

Jawed
 
I wouldnt put too much stock in an article with nonsense like this:

This might mean that the original RV870 got cancelled and that the new DirectX 11 card will take over, simply as Nvidia probably goes directly to DirectX 11 in Q4, with its GT300.
 
I'm pretty certain ATI will have the first DX11 card out. They have a huge leg up on everyone else, as they've already done most of it (tesselator, etc.).

LRB and GT300 probably arrive about th3e same time.

DK
 
Err, did anyone ever even think that RV870 and GT300 would be something else than DX11?
 
I'm pretty certain ATI will have the first DX11 card out. They have a huge leg up on everyone else, as they've already done most of it (tesselator, etc.).

LRB and GT300 probably arrive about th3e same time.

DK

ATI also had a big head start with unified shader architectures in the form of Xenos and yet it was NV that released the first PC unified shader GPU.
 
ATI also had a big head start with unified shader architectures in the form of Xenos and yet it was NV that released the first PC unified shader GPU.

I think that was down to many other issues, such as 80nm problems. The difference here being that (at least we all think) RV870 is still akin to the small GPU approach of RV770 in which being less complicated than GT300. Factor in they have had a tesselator in their PC class GPU's for two years now and perhaps 40nm experience? I don't know... I don't want to go out believing every rumor on the web, but it's pretty believable to me to assume RV870 will be out before GT300. How long? Who the heck knows.
 
I think that was down to many other issues, such as 80nm problems. The difference here being that (at least we all think) RV870 is still akin to the small GPU approach of RV770 in which being less complicated than GT300. Factor in they have had a tesselator in their PC class GPU's for two years now and perhaps 40nm experience? I don't know... I don't want to go out believing every rumor on the web, but it's pretty believable to me to assume RV870 will be out before GT300. How long? Who the heck knows.

What the item probably means is that AMD has received word that with nv not having this part ready in Q3 means that they don't have to rush RV870 (or fuad's R800 :( ) and might even do a respin with the "current" RV870 as a backup plan. who knows, maybe RV870 is late already and they are taking their time to iron out some issues or get it's (power) characteristics @40nm in check.
 
It would certainly be nice to see ATI having a real (single GPU) performance lead for a change. And with the feature lead to boot it could gain them some nice market share.
 
It would certainly be nice to see ATI having a real (single GPU) performance lead for a change. And with the feature lead to boot it could gain them some nice market share.

I'm thinking that is becoming less and less relevant in AMD's eyes right now. The market imho is becoming more and more feature set and price oriented then pure performance oriented. Which is why I can see the merits in attacking with a powerful midrange GPU for the baseline and creating a sort of halo effect around that. Not to mention the pretty lack lusting PC game market as far as original and demanding titles go, 400mm2+ GPU's are just not making sense anymore. Not to say I support the X2 concept either. I saw AFR first hand with sli, and I can tell it a difference. I just don't see tacked on AFR solutions as completely viable. Some sort of software enhancements and perhaps improvements on a physical level need to be made and hopefully with R800 (they say 3rd time is a charm eh?). These ultra high end solutions are not really needed in games (as far as I can tell, though I suppose that 2% market is important), these solutions are going to be far more important in the GPGPU space more then anything else.
 
I'm thinking that is becoming less and less relevant in AMD's eyes right now. The market imho is becoming more and more feature set and price oriented then pure performance oriented.

The strategy may be sound but their execution leaves something to be desired since they're still losing discrete market share for their efforts. It's a very dangerous game to be playing and one they have survived so far partially because they've aimed lower than Nvidia did in terms of general compute support. They will get creamed if (however unlikely) RV870 can only match GT314 class hardware.
 
The strategy may be sound but their execution leaves something to be desired since they're still losing discrete market share for their efforts. It's a very dangerous game to be playing and one they have survived so far partially because they've aimed lower than Nvidia did in terms of general compute support. They will get creamed if (however unlikely) RV870 can only match GT314 class hardware.

Their execution has been pretty solid relative to NV. The fruits of this concept probably won't be enjoyed until next generation when G92/G94 becomes even less relevant. I can't picture NV keeping this up for too long.. their market share went up, but they still lost money.
 
Their execution has been pretty solid relative to NV. The fruits of this concept probably won't be enjoyed until next generation when G92/G94 becomes even less relevant. I can't picture NV keeping this up for too long.. their market share went up, but they still lost money.

Not sure what metric you can use to come to the conclusion that their execution has been solid. They don't have anything to show for it.

Not really sure what G92/G94 have to do with anything. Are you implying that future Nvidia products will inevitably be less competitive than G92/G94?

Yes, Nvidia lost money but that's not a reflection on anything so simple as manufacturing costs of large die GPUs. There are huge number of other significant factors including the amount of R&D money being pumped into high-margin businesses. Compare the Q1 R&D numbers - Nvidia 211m, AMD 305m. What portion of that 305m would you guess is attributable to GPUs?
 
Back
Top