AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
Well, I personally don't see them having to make any greater changes to the architecture than they have already done to get feature parity with Fermi (obviously their granularity with scalar workloads sucks by comparison, but if we assume their density advantage derives from it it's simply a valid design choice and not something to be fixed). So by that line of reasoning Jawed they should be able to keep going at the same tempo for a while yet.
 
Why would nVidia not have a counterpart at the same time? :rolleyes:

Why would they, given that they've not got the first Fermi out yet? :rolleyes: It's been six months of ATI DX11 next-gen cards, and there's been no counterpart from Nvidia yet. There's no reason to think Nvidia is going to magically have a new generation out in six months when they can't even get the current one out, and what they do push out in the next couple of months is not likely to be the full speed/full SP version.

Current speculation is that we won't even see the top end Fermi part, or the mainstream part in any quantities until Q3. Given that Northern Islands is due for Q3, that means that NI could be coming out just when Nvidia are getting 512 SP or mainstream cut-down Fermi out in quantity.

The alternative that you're suggesting is that Nvidia throw away most of the lifetime of the first Fermi product and pull something else out of the hat, which doesn't seem likely given the investment they've made.
 
Well, I personally don't see them having to make any greater changes to the architecture than they have already done to get feature parity with Fermi (obviously their granularity with scalar workloads sucks by comparison, but if we assume their density advantage derives from it it's simply a valid design choice and not something to be fixed). So by that line of reasoning Jawed they should be able to keep going at the same tempo for a while yet.

I don't think so. As has been pointed out earlier, bandwidth will be increasing at a far slower rate than ALU's. While VLIW may save your bacon for ALU's, to achieve higher bandwidth efficiency a better cache hierarchy is needed. GF100 is >1Tflop slower than Cypress. If GF100 wins against Cypress, throwing more math at it (in NI) won't help.

Fermi does away with inter stage FiFO's in favor of gp-caches to save bandwidth. I feel that we will see some big changes in cache hierarchy in NI to cancel that advantage.

As for me, I'll be pretty happy if they make it a TBDR. ;)
 
I don't believe feature parity equals performance parity - with the proviso that feature means "function" rather than "details of architectural implementation".

There's a long road from Evergreen to Larrabee, so they can either chip away at it now that the major D3D inflections are done (left to do: truly flat memory model and guaranteed context switching responsiveness) or ...

I think Fermi's flatter memory model will really show up in DirectCompute type stuff. Developers have been champing at the bit to get CUDA functionality into their games and NVidia's ~3 year lead has allowed them to target performance there, not merely being D3D spec. feature-complete.

Jawed
 
I don't think so. As has been pointed out earlier, bandwidth will be increasing at a far slower rate than ALU's. While VLIW may save your bacon for ALU's, to achieve higher bandwidth efficiency a better cache hierarchy is needed.
Since the R600 they put two different versions of LDS in, they put Read/Write L2 caches in and they completely changed the L1 texture caching once already ... I don't think a writeable L1 cache is beyond the scale of the changes they already put into the "R600".
 
Since the R600 they put two different versions of LDS in, they put Read/Write L2 caches in and they completely changed the L1 texture caching once already ... I don't think a writeable L1 cache is beyond the scale of the changes they already put into the "R600".
L2 is read only.

Jawed
 
No, there is a small amount (few kb's) of L2 in Evergreen which is r/w. Look at AMD's presentation in SA09.
They are read-write snapshots of a portion of video memory for atomics, effectively re-using colour-/Z-buffer cache functionality, i.e. ROP functions. Put another way, these are ROP L1s.

Jawed
 
Why would they, given that they've not got the first Fermi out yet? :rolleyes: It's been six months of ATI DX11 next-gen cards, and there's been no counterpart from Nvidia yet. There's no reason to think Nvidia is going to magically have a new generation out in six months when they can't even get the current one out, and what they do push out in the next couple of months is not likely to be the full speed/full SP version.

Because you ignore history? nVidia did it with the NV35 and AMD two times: r580 and r670. Both came shortly after the delayed previous cards. nVidia would stop development of new chips is behind my imagination at this time .

Current speculation is that we won't even see the top end Fermi part, or the mainstream part in any quantities until Q3. Given that Northern Islands is due for Q3, that means that NI could be coming out just when Nvidia are getting 512 SP or mainstream cut-down Fermi out in quantity.

Because they don't have informationen. But yeah, we know that GF100 will not have dedicated tessellationunits and be slower than cypess with dx11.

The alternative that you're suggesting is that Nvidia throw away most of the lifetime of the first Fermi product and pull something else out of the hat, which doesn't seem likely given the investment they've made.

Fermi is a investment in the architecture and the future and not in a one-time gig of a graphics card.
 
They are read-write snapshots of a portion of video memory for atomics, effectively re-using colour-/Z-buffer cache functionality, i.e. ROP functions. Put another way, these are ROP L1s.
Well now, that depends entirely on whether memory controllers have read ports on it as well now doesn't it ... I'd say that 128 KB is an awful lot just for atomics (the original pixel caches are also still there) but you'd have to benchmark it to know either way.
 
Because you ignore history? nVidia did it with the NV35 and AMD two times: r580 and r670. Both came shortly after the delayed previous cards. nVidia would stop development of new chips is behind my imagination at this time .

And those were not seen as successful because they didn't get much of a lifespan before being superceded by a fixed version of the same product.

You're still quoting refreshes, not next-gen architectures. Or are you suggesting that Nvidia will only bring out a Fermi refresh in order to compete with a brand new architecture from ATI? I think that's quite possible. I doubt that Nvidia will have anything other than a fixed version of Fermi by Q3, given they haven't been able to get the first version out yet.

Because they don't have informationen. But yeah, we know that GF100 will not have dedicated tessellationunits and be slower than cypess with dx11.

Difficult to know anything when the product is six months late to the DX11 party.

Fermi is a investment in the architecture and the future and not in a one-time gig of a graphics card.

But it's also a one time product that should have come out with DX11 and in competition to ATI products, and hasn't. It still has a budget, a lifespan, and needs a ROI. If it's not working in it's first iteration, I doubt Nvidia is going to get the second generation out six months later unless they have something completely new up their sleeves. I consider that unlikely given that they seem to be putting all their future eggs into the Fermi basket. At best there will be a fixed Fermi, and a mainstream Fermi, but that may not be enough against another product that's a generation ahead.
 
They are read-write snapshots of a portion of video memory for atomics, effectively re-using colour-/Z-buffer cache functionality, i.e. ROP functions. Put another way, these are ROP L1s.

Jawed

If their size had been about, say ~512K, would you have said that they are r/w caches?
 
If their size had been about, say ~512K, would you have said that they are r/w caches?
They're "L1s" dedicated to a single function - atomic updates of global memory. I'm only trying to show both you and MfA that they are not read-write L2s. Hell, we could call them L0s for all the difference it makes.

They're certainly not L2s because there is no hierarchical cache level below them. TU L1s can't fetch data from these caches, either.

Jawed
 
Are you expecting a NI to be anything less than a 40->28nm shrink AND with functionality improvements less than rv770->evergreen transition?
I don't expect anything other than a boring refresh of Evergreen this year. I don't know what its name is.

I don't know if NI is meant to be a refresh of Evergreen (e.g. as minor as RV790 or as major as R520->R580 or RV730->RV740) or if NI is meant to be a substantial change (RV670->RV770, RV770->Evergreen) or if NI is meant to be an architectural re-boot (R580->R600).

:???:

Evergreen is less late than I thought it was (I thought it was about 1 quarter late) and logic would indicate that AMD plans a substantial change (RV670->RV770, RV770->Evergreen) for summer/autumn 2010 based on the pattern for RV770 and Evergreen. But I think process complications and the GF factor will all conspire against anything other than a boring refresh this year.

Jawed
 
They're "L1s" dedicated to a single function - atomic updates of global memory. I'm only trying to show both you and MfA that they are not read-write L2s. Hell, we could call them L0s for all the difference it makes.
That still doesn't make it a pure ROP cache either, do all UAV accesses (read and write) go through it? (For RW surfaces.)
 
That still doesn't make it a pure ROP cache either, do all UAV accesses (read and write) go through it? (For RW surfaces.)
UAV reads can either go through the texture cache hierarchy or they can be uncached reads from global memory.

Writes are coalesced if the addresses fall into some variants of a 128-bit strided pattern (the diagrams show a write-combine cache). The write-combine cache can (i.e. doesn't necessarily) put data into the ROP/atomic cache. Otherwise writes are uncached. Write-combine cache is effectively the shader export function block of older GPUs.

My understanding is that RW surfaces are only able to support un-cached reads from global memory. Atomics are a way to improve the performance of RMW operations in this scenario, but obviously integer data types are only available - though I suppose it's possible to use floats as ints, not sure how far you'd get :LOL:

Global Shared Memory, additionally, provides a RW surface - but it's 64KB.

Jawed
 
Well, I think it's more likely all UAV accesses for RW surfaces go through the RW cache ... going to need benchmarks to convince me :) (Too lazy to buy the hardware and do them myself.)
 
Back
Top