It has 1x 8-pin power and 1x 6-pin power so it's Cayman or Fermi, no?
With the "AMD" logo on it, I wouldn't put my money for Fermi, but each to their own
It has 1x 8-pin power and 1x 6-pin power so it's Cayman or Fermi, no?
Cos that's merely catching up to where NVidia has been for a while, at best.
The B3d article on Fermi indicates that Cypress can be affected by the fatness of the control points and math in the HS.Yes the curious thing is that this description of off-die buffering was associated specifically with tessellation. It seems that off-die buffering of tessellation data is an "improvement" on keeping that data in the SIMDs (seemingly in LDS?). Maybe that's because it's easier to share it across the chip?
The description doesn't seem specific enough to indicate there isn't on-die storage that may spill to memory.Can't say I like the sound of this, though - it might be an improvement, but ugh, cached like in Fermi seems preferable. And 2x the geometry performance, if true, sounds like no real improvement.
With the "AMD" logo on it, I wouldn't put my money for Fermi, but each to their own
Didn't the B3D article say that tessellation data is kept in the GDS? There could be some data bottlenecks with that.Yes the curious thing is that this description of off-die buffering was associated specifically with tessellation. It seems that off-die buffering of tessellation data is an "improvement" on keeping that data in the SIMDs (seemingly in LDS?). Maybe that's because it's easier to share it across the chip?
R300/9700 was a groundbreaking card. For the first time both AF and AA were useable with playable FPS. I doubt Cayman can deliver something as monumental, but one can hope.
Cos that's merely catching up to where NVidia has been for a while, at best.
GDS is used to send parameters to TS, I believe.Didn't the B3D article say that tessellation data is kept in the GDS? There could be some data bottlenecks with that.
No. A key characteristic of GS since R600 has been pushing all the data off-die through the ring buffer. This is how GS was originally able to support huge amounts of data per vertex, before D3D10 got cut back in favour of NVidia's architecture.Does anyone know how fast RV770 and newer ATI GPUs process triangles that pass through a trivial geometry shader?
Seems NVidia has quite a reserve, both in terms of unlocking throughput (Quadro is the real deal for throughput) and in terms of clocks. So, no, I don't think it's sufficient. Bearing in mind Cayman looks like it's going to have to last for a year+ (emphasis on +). Also, can it scale further? Is it really scalable?Isn't that amply sufficient?
Seems NVidia has quite a reserve, both in terms of unlocking throughput (Quadro is the real deal for throughput) and in terms of clocks. So, no, I don't think it's sufficient. Bearing in mind Cayman looks like it's going to have to last for a year+ (emphasis on +). Also, can it scale further? Is it really scalable?
Holy Jesus! That chip is BIG!
Wasnt expecting something so big coming from AMD
I dont believe in Antilles being 2x Cayman, with a big die like that...
TS data is very small, though. It's just 4 bytes per vertex if you use a triangle strip, and close to half if you do caching. If you can stage just one kilobyte then you have several wavefronts of vertices buffered up.GDS is used to send parameters to TS, I believe.
There's a separate data path from HS direct to DS. Additionally DS consumes the output of TS (obviously). So HS and TS data needs to be staged for consumption by DS - in theory covering quite a bit of lag between the two data streams. This appears to be the crux of the buffering issue. The B3D article, I believe, describes "locking" HS and DS together as a pair within a SIMD. This then limits the amount of data that can be staged, and presumably also affects the SIMDs' ability to sink the output of TS.
Like I said, TS data is 4 bytes per vertex, which means 2-4 bytes per triangle. Even Fermi's peak of 4 tris per clock would consume under 11 GB/s using an off-die buffer for the TS output.My problem with this concept is that tessellation, generally, is supposed to reduce VRAM bandwidth (and space) usage by doing stuff on die instead of dealing with hugely-expanded vertex data streams. Shoving HS/TS data off die really works against that. Unless there's a healthy Fermi style L2 cache, it seems like not much progress to me.
Seems NVidia has quite a reserve, both in terms of unlocking throughput (Quadro is the real deal for throughput) and in terms of clocks. So, no, I don't think it's sufficient. Bearing in mind Cayman looks like it's going to have to last for a year+ (emphasis on +). Also, can it scale further? Is it really scalable?
Like I said, TS data is 4 bytes per vertex, which means 2-4 bytes per triangle. Even Fermi's peak of 4 tris per clock would consume under 11 GB/s using an off-die buffer for the TS output.
I got that the wrong way round, sigh.GeForces are only limited in geometric throughput when tesselation is disabled, right? That doesn't sound like a bottleneck you'd be likely to hit outside of pro rendering applications.
GTX460 is faster than HD5870 in Civ 5:Besides, I know that tesselation is trendy —and rightly so, I suppose— but the main objective is to render games with max details and smooth framerates, right? As far as I can tell, even Cypress is capable of doing that,