AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
Perhaps that's why it may be called the Bull Shit Network.:D Atleast Fudo doesn't hide this and says that his site is FUD-zilla.
 
I haven't seen this posted anywhere, though it is old, so here it is.

That is definitely not a given. While GF is making a lot of noise about what they might be able to do in that future timeframe, they certainly have not proven it so far. Plus 28 nm will still be at least 1 year away, if not 1.5 for GPUs and other large ASICs. We might see a 32 nm part by this time next year. But, GF has a lot to prove when it comes to bulk production. While I am personally expecting them to do well, professionally we have to have significant doubts to if they can deliver as they have promised.
 
On second thought, why can't Evergreen BE the midrange part? Just because previous ATI midranges were small (and didn't sell that well on the channel compared to their larger nVidia counterparts)?


If this takes a playbook from the G94 (Geforce 9600GT), aimed on delivering the last "refresh"'s 2nd top tier single GPU performance (Evergreen to 4870/90 as 9600GT to 8800GTS), would that 40mm2 be justified?

4870+ perf in <4850 TDP too?

G94 was 240mm^2. ATI might have been on a diesize spree previously, but it could end if they have more demand and buy in larger volumes wrt previous generation. I don't see 180mm2 hampering usability in large ways, even the 9600GT got butche- uh- costdowned to a ridiculous level.
 
found this quote "This steals some of the thunder out of nVidia’s claim that they will be the first to market with DX11 parts."

are they still caiming this ?

I don't remember Nvidia 'claiming' this, I do however remember some websites claiming that Nvidia would be first to market.

US
 
I don't remember Nvidia 'claiming' this, I do however remember some websites claiming that Nvidia would be first to market.

US

indeed, could only find this on bison.
(this is from early may after nv's comcal.)
After the conference call last night, nVidia was feeling very confident that the company is on the right track... and there is a pretty good reason why. We spoke with several sources and they're quite optimistic that they will be able to release their DirectX 11 GPU ahead of ATI's RV870 chip.
 
Too much confidence from any side until X or Y sits on a shelf hurts my brain. Anyway what's up with the awkward codename? I mean "Evergreen" :LOL: What's NV's X11 chip called internally then? Edelweiss? :devilish:
 
Too much confidence from any side until X or Y sits on a shelf hurts my brain. Anyway what's up with the awkward codename? I mean "Evergreen" :LOL: What's NV's X11 chip called internally then? Edelweiss? :devilish:
I think it's a hidden reference to AMD's green logotype line. :D
 
Evergreen is the entire DX11 family name according to my source at Computex. He also had some juicy bits about GT300... pushed back to 2010 supposedly.. confirmed by a major AIC. Let me see if I can find out more.
 
You know everyone was so excited about the next DX11 part from AMD and then AMD decide to throw a spanner in the works by showing us a relatively tiny die for an assumed high end part.

When will the drama end? :cry:;)
 
Evergreen is the entire DX11 family name according to my source at Computex. He also had some juicy bits about GT300... pushed back to 2010 supposedly.. confirmed by a major AIC. Let me see if I can find out more.

Sounds plausible with rjc's info on a new 40nm upstart. if GT300 required a respin (no boards @computex) they could be delayed long enough to not make it in 2009.
 
Not posted before because it sounds like garbage. :p
Fudo and Theo have been so far off based so many times, espesially with up coming GPU's. Believe it or not, Charlie has actually been pretty good at it. Well ok... semi-accurate at least. :D


yeah ok............................:oops:

hmm no back to school parts? sure about that? God Charlie can't believe anything he says, ya know nV's 40 nm parts.......
 
Evergreen is the entire DX11 family name according to my source at Computex. He also had some juicy bits about GT300... pushed back to 2010 supposedly.. confirmed by a major AIC. Let me see if I can find out more.



"2010" :devilish:
 
RV730 is 150mm2. RV740 is 136mm2. 14mm2 smaller and packing twice the SPU/TMU logic. Thats damn impressive to say the least, but here we are talking about a chip that is nearly 80mm2 smaller then RV770. A fair guess estimation would put Evergreen I would say 10%-30% faster then RV770. Not bad at all for it's size. This chip to me screams mainstream and the perfect replacement for the HD 4850/HD 4870/HD 4890.
Yes, this is like R600->RV670 - the ALU/TU/RBE counts were unchanged and clocks got bumped by 4%, the bus got chopped in half and the GDDR3 clock was raised by 36%.

Except if the bus got chopped in half and memory clock was raised, Evergreen would have ~45mm² to fill with stuff. That's a hell of a lot of stuff, since I estimate that RV740's clusters are around 52mm².

Or, that's 45mm² of D3D11-specific additions :oops:

Or, that's 45mm² of D3D11 stuff + architectural re-jigging.

It's conceivable that the architecture needs a shake-up to handle the memory-intensive nature of D3D11.

It seems to me that D3D11 is making buffers (resources) of indeterminate count and size a more finely-grained component of rendering. Previously rendering consisted of consuming some fixed-size buffers whilst writing to other fixed-size buffers.

Geometry shading opened a can of worms making the output buffers variably sized. We've seen that ATI currently uses a pair of ring buffers to handle the ebb-and-flow of GS. Now D3D11 gives the developer access to their own arbitrarily sized buffers to be used pretty much whenever they feel like it (PS or CS seem the most likely places, and arguably CS is a distinct rendering pipeline all of its own) - though it seems there is still a hard limit on the population of these buffers bound to the rendering pipeline at any one time.


So it seems to me that there are now multiple sets of paired ring-buffers along the rendering pipeline. TS ouput is variable, GS output is variable and PS output is now variable. PS output is extremely arduous because:
  • there can be multiple independent variably-sized buffers written by a single pixel shader
  • the in-flight count of pixels is higher than for any other kind of graphics primitive
In R600 etc. the paired ring buffers rely upon latency-hiding to perform. I've not seen any analysis of GS performance that highlights the quality of latency-hiding - all we have are hints that R600's ridiculous bandwidth was a nod in the direction of making GS work well and that RV670 should show a significant shortfall due to it's much lower bandwidth.

So in D3D11 chips, is latency-hiding against ring buffers held in memory enough? Can a layer of cache provide any benefit here? Theoretically, while appending or consuming a variably sized buffer, caching works well, since all threads are focused on a single region of the buffer. In some ways this is the ideal scenario for caching and it's much easier than caching render back end tasks such as blending, where a stream of pixels arrives with "random" memory addresses (randomness ameliorated by screen-space tiling, though I don't know whether an entire tile can be held on-die in cache).

Can caching for append/consume buffers be re-used for RBE tasks? One of the properties of append/consume is that it doesn't "tile" - because all threads are focused on writing to the head. The head will move "slowly" through tiles in memory space, i.e. it'll move slowly through MC channels. This seems like a useful property to me, as it means that the MCs can be easily configured/scheduled to pre-fetch (while consuming the tail) and it means that sizeable burst writes can be done, e.g. the MC performing a single write after a wodge of data has been added to the head. This doesn't sound too different from RBEs holding entire/portions of a screen space tile - though the timing is skewed in favour of append/consume, where the lifetime of a block is much more coherent (bursty).

L2 in R600 is pretty large, hundreds of KB at least (not massive though). It effectively supports pre-fetching of texels by virtue of both locality and the fact that many texel coordinates are known before pixel shading commences. Much the same applies to append/consume, whereas RBE is pretty much stuck with some degree of randomness. So append/consume seems like it blurs across the functionality of texture and RBE caching, with the strong locality of texels, but the requirement to write.

Currently ATI effectively supports 128 vec4 registers per pixel (vertex, thread, etc.) before registers have to spill to memory. Is that the limit? It seems likely to me that ATI can't increase the register file without having to re-time instruction issue/execution. Currently the ALU and TEX pipes seem tightly bound to pairs of wavefronts in-flight, with an 8 cycle pipeline and effectively some multiple of that for register reads/reads-after-writes. So it seems to me pretty difficult to tweak the register file (e.g. double it) in order to substantially increase latency hiding or to support shaders with substantially higher register allocations.

So, maybe register spill needs to become a first class citizen. Register spill appears to be a coarse-grain variety of append/consume. When a wavefront is created its registers are allocated, D3D specifies that 4096 vec4s per pixel are allowed, that's 64KB per pixel. This is, effectively, a contiguous block of data, e.g. 128 registers is 2KB per pixel, so 128KB per wavefront, though it could be made up of smaller blocks. If register spillage is required then blocks of a wavefront's register allocation can be sent to memory. With round-robin scheduling and with the scheduler able to see the progress of a wavefront's antecedents (e.g. texture filtering) it can control the scheduling of fetching-back of those register blocks that were dumped into memory. All of this is bursty and looks amenable to simple stream-through caching.

So can a single cache take on all these roles? Or are dedicated caches better? Can the high quantities of append/consume traffic opened up by the really quite long and twisty D3D11 rendering pipeline be supported purely by uncached latency-hiding? Is ATI's current latency-hiding at its limit? Can register spill performance penalties be ameliorated by schedulable stream-through caching?

I'm not trying to suggest that all of the "missing 45mm²" is cache. I'm just wondering if the significant increase in the density of memory operations requires a massive re-wiring of all the major memory clients, perhaps with a new higher level of overview in scheduling and perhaps also in a new level of generality.

Jawed
 
Due lack of high res evergreen die pictures, here's a lowres one, next to high res RV770

rv770evergreen.jpg


Even though lowres, it's IMO clear that the evergreen has gone through major changes over RV770-style, there's 4 clear "partitions" of the chip, while in RV7xx there's one big pile in the center
 
Back
Top