AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
wouldn't 150% of RV710 be nipping at the toes of the RV730s performance ? ((unless I've gotten confused between performance from driver updates from release)).

Is hard to find recent benchmarks, i think if i was planning a review of the Cedar would make sure i had a 4650 and a 9500GT on hand to compare against, looks like the Cedar might have been targeted around 8600GTS performance if the +50% is true.

Memory choice is going to be interesting on these at the end of the year currently both DDR2 and DDR3 are rising in price. This might cut profits (or increase pricepoints) on these cards as memory is a bigger percentage of costs than on the higher end cards.
 
Last edited by a moderator:
This is a dangerous point because I strongly believe Larrabee is considerably more bandwidth efficient. So, either R900 is a total rethink in the Larrabee direction or AMD's fucked in 18 months. I don't care how big Larrabee is (whatever version it'll be on), I want twice HD5870 performance by the end of 2010. The dumb forward-rendering GPUs are on their last gasp if memory bandwidth is going nowhere.

Well, if lrb is as bandwidth efficient as was made out in that siggraph paper, not just ati, nv too is fucked. They will have to go to a smarter rendering technique to save bandwidth. After all, it is not growing and will not grow at the rate of ALU's.

Daly also said that bandwidth is one of his prime concerns. I think they'll be going over to tile based renderers, or whatever is better than that.

Of course if AMD can make a non-AFR multi-chip card that doesn't use 2x the memory for 1x the effective memory, then I'm willing to be a bit more patient and optimistic.

It is only one of the wastages of present day gpu's that will have to corrected. Ultimately, they'll have to go over to sw load balancing like lrb to become more area and power efficient. Dedicated, ff hw is not goig away anytime soon though. I am of the view that it will be used more smartly and in novel ways, but it wont go away entirely. ff hw will always be more efficient in area and power, but it has to justify it's presence in the overall scheme of things.
 
But the fixed function interpolator unit has just been deleted it seems (there is no "Interpolators" block).

Jawed

Not saying anything (yet, at least), but aren't we seeing a wee bit too much into what is basically a marketing oriented architecture diagram? It's not like there aren't other things present not shown there (or in such diagrams in general)...they could still be using dedicated interpolators, and simply not have shown them there.
 
The angle dependency is much less of an issue ((And has been since the X1900XT/G80)) than the inclusion of mipmap and LoD filters. That need to just go away. I' wouldnt shed a tear to see the "Trilinear Optimisation" mode go away. And just force HQ all the time. The performance you get from using is not even worth having it the control panel these days.
This.
I sure hope that AMD will provide a way to disable filtering optimisation without disabling Crossfire at the same time.
Otherwise I really don't see any point in improving h/w LOD selection algorythms - quality will still be worse than on NV's h/w HQ mode.
 
That AF is nice but as others have said we have moved on a long way since the bad old days when certain companies cards had very bad AF. Even AA to me got to the point I couldn't tell the difference without 4x magnifying shots. It's all about the frame rate stupid! Well, at least to me. I'm no sophisticate though.

I wasn't impressed with the ATi numbers against the 285 shown a while ago until someone pointed out to me it was the 5850 and not the 5870 I assumed it was. Looking good.

Is it still October for when it is in the shops or late September? I've lost track or when the release day is estimated.
 
Didn't Fellix say he'd show us MSAA sample patterns or SSAA options?
Nothing about type, just AA patterns.

The shot is a first hand test result, I can assure you. ;)

Probably I could get some AA-pattern samples, but I'm not promising.



Is it still October for when it is in the shops or late September? I've lost track or when the release day is estimated.
Current rumors suggest a limited availability at launch, late Sept, NDA supposedly ends on the 22/23(?), with full availability in mid-late Oct.
 
Anyway, AA pattern for SS is RG. That's important. It could came a few years earlier, but it's good to see it anyway. As I remember, the last GPU supporting this feature (without need uf multi-GPU platform) was VSA-100 almost 10 years ago :)

That should be correct; if I haven't understood though anything wrong it's not the usual way to achieve RGSS and sure hope it won't affect performance too much. Or to exagerrate a bit I hope anyone wouldn't expect that you can actually use FSAA with 6 monitors in 76xx :devilish:
 
Anyway, I feel, that ATi has taken all the "holy grails" and put them into the chip - RGSS (remember, how many people praised it when 3Dfx has gone and many of these people still miss it), AF without angle dependence (I remember intense discussions about this feature in R520 time and the excitement when S3 enabled this feature in new drivers), eyefinity/surround-view - the most praised feature of Matrox Parhelia, exclusive for 7 years, if anybody bought Parhelia for gaming, surround-view was the reason. Dual rasteriser / 32ROPs - two sections, which weren't improved for many years (in terms of quantity, of course), etc...
 
Maybe or maybe not, but we know that it consumes more than 286 watts(4870x2).

...

In other words, take non-measured values of power requirements for graphics cards with an ocean worth of salt. The card is likely to be able to exceed the vendors "TDP" without overclocking fairly easily. One day, they might actually match the specs they advertise.

Of course, there are situations where any given electronic device exceeds their Thermal Design Power but not continually. The same applies to Intel and AMD for example.
 
Of course, there are situations where any given electronic device exceeds their Thermal Design Power but not continually. The same applies to Intel and AMD for example.

CPUs from both at this point, AFAIK, will not exceed TDP for a thermally relevant period. In fact the future as laid out by Intel via their turbo mode technology, is to maximize performance at the TDP level as much as possible via boosting frequencies when possible to ride right at the TDP.

In the case of both 3dmark and furmark, both Nvidia and ATI gpus will operate at beyond TDP for thermally relevant periods of time to the point of device failure.

Both AMD and Intel learned their lessons when Tom's et al did the hot plate articles. Eventually, it would be nice to know that GPUs were smart enough not to become fire hazards as well. ;)
 

Quite annoying when you actually understand german...

Not to mention the prices seem fair. The Radeon HD 5850 competes with the Geforce GTX 285, yet is cheaper. The same applies to the Radeon HD 5870 compared to the Radeon HD 4870X2 and Geforce GTX 295.

Besides it is a far cry away from the $599 and $649 NVIDIA have asked for in the past.
 
It's not the first time I've seen this video being used. It's not funny anymore and I don't see anything wrong with the initial MSRPs either.
 
The RBE-specific caches are local to each RBE, so if there are two per memory controller, the controller sees two separate chunks of data being sent out.
Look at RV740. It has 2 RBEs per MC. It has 81% of the bandwidth of HD4850 yet comes in at ~93% of the performance. So clearly the dual-RBE per MC configuration is not hurting.

Under load conditions with each RBE contending equally, a naive arrangement might interleave traffic from each RBE with the other, which would hurt utilization of the memory bus if the targets are far enough apart in memory.
I suppose a single RBE could for some reason interleave from multiple batches, though I'm not sure it would want to.
There's no alternative. There are 10 clusters in RV770 feeding only 4 RBEs, of course the RBEs are going to be quickly switching amongst tiles. "Quickly" is relative of course, the fastest a switch can occur is once every 16 cycles, assuming a tile is 64 pixels. 16 cycles is enough time for about 722 bytes of data (assuming 850MHz core clock and 153.6GB/s in Cypress). Or if you prefer, in 16 cycles the MC does about 22.5 transactions.

Now I will admit that the way screen space is tiled for rasterisation isn't necessarily the same tiling for memory controllers.

Ways to limit the abuse of the MC would be to either make sure there is much greater locality between RBEs--that is that they work on neighboring tiles at the same time; make it so that an RBE has a monopoly on an MC for some number of bus transactions; or expand the MC's ability to recombine traffic.
R700 introduced MCs that only support local RBEs (1 in RV770, RV730, RV710 and 2 in RV740). After that, I don't know of any information on how tiling of screen space or memory works. I suppose if one wrote one's own driver, one could explore this in detail...

The regularity of the transactions and their locality can influence the amount of utilized bandwidth.
Jumping around and closing/reopening DRAM pages or otherwise not providing the linear accesses DRAM really likes can cut down the amount of time available for actual transactions.
The factors for this would be in the opaque realm of AMD's mapping policies, memory controller parameters, and GDDR5's architectural restrictions.
At the same time as RBEs are pumping out pixels the TUs are consuming piles of texels. There's also a constant stream of vertex data. So there's a limit to the kindness that can be shown to DRAM.

This is an admirable goal.
Eh? It's a prime directive since ATI first introduced early-Z rejection, along with compression. Bandwidth efficiency has shown huge gains over the last 5 years, but it's clearly expensive in terms of transistor count or we'd have had these gains already.

Given how much of the design appears to be "MOAR UNITZ", I am curious to see what they tried. The GPU peaks at 190W and it has a 60% increase in performance with a doubling of almost everything, so maybe they haven't tried too much.
Well, that's exactly my problem. Doubling the RBEs per MC has clearly (as can already be seen in RV740) brought a significant jump in efficiency, but at the same time the GDDR5 gravy train appears to have hit the buffers. So unless something radical happens and GDDR5 goes way above 6Gbps, the future is looking awful for this architecture - the entire forward-rendering concept needs a rethink.

One "advantage" of this scheme would be that it requires minimal investment in changing the rasterizers.
Rather than sending rasterization data back and forth, the GPU can get lazy and just rely on broadcasting from the RBE-level Z buffer to both Hierarchical Z blocks, and rely on the RBEs to automatically discard whatever excess fragments make it past the even more conservative than usual early Hierarchical Z checks.
It's just a question of the latency of RBE-Z updates for hierarchical-Z - if those latencies are long enough, does hierarchical-Z work? That latency could easily be thousands of cycles. Tens of thousands.

I'm not saying I'd find this to be the best solution, but it is a solution that involves a certain "economy of effort".
A simulation would be pretty informative, if done in enough detail - there have been various attempts at simulating MCs in GPUs but I suppose only the IHVs can really do this.

I'm unclear if the motivation for the dual-rasterisers was simply to cope with the high density of primitives produced by tessellation.

One way of looking at this multiple-rasteriser architecture is to imagine what happens if AMD is going to build a multi-chip solution where the multiple-rasterisers scale-up and work by communicating amongst themselves (i.e. non-AFR, instead something like super-tiled). If this is the basis of the design, then off-chip inter-rasteriser latencies are obviously far higher than on-chip - let's say 500 cycles for the sake of argument. Where does that lead? Dumb round-robin rasterisation? Still doesn't answer the question of how to apportion the vertex streams across multiple GPUs, or what to do with GS stream out from multiple GPUs (let alone append/consume buffers).

I dunno, is it even worth thinking in those terms...

Jawed
 

Lol wth? This is my video! I never thought it would reach this thread!:LOL:



Quite annoying when you actually understand german...

Not to mention the prices seem fair. The Radeon HD 5850 competes with the Geforce GTX 285, yet is cheaper. The same applies to the Radeon HD 5870 compared to the Radeon HD 4870X2 and Geforce GTX 295.

Besides it is a far cry away from the $599 and $649 NVIDIA have asked for in the past.

I guess you are right! Unfortunately it is one of those videos that will annoy some people. Sorry about that. I didn't mean any disrespect towards anybody. I've seen the movie and it's awesome!

Although i have discussed the prices with our forum members, this video was actually a balance means to this one!
http://www.youtube.com/watch?v=FR45ja_fNzU&feature=channel_page

Again , sorry! You may want to turn the audio down!
 
Quite annoying when you actually understand german...

Not to mention the prices seem fair. The Radeon HD 5850 competes with the Geforce GTX 285, yet is cheaper. The same applies to the Radeon HD 5870 compared to the Radeon HD 4870X2 and Geforce GTX 295.

Besides it is a far cry away from the $599 and $649 NVIDIA have asked for in the past.

Its not that i consider evergreen prices to be unfair. Quite the opposite. And i agree with what u have said above. Its just i found this video very hilarious :)
 
Back
Top