AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
Nyquist at least gives you a limit where more samples will not give better results.
More accurate results, but it tells us nothing about when we stop seeing the difference in quality.

BTW, in the general case the footprint of a square pixel in texture space is a quadrilateral not a parallelogram ... using a line of equally weighted bilinear samples is not really optimal.
 
AF is so broken in 5870, they decided to leave it off in 75% of their benchmarks! :D

UNder load, 100C core temps, WTF?! Damn they run hot when in use. People used to make fun of 5800U, but with the 5870, you reall can fry an egg with it.
 
UNder load, 100C core temps, WTF?! Damn they run hot when in use. People used to make fun of 5800U, but with the 5870, you reall can fry an egg with it.

I think the difference being that the 5800U fan @ 100% sounded like a hoover where as the 5870s (iirc) are running at 30%.
 
Last edited by a moderator:
They say that the GTX295 reached 99 °C, and they stopped it because Temperature was still rising. This was running Furmark, anyway.
However, I think it´s better to wait for other reviews.
 
UNder load, 100C core temps, WTF?! Damn they run hot when in use. People used to make fun of 5800U, but with the 5870, you reall can fry an egg with it.

Furmark will push any core to 100C and fry it if you're not careful. Unless you enjoy running Furmark 24/7, I don't think temperatures will be an issue.
 
Oh yes it does. Even oversampling with double the amount of samples makes no visible difference.

And yes I tested it :)
That was not the direction I was really suggesting taking it in. A lot of samples will have positions very close to a texel you have already sampled with another position, so maybe you can cheat and just nudge the sampling position of that other sample a bit and ignore the texels which have only a very small contribution. Transparency with a reference method which is only a coarse approximation (due to mipmapping, weighting, inappropriate shape of filter etc) is simply not an issue to me.

How does it look?
 
Last edited by a moderator:
so what should i have my lod set to for best iq clamp ?

ps: nv profile page takes over 2 minutes to open (the first time) for people who have a lot of games installed any chance of letting the powers that be know


The IQ clamp only sets itself to zero for if the game uses an LoD thats gone past a negative bias. Not every texture mipmap will have one. Also adjusting the LoD globally to a -LoD will not work if you clamp the LoD. Honestly if your using AF at all you should never adjust the LoD past its default "Zero".

I personally don't clamp the LoD but I havent experienced any games where its been a big problem for me. That picture you showed me does not look like a texture filtering problem to me but more like a MSAA problem. I wonder if its been fixed.. Do you have a link to that actual preview?

Whats the Nvidia Profile page? You mean the Manage 3D settings page? How many games do you have installed? 2 minutes is a long time. It really shouldnt take more than say 5 seconds.

Chris
 
AF is so broken in 5870, they decided to leave it off in 75% of their benchmarks! :D

Seems really odd to go from 0x/0x to 8x/0x rather than 4x/16xAF and 8x/16xAAF. I have always assumed that any card over 150 dollars its assumed you'll at least use 4xAA.
 
Yes and no.
It depends on just how naive the solution is.
Worst-case, a lot of the necessary data resources do not have local copies on the second chip, and then the entire thing is throttled by the interconnect.
There's always a bottleneck somewhere ;)

The way back-end cores eagerly snatch up the next available tile was not described as taking into account any locality.
This is probably fine in the single-chip case since it's all the same memory controllers and ring bus.
It can incur additional costs if this causes self-assignment to hop chips.
Agreed.

Would these buffers be in local memory per-chip or in the L2 caches?
Unless they're fetched by the TUs (unlikey I'd say) it seems they'd be in L2. The buffer is there merely to smooth data flow as consumption varies.

An additional concern is that this turns into a streaming problem when the remote cores are aware that the additional vertex work is available. (edit for clarity: so we must factor in synchronization costs that do not exist otherwise)
The most latency-tolerant methods would lean most heavily on bandwidth and buffers to work their magic, and we have an unknown ceiling in interconnect bandwidth. It's probably safe to assume interconnect bandwidth << DRAM bandwidth.
I wouldn't be surprised if interconnect : RAM ratio is better in Larrabee (i.e. less of a performance constraint relative to a single chip) than in traditional GPUs. If any of these ever do this, of course.

I say that because Larrabee appears to demand relatively little bandwidth per unit performance, while chip interconnect bandwidth is more a physics question than anything (i.e. it is an area : performance trade-off that's pretty much equal no matter ATI, Intel or NVidia).

I have my doubts about how similar they can be.
Sorry, "similar mechanism". I'm not claiming anything about performance, it was just an analogue for the kinds of inefficiencies that arise due to the order of the consumption of triangles not matching their order of production (their layout in memory).

Any low-level event that leads to waste in the multi-chip case something that, even if rare, is in my figuring likely to be 2-10x as expensive to handle versus a waste problem that stays local.
This is why I am reluctant to assume that something that is 10% of the load in a single-chip scenario stays at 10% with multiple chips.
As long as performance scales adequately with multiple chips, who cares? It's a question of whether it sells, not absolute performance. People have been buying AFR X2 junk for years now, putting up with frankly terrible driver support.

Assigning a PrimSet to a core can take possibly tens of cycles with one chip.
Assigning triangles to bins can take place at the full bandwidth of the chip DRAM bus.
A back-end thread detecting that a bin is ready would take tens of cycles, and reading bin contents in the back end can use full bandwidth.

About half the time with multi-chip situations, assuming completely naive assignment, these assumptions will not be true.
I can't work out what you're quantifying here.

This would come down to the complexity and amount of independent nodes in the dependency graph.
In the scale of granularity where per-frame barriers are coarsest and per-pixel or fragment is finest, coordinating at a render state might be medium-to-coarse synchronization. It would be additional synchronization, and this would be a performance penalty even with one-chip.

Some of this may be unavoidable regardless of scheme used, as those buffers eventually have to be used and so much of that data will need to cross the interconnect.
It might be that this can safely be done by demand streaming to each chip or perhaps a local copy of each buffer will exist for each buffer in each memory pool.
Ultimately the key thing about Larrabee is it has lots of FLOPs per watt and per mm² and is heavily dependent on GPU-like implicit (4-way per core) and explicit (count of fibres) threading to hide systematic latencies. So whether the latency is due to a texture fetch, a gather or data that is non-local, the key question is can the architecture degrade gracefully? Maybe only once it's reached version X?

The bin sets are stored in main memory, though.
The Seiler paper posited the number of color channels and format precision as the factors in deciding tile size.
I think I completely misintrepreted what you said before. I'm not sure why you say bin spread is going to get worse with flimsy binning of triangles.

A bin's contents could be streamed from a chip's DRAM pool based on demand, so why would this impact the tile size?
Granted, if for some reason the bin were on the other chip's memory pool due to a non-NUMA-aware setup, costs in latency, bandwidth or memory buffering would be higher.
Actually, if the scheme is that naive, it wouldn't know to add additional buffering and the chips would just stall a lot.
I'd hope there'd be performance counters and the programmers make the pipeline algorithms adaptive. The fact that there are choices about the balance of front-end and back-end processing indicates some adaptivity. Though that could be as naive as a "per-game" setting in the driver.

I've been discussing Intel's own software rasterizer, though. I'm not clear on just how much can be set by a programmer not messing with Intel's driver and all that.
Sure, if a developer rolled their own solution they could do what they want.
Yes, I'm thinking in terms of the programming of Intel's drivers supporting games through standard APIs.

I think that any multi-chip solution from Intel would include software that was modified so that as much work gets done locally as possible, even if at the cost of duplicated computation.
Yes, I agree in general, since computation is cheap and, relatively speaking, cheaper in multi-chip. All I'm saying is that Intel has a lot of potential flexibility (if it's serious about multi-chip) and it's a matter of not leaving gotchas in the architecture. Considering the appalling maturation path of multi-GPU, so far, Intel could hardly do worse. The only risk seems to be that consumers get sick of multi-chip (price-performance, driver-woes).

Of course now that we learn that R800 doesn't have dual-setup engines and is merely rasterising at 32 pixels per clock, it does put the prospects of any kind of move to multiple-setup (and multi-chip setup) way off in the future.

Jawed
 
UNder load, 100C core temps, WTF?! Damn they run hot when in use. People used to make fun of 5800U, but with the 5870, you reall can fry an egg with it.

I love how people pan 3dmark as being useless because it has no realworld application.

And here we're using Furmark, an application with even less real world relevance to say, OMGWTFBBQ!#@! According to Furmark my 4870 should have exploded and burnt down my house by now, yet amazingly enough it runs just peachy.

Hehe.

Anyways, it's a non-issue, in even the most demanding real world application I'd be surprised if this got even remotely close to 90c unless your ambient temp is already in the high 30's.

Regards,
SB
 
Furmark will push any core to 100C and fry it if you're not careful. Unless you enjoy running Furmark 24/7, I don't think temperatures will be an issue.
Now that PhsyX has been established as a game changer, this is the next (non)issue. ;)
 
I love how people pan 3dmark as being useless because it has no realworld application.

And here we're using Furmark, an application with even less real world relevance to say, OMGWTFBBQ!#@! According to Furmark my 4870 should have exploded and burnt down my house by now, yet amazingly enough it runs just peachy.

Hehe.

Anyways, it's a non-issue, in even the most demanding real world application I'd be surprised if this got even remotely close to 90c unless your ambient temp is already in the high 30's.

Regards,
SB


Well I dont know what the deal is but my 4890 was hitting 85C on Crysis, this in a 1280X window, so at my monitors native res of 1680, I assume it would have been worse.

I must say the 4890 has brought home heat/power issues for me, for the first time ever. My old 9800GTX was a much better citizen. Although the 9800 idled at 72C, it ran quiet. The 4890 idles at 54C, but it idles loud enough to be slightly annoying. Further, my 500 watt Antec powers supply which I assumed to be plenty, is apparently not enough for the card. I've been having random black screen-->reboot in Crysis, in trying to determine if it was core temp, VRM temp, or PSU that was causing the shutdowns, I discovered that according to OCCT my 12V line is getting pulled down under 11V on heavy applications. I cant believe a quality 500 watt supply isn't sufficient. Now whats worse after spending so many hundreds on a gaming PC when I really dont even PC game that much, I need a new $120+ power supply, and really a new $30+ GPU cooler as well to get reasonable temps at reasonable noise levels. To spend another $150 after already spending $250 on the card really just to play Cysis is really making me balk.

I'm not sure if it's ATI or just the way high end cards are getting these days, but it's been the first card I've had that really is such a heat/power monster it becomes problematic. However it is a powercolor model overclocked to 950, though with an aftermarket fan that I assume would be better not worse than the standard) so perhaps the stock models are not so bad.
 
I hope we start seeing more cards with internal exhaust cooling aimed at enthusiast (with a big RED warning sticker "Don't expect this to work in cases with retarded airflow."). For the moment only the Asian market seems to get these.
 
Well I dont know what the deal is but my 4890 was hitting 85C on Crysis, this in a 1280X window, so at my monitors native res of 1680, I assume it would have been worse.

I must say the 4890 has brought home heat/power issues for me, for the first time ever. My old 9800GTX was a much better citizen. Although the 9800 idled at 72C, it ran quiet. The 4890 idles at 54C, but it idles loud enough to be slightly annoying. Further, my 500 watt Antec powers supply which I assumed to be plenty, is apparently not enough for the card. I've been having random black screen-->reboot in Crysis, in trying to determine if it was core temp, VRM temp, or PSU that was causing the shutdowns, I discovered that according to OCCT my 12V line is getting pulled down under 11V on heavy applications. I cant believe a quality 500 watt supply isn't sufficient. Now whats worse after spending so many hundreds on a gaming PC when I really dont even PC game that much, I need a new $120+ power supply, and really a new $30+ GPU cooler as well to get reasonable temps at reasonable noise levels. To spend another $150 after already spending $250 on the card really just to play Cysis is really making me balk.

I'm not sure if it's ATI or just the way high end cards are getting these days, but it's been the first card I've had that really is such a heat/power monster it becomes problematic. However it is a powercolor model overclocked to 950, though with an aftermarket fan that I assume would be better not worse than the standard) so perhaps the stock models are not so bad.

Dunno, I run Crysis at 2560x1600 with 4xAA on a 4890 and it never breaks 80c. That's in a very cramped Shuttle XPC small form factor case. Battleforge in a 2400x1500 window with 4xAA actually makes my 4890 break 80c. But not by much. Then again I have air con, so ambient temperature in my place is usually in the 25-28c area.

Regards,
SB
 
You guys do realize that these ambient tempature discussions are really useless because they can vary greatly on room tempature? The actual threshold tempatures don't bother me as much as leaked tempature. IE how much gets dumped into your living area.
 
I hope we start seeing more cards with internal exhaust cooling aimed at enthusiast (with a big RED warning sticker "Don't expect this to work in cases with retarded airflow."). For the moment only the Asian market seems to get these.
If you mean dumping heated air into the case, I hope they continue exhausting it out the back. A single card setup wouldn't matter as much as a multi-card setup dumping that heat into the case.
 
@chrisray that pic was from ati's site i do remember f1 99-02 being used a few years ago as an example of ati's better af I googled for some articles but couldnt find any sorry

as to the Manage 3D settings page 1 minute 34 seconds (i timed it ) 315 games installed
 
Back
Top