AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
That density is such a high priority for the ALU section probably is the reason for the relatively modest enhancements listed.
There is slightly better co-issue, slightly better precision for FP, an additional instruction, etc.

Then there is the reduced-precision integer math on the slim units that appears to be a step up from RV770, where only shifts were available.
This might mean that as time drags on that AMD might inch towards 32-bit integer operations across all units in the SIMD, as process nodes afford more space.

Not a lot to talk about there.
Cache and data-store wise, there are some bumps in capacity, so that merits half a second of discussion.

I guess we can hope Nvidia's next chip provides something more interesting than what has been leaked so far.

I'm still waiting on what has doubled the complexity of Evergreen's scheduler.
 
Slappi's assessment was a response to someone claiming ATI was profitable. In fact, it just lost less than the rest of the company.

I've been reading every page of this thread and noone has said AMD has been profitable. Not sure where you got that. Slappi certainly wasn't responding to it.

What may have been implied is that the ATI (graphics) division within AMD has been more profitable than Nvidia, and that's at least true for the last quarter, although both lost money.

I'm pretty sure everyone here will quite happily admit that AMD has been losing money for quite a while now.

Regards,
SB
 
I've been reading every page of this thread and noone has said AMD has been profitable. Not sure where you got that.

Perhaps you need to read a bit more carefully. Not sure why you're mentioning AMD, this statement of Neliz was about ATI.
 
Perhaps you need to read a bit more carefully. Not sure why you're mentioning AMD, this statement of Neliz was about ATI.

okay, go over to the AMD Doom&Gloom thread and quote this from me, please:

neliz said:
indeed I think That Q1 was profitable for ATI before dropping in Q2 again. But spinning it that way ATI is losing a lot less money than nVidia who seem to have recurring "one time write-off's."
The only thing left for us to guess is how much of these "one time write-offs due to bad packaging material" they will have.

More speculation please, Will the packages contain dongles or not?
 
Perhaps you need to read a bit more carefully. Not sure why you're mentioning AMD, this statement of Neliz was about ATI.

I would assume you bringing in AMD financials would have something to do with that. Since NVDA financials were brought up to counter the claim that NVDA has been profitable this past year.

Regards,
SB
 
I would assume you bringing in AMD financials would have something to do with that. Since NVDA financials were brought up to counter the claim that NVDA has been profitable this past year.

Regards,
SB

NVDA financials were brought in to demonstrate ATI's loss was relatively small. In turn, AMD financials demonstrate that NVDA's loss was relatively small. It works both ways.
 
No. I demonstrated, that nVidia is in significant loss, not profitable. That's all.

ATi's slight loss can be explained by manufacturing of DX11 GPUs, which were stockpiled so far and will be sold soon.
 
Jeez at the backpedalling.

1. Nvidia made a profit
2. No they didnt
3. AMD (not ATI) made a big loss
4. So did NVDA
5. Where did I say that
6. Here
7. I only said that to put AMD's loss into picture
8. Go to step 1.

Nice entertainment though its getting tiring now.
 
We finally get some probable design information, and we waste pages on the public knowledge that AMD and Nvidia lose money, a fact that has been beaten over on two doom/strain threads where such facts are on-topic.

Here's an on-topic idea for this thread.

The SIMD array is apparently banked, with two separate arrays.

The ROP groups do not appear to be similarly divided.
Is this just a diagram simplification, or is there something more to this?
 
The L1/L2 bandwidth appears to be in line with clock speed, even if the consumers of said bandwidth are twice as numerous.

This may be one area where some of the less than doubled performance might be attributed.
Yeah, looks problematic to me. Can't think of any justification other than "scaling bandwidth to 20 clusters is extremely hard".

Is prunedtree's matrix multiplication L2->L1 bandwidth-limited?

Interesting, all the same, that ALU:TEX remains at 4:1. The L2->L1 bandwidth limitation might make it, in effect, 8:1 though.

The dual rasterisers etc. is some kind of big deal. I presume Hierarchical-Z/stencil is partitioned into two and this is done with a simple tiling in screen space.

There's no interpolator unit. I wonder if this has something to do with tessellation being an interpolator of sorts. The D3D11 tessellator generates interpolated vertex coordinates based on existing vertices and tessellation factors.

Jawed
 
Are the double ROPS a resultant of, for instance double-Z or are requirement bceause of it?

There are two rasterizer blocks, two SIMD banks, and then there is the ROP section.

The L shaped purple blocks on the side of each ROP block apparently have links to both hierarchical Z blocks.
This would seem to indicate that even though the earlier parts of the process are split, the ROPs have a unified view, possibly for the sake of correctness if two rasterizers have outputs that will lead to gibberish if they write out without some kind of order being put in place.

If all 32 ROPs do work in concert, it would be the case that a single 16-lane SIMD would not be capable of outputing enough per cycle to go to all 32 ROPs.
Maybe the ROPs are more flexible in allocation than the diagram indicates, there is some kind of buffering stage such that a SIMD's output over two or more cycles is built up before sending to a ROP, or somehow there is a way to send more than one SIMD-worth of output to the ROPs.

If the ROPs still work as a unit, maybe it could be that one SIMD bank might be running a fill-rate limited set of code, while the other could be running some ALU or texture-limited work, and the ROPs could concentrate on one side of the chip over the other.
 
There's no interpolator unit. I wonder if this has something to do with tessellation being an interpolator of sorts. The D3D11 tessellator generates interpolated vertex coordinates based on existing vertices and tessellation factors.
Interpolation was moved inside the shader cores.
 
There are two rasterizer blocks, two SIMD banks, and then there is the ROP section.

The L shaped purple blocks on the side of each ROP block apparently have links to both hierarchical Z blocks.
This would seem to indicate that even though the earlier parts of the process are split, the ROPs have a unified view, possibly for the sake of correctness if two rasterizers have outputs that will lead to gibberish if they write out without some kind of order being put in place.
Yep, that seems to be the case. Fragments reordering is probably done there therefore it needs to be coherent with the rest. Moreover they need to be able to send back z-related info to the Hi-Z blocks so that they can update their low res conservative z-buffer representations.
 
What about external bandwidth? Any confirmation yet? That kinda of scaling would be amazing if they were still on a 256-bit bus.
Seems it can only be 256-bit.

By the performance or architectural changes (or lack thereof)?
The performance appears to be ~50% higher than HD4890. Sure it's better than the seeming bandwidth increase, but RV770 seemed to have quite a bit of excess bandwidth particularly for 4xMSAA.

As to the architecture, well it's certainly food for thought. Too early to tell how radical they've been, because for example the append/consume buffers could be one linchpin of radical efficiency gains - but how to tell?

It's kinda interesting that this has 80 TUs and 32 RBEs, basically the same numbers as GT200 (except Z rate, of course). Slight difference in GFLOPs :p, very similar bandwidth I guess and some storming comparative performance. But I expected more I'm afraid.

Maybe the reviews will change my mind. HD4890 was seriously underwhelming and the performance gain of HD5870 seems to be merely adequate over 15 months. Maybe there's some compute-heavy games around the corner and the beast is otherwise dormant.

The STALKER Clear Sky performance numbers are badly disappointing, STALKER (versions and mods) is one game I've been delaying playing until there's a card that can own it.

Jawed
 
It's funny how the old 1 byte per flop (per second) bandwidth rule is now more a 0.5 bit per flop rule ;)
 
Back
Top