AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
Would it work to say there are two bins for that SKU?
It would be better, maybe. My English isn't good enough to discern between nomenclature nuances :)

Anyway, I have no idea about real power consumtion difference. Some reviews showed power drain similar to HD4870. Maybe these reviews tested the less power hungry part, which could be used for dual-configuration. But that's only my guess..
 
There should be no consumption difference, since leakage also effects consumption.

For X2-cards they need low-leakage and low-voltage parts or they have to reduce clock significantly.
 
It would be better, maybe. My English isn't good enough to discern between nomenclature nuances :)

Anyway, I have no idea about real power consumtion difference. Some reviews showed power drain similar to HD4870. Maybe these reviews tested the less power hungry part, which could be used for dual-configuration. But that's only my guess..

The real-world application power consumption seems to be +-10W from HD4870 depending on app
 
The register file only has 256 addresses, and a thread can only address 127 of those addresses (ignoring, for a second, the varying number of shared registers, which any thread can access, but which are limited in number to 128). So 512 isn't possible.
Is this hardware threads or marketing threads?
I'm saying that AMD is calling each element in a 64-element thread or in the case of graphics terminology calling each pixel in a 64-pixel batch a thread.

The idea that the slide's 32K means a full hardware thread would require over a thousand addresses in the register file just for a single-register allocation in the shader.
 
Redwood & Cedar predictions:

Cedar:

From 675MHz up to 825MHz core and with 500MHz DDR2 up to 900MHz DDR3

1. 4ROPs / 8TUs / 80SPs (around 290 million transitors & 51mm2) (64bit memory controller)


2. 4ROPs / 16TUs / 160SPs (around 350 million transitors & 62mm2) (64bit memory controller)


3. 8ROPs / 16TUs / 160SPs (around 420 million transitors & 73mm2) (64bit memory controller)

I like scenario 1. (with only 675MHz SKUs)

Price will be 30$ for 512MB DDR2 version.
AMD will need in Q1 2010 an extrenely cheap AIB in order to entice potential Intel customers (Clarkdale with IGP launch in Q1 2010...) and give them cheap alternative solution. (entire range)

For example in Q1 2010 we will have in the $200 range:

i5 660 3,33GHz dual core (4Threads) +IGP $196
Another alternative will be:
i5 760 2,8GHz quad core (4 Threads) $196 + $30 5350 (scenario 1)

Also it ($30) will be much faster than G 210.

Redwood:

From 675MHz up to 825MHz core and with 900MHz DDR3 up to 1GHz GDDR5

1. 8ROPs / 32TUs / 320SPs (around 550 million transitors & 93mm2) (64bit (or 128bit) memory controller)

2. 16ROPs / 32TUs / 320SPs (around 680 million transitors & 115mm2) (128bit memory controller)

3. 8ROPs / 24TUs / 480SPs (around 660 million transitors & 112mm2) (64bit (or 128bit) memory controller)

I like scenario 1. (5670 825MHz + 512MB 1GHz GDDR5 & 5650 675MHz with 900MHz DDR3) (there is a possibility for a 128bit memory controller design with DDR2 500 (5650) & DDR3 900MHz (5670), we will see...)


Its easier for AMD to take the RV730 and make it DX11 (easier on the hardware side & on the software side)
Also the 5670 version (around $59-69) will be as fast as the higher priced 550MHz GT240 900MHz DDR3 version, so no worries there.
 
Last edited by a moderator:
Does anyone know how much GDDR5 costs compared to standard DDR3? I was wondering whether using GDDR5 might make sense on Redwood and Cedar. Presumably it would make it possible for them to have a 64-bit and a 32-bit memory bus, respectively. Bandwidth might be a little tight, but that seems to be the road AMD is taking on Evergreen.

elsence > I can't imagine Redwood having less than 400SPs. It's supposed to replace RV730, it has to have a least a few more SPs.
 
Does anyone know how much GDDR5 costs compared to standard DDR3? I was wondering whether using GDDR5 might make sense on Redwood and Cedar. Presumably it would make it possible for them to have a 64-bit and a 32-bit memory bus, respectively. Bandwidth might be a little tight, but that seems to be the road AMD is taking on Evergreen.

elsence > I can't imagine Redwood having less than 400SPs. It's supposed to replace RV730, it has to have a least a few more SPs.

No idea about DDR3 but supposedly GDDR5 vs GDDR3 is only ~10% price increase.

Edit- elsence, I like your third option the best for Redwood. It was what I was speculating it to be quite awhile ago.

LordEC911 said:
Redwood is RV810, 75w TDP, unknown specs- 64bit, 320-480SPs, 24TMUs, 80-90mm2. Performance close to a 9800GT, less than a 4830(?). MSRP ~$79 and under. Low profile version(?).
 
Last edited by a moderator:
Alexko>

it's not me speaking it's the Ouija board.

lol



O.K. i will add a third Redwood prediction just for you.

That's very nice of you :) But I know you were moving the planchette...

No idea about DDR3 but supposedly GDDR5 vs GDDR3 is only ~10% price increase.

Edit- elsence, I like your third option the best for Redwood. It was what I was speculating it to be quite awhile ago.

I agree. I think this is the best way to get optimal performance/mm² while keeping board costs at a reasonable level, though it would help to know if a 64-bit board with GDDR5 costs more or less than a 128-bit board with DDR3.
 
A wider memory interface doesnt only increase the die-size but also makes the PCB more expensive. Smaller interface + faster memory is most times cheaper.
 
Is this hardware threads or marketing threads?
:LOL: It doesn't matter which actually. But for the sake of clarity, a hardware thread can only access 127 addresses (excluding shared registers).

32768 marketing threads ("markethreads"), at 64 per hardware thread, is 512 hardware threads. This is the count of threads that was introduced by R520:

http://www.beyond3d.com/content/reviews/27/4

R520's "Ultra Threaded Dispatch Processor" is the element that ATI have designed in order to better increase the overall utilisation of the Shader ALU's by breaking the tasks down into smaller chunks that can be interleaved more effectively. The central Pixel Shader dispatch unit first breaks down each of the shaders into "threads" (batches) of 16 pixels (4x4) and can track and distribute up to 128 threads per pixel quad. When one of the shader quads becomes idle, due to a completion of a task or waiting for other data, the dispatch engine will assign the quad with another task to do in the meantime, with the overall result being a greater utilisation of the shader units, theoretically.

And seemingly it's never gone up since then. Just the capacity of a thread has increased.

Jawed
 
Oh finally some new speculation about the evergreen parts we don't know yet :).
We already have 256bit gddr5 (32rops) and 128bit gddr5 (16 rops) parts. I think it would make sense to continue along those lines hence redwood would have 8 rops, and probably 128bit ddr3, and cedar is probably 4 rops, 64bit ddr3 (now ddr2 would just totally suck, and imho at this time with ddr3 chip prices nearly the same as ddr2 make no sense).
You could argue about 64/32 bit gddr5 instead, but it makes not much sense since the biggest chips you can get are 1gbit, so even if the gddr5 chips are 32bit you still need at least 4 for a 512MB card (and 8 for 1GB - OEMs absolutely love large video memory...).
Dunno about the shader / texture count though. I think low end versions will keep the tradition of having smaller simds (hence higher tex:alu ratio) though to what degree is anyone's guess - I wouldn't be surprised if the ratio gets adjusted a bit. So simd width 12 at least for redwood would make sense imho, it could have maybe 6 such simds hence 24 texturing units and 360 shader units. This should have comparable performance to HD4670 (hence handily beat the GT220 with similar die size).
For Cedar maybe 2 simds with size 12 (8 tmus / 120 shader units) or 3 simds with size 8 (12 tmus / 120 shader units) sounds like it would be very small and still more than enough to beat the competition (g210).
And all cards still with 3 display outputs would also be nice... (though in fact I'm curious how many display controllers juniper has - we've been hinted that those additional display controllers were driven by the needs of notebook manufacturers, hence Juniper might still have 6 since Cypress doesn't really make sense in a notebook?)
 
Does anyone know how much GDDR5 costs compared to standard DDR3? I was wondering whether using GDDR5 might make sense on Redwood and Cedar.

http://www.semiaccurate.com/2009/10/06/nvidia-will-crater-gtx260-and-gtx275-prices-soon/

The numbers in this (about 2/3rds down) are *SLIGHTLY* blurred from real volume prices. Getting hard number on this stuff is hard, but you can if you try. Basically 10% more from GDDR3 @~1GHz to GDDR5@1GHz. Going to GDDR5@1.25 adds about another 10%.

The question to ask is how much PCB cost, and pin/package costs does it save by going from 3 to 5? Low bin GDDR5 has more or less the same bandwidth (1) as GDDR3 at the top bin with double the bit width. That is significant, especially on a small die.

-Charlie

(1) Unless your GDDR5 controller is totally hosed. No names. In totally unrelated news, did you see Fudo posted a story ~2 weeks ago about the cool new feature of Fermi? It supports GDDR3! I wonder why? 384 bit GDDR3 feeding 512 shaders is how much more than 512 bit GDDR3 feeding 240 shaders? My calculator must be broken, it keeps putting a dash in front of the results.
 
http://www.semiaccurate.com/2009/10/06/nvidia-will-crater-gtx260-and-gtx275-prices-soon/

The numbers in this (about 2/3rds down) are *SLIGHTLY* blurred from real volume prices. Getting hard number on this stuff is hard, but you can if you try. Basically 10% more from GDDR3 @~1GHz to GDDR5@1GHz. Going to GDDR5@1.25 adds about another 10%.

The question to ask is how much PCB cost, and pin/package costs does it save by going from 3 to 5? Low bin GDDR5 has more or less the same bandwidth (1) as GDDR3 at the top bin with double the bit width. That is significant, especially on a small die.

-Charlie

(1) Unless your GDDR5 controller is totally hosed. No names. In totally unrelated news, did you see Fudo posted a story ~2 weeks ago about the cool new feature of Fermi? It supports GDDR3! I wonder why? 384 bit GDDR3 feeding 512 shaders is how much more than 512 bit GDDR3 feeding 240 shaders? My calculator must be broken, it keeps putting a dash in front of the results.
Well isnt the the GDDR3 support in there because there is no ECC GDDR5 in existence yet? Just saying .. (I'm also completely illiterate when it comes to GPGPU performance with lowered memory system bandwidth)
 
(1) Unless your GDDR5 controller is totally hosed. No names. In totally unrelated news, did you see Fudo posted a story ~2 weeks ago about the cool new feature of Fermi? It supports GDDR3! I wonder why? 384 bit GDDR3 feeding 512 shaders is how much more than 512 bit GDDR3 feeding 240 shaders? My calculator must be broken, it keeps putting a dash in front of the results.

The one relevant, supported memory type other than GDDR-5 is standard, regular DRAM DDR3. It will be replacing DDR2 on budget cards, and is an option for Tesla, using ECC memory.

For instance there could be a faster Tesla board with 3GB GDDR-5, and a slower, lower voltage one with 6GB ddr3 ECC.
 
According to this slide:

http://www.hardocp.com/image.html?i...lRCV1MxWXhXblJsUjBaVVRWWkdORlpYZUZkVmJGcEdWMn

the number of threads in flight in RV870 is no greater than RV770, despite the former having twice the SIMDs.

I'm not sure I believe that.

Jawed
The question is... which of those two numbers is correct. ;)

I thought DX11 requires 1024 threads, up from 768 in DX10?
Or is that a different type of thread again :???:
That's how many per thread group. I think that's thread block in CUDA terminology.
 
Back
Top