AMD: R7xx Speculation

INKster · Mar 31, 2008

Jawed said:
4 or 5%?

Look at the transistor counts for R600 versus RV670 and note that the latter has 50% of the former's MCs.

Jawed

More like ~7%.

R600 = 720 million transistors
RV670 = 666 million transistors

If we took PCIe 2.0, UVD and DX10.1 compatibility off the RV670, then we could've been looking at an almost 60~70 million transistor difference between the two, despite the fact that both of them carry the same ALU count and structure.
That would translate to roughly 8~12%, surely not a negligible difference on such complex chip designs.
Cut the memory bus to 128bit and that's another 5~6% on top of that, for a close to 20% gap.

A fifth of a "relatively" high-end chip discarded for use on a cheap mainstream card means profit pressure for sure.
And i'm also pretty sure that the 55nm half-node is still not at the maturity levels exhibited by the parent 65nm node.

Jawed · Mar 31, 2008

INKster said:
If we took PCIe 2.0, UVD and DX10.1 compatibility off the RV670, then we could've been looking at an almost 60~70 million transistor difference between the two, despite the fact that both of them carry the same ALU count and structure.

Run the same comparison with RV610->RV620 and RV630->RV635. UVD is supposed to be very small, something like 3M transistors I think.

Jawed

LordEC911 · Mar 31, 2008

Jawed said:
Run the same comparison with RV610->RV620 and RV630->RV635. UVD is supposed to be very small, something like 3M transistors I think.

Jawed

Dunno about tranny count but it is suppose to be 3.3mm2 on RV670.

Jawed · Mar 31, 2008

LordEC911 said:
Dunno about tranny count but it is suppose to be 3.3mm2 on RV670.

Thanks. So, ahem, about 11M transistors for UVD?...

Jawed

Sound_Card · Mar 31, 2008

To lay out my thoughts on possible improvements over RV670...

Render back ends:

quadruple the depth/stencil units
quadruple the z/stencil cache
8xMSAA per clock
8xZ per clock

(alpha/blend units and color cache untouched)

Texture Engine Units:

Double the texture address units
Double the texture samplers
Double the texture filtering units
Double the L1 cache
Double the vertex cache
Double the L2 cache

Shaders:

More SIMD arrays but shorter(Anywhere from 8 to 12)
Tweaks in the set up engine
Larger shader instruction and constant cache
Ultra threaded dispatch processor set up with 4 arbiters and 4 sequencers per SIMD(hey I can hope)

Ya, some of it seems pretty far out there like the ultra threaded dispatch processor and 8xMSAA per clock. But I would be just content enough with 4xMSAA/8xZ per clock to at least match G92's capability.

Sound_Card · Mar 31, 2008

trinibwoy said:
On another note: if RV770 does indeed have 480 ALU's it looks like it will fall short of the 1TFlop mark. I'm not really confident that they can hit the required 1Ghz+ core speed. Nvidia probably doesnt have a chance of hitting 1Tflop of MADDs either.

I thought that 1t flop was pretty far out there, but RV770 should hit close to 800gflops.

trinibwoy · Mar 31, 2008

Jawed said:
The same way that 3 quads per SIMD worked with 4 quad TMUs in R580 (1 quad TMU per SIMD).

Jawed

Maybe I'm missing something but isn't the whole point of the R600 setup (where one quad TMU serves the same quad in each SIMD) to better balance texturing work across the chip? If each TMU is instead tied to a SIMD then that TMU goes unused if that SIMD isn't running code requiring texturing. In R6xx if at least one of the threads running in the four SIMDs need texturing work the TMU's get utilized.

Jawed · Apr 1, 2008

trinibwoy said:
Maybe I'm missing something but isn't the whole point of the R600 setup where one quad TMU serves the same quad in each SIMD to better balance texturing work across the chip? If each TMU is instead tied to a SIMD then that TMU goes unused if that SIMD isn't running code requiring texturing.

I think the balancing of texturing workload comes from:

the high level thread allocation policy of the GPU spreads all workloads equally - screen-space tiling of pixel shader workload is the best example of this
L2 cache associativity - supports the coherency of multiple concurrent vertex/geometry/pixel threads so that texels aren't evicted too early

The sheer count of threads in each shader unit then keeps the TUs busy. Don't forget that texturing is a "look ahead" process in R6xx (just like R5xx) - texture results can be delivered dozens of clock cycles ahead of when they're actually required.

Looking at the way code is assembled on R6xx it seems that up to 8 texture fetches are performed in a single clause. (This comes at considerable register cost...)

In R6xx if at least one of the threads running in the four SIMDs need texturing work the TMU's get utilized.

I think it's reasonable to view R600 as having a single 16-wide TU which is shared across all four ALU SIMDs (that are 16-wide). We know L2 is centralised in R600 so it makes sense that the TUs are organised as a single SIMD processor. Each texturing clause then runs on the TU over 4 clocks, delivering 64 texturing results back to the originating batch.

So assume that RV770, with its 24 ALU quads, has a 32-wide TU, with quads A-H.

This is where I've revised my thinking, working in terms of batch size, not in terms of ALU SIMD width.

In the 12-SIMD RV770 each batch is 32-wide (2 quads * 4 clocks), or has 8 quads:

TU A - batch 1
TU B - batch 2
TU C - batch 3
...
TU H - batch 8

So each of the 12 SIMDs takes it in turn to "control" the TU, for what is effectively 1 TU clock per instruction in the TU clause.

In the 4-SIMD RV770, each batch is 96-wide (6 quads * 4 clocks), or 24 quads:

TU A - batch 1, 9, 17
TU B - batch 2, 10, 18
...
TU H - batch 8, 16, 24

And so each of the 4 SIMDs takes it in turn to control the TU, with each batch's texture clause running for 3 TU clocks per instruction.

Note that the mapping from TU to ALUs is not 1:1. The mapping is from a physical quad in the TU to logical quads in the batch. In the latter configuration, batch quads 1, 9 and 17 belong to SIMD quads 1, 3, 5, while batch quads 8, 16 and 24 belong to SIMD quads 2, 4, 6.

This latter organisation isn't what I proposed earlier

I've revised because I think the key is that there's a single TU, and I've found a way of thinking about a batch that enables "filling" a single TU processor.

I'm averse to the 12-SIMD version simply because of the large amount of control overhead... Also, I wonder if it's compatible with the concept of a single TU. Note that in this configuration each clause only runs for 1 clock in the TU pipeline. Is it reasonable to presume the TU can execute a different instruction on each successive clock or does it need to do so for several clocks?

This is similar to the way the ALU pipeline runs an instruction for 4 clocks. In R600 the TU runs an instruction for 4 clocks (still guessing). In the 4-SIMD RV770 each instruction would run for 3 clocks.

Hmm...

---

My earlier suggestion for a 4 SIMD RV770 would feature four 8-wide TUs. Each TU would be under the control of a single ALU SIMD. Each TU clause would run for 12 clocks per instruction (24 quads in the batch divided by 2 quads in the TU)... Seems pretty unlikely.

---

So, after all that, the 1-clock per TU clause instruction makes me think it's unlikely that RV770 is a 12-SIMD design. But that presumes all this stuff about there being a single SIMD for the TUs is correct. I'm left reckoning the 4-SIMD design is most likely (though 3-clocks per TU clause instruction makes me a bit wary, would be nicer if it was 4).

Jawed

Berek · Apr 1, 2008

Everyone here appears to be stating the GT200 will be available in July and the RV770 available in June, but this news listings from Tech Fuzz suggests later than that, well into Q3 or even Q4 before we see them:

http://www.techfuzz.com/roadmaps/2008.aspx

I'm also confused on the difference between the RV770 and the R700... as this thread is about the R7xx, not the RV770? It appears the R7xx will be well into Q4 if not Q1 of '09.

Rangers · Apr 1, 2008

Berek-Halfhand said:
Everyone here appears to be stating the GT200 will be available in July and the RV770 available in June, but this news listings from Tech Fuzz suggests later than that, well into Q3 or even Q4 before we see them:

http://www.techfuzz.com/roadmaps/2008.aspx

I'm also confused on the difference between the RV770 and the R700... as this thread is about the R7xx, not the RV770? It appears the R7xx will be well into Q4 if not Q1 of '09.

Well that site is all wrong. It also says this

April

* nVidia GeForce GeForce 9800 GTX (Code name: D9E = Desktop 9 Enthusiast, aka G100 or GT200) is expected to be launched the first week of April. This will be nVidia's 9th-Gen enthusiast GPU and will phase-out the D8E series. The 9800 GTX will be manufactured using TSMC's 65 nm process, contain over 1 billion transistors, and support DirectX 10.1 and Shader Model 4.1. The 9800 GTX will contain 128 processor cores running at 1688 MHz, a core clock at 675 MHz, and 512MB DDR3 running at 1100 MHz over a 512-bit memory interface. Video card makers will likely launch additional 1GB and higher clocked versions of the card at a later date. The 9800 GTX will compete against AMD R700-based video cards. It will have two SLI bridges and support Tri-SLI. The card requires two 6-pin PCIe power connectors and has a dual-slot cooler. TDP is expected to be around 250W.

9800GTX is just a G92 based card. Nothing next gen like they're claiming.

According to what I read on Fudzilla R700 is the name for the whole line of chips. RV7xx will be actual specific chips within that line. RV770 is the alledged flagship. So there wont be an uber powerful true R700 model down the line if they are correct, RV770 is it.

nicolasb · Apr 1, 2008

Berek-Halfhand said:
I'm also confused on the difference between the RV770 and the R700... as this thread is about the R7xx, not the RV770? It appears the R7xx will be well into Q4 if not Q1 of '09.

Well, think about the difference between RV670 and R680.

Berek · Apr 2, 2008

Thank you both, that clarifies it for me. All the numbers and brandings are starting to make my head spin... I'm sure you understand

.

Silent_Buddha · Apr 2, 2008

More and more it appears that in a move to get back to profitability, AMD/ATI has for the most part abandoned the ultra high end enthusiast chips. And are instead focusing on the meat of the market in budget, mainstream, and performance mainstream markets.

R600 being the last attempt from ATI for the ultra high performance crowd. With the R700 series appearing to follow in the footsteps of Rv670 with focus on the mainstream.

Their nod/efforts in terms of the ultra high end appear to be focused on Crossfire and it's direct offspring the X2 cards.

Oddly enough that renewed focus seems to have put ATI ahead of Nvidia when it comes to multi-GPU rendering.

Scaling is roughly equivalent for the two with ATI appearing to have a slight edge in 3-way scaling.

Flexibility has no competition with ATI being able to mix and match any 385x and 387x derivative card with any other for 2, 3, or 4 way CF.

As well functionality is currently higher with multimonitor usable with multi-GPU.

Although Nvidia does still have the advantage in user defineable SLI profiles. But again that's an enthusiast class feature, and not in the markets that ATI appears to currently be focused on.

Just looking at things superficially it just appears that while ATI and Nvidia are competing in the graphics card business. They are each targetting different (yet overlapping) audiences.

Amd focused on price/perf, functionality and flexibility. While Nvidia seems focused on performance and price/perf (only due to pressure from ATI).

I'm wondering when we'll see indications that Nvidia is moving to be more competitive with SLI? After all, it appears that they are still locking certain cards to certain SLI configurations. As well it doesn't appear you can mix and match say a GX2 with a GTX or GT. And still not even a rumor of multi-monitor on SLI?

Regards,
SB

Moloch · Apr 2, 2008

Actually they never made an ultra high end card with the R600- it targeted the GTS, not the GTX. The R5XX was their last attempt at an ultra high end card

OpenGL guy · Apr 2, 2008

radeonic2 said:
Actually they never made an ultra high end card with the R600- it targeted the GTS, not the GTX. The R520 was their last attempt at an ultra high end card

R580?

Moloch · Apr 2, 2008

OpenGL guy said:
R580?

I know

both are the same gen so ya

Kaotik · Apr 2, 2008

I'm quite positive ATI/AMD plannet R600 to be "ultra high end" first, they just couldn't deliver that in the end, but it doesn't mean they didn't try

trinibwoy · Apr 2, 2008

Silent_Buddha said:
I'm wondering when we'll see indications that Nvidia is moving to be more competitive with SLI? After all, it appears that they are still locking certain cards to certain SLI configurations. As well it doesn't appear you can mix and match say a GX2 with a GTX or GT. And still not even a rumor of multi-monitor on SLI?

Define competitive. I don't think Nvidia considers themselves behind considering they essentially have a monopoly on the multi-GPU market. The advantages of crossfire X right now are mostly academic in terms of the mixing and matching of various SKU's. Those aren't practical combinations that people are actually likely to use.

The big advantage IMO is with respect to multi-display support but again you're talking about a very small (yet arguably vocal) number of consumers who have both a multi-GPU and multi-display setup. Another issue that Nvidia should be very aware of is the increasing attractiveness of AMD and Intel based motherboards....that could be the most dangerous threat to their multi-GPU throne.

In the end though, no amount of flexibility is going to trump perf/$. What three or four GPU setup can AMD offer right now that cannot be beaten by an Nvidia solution for the same money?

AlexV · Apr 2, 2008

trinibwoy said:
Define competitive. I don't think Nvidia considers themselves behind considering they essentially have a monopoly on the multi-GPU market. The advantages of crossfire X right now are mostly academic in terms of the mixing and matching of various SKU's. Those aren't practical combinations that people are actually likely to use.

The big advantage IMO is with respect to multi-display support but again you're talking about a very small (yet arguably vocal) number of consumers who have both a multi-GPU and multi-display setup. Another issue that Nvidia should be very aware of is the increasing attractiveness of AMD and Intel based motherboards....that could be the most dangerous threat to their multi-GPU throne.

In the end though, no amount of flexibility is going to trump perf/$. What three or four GPU setup can AMD offer right now that cannot be beaten by an Nvidia solution for the same money?

For the same money muddles the waters. With the way prices are going, you're probably going to be able to get 2 3870X2s for the price(or near that) of 2 9800GTXs, which is a comparison that nV might or might not win. Once you go higher up the foodchain though, they have no serious competition for the moment(IMHO).

Silent_Buddha · Apr 3, 2008

trinibwoy said:
Define competitive. I don't think Nvidia considers themselves behind considering they essentially have a monopoly on the multi-GPU market. The advantages of crossfire X right now are mostly academic in terms of the mixing and matching of various SKU's. Those aren't practical combinations that people are actually likely to use.

The big advantage IMO is with respect to multi-display support but again you're talking about a very small (yet arguably vocal) number of consumers who have both a multi-GPU and multi-display setup. Another issue that Nvidia should be very aware of is the increasing attractiveness of AMD and Intel based motherboards....that could be the most dangerous threat to their multi-GPU throne.

In the end though, no amount of flexibility is going to trump perf/$. What three or four GPU setup can AMD offer right now that cannot be beaten by an Nvidia solution for the same money?

Meaning that say someone had a 3870 already. And had been considering Crossfire. Now they have the option to add a 3870x2 for triple CF.

Or if someone has a 3870x2. They can just get a 3870 or 3850 for Triple CF since Quad CF has virtually no gains.

Say someone with 9800 GX2 wants to get more performance. They have no choice but to go Quad SLI with another GX2 even though Quad SLI (just like CF) has virtually no performance benefits when compared to Tri SLI. No option at all currently.

Likewise if you have 8800 GTs, Tri SLI is no option for you. You have to buy new cards. With 3850's you can still do Tri-CF without having to reinvest in 3 new cards.

Sure Nvidia's SLI is better for Nvidia, but ATI's CF is certainly much more beneficial for consumer's.

Likewise, multimonitor may not be a big deal if all you care about is performance. But I'd gladly sacrifice performance to avoid the headache of constantly enabling and disabling SLI everytime I wanted to go back to normal desktop use after gaming. Especially and added headache of having to then setup your desktop all over again.

And as stated before multi-card scaling is generally EQUAL (so performance due to the technology of multiGPU is basically equal) between the two.

IE - SLI tech has no performance advantages over CF tech currently.

So if you ONLY look at the SLI and CF implementations minus the cards. SLI is WAY behind CF.

It's basically equal in multi-GPU scaling. Yet behind in flexibility. Behind in added value usefulness (multi-mon).

The one place it STILL shines is in user defineable SLI profiles for games that aren't yet driver optimized for SLI. Yet that is an enthusiast feature that isn't geared to the mainstream. An area that ATI is currently targetting full force it appears.

I really can't see how anyone could seriously think SLI's implementation isn't falling behind at this moment. Sure the performance of the cards used is greater than the competition, but the implementation is still far behind.

Currently ATI is the only one that appears to be innovating and refining multi-GPU. I'm sure Nvidia MUST be doing something, but they haven't shown anything yet, other than Hybrid SLI which won't work without very specific hardware. So it's about equal to ATI's CrossfireX power saving for low end MBs in that it's extremely limited and not available for use for the vast majority of gamers.

Not arguing that Nvidia doesn't have the fastest solutions due to having the fastest cards. However, compared side by side with CF. It's certainly lacking in a lot of area's.

Which I think is just a side benefit of ATI focusing on the mainstream rather than focusing on the smaller enthusiast market.

Regards,
SB

AMD: R7xx Speculation

INKster

Jawed

LordEC911

Jawed

Sound_Card

Sound_Card

trinibwoy

Meh

Jawed

Berek

Rangers

nicolasb

Berek

Silent_Buddha

Moloch

God of Wicked Games

OpenGL guy

Moloch

God of Wicked Games

Kaotik

Drunk Member

trinibwoy

Meh

AlexV

Heteroscedasticitate

Silent_Buddha

Similar threads