AMD: R8xx Speculation

Pressure · Sep 19, 2009

digitalwanderer said:
Yeah, what's the point of blowing $100 on a dongle when you can spend an extry $50 and just get a new monitor?

BTW - These are most of the DP equipped monitors available right now, I was just looking at 'em so I thought I'd post 'em:

Dell 3008
http://accessories.us.dell.com/sna/...etail.aspx?c=us&l=en&s=dhs&cs=19&sku=223-4890

Dell U2410
http://accessories.us.dell.com/sna/...etail.aspx?c=us&l=en&s=dhs&cs=19&sku=320-8277

Dell 2408
http://accessories.us.dell.com/sna/...etail.aspx?c=us&l=en&s=dhs&cs=19&sku=320-6272

Dell P2010H
http://accessories.us.dell.com/sna/...etail.aspx?c=us&l=en&s=dhs&cs=19&sku=320-8321

Dell P2210H
http://accessories.us.dell.com/sna/...etail.aspx?c=us&l=en&s=dhs&cs=19&sku=320-8323

Dell P2310H
http://accessories.us.dell.com/sna/...etail.aspx?c=us&l=en&s=dhs&cs=19&sku=320-8325

NEC MultiSync EA231WMi-BK
http://accessories.us.dell.com/sna/...etail.aspx?c=us&l=en&s=dhs&cs=19&sku=A2989270

HP LP2480zx
http://h10010.www1.hp.com/wwpc/us/e...r LP2480zx Professional Display&lang=en&cc=us

HP LP2475w
http://h10010.www1.hp.com/wwpc/us/e... 24-inch Widescreen LCD Monitor&lang=en&cc=us

HP LP2275w
http://h10010.www1.hp.com/wwpc/us/e... 22-inch Widescreen LCD Monitor&lang=en&cc=us

Lenovo 2440x
http://shop.lenovo.com/SEUILibrary/...-category-id=A6FA9FD6D1CB4B52973EAE93B7B4AEF6

You forgot this one:

Apple LED Cinema Display which comes with a Mini-DisplayPort connector.

The perfect companion for the Radeon HD 5870 SIX

Jawed · Sep 19, 2009

OpenGL guy said:
Jawed said:

So Early-Z is querying RBE-Z before the shader starts processing the pixel? Yuck, long long long latency.

Click to expand...

No, it doesn't work that way. Early-Z means do the full RBE Z before the pixel shader.

Isn't that exactly what I said? :???:

This way you reduce shader workload. Late-Z means do the full RBE Z after the shader. These are independent of HiZ and have been around since at least R300.

Hierarchical-Z is a coarse-grained (e.g. quad-level) low-precision (e.g. 12-bit) conservative rejection test and any kind of RBE Z is an acceptance test.

So a short shader (typical for DX8, I guess) is recommended to set Late-Z, which means a conservative hierarchical-Z rejection is possibly made. Acceptance is then performed after shading.

Jawed

ShaidarHaran · Sep 19, 2009

I noticed something the critics of dual-slot cooling solutions which the Radeon 5000 family members pictured thus far have failed to notice.

You can't fit 4 display outputs on a single slot backplate and exhaust the heat out of the case.

But why let reason get in the way of a good bashing session?

digitalwanderer · Sep 19, 2009

I honestly gave up a few GPUs ago worrying over single/dual slot, I figure any of the high-end cards from any maker will be dual anymore.

Jawed · Sep 19, 2009

3dilettante said:
Tom Forsyth had a presentation for SIGGRAPH2008.

Found it, I keep forgetting that

Wouldn't allocating triangles to a bin require the rasterization portion of the workload as well?
Forsyth's slides apparently included this in the front-end estimate.

It's a tile-level rasterisation only. So very cheap in terms of rasterisation, but requires that all vertex shading that affects the position attribute is computed.

Slide 26 says front-end is ~10% of the entire compute effort.

The actual cost I see is the creation of a bin and then having any core pick up a bin for processing. Both would be more expensive to do.

Forsyth's slides also indicated that a bin contains tris, shaded verts, and rasterized fragments.
I'm not sure if the fragments would be a concern for the distribution phase that might be passing over the interconnect.

The amount of data in a bin varies, you've described a heavy-weight bin. A flimsy bin with nothing more than triangle IDs would be cheap in a multi-chip solution. This would make the back-end more compute-heavy, which would hide some of the latency associated with NUMA. This trade-off between light/heavy is seen in current games where developers elect either to compute all attributes during vertex shading or leave some of them for computation during pixel shading (these are attributes derived from other attributes, normally) - you can view this as a form of compression of the per-vertex data.

Since the memory subsystem should maintain a coherent image of memory across the chips, there is no algorithmic reason why it would be single-chip.
The costs of this work have been evaluated as being sufficiently low only for a single-chip scenario, however.

Consumption of vertex data is basically a streaming problem, i.e. quite latency tolerant, if you have some decent buffers. Vertex data, due to the connectivity of triangles, strips, etc., never neatly fits precisely into cache lines, so the best approach is just to read big-ish chunks rather than individual vertices/triangles. So two chips (conventional or Larrabee) consuming from a common stream are going to be slightly more wasteful in this regard - this is similar to the wastages that occur with different vertex orders in PTVC.

But Larrabee can run multiple render states in parallel. So most trivially you can have the two chips working independently. Whether two successive render states are working with the same vertex inputs (e.g. shadow buffer passes, one per light?) or whether they're independent vertex inputs, the wastage is down purely to NUMA effects.

The flexibility of the software pipeline is the reason for Forsyth's estimate for front-end work being so wide.
It's 10% if deferring attribute, vertex, and tesselation work to the back end. It's variable because those three can be done in either front or back end.
Bin size would be the most amenable for sending to another chip if this work is deferred, but back-end burden and bin spread would be worse.

If done in the front-end, bin size becomes much larger and more costly to send to a remote pool of cores, though the bins themselves would be much better-behaved.

Bin spread should fall if flimsy bins are used, since the tiles can be larger (which reduces bin spread).

Back-end burden would be perfectly spread across both chips. Sure, two chips won't achieve 100% scaling - we aren't expecting that. Even Intel's estimates/simulations for scaling with core count on a single chip aren't linear...

If the front-end is duplicated, we about double the computation required for PrimSet dispersal and front-end work, but with minimal increase in synchronization or bandwidth burden on the interface. The developer would be much more free to decide on where to put work between the front and back ends.

I don't understand how you get double.

The PrimSet distribution by one core is actually well-suited to the likely ring-bus configuration Larrabee will use.

I don't understand what you mean by PrimSet distribution. Each PrimSet can run independently on any core. The data each produces is a stream of bins. They consume vertex streams and, if they already exist, render target tiles.

Some scheduler, somewhere, must then assign bin sets to cores. This is not a heavy task. Back-end has to consume the bin and create/update the render target tile. The scheduler isn't delivering bin data to back-end tasked cores.

It's also the case that if a bin is set up and ready for back-end processing, a scheme that is not aware of multi-chip NUMA is going to have much more traffic over the interconnect--something that will not happen if the setup scheme has duplicate front-ends that specifically minimize inter-chip rendering traffic.

If these are flimsy bins then I don't see the issue. If these are bins with in-progress render target tiles, then that's a bit more costly. Clearly heavy-weight bins are going to be the mostly costly. There's zero reason to build a multi-chip non-NUMA-aware software pipeline - Intel clearly intends not to build a one-size-fits-all software pipeline. Though I'll happily agree that multi-chip is low priority until single-chip is working really well, apart from anything else because it's harder.

Tesselation is about creating more triangles. At some level, amplifying the number of triangles and then turning them into a bandwidth+latency cost is a liability any scheme that apportions work heedless of chip location will take on.

It would be functional so long as Intel keeps inter-chip coherence, but Larrabee's bandwidth savings would be mitigated if the chip link is saturated, even if in absolute GB/s consumption is lower.

I guess in theory Intel could massively overspecify the inter-chip connections, but that sounds expensive.

Overall, though, I would expect that a memory-bandwidth:link-bandwidth ratio of X would serve Larrabee better than traditional GPUs. You have a huge amount of programmer freedom with Larrabee to account for the vicissitudes of NUMA.

Jawed

MfA · Sep 19, 2009

Jawed said:
So Early-Z is querying RBE-Z before the shader starts processing the pixel? Yuck, long long long latency.

It's a long long latency whichever side of the pixel shader you do it on though, so if the pixel shader is light (both in time and context) then doing it in front might make sense. Either way you are going to have to store data until the Z check is resolved.

The two kinds of shader shouldn't be able to overlap in their execution because a state change, implying pipeline flush, is required to switch between these two modes.

Since the pipeline is virtual it's not a huge deal though ... the shaders can simply start doing something else (or rendering to a different part of the screen which has nothing in the pipeline).

Jawed · Sep 19, 2009

elsence said:
Yep, it is natural at this early stage of GDDR5.
This happened in the past also, for example 4870 had 4Gbps Qimonda ICs and ATI clocked them at 900MHz (instead of 1GHz)
I was talking about what ICs is logical for ATI to buy from companies like Samsung and Hynix at this stage.

Which only intensifies the need for architectural improvements. For what it's worth, doubling the RBEs per unit bus width is doing that.

I search the web, i found some reviews:

http://www.bit-tech.net/hardware/graphics/2008/11/28/gigabyte-gv-r485mc-1gh-radeon-hd-4850-1gb/1

http://www.hardwarecanucks.com/foru...gabyte-radeon-hd-4850-1gb-passive-review.html

http://www.bjorn3d.com/read.php?cID=1527&pageID=6566

It seems to me that +33% (4/3) is a fair average at 1920X1200 4X AA 16X AF.

Thanks, seems reasonable and seems to indicate that drivers haven't changed the balance between the two in a significant way.

Jawed

Jawed · Sep 19, 2009

MfA said:
It's a long long latency whichever side of the pixel shader you do it on though, so if the pixel shader is light (both in time and context) then doing it in front might make sense. Either way you are going to have to store data until the Z check is resolved.

Except for ATI cards the documentation is quite explicit that early-Z is not recommended for short shaders. I take this to imply that the buffers post-rasterisation/pre-interpolation are too small compared with the buffers post-pixel-shading. OpenGL guy hasn't given a reason, so far.

Since the pipeline is virtual it's not a huge deal though ... the shaders can simply start doing something else (or rendering to a different part of the screen which has nothing in the pipeline).

The pipeline is only virtual in this sense on Larrabee (anything else?). It's a real, single-pixel-shader-at-a-time pipeline on current ATI GPUs, excepting the virtualisation required to share the unified shaders amongst VS/GS/PS (and others).

Jawed

OpenGL guy · Sep 20, 2009

Jawed said:
Except for ATI cards the documentation is quite explicit that early-Z is not recommended for short shaders. I take this to imply that the buffers post-rasterisation/pre-interpolation are too small compared with the buffers post-pixel-shading. OpenGL guy hasn't given a reason, so far.

There are reasons to use late-Z in such cases, you can contact AMD developer relations for more info.

The pipeline is only virtual in this sense on Larrabee (anything else?). It's a real, single-pixel-shader-at-a-time pipeline on current ATI GPUs, excepting the virtualisation required to share the unified shaders amongst VS/GS/PS (and others).

You're certain of this? A close look at the reg specs may give more information.

Broken Hope · Sep 20, 2009

XFX version is being bundled with Dirt 2.

http://www.xfxforce.com/en-us/Features/RadeonHD5870.aspx#1

mczak · Sep 20, 2009

Dave Baumann said:
HDMI used to use the same transmitters as single link DVI's, but when they increased pixel counts and bit depths with HDMI 1.3 they stayed with just one transmitter but increased its clock rate to 340MHz, thus diverging from DVI specs.

Well actually dual link hdmi exists too (since hdmi 1.0) but requires different connector and I don't think a single piece of consumer electronics with such a connector exists

.
I'm wondering though does rv8xx (or actually older ones too) support those increased clock rates for hdmi 1.3? Certainly some 30" monitors don't, making hdmi input rather useless.
Also, how do these active DP -> dual link dvi converters work? Are they "true" DP devices so the graphic card output works in display port mode then they just translate to dual link dvi? Only apple really seems to sell them in any quantity. In any case, how do things like hdcp work?

Dave Baumann · Sep 20, 2009

mczak said:
Also, how do these active DP -> dual link dvi converters work? Are they "true" DP devices so the graphic card output works in display port mode then they just translate to dual link dvi? Only apple really seems to sell them in any quantity. In any case, how do things like hdcp work?

Yes, they have DP recievers and transmitters for the type of output they are going to. Likewise, they will need their own HDPC keys.

Apples may be the only prominent vendor at the moment, but we are working with others and we expect more to become available.

trinibwoy · Sep 20, 2009

Broken Hope said:
XFX version is being bundled with Dirt 2.

Now that's sexy. Guess all it needed was a paint job

LunchBox · Sep 20, 2009

Techpowerup did a primer article from the leaked info and stuff. It's practically the summary of this whole thread

pretty nice read though, instead trying to read 3400+ posts from this thread

http://www.techpowerup.com/reviews/AMD/HD_5000_Leaks/

Ninjagnu · Sep 20, 2009

LunchBox said:
Techpowerup did a primer article from the leaked info and stuff. It's practically the summary of this whole thread pretty nice read though, instead trying to read 3400+ posts from this thread

http://www.techpowerup.com/reviews/AMD/HD_5000_Leaks/

Just too bad I first saw this after 3400+ posts..

Lightman · Sep 20, 2009

Some independent benchmarks from one lucky XS member:

first verification pic

now some scores,

HD5870 stock - i7 965 stock (3.2GHz)

3DMark06 - 16xAF forced in CCC
22383 3DMarks
SM 2.0 Score 8704
SM 3.0 Score 10655
CPU Score 6282

No AF forced in CCC
22549 3DMarks

DMC 4 1920x1080 8xAA 16xAF
scene 1 162.78
scene 2 123.86
scene 3 221.67
scene 4 118.59

resident evil 5 1920x1080 max details 8xAA
97fps

fellix · Sep 20, 2009

Silent_Buddha · Sep 20, 2009

Broken Hope said:
XFX version is being bundled with Dirt 2.

I wonder if Dirt 2 will carry a redistributable Dx11 package?

Also wouldn't it be really cool if ATI did a tesselation demo similar to the old Toy Store one that had way back when? That's still one of the best graphics demo's I've ever seen.

Regards,
SB

rpg.314 · Sep 20, 2009

Awesome stuff.

http://img193.imageshack.us/my.php?image=00003h.png
http://img193.imageshack.us/my.php?image=00002i.png
http://img193.imageshack.us/my.php?image=00001z.png
http://img193.imageshack.us/my.php?image=00000w.png

I especially like these ones.

Silent_Buddha · Sep 20, 2009

rpg.314 said:
Awesome stuff.

http://img193.imageshack.us/my.php?image=00003h.png
http://img193.imageshack.us/my.php?image=00002i.png
http://img193.imageshack.us/my.php?image=00001z.png
http://img193.imageshack.us/my.php?image=00000w.png

I especially like these ones.

Whoa, now that's an impressive increase in graphics quality and reality.

Regards,
SB

AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Within 1 or 2 weeks

Within a month

Within couple months

Very late this year

Not until next year

Pressure

Jawed

ShaidarHaran

hardware monkey

digitalwanderer

Jawed

MfA

Jawed

Jawed

OpenGL guy

Broken Hope

mczak

Dave Baumann

Gamerscore Wh...

trinibwoy

Meh

LunchBox

Ninjagnu

Lightman

fellix

Silent_Buddha

rpg.314

Silent_Buddha

Similar threads