RXXX Series Roadmap from AnandTech

Arty · Aug 20, 2005

geo said:
So what's everyones performance expectations for R520XL? I'm guessing that in general (non-HDR) it will fall between stock GT and stock GTX at a GT-like price. That they are going for doing to NV with it what NV did to them with 6800GT last time.

That would be very nice if they do it, I loved the 6800GT and then the X800XL.

overclocked · Aug 20, 2005

In my opinion why are people expecting a spring launch for R600 or G80(NV50?)???

Its to early IMO and i think this chips are going LATE 06 or early 07.

Skrying · Aug 20, 2005

overclocked said:
In my opinion why are people expecting a spring launch for R600 or G80(NV50?)???

Its to early IMO and i think this chips are going LATE 06 or early 07.

I'd expect them before the lanch of Vista.

Blastman · Aug 20, 2005

Do any of the Hardware sites have samples of the R520 for review yet?

KimB · Aug 20, 2005

Skrying said:
I'd expect them before the lanch of Vista.

IHV's have in the past shown interest in synchronizing their own launches with Microsoft's.

Dave Baumann · Aug 20, 2005

Jawed said:
The other thing I've realised is that 48 pipes for R580, arranged in three arrays of 16 pipes each, wouldn't be such a huge device.

I would put say that the pipelines will still be arranged in 4 MIMD, regionalised "quads".

tEd · Aug 20, 2005

Dave Baumann said:
I would put say that the pipelines will still be arranged in 4 MIMD, regionalised "quads".

MIMD would mean a quad could work with different shaders per pixel?

Ailuros · Aug 20, 2005

tEd said:
MIMD would mean a quad could work with different shaders per pixel?

Self explanatory: multiple instruction multiple data.

I'm not entirely sure, but weren't the current quads of Radeons also rated as MIMD?

Dave Baumann · Aug 20, 2005

Obviously a single quad on a current Radeon is SIMD, but across the quads they are MIMD (otherwise the tile rendering wouldn't work).

Ailuros · Aug 20, 2005

Dave Baumann said:
Obviously a single quad on a current Radeon is SIMD, but across the quads they are MIMD (otherwise the tile rendering wouldn't work).

Hmmm.....I'm still describing Xenos as a 3* 16-way SIMD.

nAo · Aug 20, 2005

If their tile size on R520 is the same as on R3xx/R4xx and if a batch can't be bigger than a tile per quad we could expect better dynamic branching performance than NV40.

Jawed · Aug 20, 2005

nAo, I'm not convinced a 256-fragment batch is small enough to make dynamic flow control much more usable than it is in NVidia hardware. Admittedly, the batch size for NV40 seems to be vast, 4096. The batch size for G70 seems like it's 1024 - but the results for the two cards in the PS3.0 branching test:

http://graphics.stanford.edu/projects/gpubench/results/

don't seem to be that different from each other. Those tests aren't documented very well - I think there was some detailed discussion of them on B3D but...

Jawed

Jawed · Aug 20, 2005

Ailuros said:
Hmmm.....I'm still describing Xenos as a 3* 16-way SIMD.

I think that's correct.

At the next higher level, you can describe the shader arrays as a group that's configured as 3-way MIMD.

Jawed

Jawed · Aug 20, 2005

Dave Baumann said:
I would put say that the pipelines will still be arranged in 4 MIMD, regionalised "quads".

So, you're saying that there would 4 arrays, each of 12 pipes, rather than 3 arrays, each of 16 pipes?

Further, that each array of 12 pipes owns a "tile" in the render target, in a similar fashion to the tiles of R3xx...R4xx...?

12 seems like a bloody awkward number of pipes to make the "base unit" for an architecture. The kind of number you only use when cost/size constraints (e.g. mid-range) come into play. Hmm...

Jawed

Ailuros · Aug 21, 2005

nAo said:
If their tile size on R520 is the same as on R3xx/R4xx and if a batch can't be bigger than a tile per quad we could expect better dynamic branching performance than NV40.

Flip-back on Eric's own comments regarding that topic:

The R3xx and the R4xx have a rather interesting way of tiling things. Our setup unit sorts primitives into tiles, based on their area coverage. Some primitives fall into 1 tile, some into a few, some cover lots. Each of our backend pixel pipes is given tiles of work. The tiles themselves are programmable in size (well, powers of 2), but, surprisingly, we haven't found that changing their size changes performance that much (within reason). Most likely due to the fact that with high res displays, most primitives are large. There is a sweet spot in the performance at 16, and that hasn't changed in some time. Even the current X800 use 16, though I think we need to revisit that at some point in the future. Possibly on a per application basis, different tile sizes would benefit things. On our long list of things to investigate.

Anyway, each pipe has huge load balancing fifos on their inputs, that match up to the tiles that they own. Each pipe is a full MIMD and can operate on different polygons, and, in fact, can be hundreds of polygons off from others. The downside of that is memory coherence of the different pipes. Increasing tile size would improve this, but also requires larger load balancing. Our current setup seems reasonably optimal, but reviewing that, performance wise, is on the list of things to do at some point. We've artificially lowered the size of our load balancing fifos, and never notice a performance difference, so we feel, for current apps, at least, that we are well over-designed.

In general, we have no issues keeping all our units busy, given the current polygons. I could imagine that if you did single pixel triangles in one tile over and over, that performance could drop due to tiling, but memory efficiency would shoot up, so it's unclear that performance overall would be hurt. The distribution of load accross all these tiles is pretty much ideal, for all the cases we've tested. Super tiling is built on top of this, to distribute work accross multiple chips.

As well, just like other vendors, we have advance sequences that distribute alu work load to our units, allocate registers and sequence all the operations that need to be done, in a dynamic way. That's really a basic requirement of doing shading processing. This is rarely the issue for performance.

Performance issues are still very texture fetch bound (cache efficiency, memory efficiency, filter types) in modern apps, as well as partially ALU/register allocation bound. There's huge performance differences possible depending on how your deal with texturing and texture fetches. Even Shadermark, if I recall correctly, ends up being texture bound in many of its cases, and it's very hard to make any assumptions on ALU performance from it. I know we've spent many a time in our compiler, generating various different forms of a shader, and discovering that ALU & register counts don't matter as much as texture organization. There are no clear generalizable solutions. Work goes on.

http://www.beyond3d.com/forum/showthread.php?p=342652&highlight=tiles#post342652

My highlight and question mark....

Jawed · Aug 21, 2005

I'd just like to jump in here and point out a bit at the end:

Performance issues are still very texture fetch bound (cache efficiency, memory efficiency, filter types) in modern apps, as well as partially ALU/register allocation bound. There's huge performance differences possible depending on how your deal with texturing and texture fetches.

Jawed

Geo · Aug 21, 2005

Yes, well, on the other hand he also says as late as Oct 04, shortly before R520 tapeout:

There are no clear generalizable solutions. Work goes on.

Which is not to say I don't hope that they made a significant advancement in that area --just being honest broker guy.

dizietsma · Aug 21, 2005

Has the picture of the chip from hkpce been posted here yet ? I did a quick search but could not find it so here ( via xtremesystems )

http://www.xtremesystems.com/module...e=article&sid=479&mode=thread&order=0&thold=0

sorry if it has been posted.

Geo · Aug 21, 2005

Yep. Post 636 over here: http://www.beyond3d.com/forum/showthread.php?t=21970&page=26&pp=30

But thanks anyway. . .

dizietsma · Aug 21, 2005

geo said:
Yep. Post 636 over here: http://www.beyond3d.com/forum/showthread.php?t=21970&page=26&pp=30

But thanks anyway. . .

Cheers ! Why was that in the pipes thread, pictures of processors have nothing to do with number of pipes ( unless the pipes are so big you can see them sticking out of the chip of course

) ?

I guess you can expect the first benches from that same web site, probably in the next week.

RXXX Series Roadmap from AnandTech

Arty

KEPLER

overclocked

Skrying

S K R Y I N G

Blastman

KimB

Dave Baumann

Gamerscore Wh...

tEd

Casual Member

Ailuros

Epsilon plus three

Dave Baumann

Gamerscore Wh...

Ailuros

Epsilon plus three

nAo

Nutella Nutellae

Jawed

Jawed

Jawed

Ailuros

Epsilon plus three

Jawed

Geo

Mostly Harmless

dizietsma

Geo

Mostly Harmless

dizietsma

Similar threads