AMD: Volcanic Islands R1100/1200 (8***/9*** series) Speculation/ Rumour Thread

Now that we are seeing the architecture in Hawaii (VI family) maybe we can hear more from 3DCGI, remember his comment from way back in the XboxOne thread ;)

http://forum.beyond3d.com/showpost.php?p=1761127&postcount=4243

3DCGI said
I worked on modifications to the core graphics blocks. So there were features implemented because of console customers. Some will live on in PC products, some won't. At least one change was very invasive, some moderately so. That's all I can say.
 
Last edited by a moderator:
Can anyone identify any sign that Hawaii is based on newer IP than Bonaire?


Completely different architecture design maybe ? Bonnaire is based on an improved process of GCN1.0 .. here you have GCN2.0... just watch where is the geometry and how most of the front end have been moved inside the Shader engine..

Thaiti or Pitcain, i dont have the scheme for bonnaire but the change are minimal



maybe this scheme is wrong, but if it is right, rasterizer and geometry processors have been moved inside the SM ( shader engine, they cant stay on the same name, lol )
 
Last edited by a moderator:
Completely different architecture design maybe ? Bonnaire is based on an improved process of GCN1.0 .. here you have GCN2.0... just watch where is the geometry and how most of the front end have been moved inside the Shader engine..

Thaiti or Pitcain, i dont have the scheme for bonnaire but the change are minimal

Bonaire has been described as coming from the same IP pool as Hawaii. This is why the 260X has TrueAudio, and why existing Bonaire cards will have that enabled with a driver update.
The same goes for Crossfire over PCIe, the new version of the ACEs (Bonaire has them, too), and it supports the new flat addressing mode that was mentioned for Hawaii.

GCN has a number of ways that the number of units can be varied, but it's the makeup of the units that determines how closely related they are.
 
Bonaire has been described as coming from the same IP pool as Hawaii. This is why the 260X has TrueAudio, and why existing Bonaire cards will have that enabled with a driver update.
The same goes for Crossfire over PCIe, the new version of the ACEs (Bonaire has them, too), and it supports the new flat addressing mode that was mentioned for Hawaii.

GCN has a number of ways that the number of units can be varied, but it's the makeup of the units that determines how closely related they are.

Look the difference, i know Bonnaire have not this, what is more important ?

This architecture is completely redesigned ( ofc if this scheme is a true one from AMD, and not a fake ).. Maybe Bonnaire is sharing external part with it, but internal, they are really different.

I aggree Bonnaire is a middle between Thaiti arch and this one, its why we have call it GCN 1.1 ... But the move of all pixel, geometry and raster in the shader Engine is not in GCN1.1 ( at least we have not see it yet ).. Maybe i am wrong, ( it will not be the first time )..

The color of this scheme perturb me.. its not the colors who use AMD in general.. but well, new arch, new team.
 
Last edited by a moderator:
Look the difference, i know Bonnaire have not this, what is more important ?

This architecture is completely redesigned ( ofc if this scheme is a true one from AMD, and not a fake ).. Maybe Bonnaire is sharing external part with it, but internal, they are really different. .

doesn't look like it to me, you could draw Tahiti and Hawaii to look almost the same just as it has already been pointed out. ie forget where they put the blocks and look at what each block connects to.
 
Rearranging square blocks in a picture in a different manner does not necessarily imply a complete redesign. It may just mean AMD chose a different way to present a vastly similar architecture.
 
just watch where is the geometry and how most of the front end have been moved inside the Shader engine..
[...]
maybe this scheme is wrong, but if it is right, rasterizer and geometry processors have been moved inside the SM ( shader engine, they cant stay on the same name, lol )
In ATI/AMD GPU architecture rasteriser and ROPs go together, they are tightly linked. NVidia has these independent of each other.

Hawaii and Tahiti are no different in this respect. The ROPs are also on the shader side of the shader/memory crossbar, which is how the ROPs in Tahiti do not line-up with the memory channels.

Both diagrams also show that all geometry engines communicate with all rasterisers.

The fundamental structure of these things hasn't changed. There's just more of them.
 
At first the fillrate didn't make much sense to me, since there's only 5 bytes/pix of BW at 64 GP/s. Opaque pixels are usually shader bound while transparent ones need 8+ bytes/pix. Then I remembered that deferred rendering is pretty popular nowadays, where you need to pump out at least a few 4x8bit MRTs.

I'm also curious if the 1 MB R/W cache can enable some TBDR-esque BW saving. Screen space object sorting (even as simple as left->right) could give you quasi-TBDR performance.
 
The fundamental structure of these things hasn't changed. There's just more of them.
Do you know this from some inside info? I'm hoping AMD changed things a bit. The way they've done tessellation in the past - tessellating a whole batch of patch control points at once - wasn't very smart.
 
In ATI/AMD GPU architecture rasteriser and ROPs go together, they are tightly linked. NVidia has these independent of each other.

What are the other distinguishing features? Both architectures seem pretty similar at this point.

-AMD has that programmable scalar unit but I'm not sure how much impact it has on graphics.
-AMD has the GDS
-AMD is no longer dependent on ILP
-AMD has a shorter shader pipeline?
-AMD has a larger workgroup size
-AMD has faster FP10 filtering
-AMD's geometry processing is less distributed

Not sure whether there's a significant difference between the architectures in rasterization, culling, transcendentals, memory access efficiency, atomics or any other nuances.
 
At first the fillrate didn't make much sense to me, since there's only 5 bytes/pix of BW at 64 GP/s. Opaque pixels are usually shader bound while transparent ones need 8+ bytes/pix.
Does 1MB of L2 have any effect there? Will we ever find out?

I dare say it seems a bit ironic, but we're likely to find out more about this architecture (in graphics) because of the consoles than we've learnt so far.

Then I remembered that deferred rendering is pretty popular nowadays, where you need to pump out at least a few 4x8bit MRTs.

I'm also curious if the 1 MB R/W cache can enable some TBDR-esque BW saving. Screen space object sorting (even as simple as left->right) could give you quasi-TBDR performance.
I'm tempted to say we'd have found out already, it's been 2 years and Hawaii is only 33% bigger in this respect.

On the other hand, again, once console developers start digging...
 
Do you know this from some inside info?
No, just those slides that Lanek included show no meaningful difference. I'm keen to point out that the high level stuff really doesn't seem to have changed.

I'm hoping AMD changed things a bit.
Four rasterisers with pixels locked to those rasterisers' ROPs certainly looks like the corner that NVidia studiously painted away from. It doesn't give an impression of robustness when presented with tricky workloads.

AMD likes to say that geometry processing scales downwards more gracefully in its architecture. This could be true, but there's no data from real games.

The way they've done tessellation in the past - tessellating a whole batch of patch control points at once - wasn't very smart.
Again, a lack of data. e.g. has the L2 in Tahiti been useful here?

A friend was playing Deus Ex:HR recently and I asked him if he had tessellation turned on. It took some burrowing through menus to find out that he did. I could argue that tessellation is sort of its own worst enemy: you don't notice it when it's turned on.

8MP monitors re-cast the thorny question of the number of pixels per triangle. Games that "under-did" tessellation because 2 or 4MP monitors only demanded, say, 16 pixel triangles, are going to look a bit coarse on 8MP monitors.

I'm not even sure how adaptive tessellation is, in the few games that do it.
 
What are the other distinguishing features? Both architectures seem pretty similar at this point.

-AMD has that programmable scalar unit but I'm not sure how much impact it has on graphics.
It's years since I last delved, but I was under the impression that NVidia has a similar scalar unit.
-AMD has the GDS
-AMD is no longer dependent on ILP
Whereas NVidia prefers some...
AMD has a shorter shader pipeline?
Never understood why NVidia's is so long...
-AMD has a larger workgroup size
This doesn't seem to have hurt - but so little data. It amuses me that the Tahiti compiler will often use branches instead of a conditional move. Branches (in themselves, for a very short span of intructions) really are silly-cheap it seems.
-AMD has faster FP10 filtering
-AMD's geometry processing is less distributed

Not sure whether there's a significant difference between the architectures in rasterization, culling, transcendentals, memory access efficiency, atomics or any other nuances.
I have a strong feeling we'll still be asking: "if the theoreticals on Hawaii are so good, why is the performance so unexciting?"

30% faster? After 2 years? Sigh.

Actually, I'll temper that a bit. 8MP may well be where this chip shines. Developers this past few years mostly haven't been writing compute-heavy graphics. When they do, reviewers leave that option off because NVidia is screwed. So, games are compute-light, which means that 8MP monitors are home territory for Hawaii.
 
No, just those slides that Lanek included show no meaningful difference.
Well, I'm hoping that putting the geometry units inside the shader engines is symbolic of AMD reworking the data flow instead of algorithmically cramming tessellation into the geometry shader streamout model.
Again, a lack of data. e.g. has the L2 in Tahiti been useful here?
That's just it, though: AMD's method needlessly uses a lot of on-chip space when it should barely use any at all.

You don't need to start with 32 or 64 patch verts and generate all the tessellated triangles from that (which could require a lot of space or even streamout). The wavefronts should be post-tessellated vertices (you know how many there are from the tess factors) which read patch parameters (possibly as few as three) to calculate their barycentric coords. There absolutely should not be any performance degradation with higher scaling factors, even beyond the D3D max.

I'm not even sure how adaptive tessellation is, in the few games that do it.
I think it'll start to be used more intelligently with the upcoming consoles having proper support. Devs really need to make it part of their art workflow from the very beginning.
 
Last edited by a moderator:
Stop fight guys, but just watch what is really different in this arch... and what impact it can have .. you have all under your eyes....
 
Last edited by a moderator:
Well, I'm hoping that putting the geometry units inside the shader engines is symbolic of AMD reworking the data flow instead of algorithmically cramming tessellation into the geometry shader streamout model.
I don't think they've "put them inside" because they were already there, allied to rasterisation and ROPs. The Hawaii diagram indicates that all geometry engines can feed all rasterisers, just like Tahiti, so if the former is "inside" then so is the latter (or at least that connectivity indicates nothing on that topic).

That's just it, though: AMD's method needlessly uses a lot of on-chip space when it should barely use any at all.

You don't need to start with 32 or 64 patch verts and generate all the tessellated triangles from that (which could require a lot of space or even streamout). The wavefronts should be post-tessellated vertices (you know how many there are from the tess factors) which read patch parameters (possibly as few as three) to calculate their barycentric coords. There absolutely should not be any performance degradation with higher scaling factors, even beyond the D3D max.
You're apparently suggesting a non-FF tessellator, I think.

As a matter of interest, do we know what the precision requirements for the D3D tessellator are? Would shader ALU floating point be precise enough?
 
Microsoft described the first phase of the tessellation stage as using 32-bit floating point. The second phase was described as using 16-bit fractions using fixed-point math.


One thing I didn't notice when I glanced at some of the comparison slides is a little oopsie in the memory bandwidth comparison between the 7970 GHz edition and the 290X.
The bandwidth given for the former belongs to the 7970 non-GHz edition.

The 290X offers an 11% bandwidth improvement over its predecessor, although the choice of stepping back in terms of clock speed and going for more density apparently leaves the wider bus physically smaller. I'm a little curious whether this saved some power.
 
Back
Top