itsmydamnation
Veteran
i am hoping for a big memory OC seeing that some R* gpu's have 6.5gbps memory.
i am hoping for a big memory OC seeing that some R* gpu's have 6.5gbps memory.
In ATI/AMD GPU architecture rasteriser and ROPs go together, they are tightly linked. NVidia has these independent of each other.
Hawaii and Tahiti are no different in this respect. The ROPs are also on the shader side of the shader/memory crossbar, which is how the ROPs in Tahiti do not line-up with the memory channels.
Both diagrams also show that all geometry engines communicate with all rasterisers.
The fundamental structure of these things hasn't changed. There's just more of them.
Does 1MB of L2 have any effect there? Will we ever find out?
I dare say it seems a bit ironic, but we're likely to find out more about this architecture (in graphics) because of the consoles than we've learnt so far.
I'm tempted to say we'd have found out already, it's been 2 years and Hawaii is only 33% bigger in this respect.
On the other hand, again, once console developers start digging...
Actually, I'll temper that a bit. 8MP may well be where this chip shines. Developers this past few years mostly haven't been writing compute-heavy graphics. When they do, reviewers leave that option off because NVidia is screwed. So, games are compute-light, which means that 8MP monitors are home territory for Hawaii.
So you think Nvidia doesn't tie its ROPs to specific pixels?Four rasterisers with pixels locked to those rasterisers' ROPs certainly looks like the corner that NVidia studiously painted away from. It doesn't give an impression of robustness when presented with tricky workloads.
What makes you think AMD requires more on chip space than Nvidia?Well, I'm hoping that putting the geometry units inside the shader engines is symbolic of AMD reworking the data flow instead of algorithmically cramming tessellation into the geometry shader streamout model.
That's just it, though: AMD's method needlessly uses a lot of on-chip space when it should barely use any at all.
You don't need to start with 32 or 64 patch verts and generate all the tessellated triangles from that (which could require a lot of space or even streamout). The wavefronts should be post-tessellated vertices (you know how many there are from the tess factors) which read patch parameters (possibly as few as three) to calculate their barycentric coords. There absolutely should not be any performance degradation with higher scaling factors, even beyond the D3D max.
How could it be the 11th when you posted your link on the 13th?
So that would be fine.Microsoft described the first phase of the tessellation stage as using 32-bit floating point. The second phase was described as using 16-bit fractions using fixed-point math.
No, simple round-robin, as long as the compute unit has space. I'm not sure if that's what you're asking though. The mapping from pixel to wavefront index (work item ID within a wavefront) is static, because hierarchical-Z maps its tile hierarchy to render target pixels statically.So is the pixel-to-shader mapping static for AMD? With the rasterizer feeding fragments to statically assigned shaders?
Yes, console compute is going to take a while to kick in. Consoles are "stuck at 1920x1080" so there'll prolly be a relatively rapid climb in interest in graphics-compute, but games have 1-5 year+ development cycles... On the other hand, there will be compute on the console GPUs which may be left as CPU compute on games when they are transferred to PC.We are not going to see more cache until the compute workloads start to get a bit more complex.
When you say frontend, are you referring to the increase in ACEs? That should be compute friendly (e.g. in "guaranteeing" response times for certain compute tasks), but again that's going to take a while.Absolutely. L2, compute scaled by by a middling 30%. Most of the work here seems to have gone into the frontend, geometry and ROPs. Frontend, probably because it is cheap and after the consoles they had the IP lying around. Geometry will help for the 4K and ROPs seem made for 4K.
Titan has 5 rasterisers and 48 ROPs.So you think Nvidia doesn't tie its ROPs to specific pixels?
Well that's a given. You can't avoid that.The Hawaii diagram indicates that all geometry engines can feed all rasterisers
By FF you mean fixed function? Actually, I'd be even happier with that, but I don't think AMD did FF. I think AMD made a few small tweaks to the geometry shader so that tessellation could be done with shader code.You're apparently suggesting a non-FF tessellator, I think.
See the graph above.What makes you think AMD requires more on chip space than Nvidia?
Okay, I probably didn't describe it well.I don't understand your second paragraph. Which wavefronts are you saying should be post-tessellated vertices? I assume you're referring to DS waves as that's what you're describing, but I'm not sure of the link between your last sentence and the rest of the paragraph.
Thanks. I mirrored it.Whatever picture you tried to post didn't 'work, you instead got a "no deeplinking please!" placeholder.
No, simple round-robin, as long as the compute unit has space. I'm not sure if that's what you're asking though. The mapping from pixel to wavefront index (work item ID within a wavefront) is static, because hierarchical-Z maps its tile hierarchy to render target pixels statically.
Well, tbh, with the single threaded graphics dispatch, they aren't going to do any good. May be that's where Mantle will help. True parallel submission.When you say frontend, are you referring to the increase in ACEs? That should be compute friendly (e.g. in "guaranteeing" response times for certain compute tasks), but again that's going to take a while.
Titan has 5 rasterisers and 48 ROPs.
I've realised I've been sloppy and should have been referring to fragments. Fragments in AMD are locked to ROPs. I don't see anything like that in NVidia.
I suppose it's possible NVidia has a fixed tiling of ROPs to render target pixels, but the hierarchy of rasterisation, render back end, L1, memory crossbar, L2 and memory channels doesn't seem to require that.
NVidia's implementation of hierarchical-Z could be a factor here, fixing certain things. So, maybe I'm missing something there.
I never got a full answer on this topic, but I think up to 4 triangles each of up to 16 fragments can share a wavefront, or combinations thereof (on the basis that the rasteriser has a granularity of 16 fragments, and they are derived from a single triangle, per clock).If the pixel to wavefront ID is mapped statically, then how do they fill the wavefronts fully? A triangle is quite likely to not fill a wavefront fully and multiple triangles will generate a lot of superfluous fragments along the edges due to quad shading. Those superfluous fragments will have to fill the next wavefront, which won't have fragments from the bulk of triangle to fill up.
Each pixel wave can be made up of as many as 16 triangles. The smallest granularity is a quad.I never got a full answer on this topic, but I think up to 4 triangles each of up to 16 fragments can share a wavefront, or combinations thereof (on the basis that the rasteriser has a granularity of 16 fragments, and they are derived from a single triangle, per clock).
The edges is a question I don't know how to answer.
Arguably the simplest solution is to say that my 1:1 mapping from earlier is wrong.