I wanted to edit that to RV635 exclusively but B3D has no edit option. Dang.
OT but I at least see an edit function for all my past posts.
I wanted to edit that to RV635 exclusively but B3D has no edit option. Dang.
Just out of curiosity which mechanism implies that more transistors per unit of area leads to higher defects?
Do smaller transistors lower the threshold where physical defects in the silicon itself impact transistor function?
At least for mature processes at the MPU manufacturers, defect rates approach the baseline of physical defects in the wafer, irrespective of what's patterned on them.
I was thinking the same a while back and I have to admit despite my subsequent thoughts ("4xRV635" with 32 TUs) I'm biased towards 24 TUs simply because it's not such a huge jump. But the "asymmetry" of 24 TUs does bother me a bit...I have a feeling the arc. is even longer instead of wider. Perhaps the addition of two more quads in each SIMD array with two more texture blocks.
RV670 --- 16x4 into 4 texture blocks
RV770 --- 24x4 into 6 texture blocks
Because I think the SIMD arrays will stay at 4, I don't think we will see anything higher than 4 render back ends. I guess I will wait and see though.
I was thinking the same a while back and I have to admit despite my subsequent thoughts ("4xRV635" with 32 TUs) I'm biased towards 24 TUs simply because it's not such a huge jump. But the "asymmetry" of 24 TUs does bother me a bit...
Jawed
SS SS SS SS --TT
SS SS SS SS --TT
SS SS SS SS --TT
R R R R
I don't know how the ring stops would be organised. But, then again, does it matter?But I'm not sure what you mean by "asymmetry" problem.
It's a question of screen-space tiling, first deployed in R300, solely for pixel shading: once a batch of pixels is rasterised (determined by their screen space tile) they're localised to a single quad RBE.I can't remember if the ROPs were each statically tied to specific SIMDs or not.
I don't know how the ring stops would be organised. But, then again, does it matter?
I'm assuming that R600 has one ring stop per quad RBE. Then each SIMD also has one ring stop. Finally, each SIMD has one quad TU, implying to me that a TU is directly linked to this same ring stop (since the TU needs to send its results to all the other SIMDs, not just its local SIMD).
6 quads of TUs sorta implies 6 ring stops...
Also, in R600, each SIMD has an equal share (1/4) of the TUs. How do you share 6 quad TUs across 4 SIMDs? 2 SIMDs with 2 quad TUs and 2 SIMDs with 1 quad TU. Hmm.
This asymmetry is why I diagrammed 4xRV635, otherwise it just seems messy:
http://forum.beyond3d.com/showpost.php?p=1130755&postcount=649
Now it's worth pointing out that L2 in R600 is centralised. So all TUs (or their L1s, at least) are accessing the same L2. That implied path, which isn't via the ring bus, could imply that the TUs actually return results back to the requesting SIMD via a central route, not the ring bus. If so, that would mean that 6 ring stops wouldn't be needed. But it still leaves me puzzling over the "ownership" of TUs, normally something that's symmetric across all SIMDs.
Jawed
Historically it was about load-balancing and cache (for each of TU and RBE) coherence. The load balancing was "automatic", the ALUs were only shading pixels and the assumption was that the (statically defined) set of tiles assigned to each RBE would all amount to an "equal" workload per frame, for ALUs, TUs and RBEs.That sounds like it might be tiled that way.
Maybe I'm being dense, but why would it matter to the RBE which SIMD it was talking to?
I see it as a question of whether the advantages that were once gained with the screen-space tiling are now relevant. And how much collision management are you willing to indulge in, in order to have as much many-to-many flexibility as possible.Every form of storage in the R600 diagrams is located everywhere but the SIMDs.
The register file cache, the schedulers, the unified L2 and single-image L1 seem to lean towards keeping the ALUs isolated from the particulars.
I just want to add that it's entirely possible to build a single hierarchical-buffer update unit (i.e. for early-Z culling/updating) which accesses tiled Z/stencil buffers. This would enable the RBEs to have privately tiled Z/stencil buffers.The biggest outstanding question is that the RBEs will all try to access the hierarchical buffers simultaneously. So, are those buffers (Z and stencil) tiled or can a single instance of each support the kind of throughput the RBEs demand? With tiling you guarantee collision-free accesses. Without tiling you have to have some kind of queueing/buffering/re-ordering front end to keep the RBEs happy. I think it's reasonable to assume that the RBEs are the most tetchy when it comes to being told to wait.
No, the SIMD units do not have quad TU's attached - those must be independent (try to make the unit count match up otherwise with rv630...). So just have 24-wide simd arrays works perfectly fine from that point of view and naturally gives you 6 quad-tus. It would definitely be the most easy way to go from rv670 to rv770, assuming the 480 shader unit number is correct. The most obvious downside would be that the branching granularity would increase.I'm assuming that R600 has one ring stop per quad RBE. Then each SIMD also has one ring stop. Finally, each SIMD has one quad TU, implying to me that a TU is directly linked to this same ring stop (since the TU needs to send its results to all the other SIMDs, not just its local SIMD).
OK that would be like a "super-sized" Xenos, 24-wide units instead of 16-wide. I haven't thought of it in those terms, but it does sound feasible. It would certainly make me happier about redundancy as I really dislike the idea of SIMDs narrower than 16, it just seems relatively wasteful.No, the SIMD units do not have quad TU's attached - those must be independent (try to make the unit count match up otherwise with rv630...). So just have 24-wide simd arrays works perfectly fine from that point of view and naturally gives you 6 quad-tus. It would definitely be the most easy way to go from rv670 to rv770, assuming the 480 shader unit number is correct. The most obvious downside would be that the branching granularity would increase.
I know, right? It does look bullshitty.