If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#126 | |
|
Senior Member
Join Date: Jun 2005
Posts: 1,320
|
Quote:
The latest batch of roadmaps tells of details about several new parts, for example the RV670 and R670, http://www.theinquirer.net/default.aspx?article=40068
__________________
What is the meaning of life? - Why I'm here, I know my past, because I return to the past but I'm going forward to see my future, to find the truth, meaning of the existence and purpose. |
|
|
|
|
|
|
#127 |
|
Senior Member
Join Date: Apr 2007
Posts: 1,393
|
|
|
|
|
|
|
#128 |
|
Member
Join Date: Oct 2004
Posts: 247
|
|
|
|
|
|
|
#129 |
|
Senior Member
Join Date: Apr 2006
Location: Io, lava pit number 12
Posts: 2,108
|
Probably because it's a dual-GPU card, with maybe two R650's working in Crossfire mode.
However, i wouldn't rule out a similar move by Nvidia, even though we know G92 is not another GX2-type of refresh product (due to known process changes, added FP64 support, etc). |
|
|
|
|
|
#130 |
|
Meh
Join Date: Mar 2004
Location: New York
Posts: 9,809
|
65^2 is 52% of 90^2. How'd you get a 28% decrease?
__________________
What the deuce!? |
|
|
|
|
|
#131 |
|
Naughty Boy!
Join Date: Aug 2004
Location: Stuttgart, Germany
Posts: 5,008
|
90 - 28% = 65. Obviously miscalculated
__________________
I have thought some of nature's journeymen had made men, and not made them well, they imitated humanity so abominably. |
|
|
|
|
|
#132 | ||
|
Senior Member
Join Date: Mar 2002
Posts: 3,779
|
Quote:
Quote:
G92 looks pretty nuts to me. I thought ATI might have an advantage in clocking up its shaders when AMD came aboard, but now that NVidia beat them to that with G80 and will likely go even further with G92, I don't see ATI having much success against the latter. |
||
|
|
|
|
|
#133 |
|
Harmlessly Evil
Join Date: Feb 2002
Posts: 2,027
|
Remind me to stay away from calculations at 3 am.
__________________
"Complexity is easy; simplicity is difficult." |
|
|
|
|
|
#134 | |
|
Regular
|
Quote:
Hmm... Jawed |
|
|
|
|
|
|
#135 | |
|
Member
Join Date: Apr 2004
Posts: 325
|
Quote:
Well, I'm not sure I'm in love with R600's approach, either. Should nearby pixels really be handled by disjoint TMUs? Does it make sense to *always* ship work across the chip? One could come up with a trivial predication case that would effectively underutilize R600's TMUs as well. From a high-level perspective, I guess I think of this problem in a couple of different ways. Either worktypes are determined, and processing units (TMUs, SFUs, ALUs) assigned dynamically, or a kernel forks off requests to unit processing farms, which report back results (the individual 'farms' manage prioritization of incoming requests, etc.). MintMaster is probably right, that multiple threads can almost certainly hide underutilization, but the above seems somewhat more flexible when it comes to handling DB. As long as #units/#sequencers(?) <= average(kernel_data_width), you don't have a DB problem. I'm sure there are much larger problems to deal with, though -- like shipping data all over a chip.... Something I wouldn't expect a higher-clocked chip to try to do. [And it is looking like the G92 is a MUL-enabled, 192proc, higher speed chip, if "2x theoretical" and "2.5-3x real" and "30% smaller die" are to be believed] -Dave |
|
|
|
|
|
|
#136 | |
|
Regular
|
Ah, sorry, Dave - hasty posting during an advert break syndrome
For the rest of this posting, just assume I've got one eye somewhere else Quote:
I can only think that once you've built latency tolerance, the two approaches (private TUs versus shared-distributed TUs) end up moving the same amount of data around the ring. Hmm, except that texels in compressed form (which I presume they are, while they're in L2) would consume less ring bandwidth. When a TU produces a quad of texel results (or, perhaps, 4 quads of texel results as a burst in response to one batch) that are fully filtered and are destined for registers, surely they consume more bandwidth on the ring? Then again, texel-overhead relating to anisotropic filtering is saved, since those extra texels tend to stay in their "home" L2. Gah. We don't know the rasterisation pattern in R600. Considering a batch of 64 pixels, for example, is it: 1111222233334444 1111222233334444 1111222233334444 1111222233334444 or: 1111111133333333 1111111133333333 2222222244444444 2222222244444444 etc. I remember a rasterisation patent document that implied rasterisation along the long axis of a triangle, so either width-wise or height-wise rasterisation is possible. What's the effect of that on texel locality? How big are the screen-space tiles within which rasterisation is constrained? What about that texture caching patent application I keep linking, the prefetching one? I can't think what kind of trivial predication you're referring to that would waste R600's TUs. The "home" arbiter for the texture requests (for a batch) is forced to treat the 16 quads of texel results that it's waiting for as asynchronous events. Predication would de-select texture-fetches at the quad level, I guess, so the arbiter would only send out quad-fetches to "foreign" TUs as needed. Brainfade... Jawed |
|
|
|
|
|
|
#137 | ||
|
Member
Join Date: Apr 2004
Posts: 325
|
Quote:
Quote:
I'm not sure how a local TMU uses the ring at all -- local ALUs talk to local TMUs, I wouldn't expect that to be over the ring. As it is, ALUs are always talking to remote TMUs (how remote depends on which quad). Have I misunderstood something? [that's a stupid question -Dave [->sleep] |
||
|
|
|
|
|
#138 | |||
|
Regular
|
Quote:
Which reminds me of a similar possibility with the way textures are defined and then fetched. It's possible to use a stride that will hit only one memory channel. Quote:
Quote:
Ooh, hang on, there's this from Watch Impress ![]() I wish AMD would just post the complete set of slides. Anyway, that doesn't show the ring bus at all, so I prolly should still have a go at a more detailed diagram. Jawed |
|||
|
|
|
|
|
#139 |
|
A little of this and that
Join Date: Oct 2005
Location: Cupertino
Posts: 342
|
Eric Demers gave a talk about the R6XX processors at Stanford's CS448 and AMD actually let us post the slides. http://graphics.stanford.edu/cs448-07-spring/. The talk was not a completely deep technical dive as it was in some ways designed to inspire students aiming to become architects and talk about why some things were done.
|
|
|
|
|
|
#140 |
|
Mostly Harmless
|
We have Eric's architecture deep-dive from Tunis. We also have a long list of interview questions into Eric. Hopefully these things get published together. . .
__________________
"We'll thrash them --absolutely thrash them."--Richard Huddy on Larrabee "Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future." --Pat Gelsinger, Intel ". . .its taking us longer than we would have liked to get a [Crossfire game] profiling system out there" --Terry Makedon, ATI, July 2006 "Christ, this is Beyond3D; just get rid of any f**ker talking about patterned chihuahuas! Can the dog write GLSL? No. Then it can f**k off." --Da Boss |
|
|
|
|
|
#141 |
|
Regular
|
Thanks Mike, that's great. That'll keep me busy for a while!
There's an admittedly vague die picture for those who like pretty pix. Jawed |
|
|
|
|
|
#142 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,093
|
I'm kind of confused as to why GPU makers insist on die shots where the innards are obscured by the pinout. Don't they use the same kind of packaging scheme?
CPU makers seem to have no issue with showing higher-res and more clear die shots. They sometimes even go out of their way to draw borders around functional units.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#143 |
|
Regular
|
Xenos and R520 die shots are comparatively clear...
Jawed |
|
|
|
|
|
#144 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,093
|
I haven't seen any hi-res die shots of those cores. I haven't looked too hard for R520 shots, but I don't recall seeing a good pic of Xenos either.
Is there a presentation or article I missed?
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#145 |
|
Regular
|
There's nothing hi-res I'm afraid. The R520 review here has a die shot, I believe. Not available right now.
Xenos die shot is out there somewhere, can't think where. Mostly I have these things on disk, not URLs (as the latter have a habit of disappearing)... Jawed |
|
|
|
|
|
#146 | |
|
Member
Join Date: Jul 2003
Posts: 406
|
Apologies for continuing the OT, but P.29 from the Stanford presentation:
Quote:
|
|
|
|
|
|
|
#147 |
|
Nutella Nutellae
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
|
Not really surprising, it holds for any new architecture
__________________
[twitter] More samples, we need more samples! [Dean Calver] The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|