View Full Version : Nvidia G71 - rumours, questions and whatnot
Pages :
[
1]
2
3
4
5
6
7
8
ToxicTaZ
04-Dec-2005, 03:50
NVIDIA CFO Marv Burkett and VP Michael Hara at CSFB Annual Technology Conference 2005 revealed more about the GeForce 7 series where the 90nm G72 (GeForce 7200) and G73 (GeForce 7600) will be announced in Q1 next year. G72 will be 64-bit and support TurboCache technology. Marv Burkett revealed that GeForce 7200 and 7600 series will last between one to one and half year. G71 will be 90nm and is expected to be much higher clocked at 750MHz. As for G80, the development process has been smooth and is slated for mid 2006 release which i suppose during Computex 2006. Also G80 will support Shader Model 4.0.
Here use this to translat with...
http://www.worldlingo.com/en/websites/url_translator.html
http://www.hkepc.com/bbs/viewthread....extra=page%3D1
http://www.mchsi.com/desmet/outgoing_window/dvhardware.net
http://www.vr-zone.com.sg/?i=3005
This card could look like this in April 2006?
Nvidia GeForce 7900 Ultra @750MHz / 1.9GHz 256-bit 512MB GDDR 3 SDRAM 1ns (Samsung K4J52324QC-BJ10) 1000MHz
http://www.samsung.com/Products/Semiconductor/GraphicsMemory/index.htm
^eMpTy^
04-Dec-2005, 04:34
If true, how would a 750Mhz 24 pipe G71 stack up to the rumored specs of R580?
If true, how would a 750Mhz 24 pipe G71 stack up to the rumored specs of R580?
Not well. I think the G71 will be a 32 pipe part though.
Who says there's 24 of anything on that chip? I'm entirely unconvinced they're just using 90nm to get clocks for this refresh. They're using die space too, for yet more processing units, if you ask me.
^eMpTy^
04-Dec-2005, 05:06
Not well. I think the G71 will be a 32 pipe part though.
The impression I've gotten is that R580 will have 48 shading pipes...or shader processors? in any case, is this expected to translate directly to performance? or is it expected to be bottlenecked by something?
The impression I've gotten is that R580 will have 48 shading pipes...or shader processors? in any case, is this expected to translate directly to performance? or is it expected to be bottlenecked by something?
16 shader processors, each with a triplet of ALUs. 48 ALUs total. And that's just the beginning. Good thread here (http://www.beyond3d.com/forum/showthread.php?t=25802) about it (among a trillion others).
It'll be bottlenecked by software, if anything. We'll see soon enough.
The impression I've gotten is that R580 will have 48 shading pipes...or shader processors? in any case, is this expected to translate directly to performance? or is it expected to be bottlenecked by something?
Most likely by the CPU ?
Dave Baumann
04-Dec-2005, 05:15
16 shader processors, each with a triplet of ALUs. 48 ALUs total.
48 or 96?
It'll be bottlenecked by software, if anything. We'll see soon enough.
Well, on current software something of that configuration would likely to be bottlenecked by it texture capabilities.
I like to think of the 'sub' ALU pairings as the single unit to count. So 48 ALUs, but 96 instructions per cycle (only 48 MADs, though). As you well know :lol: 24 ALUs, 48 instructions (all MADs theoretically) for G70. That's not counting free FP16 normalise, or input modifiers or scalars, to keep it simple.
And bottlenecked because of software, not by. Bad language on my part.
Dave Baumann
04-Dec-2005, 05:33
Ahhh, but would that not be 192 instructions per cycle, peak... or even 256 if you want to add in branching and texture instructions...!
^eMpTy^
04-Dec-2005, 05:51
So I must confess that I've kinda lost track of how to describe the capabilities of a GPU now that the reign of fillrate is over...can anyone recommend a good place to read up on such things?
Who says there's 24 of anything on that chip? I'm entirely unconvinced they're just using 90nm to get clocks for this refresh. They're using die space too, for yet more processing units, if you ask me.
O RLY? - 1
:lol:
So I must confess that I've kinda lost track of how to describe the capabilities of a GPU now that the reign of fillrate is over...can anyone recommend a good place to read up on such things?
Nope (http://www.beyond3d.com/), (http://www.hexus.net/content/item_list.php?cat=1-4) not that I know of.
^eMpTy^
04-Dec-2005, 06:02
Nope (http://www.beyond3d.com/), (http://www.hexus.net/content/item_list.php?cat=1-4) not that I know of.
...that's what I get for being vague...
To be more specific: I know what an ALU is, I have an idea what a fragment pipeline is, I don't know what a MAD is, nor do I understand how all these things work together or how they relate to performance in modern games...
Know any articles that might piece that together for me? And now that ROPs are no longer 1:1 with fragment pipes, what should I call the rate at which fragment pipes make whatever it is that they make? shader operations/sec?
It's still fragments per cycle of output. It's just that texel rate and pixel fill aren't going to have the same number these days.
It's good to be talking about a shader rate (instructions per something, math ops per something, tex ops per something, etc), if nothing else, IMO.
As for MAD (and friends, like mul, add, sin/cos, texldd, etc), they're all shader instructions to be executed by the shader processors and any ALUs therein. So game performance is now, in a general sense, a product of how many of a given mix of those instructions you can process in a given time period, or at once, be they math and/or texture ops in the shader code. That's your shader rate (further defined by the branching performance these days, too), and then pixel output rate at the end.
^eMpTy^
04-Dec-2005, 06:25
It's still fragments per cycle of output. It's just that texel rate and pixel fill aren't going to have the same number these days.
It's good to be talking about a shader rate (instructions per something, math ops per something, tex ops per something, etc), if nothing else, IMO.
As for MAD (and friends, like mul, add, sin/cos, texldd, etc), they're all shader instructions to be executed by the shader processors and any ALUs therein. So game performance is now, in a general sense, a product of how many of a given mix of those instructions you can process in a given time period, or at once, be they math and/or texture ops in the shader code. That's your shader rate (further defined by the branching performance these days, too), and then pixel output rate at the end.
that actually makes a tremendous amount of sense...
so then a "shader processor" contains ALUs, the type and number of which determine the amount of shader operations of various types that can be performed per clock? and then texture units are a separate entity? with ROPs at the end actually pushing out the pixels? right?
where do the terms "vertex pipeline" and "fragment pipeline" fix into this? is a vertex pipeline just a vector ALU? and is a fragment pipeline just a shader processor?
(thanks for letting me pick your brain)
Ailuros
04-Dec-2005, 06:55
Ahhh, but would that not be 192 instructions per cycle, peak... or even 256 if you want to add in branching and texture instructions...!
Yes and it tells me that the gap between future competing chips is shrinking quite a lot in that department and it comes down to clock frequency difference more than anything else.
That of course if there aren't any significant internal architectural changes for either/or which are quite unlikely with refreshes anyway.
-----------------------------------------------------------------
As for the hypothetical 750MHz it would make more sense IMO if we'd be talking about the same amount of units as on G70. For a followup with more quads/units such a frequency smells like extremely bad yields IMO.
Ailuros
04-Dec-2005, 06:59
that actually makes a tremendous amount of sense...
so then a "shader processor" contains ALUs, the type and number of which determine the amount of shader operations of various types that can be performed per clock? and then texture units are a separate entity? with ROPs at the end actually pushing out the pixels? right?
where do the terms "vertex pipeline" and "fragment pipeline" fix into this? is a vertex pipeline just a vector ALU? and is a fragment pipeline just a shader processor?
(thanks for letting me pick your brain)
Have you tried reading the G70 and R520 architectural analysis here at B3D in the according reviews? Granted my head boils too with every reading attempt, but they do help understanding at least a few aspects.
***edit: a sterile calculation concentrating exclusively on MADDs (multiply + add) could be:
48 * 4 MADDs = 192 MADDs/clock
G70 = 24 * 8 MADDs = 192 MADDs/clock
but now you have only half the data since texture ops haven't been taken under account and no clock frequencies. For the first I'd use anything above 625MHz, for the second either 430 or 550MHz. For the G7x-followup you could speculate either 192 * 0.75GHz or 256 * 0.55GHz or something along that line, but there's still the texture OP math missing.
JoshMST
04-Dec-2005, 07:44
Wow, I guess if NV is able to get the top end huge chip to 750 MHz, I would be impressed. I was expecting more around 700 MHz. Still, they did surprise most of us with the 550 MHz 7800, so it definitely sounds like a distinct possibility.
I also agree with Rys, NVIDIA would be silly to not include new functional units with the die size savings of using the 90 nm process. 400 M transistors here we come!
Looking back, its actually quite amazing what NV and ATI have done with the 130 nm process. NV30 was 120 M trans, NV35 was 125 M trans, NV40 was 225 M, and the 110 nm G70 was 302 M trans. Do you think we can expect a factor of 3 in transistor increases from the first gen of 90 nm to the final 80 nm part? Do both ATI and NV have the ability within the next three years to produce a 1 Billion transistor GPU?
Skrying
04-Dec-2005, 08:24
R300 on 150nm was amazing in both clock speed and transistor count. Nv30 and Nv35 not at all, in fact, they were terrible IMO for their process. Nv40 is pretty impressive though as far as transistor count, and R420 is impressive as far as clock speeds.
I believe that both R580 and G71 will be close in transistor count, about the same amount of difference between G70 and R520. Though I'm not sure which one will have the higher count. I'm curious as to if the ALU beefing is the only thing that is changing in R580, I'm doubting it.
Ailuros
04-Dec-2005, 08:47
Wow, I guess if NV is able to get the top end huge chip to 750 MHz, I would be impressed. I was expecting more around 700 MHz. Still, they did surprise most of us with the 550 MHz 7800, so it definitely sounds like a distinct possibility.
I also agree with Rys, NVIDIA would be silly to not include new functional units with the die size savings of using the 90 nm process. 400 M transistors here we come!
Uhmm at first attempt the 7800 was at 430MHz. At first attempt in 90nm I'd be personally very much surprised if I'd see anything over 600MHz (at least not with more quads).
Nice - so those guys publish the exact same info, minus RSX info, than I did in a thread I posted about a day after the conference, and THIS makes a new thread? ;) Only difference of course being some more senseless speculation in this one.
Ailuros: I wouldn't really say the first G70 attempt was at 430Mhz. There are two factors in that we need to remember: first of all, NVIDIA decided to take a budget process, with the budget production module basically, not the high-performance one. And secondly, they clocked it quite conservatively since there was no competition and they wanted as high margins as they could get (cf. the various OC boards). For the 7800GTX512MB, both factors were no longer true (of course, the process matured slightly too, but that's unlikely to do a big difference).
Anyhow, since this thread is on a nice track to become either a "NV30's clocks were a voltage hack" or a "What's an ALU?!" one, I guess I might as well contribute something to it... The following are my notes regarding margins from a smaller conference that occured around the 20th of November iirc:
"Revenue per Wafer"
Fixed cost per wafer.
So more dies per wafer.
-> RECONFIGURABILITY
Capability to reconfigure chips to sell at a different pricepoint.
Most of the margin improvements come from this, thus the GFFX->GF6 transition.
Margins for FX were in the low-30s on average.
Now, margins for GF6 & GF7 are above 40%
Next 5% coming from 2 factors: 1) Sony licensing business. 2) Opportunities in our operational processing
Team we formed GMI: Gross Margin Initiative Team. Cross-functional managers coming from all departments.
Only one objective: reducing waste. Coming up with ideas every 2 weeks.
Revenues came back when we became much more competitive [with GF6] This also implies that they expect to be increasing margins from 40 to 45% not really through their desktop/notebook GPU business, but more in their secondary divisions, where margins aren't quite as high yet. And of course, the PS3 is 100% margins for them...
Also, what's interesting IMO is that they expect the PS3 to increase margins by 2.5% when at 40%, which makes it possible to estimate the revenues they expect from it, too. Current revenues are of $583M, and the current cost of revenue is of $228M.
So, to assume 40% margins in that timeframe, let us say that they'd have $600M revenue and $240M gross profit.
Then, to get around 42.5% margins, they'd need $625M revenue and $265M gross profit. That implies they expect about $100M revenue (profit) per year from the PS3, at least for the first year or so, which would correspond to about 20M consoles if the analyst estimate of $5/console licensing fees for NVIDIA is accurate. Feels doable to me (remember... produced, not sold).
Uttar
wireframe
04-Dec-2005, 10:15
I also agree with Rys, NVIDIA would be silly to not include new functional units with the die size savings of using the 90 nm process. 400 M transistors here we come!
At 750 MHz, would they need to? Sure, some tuning and fixes here and there, but this is a larger leap than from the GTX to 512MB GTX. I'm not saying I think 750 MHz sounds right. I think it is a wee bit optimistic. Besides, what's G80 supposed to be if this thing only exists for two quarters?
Also, for those using this thread to further understand the processing ability of certain chips, we've only talked about fragment shader rates so far, and nothing really of the VS units. Got a bunch of folks ask me about that offline.
Kombatant
04-Dec-2005, 12:47
Got a bunch of folks ask me about that offline.
Offline? What is that? :???:
Sorry, had to :lol: And so as not to have my dear mods complaining that I make off-topic posts, allow me to say that I believe that nV will not do a simple 90nm transition; they will enhance the chip, apart from merely doing a frequency increase.
Dave Baumann
04-Dec-2005, 14:00
***edit: a sterile calculation concentrating exclusively on MADDs (multiply + add) could be:
48 * 4 MADDs = 192 MADDs/clock
G70 = 24 * 8 MADDs = 192 MADDs/clock
Is that not component ops, rather than instruction?
Ailuros
04-Dec-2005, 14:18
Ailuros: I wouldn't really say the first G70 attempt was at 430Mhz. There are two factors in that we need to remember: first of all, NVIDIA decided to take a budget process, with the budget production module basically, not the high-performance one. And secondly, they clocked it quite conservatively since there was no competition and they wanted as high margins as they could get (cf. the various OC boards). For the 7800GTX512MB, both factors were no longer true (of course, the process matured slightly too, but that's unlikely to do a big difference).
That's too oversimplyfied for my taste.
a) 7800GTX 256 shipped in mid summer this year at 430/600MHz as a single slot design and with wide availability.
b) For any higher theoretical fillrate you also need a tad more raw bandwidth or not?
c) I wouldn't call the 512 GTX availability "wide" even today. I can see a dual slot design that shipped months past the 256 GTX.
d) While there are varying clock domains on the 256 GTX, it's my understanding that the 512 is clocked throughout at 550MHz.
I can clock mine without a hickup at 490/685MHz, but that's mostly due to the odds that are favourable in my case and more specifically a very low case temperature.
Is that not component ops, rather than instruction?
I called it a sterile calculation didn't I?
ToxicTaZ
04-Dec-2005, 19:03
I still think its just a 200MHz speed upgrade on core/ram and the 90nm could do this for less money and less power for a good spring refresh for Nvidia!
Nvidia GeForce 7900 Ultra @750MHz / 1.9GHz 256-bit 512MB GDDR 3 SDRAM 1ns (Samsung K4J52324QC-BJ10) 1000MHz. $749. us
Yes this is 90nm GPU chip meaning there is room for more pipes? They could leve this the same 24 pipe or add 4 - 8 pipes more with this change! So this new chip could be a 28 pipe or a 32 pipe? (I still think the only real 32-pipe chip from Nvidia's GPU code is the G80!)
Now its a spring GPU war with G71 vs R580?
b) For any higher theoretical fillrate you also need a tad more raw bandwidth or not?
Maybe NVidia will be quadrupling the cache sizes or something to use up all that die space?
Jawed
Ailuros
04-Dec-2005, 20:21
Maybe NVidia will be quadrupling the cache sizes or something to use up all that die space?
Jawed
That's not what I meant. 10.3 GTexels/s with 38.4 GB/s on 256 GTX and
13.3 GTexels/s with 48 GB/s bandwidth on 512 GTX. I was under the impression that mid summer this year anything higher than 600MHz GDDR3 had quite rare availability because of the ram orders for the coming consoles.
(I still think the only real 32-pipe chip from Nvidia's GPU code is the G80!)
I'm not so sure with the recent indications piling up, that we can talk actually about 32pipes or 8 quads for their D3D10 GPU.
That's not what I meant. 10.3 GTexels/s with 38.4 GB/s on 256 GTX and
13.3 GTexels/s with 48 GB/s bandwidth on 512 GTX.
Well, I was agreeing, with barely more bandwidth, this part is gonna need some magic to not feel a serious pinch.
But another way of looking at it is, with these new games (that are coming real soon now)with their increased ALU operations over TEX operations, the lack of extra bandwidth won't matter too much. All that raw shader power will feast on the heavier shaders.
Jawed
Hmm, I meant to post this late last night, but my conxn gave out on me. Maybe empty will still find this helpful. (Also, pc++. :razz:)
where do the terms "vertex pipeline" and "fragment pipeline" fix into this? is a vertex pipeline just a vector ALU? and is a fragment pipeline just a shader processor?I'll take a stab at looking the fool (and possibly learning something in the process). If there's too much wrong with my explanation, just tell me to delete it and read more before misleading people. And go easy on the expletives. ;)
As I understand things, the vertices get processed (solely transformed?*) first, then you move into processing scene fragments (color, texture, etc.), which are finally outputted as screen pixels. It's a rendering pipeline from CPU (application) to vertex shader (geometry) to fragment shader (rasterizer) to display. Within these rendering phases, operations are further pipelined. The further pipelining is to avoid stalling an entire step (eg, vertex shading) on a trivial task. Apparently GPUs have hundreds of pipeline stages, compared to something like the P4 Netburst's 20 or so pipeline stages (which is high compared to P3/P-M/A64/etc.).
---
* Have modern games moved away from per-vertex to per-pixel lighting, thus reducing some of the vertex pipeline's typical load?
---
Yeah, ATM vertex and fragment shaders use specialized ALUs. Vertex shader ALUs are four FP32 components wide (matching a four-component vector in a typical 4x4 matrix), while fragment shader ALUs are three FP24/32 components wide (RGB) plus scalar (A? Z?) wide. (See those nice illustrations from the NV40 and R300 PR material in Dave's reviews.)
Unified shaders will, as I understand it, move to unified, or common, ALUs (now four vectors wide all around, and I dunno if that includes the typical fragment shader scalar op). Unified shader pipes will also give "vertex shaders" access to a TMU, which I believe has been confined to fragment shaders (up til SM3/VS3, anyway). This addition apparently leads to creating/destroying geometry and all sorts of stupid polygon tricks that were previously limited to the CPU ("software"). See Dave's Xenos article for more on this unified fad.
So, my understanding is that the vertex stage is basically a feature-limited (no TMU or extra hardware bits like single-cycle sincos or FP16 normalization) but higher-precision (FP32 vs., up til recently, FX8/12/16 and FP16/24) and wider (4 vs. 3 vectors) version of the fragment stage. Now both VS and FS have FP32 fragment precision in common, and I believe the vertex fetch functionality in VS3 brings them still closer together in broad featureset.
As for whether the fragment pipe is just a vector ALU (don't forget Mr. Scalar!), I imagine the pics in Dave's R520 review should clear that up. I'm thinking of the one he modified to illustrate the difference b/w an R520 fragment "pipe" (ROPs, the end of the rasterizer pipeline, are now separated from shaders et al) and RV530/580 tripling of "shader power."
In short, it's all math, and it's all crazy.
Now I see why AEG approached me. I fit Rys' profile of a pundit without a podium perfectly, only probably dumber than Rys envisions and about as dumb--er, malleable as AEG hopes. Bah, I should be typing this on a free G70. ;-P
caboosemoose
05-Dec-2005, 02:28
Well, I'll be happy to be proven wrong, but using the X1600 as a reference point for R580, I would expect it to be fairly unspectacular for most currently available games. After all, the X1600 is fairly shite. A 750MHz, 24 fragment G71 chip would be very competitive, IMHO. By the time games that really leverage the pixel-shader heavy design of R580 are the norm, that chip will be history. That's not to say I have any idea whether G71 will add trannies/extra fragments etc (you'd think it would, perhaps 32 fragments and 16 ROPs, but who knows, I don't).
Anyway, NVIDIA's brute force approach is no doubt relatively crude, but I fancy it's slightly more effective in the current software environment. The same thing goes for ATI's supposed advantage in shader branching. Until both ATI and NV offer chips with decent branching, it's probably unlikely that developers will code for it.
Anyway, as Rys says, we shall be finding out about one half of this equation very, very soon. Well, some of us will. :P
Anyway, as Rys says, we shall be finding out about one half of this equation very, very soon. Well, some of us will. :P
"Very, very soon"? Not just "soon", or "very soon"? What's that mean --you haven't opened the box on the kitchen counter yet? :lol:
caboosemoose
05-Dec-2005, 02:39
My "very, very soon" may or may not be analogous to your "very, very soon". I hope that (doesn't) help(s).
Mintmaster
05-Dec-2005, 02:39
I like to think of the 'sub' ALU pairings as the single unit to count. So 48 ALUs, but 96 instructions per cycle (only 48 MADs, though). As you well know :lol: 24 ALUs, 48 instructions (all MADs theoretically) for G70. That's not counting free FP16 normalise, or input modifiers or scalars, to keep it simple.
It's a little misleading to put it that way. For one, I don't think G70 can use both ALU's when a texture instruction is being issued. Saying G70 can execute 48 MAD's per clock would suggest it shades 3 times as fast as R480 or R520 at equal clock speeds, when actually it's something like 2x and 1.5x respectively.
But I do agree that 32-pipes is likely. I also heard something about NVidia considering adding FP16 multisampling capability for HDR+AA, but I'm not sure. Maybe this will take up some die space.
My "very, very soon" may or may not be analogous to your "very, very soon". I hope that (doesn't) help(s).
Gee, and I thot "pipes" had gotten ambiguous. :cool:
Ailuros
05-Dec-2005, 04:55
Well, I'll be happy to be proven wrong, but using the X1600 as a reference point for R580, I would expect it to be fairly unspectacular for most currently available games. After all, the X1600 is fairly shite.
That's a bit harsh for my taste; when the infamous numbering riddles were finally decrypted shortly before release on these forums I suspected that the RV530 would end up rather underwhelming in older (shader-less) multi-texturing heavy games (take UT2k4 as an example). I'd expect the mainstream part to follow the R580 to be at least twice as capable as RV530.
Big question mark would then be what the part from the competition would look like against the latter.
A 750MHz, 24 fragment G71 chip would be very competitive, IMHO. By the time games that really leverage the pixel-shader heavy design of R580 are the norm, that chip will be history. That's not to say I have any idea whether G71 will add trannies/extra fragments etc (you'd think it would, perhaps 32 fragments and 16 ROPs, but who knows, I don't).
No doubt it would at least theoretically. Trouble is that doesn't go along the so far used policy with the entire NV4x/G7x line so far. Even more quads give them the luxury to clock such a board at way lower frequencies; if then needed it's always easy to handpick after a given time specific chips and clock them higher for the ultra high end. Pumping up to 750MHz doesn't suggest in my mind too much headroom for even higher frequencies.
Layman's speculative math tells me (and yes I might be wrong) that you can get more or less the same results with 8quads@560MHz as with 6quads@750MHz (presupposition quads are theoretically identical).
So I must confess that I've kinda lost track of how to describe the capabilities of a GPU now that the reign of fillrate is over...can anyone recommend a good place to read up on such things?
Ahem, www.beyond3d.com? :razz:
EDIT:
Bah, Rys beat me to it...
Sunrise
05-Dec-2005, 09:20
No doubt it would at least theoretically. Trouble is that doesn't go along the so far used policy with the entire NV4x/G7x line so far. Even more quads give them the luxury to clock such a board at way lower frequencies; if then needed it's always easy to handpick after a given time specific chips and clock them higher for the ultra high end. Pumping up to 750MHz doesn't suggest in my mind too much headroom for even higher frequencies.
Layman's speculative math tells me (and yes I might be wrong) that you can get more or less the same results with 8quads@560MHz as with 6quads@750MHz (presupposition quads are theoretically identical).Sure you could, but that´s too oversimplified for my taste. ;)
While basically you are on the right track in pointing out what NV did "right" with NV4X/G70, you have to adapt that "knowledge" to new possibilities with 90nm (FSG vs. Low-K). With that in mind, they could also have both higher complexity AND higher clockspeed, relative to GPUs built on 110nm. Margin wise it should be pretty negligible which route they take, since they already have a fair share of understanding how high they can clock their parts, while still being not to aggressive and butchering yields. That´s not saying 750MHz looks easily accomplished, but at the current state we can only draw conclusions with RSX in mind, where they have to meet very strict guidelines for power consumption and power dissipation. G71 won´t have those limits, so basically they could clock it through the roof (that´s speculative, since we don´t know what headroom NV really has with their current designs), but in a relative sense, nonetheless.
While it was the idea of a genius to take their already well fitted NV4X (high complexity, scalability, average clockspeeds) one step further and build it on 110nm (even more complexity, same scalability, average-high clockspeeds), you need to support the thought that with 90nm in mind, you can have an entirely different beast of GPU.
Second, G7x will basically be EOL after that, therefore they can squezze everything out of that architecture until G80 takes over, which will then be like the transition from Geforce 2 Ultra to Geforce 3 (historically speaking).
It's a little misleading to put it that way. For one, I don't think G70 can use both ALU's when a texture instruction is being issued. Saying G70 can execute 48 MAD's per clock would suggest it shades 3 times as fast as R480 or R520 at equal clock speeds, when actually it's something like 2x and 1.5x respectively.
But I do agree that 32-pipes is likely. I also heard something about NVidia considering adding FP16 multisampling capability for HDR+AA, but I'm not sure. Maybe this will take up some die space.
True, since the texture sampler isn't available to both sub units, nor decoupled entirely. So waiting for sampler results precludes the sub unit doing anything.
And yeah, if the ROPs aren't a bit more capable this time around, NVIDIA using the new die space they'd have available for that as well, I'd be surprised
Mintmaster
05-Dec-2005, 13:28
Well, I'll be happy to be proven wrong, but using the X1600 as a reference point for R580, I would expect it to be fairly unspectacular for most currently available games. After all, the X1600 is fairly shite. A 750MHz, 24 fragment G71 chip would be very competitive, IMHO.
I think you're missing something very important.
True, the X1600XT can only catch the 6800GS in newer games like COD2 and FEAR, even though the 6800GS is clocked a bit lower, and the X1600XT often loses quite badly.
But while G70 is twice as fast as the 6800GS, R580 is four times as fast as the X1600XT (per clock, neglecting bandwidth in both cases). A 32-pipe 90nm G71 will be an interesting contender against R580 (not sure who'd win, but I'm thinking ATI for newer titles); however, a 24-pipe G71 won't, IMO. The latter's clock speed may make it a little over 3 times the speed of the 6800GS, but I don't think that'll be enough. Unless, of course, there are radical pipeline changes.
Another possibility is that G71 will have yet another shader unit in each pipeline instead of adding 8 more pipes, but it'll have to scale much better than the additional shader unit currently does.
True, the X1600XT can only catch the 6800GS in newer games like COD2 and FEAR, even though the 6800GS is clocked a bit lower, and the X1600XT often loses quite badly..
Don't forget, that GS has much higher memory bandwidth (~150% of X1600XT).
Mintmaster
05-Dec-2005, 14:38
Don't forget, that GS has much higher memory bandwidth (~150% of X1600XT).
Don't worry, I didn't. I'm ignoring BW right now, since I figure ATI and NVidia are close enough for memory efficiency, and both IHV's will have access to the same memory.
It's when BW doesn't matter that things will get more interesting.
16 shader processors, each with a triplet of ALUs. 48 ALUs total. And that's just the beginning. Good thread here (http://www.beyond3d.com/forum/showthread.php?t=25802) about it (among a trillion others).
It'll be bottlenecked by software, if anything. We'll see soon enough.
Last time I checked the pixel shader units have more than 1 ALU each (though only 1 fully capable ALU), so it would have 3 of ps units on each shader processor, not 3 ALU's
trinibwoy
05-Dec-2005, 14:47
Wouldn't the texturing units in an additional quad for G71 be a waste of die space? Assuming minimum clock on a 90nm 8-quad G70 is ~ 550, that would put it around 18,000 MT/s. Outside of synthetic benchmarks, would that even provide a competitive advantage? What are the odds that the decoupled texture samplers make it into G71?
Another possibility is that G71 will have yet another shader unit in each pipeline instead of adding 8 more pipes, but it'll have to scale much better than the additional shader unit currently does.
G70 doesn't feature an extra shader unit per pipeline compared to NV40 (and derivatives), but simply enhanced shader units: http://www.beyond3d.com/previews/nvidia/g70/index.php?p=02.
Because of this it wouldn't be accurate to predict the performance of a hypothetical G71 with 3 shader units per pipe on the performance difference of a G70 and NV40 with equal number of pipelines and same clock speed.
dizietsma
05-Dec-2005, 14:53
I don't see the need for nvidia to increase the complexity and thus decrease the yield especially if they keep it like it is and hopefully get it much cheaper.
Therefore when the 48 pipeline cards come out they are not as expensive as currently. I don't see nvidia complaining to the motherboard manufacturers that having two 16x PCIe slots on the motherboard is a waste ;)
Come Spring it will be "The Power of 4 " no doubt.
AlStrong
05-Dec-2005, 15:10
That's not what I meant. 10.3 GTexels/s with 38.4 GB/s on 256 GTX and
13.3 GTexels/s with 48 GB/s bandwidth on 512 GTX. I was under the impression that mid summer this year anything higher than 600MHz GDDR3 had quite rare availability because of the ram orders for the coming consoles.
But if you're talking about high end GPUs, which use 256-bit memory, isn't that a different part of the manufacturing entirely? The next wave of consoles are using 128-bit GDDR3.
Don't forget, that GS has much higher memory bandwidth (~150% of X1600XT).
It also has twice as many ROPs and TMUs.
3x
X1600XT is nothing more than a replacement for X700Pro. Which has twice as many ROPs and TMUs. The X1600XT is comfortably faster.
Jawed
mrcorbo
05-Dec-2005, 17:44
Ok, so what if NVidia do make some tweaks to G71 and:
1. Re-enable angle-independant AF.
2. Enable MSAA+HDR
3. Tweak the memory access (more cache, etc.)
4. But keep it at 24 Pipes @ 750MHz?
Given that 1+2 (Even though the usefulness of #2 is somewhat questionable ATM due to performance issues) are feathers in ATI's cap and NVidia seem keen of late to pluck all of ATI's feathers, I think that is not extremely unlikely.
IMO, that configuration vs. R580 would result in a mixed bag with G71 winning some benchmarks and R580 others. Unless there are several titles coming in the next few months that favor R580 architecture, though, I think on balance the G71 comes out ahead at launch. That picture could very well change during the period leading up to R600/G80, however.
We are in interesting times, no doubt about it.
Chalnoth
05-Dec-2005, 17:45
Quite interesting if true. But at this point, we can't really expect any clock speeds to remain static.
That said, 750MHz really sounds like it'd be a 24-pipeline part, little more than a shrink of the current G70. This would seem to give a similar performance boost over the GeForce 7800 512MB to that which we are expecting from the R580 over the R520 XT, though.
But if nVidia is planning on producing a 32-pipeline, 750MHz part, well, the performance crown is in the bag.
Me, though, I'm eagerly anticipating SM4.
Dave Baumann
05-Dec-2005, 17:55
Chalnoth - do you believe that 32 pipes and 750MHz is likely?
Ok, so what if NVidia do make some tweaks to G71 and:
1. Re-enable angle-independant AF.
2. Enable MSAA+HDR
3. Tweak the memory access (more cache, etc.)
4. But keep it at 24 Pipes @ 750MHz?
Given that 1+2 (Even though the usefulness of #2 is somewhat questionable ATM due to performance issues) are feathers in ATI's cap and NVidia seem keen of late to pluck all of ATI's feathers, I think that is not extremely unlikely.
IMO, that configuration vs. R580 would result in a mixed bag with G71 winning some benchmarks and R580 others. Unless there are several titles coming in the next few months that favor R580 architecture, though, I think on balance the G71 comes out ahead at launch. That picture could very well change during the period leading up to R600/G80, however.
We are in interesting times, no doubt about it.
Wouldn't be described chip with 32 pixel pipelines and mentioned tweaks extremely huge @90nm? I think 450 milions of tranzistors wouldn't be sufficient.
Chalnoth - do you believe that 32 pipes and 750MHz is likely?
Post of the day :-)
Chalnoth
05-Dec-2005, 18:04
Chalnoth - do you believe that 32 pipes and 750MHz is likely?
I believe the second sentence of my post answers that question. Of course, I have no real information on this, but 32 pipes and 750MHz would seem to be just too hot.
Everything we think we know about NV over the last year would seem to argue more for the pipes than the clocks. So at the moment, my WAG is 32 pipes around 650. If they yank a 750mhz part out with 32 pipes I'm guessing it won't be at first.
Edit: Of course, I'm a guy who thot GTX512 would be clocked around 520! :lol:
JoshMST
05-Dec-2005, 18:28
Why not have both clockspeed and pipes? NVIDIA worked on NV40 for two + years before its release, they have then refreshed it with the G70 that has the max speed of 550 MHz on a non-performance process, so basically they have had the ability to work on the same basic design for almost another two years from when the 6800 Ultra was received back at the labs. Through good tools, design, and optimizations why do you think it is beyond NV's engineers to create a part that was not only big, but could run fast too? 90 nm Low-K is a high performance process, and from all indications it runs very well.
While I think 750 MHz with 32 pipes is a possibility, I think it more likely that we would see 700 Mhz and less for such a part. Look at ATI and their R520. Most of the final chips can run perfectly fine at 700 MHz and it is well over 300 million transistors, and their first real run at what appears (at least to me) to be a fairly large change of architecture for ATI.
While everyone always laughs at the Intrinsity posts, we tend to forget the other design tools that these guys use as well. They have significantly improved these tools through the years to be able to handle the complex designs, which in turn means that faster chips come out of the design process. Both ATI and NVIDIA spend millions every year upgrading these tools, and the reason they spend this money is that every year these tools get better and better at what they do.
32 pipes at 600 seems more like it to me. 32 pipes is going to be in the region of 450m transistors - for NVidia's first high-end 90nm device I'd suggest it's easier for them to aim "fat and slow" rather than "skinny and nippy".
Skinny being a relative concept of course, 24 pipes isn't really skinny.
Fat and slow also provides more options to disable quads, so is a safer way to start the high-end, I guess. And the skinny and nippy alternative prolly wouldn't leave any headroom for higher clocks as the process is tweaked.
I think fat and slow is the approach NVidia has taken with NV40 and G70. Is that a fair comment?
Jawed
I think fat and slow is the approach NVidia has taken with NV40 and G70. Is that a fair comment?
Absolutely. Common sense says so, which I presume is what you used :wink:
I'll stick my flag in the sand right now with this one. Might as well, since I wasn't that far off with this (http://www.beyond3d.com/forum/showpost.php?p=433854&postcount=274). Bar Jeff Fu, 4D fragment hardware, and most of the texturing details anyway :lol:
G70's basic ALU setup per vp and fp, but: 10vp, 32fp, 16 ROPs (that can multisample FP16, maybe 6xAA or more), tweaked AF, 90nm, 625MHz core, 900MHz mem.
JoshMST
05-Dec-2005, 19:02
450 million transistors seems a little high to me, I guess I was expecting more around 380 M to 400 M for this first, big 90 nm product. But I guess we shall see!
ToxicTaZ
05-Dec-2005, 19:08
But if you're talking about high end GPUs, which use 256-bit memory, isn't that a different part of the manufacturing entirely? The next wave of consoles are using 128-bit GDDR3.
Sorry but I think thats wrong! The next wave of consoles are using 256-bit GDDR3! The Sony PS3 use 256MB 256-bit GDDR3 VRAM @700MHz (22.4GB/s)
http://www.us.playstation.com/Pressreleases.aspx?id=279
Last time I checked the pixel shader units have more than 1 ALU each (though only 1 fully capable ALU), so it would have 3 of ps units on each shader processor, not 3 ALU's
We went over that on the previous page, talking about sub units.
Absolutely. Common sense says so, which I presume is what you used :wink:
I'll stick my flag in the sand right now with this one. Might as well, since I wasn't that far off with this (http://www.beyond3d.com/forum/showpost.php?p=433854&postcount=274). Bar Jeff Fu, 4D fragment hardware, and most of the texturing details anyway :lol:
And you're still beating them over the head on Jeff Fu. :lol: Hell hath no fury like a pundit whose advice is not followed. :razz:
AlphaWolf
05-Dec-2005, 19:29
Sorry but I think thats wrong! The next wave of consoles are using 256-bit GDDR3! The Sony PS3 use 256MB 256-bit GDDR3 VRAM @700MHz (22.4GB/s)
http://www.us.playstation.com/Pressreleases.aspx?id=279
Where do you get 256bit from that?
That's slightly more bandwidth than the x1600xt, which is 128bit@690mhz gddr3.
Where exactly is the 750MHz core speed figure explicitly revealed?
I assume that it wasn't mentioned during the Credit Suisse First Boston presentation that Uttar references here (http://www.beyond3d.com/forum/showthread.php?t=26172), as it's not exactly the kind of tidbit one omits from the summary!
Given nV's success with G70 on 110nm, I wouldn't want to rule out 750MHz at 90nm completely, which would be impressive at six quads, but much more so at eight quads. However, if this figure has yet to be attributed to one of the usual channels is it just a reasoned assumption and perhaps soon-to-be general concensus, or has a little birdy been cheeping to VR-Zone? :smile:
They could also do around 650-700 MHz with 24 pipes, each with two full ALUs.
32-pipeline, 750MHz part???
And exactly how are they gonna to keep all of those pipes full?
Absolutely. Common sense says so, which I presume is what you used :wink:
I'll stick my flag in the sand right now with this one. Might as well, since I wasn't that far off with this (http://www.beyond3d.com/forum/showpost.php?p=433854&postcount=274). Bar Jeff Fu, 4D fragment hardware, and most of the texturing details anyway :lol:
Now that's a pretty interesting speculative post. Actually worth reading and thinking about and comparing it to what we have now. (I feel a bit silly speculating merely about pipes and clocks...)
That thread and, I guess, the other R520 threads, would make for entertaining reading just to see how the discussion/theories evolved (highlights). And not to mention the pain caused by the delays. But the hugeness does sort of dampen my enthusiasm.
Jawed
wireframe
05-Dec-2005, 20:43
Where do you get 256bit from that?
That's slightly more bandwidth than the x1600xt, which is 128bit@690mhz gddr3.
People are confusing memory density (Mbit) with memory interface width (bits).
PS 3 is using a 128-bit memory interface to the GDDR3 (1,400MHz*128-bits / 8-bit = 22.4 GB/sec). The memory density of those GDDR3 modules might be 256Mbit (32MB/module) meaning it would require 8 modules in order to total 256 MB.
32-pipeline, 750MHz part???
And exactly how are they gonna to keep all of those pipes full?
The same way that 24 pipes are currently filled.
There are plenty of heavy-duty games out there, now. The blip of CPU-limited games we saw on the release of 7800GTX was just that, a blip.
I'm looking forwards now to a long future of the best games being predominantly GPU-limited. DX10 should free-up CPUs a good notch (admittedly a year+ away), and with DX7 GPUs practically on their last legs (with "old" new engines, D3, HL-2 and CoD2 playing rear-guard) the rush into eye-candy driven by the ~DX9+ consoles should make for an incredibly exciting time.
Hell, the doubling of bandwidth provided by GDDR4 in the next 6-9 months, all on its own, is gonna be a just delicious :drool: <-insert B3D drool smiley
Jawed
wireframe
05-Dec-2005, 20:51
Hell, the doubling of bandwidth provided by GDDR4 in the next 6-9 months, all on its own, is gonna be a just delicious
This has totally passed me by. Where can I read more about when this is being launched, potential first products to use it, and projected frequencies? I saw some little comment about this in relation to R580 or R590, I think, but that was it.
:drool: <-insert B3D drool smiley
That's a really cool smiley. How do you do that?
While I'm reminiscing, damn, how could I have forgotten the R3D April Fools:
http://www.beyond3d.com/forum/showpost.php?p=439397&postcount=543
http://www.beyond3d.com/forum/showpost.php?p=439482&postcount=554
Wonderful.
Jawed
DemoCoder
05-Dec-2005, 21:03
The only problem is "fat and slow" both have negative associations. "wide" and "shallow" might be a better way to phrase it.
AlphaWolf
05-Dec-2005, 21:10
The only problem is "fat and slow" both have negative associations. "wide" and "shallow" might be a better way to phrase it.
I don't think my girlfriend would like that either, but I will give it a shot.
This has totally passed me by. Where can I read more about when this is being launched, potential first products to use it, and projected frequencies? I saw some little comment about this in relation to R580 or R590, I think, but that was it.
http://www.techreport.com/onearticle.x/9108
Though I saw it first somewhere else, that's a nice summary.
:drool: <-insert B3D drool smiley That's a really cool smiley. How do you do that?
That's a smiley? :???: It aint, here :oops:
Jawed
mrcorbo
05-Dec-2005, 21:33
Wouldn't be described chip with 32 pixel pipelines and mentioned tweaks extremely huge @90nm? I think 450 milions of tranzistors wouldn't be sufficient.
Quote:
Originally Posted by mrcorbo
Ok, so what if NVidia do make some tweaks to G71 and:
1. Re-enable angle-independant AF.
2. Enable MSAA+HDR
3. Tweak the memory access (more cache, etc.)
>>4. But keep it at 24 Pipes @ 750MHz?<<
My hunch is still faster/leaner....unless they wait for GDDR4. Otherwise, I think a 32-pipe part would be severely memory bandwidth limited. ATI already addressed this bottleneck with the ring-bus and Nvidia have yet to do so. This makes some form of tweak to enable higher efficiency memory access extremely likely IMO. Jawed said it as an off-hand comment, but given that CPU manufacturers have used extra die space enabled by process shrinks to add more cache, he may not have been far off. This is where I go out of my depth, though. What specific areas would benefit from added cache?
Finally LINDBERGH's GPU has been revealed.
Jawed said it as an off-hand comment, but given that CPU manufacturers have used extra die space enabled by process shrinks to add more cache, he may not have been far off. This is where I go out of my depth, though. What specific areas would benefit from added cache?
I think there's a rule of thumb in cache design that every doubling of cache size buys you 10% extra performance.
ATI re-architected the cache for R520 and made it bigger - I think 32K, perhaps from 16K. If there was a doubling, the re-architecting nullifies the "extra 10% performance" argument.
http://www.beyond3d.com/reviews/ati/r520/index.php?p=06
I don't think adding wodges more cache would be enough on its own for G71.
NVidia uses a 2-level cache architecture as opposed to ATI's 1-level, which makes comparisons of texture caching between ATI and NVidia really fiddly.
Jawed
The same way that 24 pipes are currently filled.
There are plenty of heavy-duty games out there, now. The blip of CPU-limited games we saw on the release of 7800GTX was just that, a blip.
I'm looking forwards now to a long future of the best games being predominantly GPU-limited. DX10 should free-up CPUs a good notch (admittedly a year+ away), and with DX7 GPUs practically on their last legs (with "old" new engines, D3, HL-2 and CoD2 playing rear-guard) the rush into eye-candy driven by the ~DX9+ consoles should make for an incredibly exciting time.
Hell, the doubling of bandwidth provided by GDDR4 in the next 6-9 months, all on its own, is gonna be a just delicious :drool: <-insert B3D drool smiley
Jawed
I think your putting way to much stock in both GDDR4 and CPU limited games. Right now we have games that make both the x1800xt/7800 512mb GTX crawl at the higest settings which shows some of the impact on Memory Bandwidth. But guess we have to wait and see...
wireframe
05-Dec-2005, 22:29
http://www.techreport.com/onearticle.x/9108
Though I saw it first somewhere else, that's a nice summary.
Thanks. Not much of a summary. More of a Hynix press release. It's incorrectly worded as well "2.9Gbps" should read "2.9 GHz" (with 32 parallel bits per clock, or 4 bytes per clock) for a total of 11.6 GB/sec. With densities of 512Mbit I guess we'll see 8 chips on a board for 512 MB total. That should be a peak of 92.8 GB/sec on a 256-bit interface. Roughly double what 512MB GTX and 1X800XT can offer now.
Thinking about it that way, unless someone finds a way to rapidly scale memory performance per module, it looks like a 512-bit interface is inevitably inbound or they (not they they, the other they :razz:) expect us to use SLI/CrossFire to alleviate the situation. Then again, we have been hovering around 1,000MHz@256-bit for what seems like forever and performance has still increased. Perhaps we'll just see more of that, but it would be nice with some real raw increase in bandwidth, something like when Radeon 9700 launched and everyone gasped.
Weird that Samsung hasn't announced anything like this. 512Mbit densities seems the way to go now.
That's a smiley? :???: It aint, here :oops:
Yeah, I'm just pulling your leg. heh. :wink:
wireframe
05-Dec-2005, 22:33
Finally LINDBERGH's GPU has been revealed.
LOL. You and your Lindbergh...
Chalnoth
05-Dec-2005, 22:36
ATI already addressed this bottleneck with the ring-bus and Nvidia have yet to do so.
No, I don't believe the ring bus has anything to do with this. Rather, I expect it exists for better utilization of resources when multithreading is used (as well as allowing for high clock speeds, though nVidia doesn't appear to be having any problem there). A better analysis of the ring bus would be to say that it's a fantastic architectural advancement for ATI's architecture, but may or may not be useful at all for nVidia.
Chalnoth
05-Dec-2005, 22:37
Right now we have games that make both the x1800xt/7800 512mb GTX crawl at the higest settings which shows some of the impact on Memory Bandwidth. But guess we have to wait and see...
Why do you think that this has anything to do with memory bandwidth?
Mintmaster
05-Dec-2005, 23:28
That said, 750MHz really sounds like it'd be a 24-pipeline part, little more than a shrink of the current G70. This would seem to give a similar performance boost over the GeForce 7800 512MB to that which we are expecting from the R580 over the R520 XT, though.
But if nVidia is planning on producing a 32-pipeline, 750MHz part, well, the performance crown is in the bag.
That looking at it through rather green tinted glasses!
The RV530 is half the speed of R520 in newer games, and in those circumstances we're expecting R580 to double R520's performance, not get a measly 35% boost. For older games, who cares at >60fps for 1600x1200 if the increase is only 50% or less?
If they release a 24-pipe card, ATI will have performance in the bag. A 32-pipe card, however, will make it more interesting. I'm inclined to think NVidia will win overall, but ATI will still lead in newer games.
You're making it sound like a 24-pipe G71 @750 MHz is to R580 as the 512MB GTX is to R520.
Mintmaster
05-Dec-2005, 23:35
Ok, so what if NVidia do make some tweaks to G71 and:
1. Re-enable angle-independant AF.
2. Enable MSAA+HDR
3. Tweak the memory access (more cache, etc.)
4. But keep it at 24 Pipes @ 750MHz?
Given that 1+2 (Even though the usefulness of #2 is somewhat questionable ATM due to performance issues)
Where are you getting this from? I've heard others say this too.
ATI takes a ~20% hit for 4XAA when HDR is enabled. This is true for Far Cry as well, though the numbers without AA are abnormally low for ATI in this game. There's a thread about this somewhere on these forums.
This is lower than the hit the 7800GTX takes for 4xAA without HDR.
Chalnoth
05-Dec-2005, 23:37
The RV530 is half the speed of R520 in newer games, and in those circumstances we're expecting R580 to double R520's performance, not get a measly 35% boost. For older games, who cares at >60fps for 1600x1200 if the increase is only 50% or less?
Consider that if nVidia were to accually pull out a 750MHz part with 32 pipelines, we'd be talking about an 80% increase in pixel processing power. Since the GeForce 7800 GTX 512 currently pretty much always wins out against the R520 in performance, so should a hypothetical 32-pipeline, 750MHz G7x part win over the R580.
If they release a 24-pipe card, ATI will have performance in the bag. A 32-pipe card, however, will make it more interesting. I'm inclined to think NVidia will win overall, but ATI will still lead in newer games.
Not necessarily. We'd still be talking about a 38% increase in pixel processing power on nVidia's part, which is not far from what we should expect the R580's performance boost over the R520. In this scenario, ATI may close the gap somewhat, and pull ahead in one or two games, but is unlikely to have an overall win.
Mintmaster
05-Dec-2005, 23:41
G70 doesn't feature an extra shader unit per pipeline compared to NV40 (and derivatives), but simply enhanced shader units: http://www.beyond3d.com/previews/nvidia/g70/index.php?p=02.
Because of this it wouldn't be accurate to predict the performance of a hypothetical G71 with 3 shader units per pipe on the performance difference of a G70 and NV40 with equal number of pipelines and same clock speed.
I guess you're right, but I'm looking at how G70's shader pipes are not much faster than R520's, even though the former has two shader units in each.
But you're right in that I can't make any assumptions about the scaling.
ToxicTaZ
05-Dec-2005, 23:43
People are confusing memory density (Mbit) with memory interface width (bits).
PS 3 is using a 128-bit memory interface to the GDDR3 (1,400MHz*128-bits / 8-bit = 22.4 GB/sec). The memory density of those GDDR3 modules might be 256Mbit (32MB/module) meaning it would require 8 modules in order to total 256 MB.
Yes your right! I miss read that sorry! 128-bit it is...
Is the RSX a slower G71 @550MHz? (PS3)
http://theinquirer.net/?article=27463
G71 @750MHz (PC) out the same time frame as the RSX? Say around April...
Is the RSX @550MHz 90nm ...24-pipe? or is it more? not much info on this!
Chalnoth
05-Dec-2005, 23:45
By the way, I'd like to state something else here.
I feel that an absolute best-case scenario for the G71 would be for it to remain 24 pipelines, but implement multithreaded execution. Now, this might not be quite as crazy as it sounds, as nVidia currently has a patent out there for multithreaded execution. So it may be that the G70 is actually capable of it, but it isn't of much use due to the particular implementation.
This is highly wishful thinking, but if nVidia were to focus on improving performance of multithreaded execution on the G71 (assuming that the G70 is capable of it), while increasing clock speeds on the move to 90nm, it would really be a monster card.
That is to say, if multithreaded execution were made more viable, nVidia would automatically gain a significant performance boost for anisotropic filtering. Also, to make multithreaded execution viable, larger caches are needed. This would automatically improve FSAA performance as well as texture cache efficiency.
But I believe it is highly unlikely that there will be any significant multithreaded execution in an nVidia part until their next-generation architecture.
I feel that an absolute best-case scenario for the G71 would be for it to remain 24 pipelines, but implement multithreaded execution. Now, this might not be quite as crazy as it sounds, as nVidia currently has a patent out there for multithreaded execution. So it may be that the G70 is actually capable of it, but it isn't of much use due to the particular implementation.
That's quite unlikely imho, multithreading is not something you can just drop on top of current G70 architecture, they need to completely decouple TMUs from fragment shading pipelines first.
I'm willing to bet that (http://v3.espacenet.com/textdoc?CY=ep&LG=en&F=4&IDX=SG112989&DB=EPODOC) patent is about their next gen architecture.
ToxicTaZ
06-Dec-2005, 00:32
Who has not read this yet? Even the Inquirer said it would be these speeds over a month ago!
"When it comes to G71 as a graphic chip, Nvidia will get that chip to insane speeds and we expect at least 650 to 700MHz for the cherry picked top of the range."
http://theinquirer.net/?article=27463
Mintmaster
06-Dec-2005, 00:33
We'd still be talking about a 38% increase in pixel processing power on nVidia's part, which is not far from what we should expect the R580's performance boost over the R520.
I guess this is where we disagree.
I expect R580 to be clocked higher than R520. Even per clock cycle, I expect a 60% increase on average, reaching 100% some games and >150% in some shaders (e.g. Rightmark3D's lighting tests, as shown by RV530).
38% is far from what we should expect from R580 over R520.
For another comparison, divide R580 and the 24-pipe G71 by four (and a bit). You basically get X1600XT vs. 6800 (plain). It's close in older games, but BF2, COD2, and FEAR are all substantial wins for the former.
If you take a 32-pipe G70 at a more sane clock of say 650MHz, then it'll be mixed, with NVidia taking the overall crown but still losing occasionally. Of course, they can always release an uber clocked version like they did this gen for an across the board sweep.
Anyway, time will tell.
Chalnoth
06-Dec-2005, 02:14
Yes, I definitely disagree with you there. Due to power consumption constraints, I highly doubt that they'll be able to clock a 400+ million transistor part any higher than the R520 for a retail part. It may well have higher-clocked memory, but I doubt it'll have a higher-clocked core. And I think that your performance estimates are also overly-optimistic. But we'll see.
Chalnoth
06-Dec-2005, 02:16
That's quite unlikely imho, multithreading is not something you can just drop on top of current G70 architecture, they need to completely decouple TMUs from fragment shading pipelines first.
I'm willing to bet that (http://v3.espacenet.com/textdoc?CY=ep&LG=en&F=4&IDX=SG112989&DB=EPODOC) patent is about their next gen architecture.
Oh, I agree it's very unlikely (you know, getting close to the range of the likelihood of porcine flight). But it wouldn't require decoupling of the TMU's from the fragment shading pipelines. It just wouldn't be as effective as a decoupled, multithreaded architecture.
Oh, I agree it's very unlikely (you know, getting close to the range of the likelihood of porcine flight). But it wouldn't require decoupling of the TMU's from the fragment shading pipelines. It just wouldn't be as effective as a decoupled, multithreaded architecture. That's right. BTW Nvidia has a patent about decoupled TMUs too :D
REGISTER BASED QUEUING FOR TEXTURE REQUESTS (http://v3.espacenet.com/origdoc?DB=EPODOC&IDX=WO2005093665&F=0&QPN=WO2005093665)
Because each texture command originates from one of many independent execution units, each of which can have many independent threads, another embodiment of graphics processing unit 300 includes thread state information as texture command parameters in each texture command. Examples of thread state information include a thread type and a thread identification. The thread state information is used by the texture unit 305 to identify the texture referenced by the texture command and to determine the destination of the final texture value. Even though it would be still possible for NVIDIA to use their shaders ALU as address generators instead having a specialized math unit into their TMUs
ciao,
Marco
ninelven
06-Dec-2005, 02:38
I'm still not entirely convinced we will ever see a 7900. They would most likely get better yields than R580 with such a part, but they would (probably) lose most, if not all, shader bound benchmarks, which is where the focus is going forward.
Chalnoth
06-Dec-2005, 02:50
I'm still not entirely convinced we will ever see a 7900. They would most likely get better yields than R580 with such a part,
Depends upon what the specs are. A 32-pipeline, 750MHz part would most likely be rare indeed (or at least very hot and very expensive).
But while G70 is twice as fast as the 6800GS, R580 is four times as fast as the X1600XT (per clock, neglecting bandwidth in both cases).I don't like the odds of R580 being clocked at similar levels to RV530. It sounds like a big chip.
Why not have both clockspeed and pipes? NVIDIA worked on NV40 for two + years before its release, they have then refreshed it with the G70 that has the max speed of 550 MHz on a non-performance process, so basically they have had the ability to work on the same basic design for almost another two years from when the 6800 Ultra was received back at the labs. Through good tools, design, and optimizations why do you think it is beyond NV's engineers to create a part that was not only big, but could run fast too? 90 nm Low-K is a high performance process, and from all indications it runs very well.I would hope they were using the extra resources freed up by building G7x off NV4x to polish G8x. I can't see this G71 being anywhere near as important.
mrcorbo
06-Dec-2005, 03:06
No, I don't believe the ring bus has anything to do with this. Rather, I expect it exists for better utilization of resources when multithreading is used (as well as allowing for high clock speeds, though nVidia doesn't appear to be having any problem there). A better analysis of the ring bus would be to say that it's a fantastic architectural advancement for ATI's architecture, but may or may not be useful at all for nVidia.
Not exactly what I meant, so let me be a little more precise. ATI have achieved better memory performance by implementing a programmable memory controller which allows them to have more efficient transfers to/from memory. The G70 seems to be adequately served by it's current memory controller and the memory chips available to service it. Now if they were to increase the number of pipes to 32 from 24 or even just bump the clockspeed up a signifigant ammount and not produce a comparable increase in memory performance than the memory becomes the bottleneck.
I understand that this is "old" thinking and this kind of performace will become less and less important going forward, but it is my belief that, at the time that the design decisions were being made for this coming generation of products, these were still seen to be important issues for the time frame that these products are due to be introduced.
(You have no idea how hard it was for me to figure out how to get that last bit gramatically correct :))
In a nutshell, I don't think that there will be enough of these new-style games for either of the IHVs to afford to be unimpressive in current games. And by unimpressive I mean only having a minimal increase in performance over the previous generation. Imagine the situation as it was in the GF3 era, where the new generation actually underperformed the previous generation in some cases, but add in a viable competitor that gave you the new features AND was noticably faster than the previous generation in every way. Not a good situation to find yourself in.
mrcorbo
06-Dec-2005, 03:30
Where are you getting this from? I've heard others say this too.
ATI takes a ~20% hit for 4XAA when HDR is enabled. This is true for Far Cry as well, though the numbers without AA are abnormally low for ATI in this game. There's a thread about this somewhere on these forums.
This is lower than the hit the 7800GTX takes for 4xAA without HDR.
It doesn't matter the % hit you take if the end result is that you don't get playable framerates at the resolutions you expect to be able to play at. You have to figure that anyone shelling out the cash for one of these cards is likely to have a display capable of at least 1600X1200 and a hard-core gamer would be loathe to sacrifice resolution or playable framerates for eye-candy.
Ailuros
06-Dec-2005, 05:21
Sure you could, but that´s too oversimplified for my taste. ;)
While basically you are on the right track in pointing out what NV did "right" with NV4X/G70, you have to adapt that "knowledge" to new possibilities with 90nm (FSG vs. Low-K). With that in mind, they could also have both higher complexity AND higher clockspeed, relative to GPUs built on 110nm.
That's what a hypothetical 8 quad chip@=/>550MHz actually suggests. That's 120MHz higher than a 256GTX and that with at least estimated =/>80M transistors more. After that they can always after a specific timeframe raise the core frequency even more.
Margin wise it should be pretty negligible which route they take, since they already have a fair share of understanding how high they can clock their parts, while still being not to aggressive and butchering yields. That´s not saying 750MHz looks easily accomplished, but at the current state we can only draw conclusions with RSX in mind, where they have to meet very strict guidelines for power consumption and power dissipation. G71 won´t have those limits, so basically they could clock it through the roof (that´s speculative, since we don´t know what headroom NV really has with their current designs), but in a relative sense, nonetheless.
Not with 8 quads though. I'm not oversimplyfying things by far au contraire. Could it be that some of you concentrate exclusively on texel fillrates here and forget the necessity for even more ALU throughput? At the same clockspeed an 8 quad G7x has roughly over 30% more ALU power than on 6 quads.
While it was the idea of a genius to take their already well fitted NV4X (high complexity, scalability, average clockspeeds) one step further and build it on 110nm (even more complexity, same scalability, average-high clockspeeds), you need to support the thought that with 90nm in mind, you can have an entirely different beast of GPU.
How much higher can you go than hypothetical 750MHz though, that's the real question. In my theory you can after a specific time raise the clockspeed further to say 650MHz and have a substantially faster ultra high end model and for that kind of jump you'd need on estimate to clock a 6 quad@750MHz chip all the way up to 870MHz.
Ailuros
06-Dec-2005, 05:27
3x
X1600XT is nothing more than a replacement for X700Pro. Which has twice as many ROPs and TMUs. The X1600XT is comfortably faster.
Jawed
In newer shader heavy games yes. In any other case comfortably is relative.
http://www.xbitlabs.com/articles/video/display/radeon-x1600_11.html
http://www.xbitlabs.com/articles/video/display/radeon-x1600_3.html
Ailuros
06-Dec-2005, 05:31
Who has not read this yet? Even the Inquirer said it would be these speeds over a month ago!
"When it comes to G71 as a graphic chip, Nvidia will get that chip to insane speeds and we expect at least 650 to 700MHz for the cherry picked top of the range."
http://theinquirer.net/?article=27463
Highly reliable source :roll:
Highly reliable source :roll:But from what I heard, it isn't 100% wrong, from the pov that the featureset improvements in G71 will be there in RSX (I'm not sure if it's not rather the other way around actually, but it's not like we cared). That leaves the question of the number of pixel pipelines, but I guess we'll finally know a day or another.
Of course, overall, TheInq's article is a damn nice joke, making no sense whatsoever - that, I can fully agree with ;)
Uttar
Mintmaster
06-Dec-2005, 07:45
It doesn't matter the % hit you take if the end result is that you don't get playable framerates at the resolutions you expect to be able to play at.
Have you even looked at the numbers or are you just talking out of your ass? We discussed this before (http://www.beyond3d.com/forum/showthread.php?p=629959#post629959).
Far Cry HDR (http://www.hardware.fr/articles/599-5/geforce-7800-gtx-512-mo-nouvelle-reine-3d.html):
With 7800GTX you can play at 1280x1024, ~60fps
With the X1800XT you can play at 1024x768 w/ 4xAA, ~60 fps. 6xAA is only around 5% futher hit, if you like super clean edges.
(these framerates are extrapolated using resolution scaling from other sites. See B3D link above)
Which would you take? If you say the former then obviously you don't care about image quality at all, so what's the point in upping the resolution? Don't bring the LCD argument into this, because that's just lame. It's only a coincidence that the GTX happens to play 1280x1024 well, and how many gamers buying $500 video cards have a $200 LCD with this resolution anyway?
Splinter Cell HDR (http://www.hardware.fr/articles/599-7/geforce-7800-gtx-512-mo-nouvelle-reine-3d.html): The GTX is slower than the X1800XT. It can barely handle HDR at 1600x1200 without AA (49.6fps). There's no AA results, but ATI would likely get over 60fps at 1280x1024 with 4xAA.
Serious Sam 2 HDR (http://www.hardware.fr/articles/599-6/geforce-7800-gtx-512-mo-nouvelle-reine-3d.html): The X1800XT leads the GTX 512MB this time. Okay, ATI has some image quality problems, but that's the coders fault. Using FP16 filtering for bloom is stupid, even on NVidia hardware (I'll explain my stance further for anyone who wants to know the technical details).
HL2: Lost Coast HDR (http://www.hardware.fr/articles/599-4/geforce-7800-gtx-512-mo-nouvelle-reine-3d.html): Yet again, The XT's HDR performance is right with the GTX. (This HDR method is a bit of a hack job, which is why AA works on NVidia's cards).
So I ask you again, where did you get your numbers? It seems your claims are full of shit.
You have to figure that anyone shelling out the cash for one of these cards is likely to have a display capable of at least 1600X1200 and a hard-core gamer would be loathe to sacrifice resolution or playable framerates for eye-candy.
Playing at high resolution without AF is utterly pointless. Everything's a blur, and the edges are jagged. Once you enable AF, AA makes a bigger difference cleaning up edges than even upping the resolution two steps.
I'll say it again: Any gamer that declines 4xAA for a mere 25% increase in dpi is a fool.
DemoCoder
06-Dec-2005, 07:46
Mint, did you get back on the ATI payroll or something? :)
bloodbob
06-Dec-2005, 07:51
Serious Sam 2 HDR (http://www.hardware.fr/articles/599-6/geforce-7800-gtx-512-mo-nouvelle-reine-3d.html): The X1800XT leads the GTX 512MB this time. Okay, ATI has some image quality problems, but that's the coders fault. Using FP16 filtering for bloom is stupid, even on NVidia hardware (I'll explain my stance further for anyone who wants to know the technical details). Me cause the only thing I could imagine would be possible the fixed function filtering is slower then doing it via a shader.
Until someone can get some concrete details on how HL2 actually does HDR I'll say lets not even included it in discussions.
Mintmaster
06-Dec-2005, 08:02
Mint, did you get back on the ATI payroll or something? :)
:lol:
Seriously, can you really believe that people in this age are underrating anti-aliasing this much? When the performance hit is lower now than ever in the past? Of all people, even Chalnoth is lauding this feature of ATI's.
Regarding the other threads, well, it's fun to predict. Even you get caught up in it sometimes :razz:. I'm just drooling over the thought of R580 and eventually R600. It would be so sweet to be able to program on Xenos.
Using FP16 filtering for bloom is stupid, even on NVidia hardware (I'll explain my stance further for anyone who wants to know the technical details).Seconded, please.
Mintmaster
06-Dec-2005, 08:31
Me cause the only thing I could imagine would be possible the fixed function filtering is slower then doing it via a shader.
You don't need to blend the bloom into the HDR rendertarget, that's why it's useless. Blending into the final 32-bit output buffer, even during the tonemapping pass, is just as good.
When you make your low res bloom texture, why does it have to be FP? Sample from the FP16 scene texture, but do whatever scaling or tonemapping you want before writing it to the bloom texture that will be composited on top of the scene. This way you can use 32-bit for the bloom texture to get better speed, with next to zero image quality loss.
A full screen of FP16 filtering will take up a lot more time than a full screen of regular texture filtering, even on G70.
dizietsma
06-Dec-2005, 08:34
I don't rate AA at all apart from FEAR where I had to play at 800x600 and hated it. I much prefer, in general, AF over AA if I had to pick one. I'd rather have super effects like HDR over either though whether it be done on nvidia or Ati hardware.
I'm still sticking with my preferred route for nvidia of keeping the top G71 just like G70 but with two cores on a single package. Maybe just because nobody else is. With two 16x SLi and the cheaper, smaller, less hot 90nm process this makes sense now.The groundwork has already been done by Gigabyte and Asus and we hear rumours of driver support for 4 cores. It also allows you to spend zilch on R&D ( unless there is some for the move to 90nm). I think it saves money and time that could be better spent on G80. Of course not everybody has an Sli motherboard but then you have the choice of the single core 24 pipe 90nm "G70" running at improved clocks.
G71 Ultra 2 x 90nm "G70" cores on one package 48 pipes
G71 GTX 2 x 90nm "G70" cores using defect G70 cores 40 pipes
G71 GT 1 x 90nm "G70" core 24 pipes
G71 GS 1 x 90nm "G70" core 20 pipes
The top cards would still be expensive but anything less than $1000 is still cheap in the wider world.
I'd say my theory only has a 98-99% chance of being wrong, but what the hell ! :D
I don't think anything like a 32-pipe 750MHz part will appear anytime soon. If, than it'll just be a little stopgap to gain the performance edge for the time being, just like GTX512. I think we'll see the next-gen from nV sooner than we thought. And to make the guessing game perfect, I think it'll be a unified architecture :)
But then, I didn't even believe they'd ever release GTX512 - obviously the marketing division has more say than I thought.
If G71 is 32-pipe 600 on 90nm, then would it be reasonable to expect that an 80nm refresh could hit 700-750?
Do we expect NVidia to utilise the 80nm node before G80?
Jawed
I don't like the odds of R580 being clocked at similar levels to RV530. It sounds like a big chip.
R520 is almost 3x bigger than RV515 and even thought is clocked 1,1x faster (Asus and HIS boards are clocked 1,15x faster). I think R580 will be clocked faster than RV530XT.
bloodbob
06-Dec-2005, 10:58
When you make your low res bloom texture, why does it have to be FP? Sample from the FP16 scene texture, but do whatever scaling or tonemapping you want before writing it to the bloom texture that will be composited on top of the scene. This way you can use 32-bit for the bloom texture to get better speed, with next to zero image quality loss.
Okay so are you saying we should have less then 8bits of precision on the bloom or should we have bloom around white text?
Assuming your calculating exposure on the fly you'll probably want to do bilinear filtering anyway.
Chalnoth
06-Dec-2005, 11:02
I'm still sticking with my preferred route for nvidia of keeping the top G71 just like G70 but with two cores on a single package.
Two cores on one package is quite possibly the stupidest thing nVidia could possibly do. One core with twice the pipelines would have fewer transistors and higher performance.
I guess you're right, but I'm looking at how G70's shader pipes are not much faster than R520's, even though the former has two shader units in each.
R520 also has two shader units per pipe: http://www.beyond3d.com/reviews/ati/r520/index.php?p=03. (Just like every ATI chip since R300.)
DegustatoR
06-Dec-2005, 12:17
Why exactly are all of you assume that G71 is a high-end part?
If G71 is 32-pipe 600 on 90nm, then would it be reasonable to expect that an 80nm refresh could hit 700-750?
Jawed
There should not be that much difference, unless high-speed technology effects of TSMC were improved.
Do we expect NVidia to utilise the 80nm node before G80?
I do, maybe not for high end but there are always some products to split for different nodes these days.
Sunrise
06-Dec-2005, 12:35
How much higher can you go than hypothetical 750MHz though, that's the real question. In my theory you can after a specific time raise the clockspeed further to say 650MHz and have a substantially faster ultra high end model and for that kind of jump you'd need on estimate to clock a 6 quad@750MHz chip all the way up to 870MHz.
That "raising the core clock even more" point you´re talking about:
It´s very hard to see whether they even need to "raise clocks substantially more" due to the fact that G80 will take over from there, so i don´t really think (at this point in time) that there´s any need for another ultra-high-clocked G7X. That scenario really made much more sense (and only then) when ATi was "late to the game", so basically they had both the luxury of cherry-picking G70 (waiting for the release of R520 and made it look like ATi not only was late, but instead made it look like they got outclassed by NV an all fronts, except image quality) and working on the "real" Ultra-model (http://www.beyond3d.com/forum/showthread.php?p=622722#post622722) (which has some modifications unknown to me, but incorporated in RSX) which we now know is G71.
NV had lots of time to concentrate on their 90nm designs, so RSX should give us a pretty good idea about what G71 is capable of. R580 won´t be easy to beat from things i´m hearing and that will certainly be a good starting point were NV needs to make things right from the get-go, they won´t have time to cherry-pick, they need to release something that is as fast as it can get right from the beginning.
I´m also not trying to limit my thoughts by sole clockspeed, since we should know pretty well by now that clockspeed is important and all, but it won´t tell us the whole story. This point will get even more important in the future, were they will incorporate even more sophisticated tech and more power savings, due to the fact that lots of different markets need to be addressed, with different requirements, while keeping R&D down.
In the same context it´s hard to guess what final configuration G71 will have, currently (when thinking about it throughly) i´m not entirely sure, since we haven´t seen a 90nm G7X from NV were we can kind of estimate whats doable and what´s not. It would make a lot of sense that they modified G70 (RSX) to some extent and looked at how high they can clock it. We should know that pretty soon. After all, we´re all having fun speculating, haven´t we ? ;)
EDIT: Spelling.
Why exactly are all of you assume that G71 is a high-end part?
71 > 70 :razz: :wink:
NV had lots of time to concentrate on their 90nm designs
Which explains why they're so late, apparently.
Jawed
Sunrise
06-Dec-2005, 12:46
Why exactly are all of you assume that G71 is a high-end part?Care to elaborate on your thoughts ? Looking at what´s talked about at CSFB (CFO and VP of NV), there´s currently no reason to believe otherwise.
Which explains why they're so late, apparently.
Jawed
No, they just had lots of space even for their older 110nm product line...
Just see what they managed to pull out with GTX 512MB...
Honestly there is no definate difference between 1800 XT and 7800GTX 512MB and we're talking about two completely different ranges of hardware. Pretty impressive.
Now with smaller dev process they can rev up clocks and lower the power consumtion and temperatures at same time. They'll optimize the core for time being but probably with no drastic changes. I see no point in doing so anyway. Don't fix if ain't broken and NV4x cores certanly fall into this category. People just don't want to admit excellent NV's comeback after FX series... I'd get myself a NV4x based graphic card right away if i'd have $$$ (which i also don't have for any ATI counterpart). Instead i can tease myself on causins 6600GT or on friends machine with the very same GPU powered card...
Still running good old R9600 Pro (highly OC'ed hehe).
Which explains why they're so late, apparently.
Jawed
Late with what? According to which timetable? :???:
Care to elaborate on your thoughts ? Looking at what´s talked about at CSFB (CFO and VP of NV), there´s currently no reason to believe otherwise.
Sure there is. All he said about G71 was, according to VR-Zone - "G71 will be 90nm and is expected to be much higher clocked at 750MHz." Just because it is 90nm, and will be clocked at 750Mhz, does not automatically mean it will be high end (i.e. beat G70).
BTW I don't know what G71 is, all I'm saying is that from what the Marv Burkett said, you can't conclude that it will be a high end part.
Dave Baumann
06-Dec-2005, 14:48
IIRC (and I have the transcript) no definitive parts or clockspeeds were mentioned at all - all that was mentioned was that G7x refreshes would be in the next 60 days and another part "beginning their next generation" would be in H1 '06. The G71 / 750MHz quote originates from HKEPC and I wasn't exactly clear on what the quote was. Their was also a third story from Digitimes that points to G72 and G73 as coming early next year. Right now people are taking these three reports and shaking them all up.
Can anyone clear up what HKPEC's sources or reasoning for G71 / 750MHz was?
IIRC (and I have the transcript) no definitive parts or clockspeeds were mentioned at all - all that was mentioned was that G7x refreshes would be in the next 60 days and another part "beginning their next generation" would be in H1 '06. The G71 / 750MHz quote originates from HKEPC and I wasn't exactly clear on what the quote was. Their was also a third story from Digitimes that points to G72 and G73 as coming early next year. Right now people are taking these three reports and shaking them all up.
Can anyone clear up what HKPEC's sources or reasoning for G71 / 750MHz was?
Yeah, I've been wondering if we've been FUD'ed, tho it's also possible that after he hashed up his answer so badly, they noticed we were all running around going "huh? what?", and so they leaked a clarification that was meant to be a bit more definitive that there is a high-end 90nm G7x coming. That's how I'm leaning at the moment.
Edit: But that Digitimes piece was ever so not comforting that there is a high-end coming, so, meh, I'm still in the "squishy vibes" zone.
Sunrise
06-Dec-2005, 15:43
Sure there is. All he said about G71 was, according to VR-Zone - "G71 will be 90nm and is expected to be much higher clocked at 750MHz." Just because it is 90nm, and will be clocked at 750Mhz, does not automatically mean it will be high end (i.e. beat G70).I highly doubt that this particular "source" is completely accurate. They have a past record of spreading false information (HKEPC is the originating source, Vr-Zone then takes that info and writes it 1:1 on their page -> bad), but there´s still the question of how they came up with that particular figure. We therefore can´t conclude anything at all, but we can take guesses. If we had really hard evidence we wouldn´t need to speculate, would we ?
Sunrise
06-Dec-2005, 15:46
IIRC (and I have the transcript) no definitive parts or clockspeeds were mentioned at all - all that was mentioned was that G7x refreshes would be in the next 60 days and another part "beginning their next generation" would be in H1 '06.Thanks for clearing that up, Dave.
Thanks for clearing that up, Dave.
Y'know, I'd even go a little further than that. Earlier answers sounded like they were talking about GTX512 ramping as the high-end for spring.
What I heard:
If you look at the next big cycle which is going to be the spring refresh and you look back into the February to April timeframe next year, we still think we're very well-positioned. . . If you look at every price point we typically are the performance leader in that space. We are in the process of refreshing that entire lineup with brand-new GeForce 7. I think if you go back and listen to our conference call. . . we are really focused on spring refresh and we're looking at how we are positioned in all the different segments, and we are very confident that with the new 7's coming out, given the headroom, and 90 nanometers, and being that they are already competitively positioned today . . .
All that "today" and "in the process of refreshing" (what is GTX512 if not a refresh of GTX?) and "already competively positioned".
But at the same time folks seem to think one's coming, so maybe it was just hashed answers now trying to be clarified with leaks. <shrugs>
Edit: I just realized that part of what was leading me that way was not that answer, but the question. . .which assumed GTX512. Hard to say how responsible to hold NV for the question. It was something like "We are starting to see the ramp-up from the new high-end graphics part. . ."
Mintmaster
06-Dec-2005, 17:02
R520 also has two shader units per pipe: http://www.beyond3d.com/reviews/ati/r520/index.php?p=03. (Just like every ATI chip since R300.)
There's just no quit in you, is there!
My first (I think) reply in this thread was regarding someone's post that R580 and G70 can do 48 MAD/DP3 operations per clock. What I'm saying is an NVidia-PR-speak shader unit is not equivalent to an ATI-PR-speak shader unit. NV40 wasn't even close to twice the speed of R480.
So if I may restate the original claim that you objected to: An extra shader unit, as defined by NVidia PR, will have to scale better than in the past to make a difference.
By NVidia's own claim, MAD is by far the most used instruction, so you would think that G70 would have an advantage over NV40 per pipe. But that rarely proved to be the case, even in ShaderMark. Adding yet another MUL/MAD capable "shader unit" is unlikely to do much, so it would have to be a heavier modification. (Which, BTW, I'm not precluding.)
As the texturing load drops off with more complex shaders, the dual-issuable MAD (or dual-issuable combinations of ADD or MUL) in G70's pipeline should increase performance quite dramatically with respect to NV40.
The only issues with dual-issuable MAD remain:
data dependency between the two MADs (if the second is dependent on the result of the first)
FP32 operands restricted to four different registers in the pairJawed
No, they just had lots of space even for their older 110nm product line...
Just see what they managed to pull out with GTX 512MB...
Honestly there is no definate difference between 1800 XT and 7800GTX 512MB and we're talking about two completely different ranges of hardware. Pretty impressive.
Now with smaller dev process they can rev up clocks and lower the power consumtion and temperatures at same time. They'll optimize the core for time being but probably with no drastic changes. I see no point in doing so anyway. Don't fix if ain't broken and NV4x cores certanly fall into this category. People just don't want to admit excellent NV's comeback after FX series... I'd get myself a NV4x based graphic card right away if i'd have $$$ (which i also don't have for any ATI counterpart). Instead i can tease myself on causins 6600GT or on friends machine with the very same GPU powered card...
Still running good old R9600 Pro (highly OC'ed hehe).
Who hasn't admited nvidia made a good comeback:???:
The problem now is that they have subpar filtering compared to ati, even without ati's independent AF.
That and their 512MB part is even less available then ati's part and more money.
http://www.newegg.com/Product/ProductList.asp?Submit=Go&DEPA=0&type=&description=7800gtx+512&Category=0&minPrice=&maxPrice=&Go.x=0&Go.y=0
http://www.newegg.com/Product/ProductList.asp?Submit=GO&Range=1&bop=and&description=X1800&srchInDesc=xt
Mintmaster
06-Dec-2005, 17:47
Okay so are you saying we should have less then 8bits of precision on the bloom or should we have bloom around white text?
Assuming your calculating exposure on the fly you'll probably want to do bilinear filtering anyway.
Okay, you don't seem to be getting what I'm talking about.
You have texture A, your scene in which you rendered everything at FP16. Now it's time to do post-processing. You want to create texture B, the bloom texture that is reduced size and will be enlarged to the whole screen. Judging by the SS2 images, this buffer is about 128x128 (with good reason as bloom is a low frequency function). Then you need to create C, the 32-bit backbuffer that will be output to the display.
You create a mip-map chain of A to get the final 1x1 level scene average value that you can use to determine exposure. Still, this is a FP16 value. (FP filtering does not help you here practically, I guarantee it. You need 40 bytes of data access per pixel here, so the shader pipes will have plenty of time to burn.)
Now you take your first or second mipmap of A (for performance purposes), which are in FP16, and you do the horizontal and vertical gaussian blur passes. You can use whatever weighting or threshold value that you want, so white text won't have a halo. In the final pass, however, output to a 32-bit texture. This is B, and you can do whatever you want, including accessing the 1x1 mipmap of A, to determine how you want the bloom to look in the final framebuffer.
Now it's time to create C. You have FP16 data in A, so you tone-map it, then add B. All done. There's no lost control or capability in doing it this way, unless you can prove me wrong.
Mintmaster
06-Dec-2005, 18:11
As the texturing load drops off with more complex shaders, the dual-issuable MAD (or dual-issuable combinations of ADD or MUL) in G70's pipeline should increase performance quite dramatically with respect to NV40.
Let me give you a more concrete definition of what "dramatic" is.
Per pipe per clock, the best improvement I could find (http://www.beyond3d.com/previews/nvidia/78512/index.php?p=05#ps) (except for two HDR shaders in ShaderMark where NV40 was well behind R480, not likely a math heavy shader) was 22% in the 3-light Phong shader of RightMark3D. The more complicated 3-light Cook-Torrence shader had only a 7% improvement. By contrast (http://www.digit-life.com/articles2/video/r520-part2.html#p5), X1600XT gets 2.8x and 2.9x the performance of X1300Pro in those same shaders.
A GPGPU benchmark situation is not exactly the most realistic example.
EDIT: corrected numbers
Ailuros
06-Dec-2005, 18:19
Can anyone clear up what HKPEC's sources or reasoning for G71 / 750MHz was?
Probably the same that insisted shortly before the R520 release that it has 6 quads.
Ailuros
06-Dec-2005, 18:27
That "raising the core clock even more" point you´re talking about:
It´s very hard to see whether they even need to "raise clocks substantially more" due to the fact that G80 will take over from there, so i don´t really think (at this point in time) that there´s any need for another ultra-high-clocked G7X. That scenario really made much more sense (and only then) when ATi was "late to the game", so basically they had both the luxury of cherry-picking G70 (waiting for the release of R520 and made it look like ATi not only was late, but instead made it look like they got outclassed by NV an all fronts, except image quality) and working on the "real" Ultra-model (http://www.beyond3d.com/forum/showthread.php?p=622722#post622722) (which has some modifications unknown to me, but incorporated in RSX) which we now know is G71.
NV had lots of time to concentrate on their 90nm designs, so RSX should give us a pretty good idea about what G71 is capable of. R580 won´t be easy to beat from things i´m hearing and that will certainly be a good starting point were NV needs to make things right from the get-go, they won´t have time to cherry-pick, they need to release something that is as fast as it can get right from the beginning.
I´m also not trying to limit my thoughts by sole clockspeed, since we should know pretty well by now that clockspeed is important and all, but it won´t tell us the whole story. This point will get even more important in the future, were they will incorporate even more sophisticated tech and more power savings, due to the fact that lots of different markets need to be addressed, with different requirements, while keeping R&D down.
In the same context it´s hard to guess what final configuration G71 will have, currently (when thinking about it throughly) i´m not entirely sure, since we haven´t seen a 90nm G7X from NV were we can kind of estimate whats doable and what´s not. It would make a lot of sense that they modified G70 (RSX) to some extent and looked at how high they can clock it. We should know that pretty soon. After all, we´re all having fun speculating, haven´t we ? ;)
EDIT: Spelling.
To me it's either 6 quads@750MHz or 8 quads@=/>550MHz. Something like 8 quads@750MHz is out of the question IMHLO.
Degustator,
Are codenames so important? ;)
Dave Baumann
06-Dec-2005, 18:30
Probably the same that insisted shortly before the R520 release that it has 6 quads.
But it does, it just didn't yield well at 6 quads!
SugarCoat
06-Dec-2005, 18:34
But it does, it just didn't yield well at 6 quads!
Okay thats a tad confusing, you arent referring to shipping revisions though right? I mean R520's dont have transistors dedicated to 2 disabled quads or do they?
Okay thats a tad confusing, you arent referring to shipping revisions though right? I mean R520's dont have transistors dedicated to 2 disabled quads or do they?
He is joking :), it was a popular rumor before :wink:
Ailuros
06-Dec-2005, 18:38
The problem now is that they have subpar filtering compared to ati, even without ati's independent AF.
Wherever filtering optimisations cause side-effects they should be turned off irrelevant of GPU employed. In that regard there's no optimum or sub-optimum, it's either angle dependent full trilinear on all texturing stages or half-assed brilinear on one stage and the rest getting plain ole bilinear and that goes for both IHVs.
Those that want to convince me that those do not create side-effects under specific circumstances on Radeons either do not want to admit them or simply can't see them and in the latter case I wouldn't even mention anything about subpar quality.
Ailuros
06-Dec-2005, 18:39
Okay thats a tad confusing, you arent referring to shipping revisions though right? I mean R520's dont have transistors dedicated to 2 disabled quads or do they?
Sarcasm *ding ding* ;)
SugarCoat
06-Dec-2005, 18:42
Sarcasm *ding ding* ;)
wasnt funny ):
wasnt funny ):
Well, most of us didn't find it too funny when errm, "some" sites (vr, kaff, kaff) used that "it's just not yielding well so they'll release it at 4" thing either.
Well, okay, I thot it was hilarious. At first. But then I realized they were really expecting to sell that.
Wherever filtering optimisations cause side-effects they should be turned off irrelevant of GPU employed. In that regard there's no optimum or sub-optimum, it's either angle dependent full trilinear on all texturing stages or half-assed brilinear on one stage and the rest getting plain ole bilinear and that goes for both IHVs.
Those that want to convince me that those do not create side-effects under specific circumstances on Radeons either do not want to admit them or simply can't see them and in the latter case I wouldn't even mention anything about subpar quality.
Well I'd just like the abilty to really turn off all the optz on my 6600GT(heh..)
The way it is now sure it looks better but it's still not good enough, it doesn't really go with the name "HQ".
To me the quality of filtering is represented is balanced, not HQ.
"I want to shimmy-shimmy-shimmy through the break of dawn yeah" :wink:
And I've seen fraps videos of radeons and they do look better.
But I find with fraps is even though it's lossless, it's still not representaive of what's going on.
I've taken some fraps caps and they dont look quite right, it looks close but the colors/brightnes/contrast is off a bit.
Ailuros
06-Dec-2005, 19:38
Well I'd just like the abilty to really turn off all the optz on my 6600GT(heh..)
The way it is now sure it looks better but it's still not good enough, it doesn't really go with the name "HQ".
To me the quality of filtering is represented is balanced, not HQ.
"I want to shimmy-shimmy-shimmy through the break of dawn yeah" :wink:
High quality since 78.03 is high quality, ie you get full trilinear and no texturing stage optimizations.
And I've seen fraps videos of radeons and they do look better.
But I find with fraps is even though it's lossless, it's still not representaive of what's going on.
I've taken some fraps caps and they dont look quite right, it looks close but the colors/brightnes/contrast is off a bit.
I was hoping to hear from real time experience with a Radeon. In those cases where underfiltering causes side-effects Radeons aren't immun either and the only other cure to decrease any kind of shimmering or any other side-effect is to disable filtering related optimizations.
Ailuros
06-Dec-2005, 19:42
Well, most of us didn't find it too funny when errm, "some" sites (vr, kaff, kaff) used that "it's just not yielding well so they'll release it at 4" thing either.
Well, okay, I thot it was hilarious. At first. But then I realized they were really expecting to sell that.
Hmmmm there's a funny story behind where the 6/8 quad BS originated from. I wouldn't be one bit surprised if that's another nasty joke but coming from the other side this time.
DegustatoR
06-Dec-2005, 19:50
Are codenames so important? ;)
No, they're not. But right now i don't see the point for NV to make _any_ new high-end G7x for 1H06 since they said that they'll have G80 in 1H06 already.
So G71 at 90nm is probably either a cost-cutting 90nm version of G70 (in the same way how NV42 was just a 110nm version of NV41) or even something smaller than that -- something with 16PP and 256-bit memory bus for sub-$300 price range.
As for R580 i'm not sure that NV has to put anything against it at all -- G70-512 might be enough for that.
On the other hand it would be good for them to have some backup G7x-plan in case of problems with the new G80 architecture... So it's all fog and mystery right now :)
Adding yet another MUL/MAD capable "shader unit" is unlikely to do much, so it would have to be a heavier modification. (Which, BTW, I'm not precluding.)
By your own analysis modifying the shader units on NV40 to those found on G70 per clock performance didn’t go up a whole lot. But this has very little to do with adding a third shader unit. After all, all ATI is really doing from R520 to R580 is adding more shader units (arranged differently of course).
NV PR will obviously show the strong points of their architecture compared to their earlier products or competitor’s products. In G70’s case one would be the number of MAD operations. That’s most certainly misleading, because although G70 can perform twice as many MADs per pipe, the number of shader units per pipe (and to my understanding the total number of instructions) stayed the same as in NV40.
Simply put, there’s no data out there to predict how a hypothetical G7x with 3 shader units per pipe would perform.
andypski
06-Dec-2005, 20:26
In that regard there's no optimum or sub-optimum, it's either angle dependent full trilinear on all texturing stages or half-assed brilinear on one stage and the rest getting plain ole bilinear and that goes for both IHVs.
I'm not aware of circumstances on recent parts where we treat any texture stages differently from any others - all texture stages are treated exactly the same. With CatalystAI enabled the level of optimisations may vary, but only because of texture content, not the particular sampler.
We did have the old quality mode when anisotropic filtering was forced that did trilinear on stage 0 and bilinear on the other stages, which was based on an analysis of applications available at that time and the prevalence of the idea of a 'base texture' - sampler 0 always tended to have the most important texture data. That's a long time ago now, and I don't think that method has ever been applied on our more recent parts with more advanced filtering optimisation modes - with modern shaders you really have no idea typically which sampler has what data, so treating stages differently makes no sense.
I'm really not sure where this "half-assed brilinear on one stage and the rest getting plain ole bilinear and that goes for both IHVs" statement comes from - how carefully have you looked into how current filtering optimisations work?
High quality since 78.03 is high quality, ie you get full trilinear and no texturing stage optimizations.
I was hoping to hear from real time experience with a Radeon. In those cases where underfiltering causes side-effects Radeons aren't immun either and the only other cure to decrease any kind of shimmering or any other side-effect is to disable filtering related optimizations.
I'm using 81.85 and with all the dirty opts disabled filtering still leaves a lot to be desired in some titles.
Fire up richard burns rally and look at the ground (textures) while moving.
Looking at some 6800gs HDR numbers I think I'm better off with a unlocked gto2 since hdr is virtually unplayable at decent res so the SM3 goodies doesn't net anything good to me since I like high res gaming
...and it will be a stop gap card for me.
Better filtering, faster, and TAA:D (transparency aa)
PeterAce
06-Dec-2005, 21:54
Quick question : on my new GTX 512MB (using the 81.95 drivers) I've been testing the difference between 'Quality' (with all the three filtering optimsations off) and HQ and there is a quite a big difference ('The Compressonator' is a fantastic tool for seeing texture filtering difference - thanks Wavey for your 'X800 filtering' thread many moons ago!)
What exatly is the difference between Q and HQ, can someone explain what the difference is between these modes?
trinibwoy
06-Dec-2005, 22:48
That's quite unlikely imho, multithreading is not something you can just drop on top of current G70 architecture, they need to completely decouple TMUs from fragment shading pipelines first.
I'm willing to bet that (http://v3.espacenet.com/textdoc?CY=ep&LG=en&F=4&IDX=SG112989&DB=EPODOC) patent is about their next gen architecture.
I just read through that patent again (first time I've ever read an entire patent and understood all of it) and it doesn't seem that complex at all, at least in comparison to R520's scheduling/threading architecture. I wouldn't be surprised to see this implemented in the G7x refreshes.
I'm still not clear on how this would help performance though. How often would a situation arise where the workload would be such that some threads are relatively more math intensive than others (hence freeing up their currently dedicated texture samplers) ?
Mintmaster
06-Dec-2005, 22:50
By your own analysis modifying the shader units on NV40 to those found on G70 per clock performance didn’t go up a whole lot. But this has very little to do with adding a third shader unit. After all, all ATI is really doing from R520 to R580 is adding more shader units (arranged differently of course).
That's exactly my point. ATI's approach is different, allowing them to realize the full gains.
I'm just saying that if accelerating the most common instruction didn't make much difference (and NVidia isn't lying about that if you look at shader code), then accelerating all instructions in the same way is not going to much more fruitful. I think it has to do with the dependency restriction that Jawed mentioned, assuming he's right.
I'm not bashing the architecture in NV40/G70. From what I've heard, the reason one shader unit doesn't get used during texturing is because it's used for texture address calculations. Looking at it in reverse, they're effectively allowing us to use the texture address logic when no texturing has to be done. That's pretty intelligent, and is what allows NVidia to have much higher pipeline density than R520. The only reason ATI has a fighting chance right now is that NVidia didn't move to 90nm as quickly as ATI did.
Mintmaster
06-Dec-2005, 22:56
No, they're not. But right now i don't see the point for NV to make _any_ new high-end G7x for 1H06 since they said that they'll have G80 in 1H06 already.
That's a very good point. A 24-pipe G71 would last a long time since it could eventually become a upper-mainstream card like the 6800GT today. In fact, didn't they state the other 90nm parts are supposed to last nearly two years or something? The same could hold true for G71.
I'm still not clear on how this would help performance though. How often would a situation arise where the workload would be such that some threads are relatively more math intensive than others (hence freeing up their currently dedicated texture samplers) ? Maybe we read different patents.. cause I'm not sure to understand your question :)
I'll try to answer you anyway: if you're running vertex and pixel threads on the same machinery vertex threads are likely to be more math intensive (ie few or zero texture fetches) than pixel threads.
IMHO what they describe doesn't require that: as long as you can hide texture fetches latency issuing instructions from another thread everything would be ok.
SugarCoat
06-Dec-2005, 23:49
No, they're not. But right now i don't see the point for NV to make _any_ new high-end G7x for 1H06 since they said that they'll have G80 in 1H06 already.
So G71 at 90nm is probably either a cost-cutting 90nm version of G70 (in the same way how NV42 was just a 110nm version of NV41) or even something smaller than that -- something with 16PP and 256-bit memory bus for sub-$300 price range.
As for R580 i'm not sure that NV has to put anything against it at all -- G70-512 might be enough for that.
On the other hand it would be good for them to have some backup G7x-plan in case of problems with the new G80 architecture... So it's all fog and mystery right now :)
agree with you about no need for another refresh part of the G7X series on the high end. As far as the G70 architecture is concerned by early next year, i think its a success story, and completed its run case closed. Next high end cards should be changed significantly architecturally. Nvidia should find little benefit in a die shrink and further continuation of the current architecture. What ever is to come next has been in the works for awhile, though i dont believe it to be the native DX10 launch card from Nvidia. I dont think we'll even see any SM4.0 cards on display until CeBit.
trinibwoy
07-Dec-2005, 01:00
Maybe we read different patents.. cause I'm not sure to understand your question :)
I'll try to answer you anyway: if you're running vertex and pixel threads on the same machinery vertex threads are likely to be more math intensive (ie few or zero texture fetches) than pixel threads.
IMHO what they describe doesn't require that: as long as you can hide texture fetches latency issuing instructions from another thread everything would be ok.
Yeah we read the same patent :) I was thinking more along the lines of texture sampler resources being arbitrated among pixel shaders - didn't think about the vertex shader aspect (or unified shaders for that matter).
bloodbob
07-Dec-2005, 01:18
Okay, you don't seem to be getting what I'm talking about.
You have texture A, your scene in which you rendered everything at FP16. Now it's time to do post-processing. You want to create texture B, the bloom texture that is reduced size and will be enlarged to the whole screen. Judging by the SS2 images, this buffer is about 128x128 (with good reason as bloom is a low frequency function). Then you need to create C, the 32-bit backbuffer that will be output to the display.
You create a mip-map chain of A to get the final 1x1 level scene average value that you can use to determine exposure. Still, this is a FP16 value. (FP filtering does not help you here practically, I guarantee it. You need 40 bytes of data access per pixel here, so the shader pipes will have plenty of time to burn.)
Now you take your first or second mipmap of A (for performance purposes), which are in FP16, and you do the horizontal and vertical gaussian blur passes. You can use whatever weighting or threshold value that you want, so white text won't have a halo. In the final pass, however, output to a 32-bit texture. This is B, and you can do whatever you want, including accessing the 1x1 mipmap of A, to determine how you want the bloom to look in the final framebuffer.
Now it's time to create C. You have FP16 data in A, so you tone-map it, then add B. All done. There's no lost control or capability in doing it this way, unless you can prove me wrong.
Yeap okay thats works.
So okay as far as the results go the ATI should be getting a speed advatange as its doing 75% data transfer for the texture look ups for the bloom texture. So if they fixed it and enabled shader filtering for the bloom texture the only problem would be the Nvidia cards get an advantage because they use faster memory and we are doing 4x the nessecary data transfer for the bloom texture?
As for R580 i'm not sure that NV has to put anything against it at all -- G70-512 might be enough for that.
Only if Nvidia can actually produce the cards and get them into consumer channels. They are currently no where to be found.
Chalnoth
07-Dec-2005, 02:49
Only if Nvidia can actually produce the cards and get them into consumer channels. They are currently no where to be found.
You could pre-order one. The things are just in too high demand right now, despite the exhorbitant price. Newegg lists one with an ETA of 12/15, which isn't horribly bad for a pre-order.
You could pre-order one. The things are just in too high demand right now, despite the exhorbitant price. Newegg lists one with an ETA of 12/15, which isn't horribly bad for a pre-order.
Except that NewEgg doesn't do pre-orders...
Dave Baumann
07-Dec-2005, 03:02
Desist from the availability discussions in this thread please, there are (countless) others that are appropriate.
mrcorbo
07-Dec-2005, 04:16
Have you even looked at the numbers or are you just talking out of your ass? We discussed this
So I ask you again, where did you get your numbers? It seems your claims are full of shit.
I just love the irony of you accusing me of talking out of my ass and being full of shit and then you proceed to attempt to approve your point (whatever that was, because you seemed to be arguing a different one by the end of your rant) using extrapolated performance numbers and guesses. Priceless.
Ailuros
07-Dec-2005, 05:02
I'm not aware of circumstances on recent parts where we treat any texture stages differently from any others - all texture stages are treated exactly the same. With CatalystAI enabled the level of optimisations may vary, but only because of texture content, not the particular sampler.
We did have the old quality mode when anisotropic filtering was forced that did trilinear on stage 0 and bilinear on the other stages, which was based on an analysis of applications available at that time and the prevalence of the idea of a 'base texture' - sampler 0 always tended to have the most important texture data. That's a long time ago now, and I don't think that method has ever been applied on our more recent parts with more advanced filtering optimisation modes - with modern shaders you really have no idea typically which sampler has what data, so treating stages differently makes no sense.
I'm really not sure where this "half-assed brilinear on one stage and the rest getting plain ole bilinear and that goes for both IHVs" statement comes from - how carefully have you looked into how current filtering optimisations work?
I haven't and I plead guilty in that department until I finally have the time to dedicate more time to it. It is a subject though I'm willing to investigate in the foreseeable future what really is going on with filtering optimizations but even then it won't be that easy because there aren't that many sophisticated texel analyzing applications available that would help a layman like me to really come to accurate findings.
Trouble is that there can be found amounts of shimmering in some applications with optimisations enabled and what it tells me is that some sort of underfiltering is going on.
Ailuros
07-Dec-2005, 05:04
No, they're not. But right now i don't see the point for NV to make _any_ new high-end G7x for 1H06 since they said that they'll have G80 in 1H06 already.
So G71 at 90nm is probably either a cost-cutting 90nm version of G70 (in the same way how NV42 was just a 110nm version of NV41) or even something smaller than that -- something with 16PP and 256-bit memory bus for sub-$300 price range.
As for R580 i'm not sure that NV has to put anything against it at all -- G70-512 might be enough for that.
On the other hand it would be good for them to have some backup G7x-plan in case of problems with the new G80 architecture... So it's all fog and mystery right now :)
That's a totally different perspective and one that does make sense though. What I'm unsure about is whether the 512 GTX is really sufficient up against R580.
Ailuros
07-Dec-2005, 05:15
I'm using 81.85 and with all the dirty opts disabled filtering still leaves a lot to be desired in some titles.
Fire up richard burns rally and look at the ground (textures) while moving.
Have you determined what the real problem here is? Is it perfectly clean on other hardware or not?
Looking at some 6800gs HDR numbers I think I'm better off with a unlocked gto2 since hdr is virtually unplayable at decent res so the SM3 goodies doesn't net anything good to me since I like high res gaming
...and it will be a stop gap card for me.
Better filtering, faster, and TAA:D (transparency aa)
No doubt; throwing everything into one pot you can just remember doesn't still answer the above question.
Have you determined what the real problem here is? Is it perfectly clean on other hardware or not?
No doubt; throwing everything into one pot you can just remember doesn't still answer the above question.
I dont have access to other hardware besides a 8500 and a 9000 and since those are R200/rv250 they have even shittier filtering, and last I checked they didn't even work in the game :lol:
But looking at sshots from ati hardware playing the recently released GT Legends by simbin(makers of GTR) ati hardware def has a edge atleast in sshots.
I mentioned that because I'm ready to jump on the pci-express wagon with a 3000+ and NF4 mobo and some midrange card.
karlotta
07-Dec-2005, 05:47
I dont have access to other hardware besides a 8500 and a 9000 and since those are R200/rv250 they have even shittier filtering, and last I checked they didn't even work in the game :lol:
But looking at sshots from ati hardware playing the recently released GT Legends by simbin(makers of GTR) ati hardware def has a edge atleast in sshots.
I mentioned that because I'm ready to jump on the pci-express wagon with a 3000+ and NF4 mobo and some midrange card. I play GTR and legends alot along with a few other wheel sims.. its the reason i bought a x1800xt, my x800pe looks better than than my friends 7800gt's(sli). I was thinking 7800gtx512... but the x1800xt512 was instock and cheaper...
Sunrise
07-Dec-2005, 05:49
No, they're not. But right now i don't see the point for NV to make _any_ new high-end G7x for 1H06 since they said that they'll have G80 in 1H06 already.
So G71 at 90nm is probably either a cost-cutting 90nm version of G70 (in the same way how NV42 was just a 110nm version of NV41) or even something smaller than that -- something with 16PP and 256-bit memory bus for sub-$300 price range.
As for R580 i'm not sure that NV has to put anything against it at all -- G70-512 might be enough for that.
On the other hand it would be good for them to have some backup G7x-plan in case of problems with the new G80 architecture... So it's all fog and mystery right nowWell, if it´s a replacement for G70-512 it still would be high-end though, so that´s not far from what i´m saying. However, if R580 is powerful enough and ATi will deliver it on time (enough that NV needs this to be addressed) before G80 hits the street (mid 2006 at the earliest), they´ll need something that still has headroom to clock higher, because they would lose their advantage they´ve built up so well this year, which - as is see it - is not a good thing. It´s not entirely relevant what codename this one has (G75 was a possibility, too) but based on past experience with internal codenames (even more so since NV switched from IBM -> TSMC), they won´t tell you anything.
We may rethink that after R580, but it´s always nice to talk about it.
I play GTR and legends alot along with a few other wheel sims.. its the reason i bought a x1800xt, my x800pe looks better than than my friends 7800gt's(sli). I was thinking 7800gtx512... but the x1800xt512 was instock and cheaper...
What kinda monitor ya got?
Since that kinda matters since ati's gamma corrected fsaa doesn't work well with lcds(or crts that dont use 2.2 gamma?)
dizietsma
07-Dec-2005, 12:02
Two cores on one package is quite possibly the stupidest thing nVidia could possibly do. One core with twice the pipelines would have fewer transistors and higher performance.
But you're assumming your bigger 32 pipe chip overclocks just as well as the simple smaller 2 chips ? With twice as many smaller chips you have far more opportunity for binning as well.
We've already got 2 cores on two separate cards and two cores on separate packages oin one card so i do not see the problem with having two cores on one package, it is cheaper and simpler than either as well; not sure why you think it is the most stupidest thing they could do.
Yeah we read the same patent :) I was thinking more along the lines of texture sampler resources being arbitrated among pixel shaders - didn't think about the vertex shader aspect (or unified shaders for that matter).
Well..that patent addresses unified shading too ;)
But you're assumming your bigger 32 pipe chip overclocks just as well as the simple smaller 2 chips ? With twice as many smaller chips you have far more opportunity for binning as well.
We've already got 2 cores on two separate cards and two cores on separate packages oin one card so i do not see the problem with having two cores on one package, it is cheaper and simpler than either as well; not sure why you think it is the most stupidest thing they could do.
As good as SLi is it doesn't always give a 100% performance increase. But a bigger chip with double the pipelines and similiar clocks will give close to 100% performance :wink:
trinibwoy
07-Dec-2005, 12:31
Well..that patent addresses unified shading too ;)
Oh my bad, I quoted you incorrectly. I meant to refer to http://v3.espacenet.com/origdoc?DB=EPODOC&IDX=WO2005093665&F=0&QPN=WO2005093665 REGISTER BASED QUEUING FOR TEXTURE REQUESTS.
And you're right, it does make reference to different thread "types". So I guess the expectation is that we won't see decoupled TMU's before unification?
How do R520's samplers get allocated? According to Dave, 4 are assigned to each quad. So can any thread use any TMU within that group or is it still 1:1.
Although the earlier pipeline diagram indicates a texture sampler array, all the the texture units are not re-allocatable to different pipelines, instead 4 are dedicated to each of the quads.
i do not see the problem with having two cores on one package, it is cheaper and simpler than either as well; not sure why you think it is the most stupidest thing they could do.
You could fry eggs on that chip and I don't even want to know what kind of crazy electromagnetic behaviour the thing would have.
As good as SLi is it doesn't always give a 100% performance increase. But a bigger chip with double the pipelines and similiar clocks will give close to 100% performance :wink:
I think that there will be an engineering way around this relatively soon whereby they can package seperate chips after the fact into essentially one chip.
Oh my bad, I quoted you incorrectly. I meant to refer to http://v3.espacenet.com/origdoc?DB=EPODOC&IDX=WO2005093665&F=0&QPN=WO2005093665 REGISTER BASED QUEUING FOR TEXTURE REQUESTS. I told you we were talking about different patents! :D
And you're right, it does make reference to different thread "types". So I guess the expectation is that we won't see decoupled TMU's before unification? Who knows?
If the first unified shading architecture by nvidia is 6-8 months away I don't think we're going to see decoupled TMUs on some G70 variation.
Even though NVIDIA could decide to still use arithmetic ALUs as address generators on an ipotethic multithreaded-unified architecture.
Giving the increasing ALU ops/TEX ops ratio maybe it makes sense to keep it that way.
JoshMST
07-Dec-2005, 16:36
I think that there will be an engineering way around this relatively soon whereby they can package seperate chips after the fact into essentially one chip.
Ask Intel how well their two dies on one package operating at high speed works. Sure, they get better yields per smaller die, but there are a lot of issues with that substrate.
Now, of course NVIDIA and ATI won't be releasing a chip running at 3.2 GHz, but when you consider how fast the data needs to move, as well as the power requirements for each chip (even if they are more simple), it makes designing the substrate a nightmare. It is a lot easier for these guys to slap two complete chips onto one PCB, as they have a lot more flexibility in terms of power and space. I just don't see the graphics guys going in that direction anytime soon. The engineering hurdles are just a bit too big now for any kind of potential gains they will see.
Ailuros
07-Dec-2005, 17:20
As good as SLi is it doesn't always give a 100% performance increase. But a bigger chip with double the pipelines and similiar clocks will give close to 100% performance :wink:
Honest question: a real time current example that would prove the above assumption?
JoshMST
07-Dec-2005, 17:52
Honest question: a real time current example that would prove the above assumption?
The easiest way to check is to compare 2 x 6800 GS to a 7800 GTX and see how that holds up. I would do it for you, but I don't have the 6800 GS's at the moment.
The easiest way to check is to compare 2 x 6800 GS to a 7800 GTX and see how that holds up. I would do it for you, but I don't have the 6800 GS's at the moment.
In Razor's post he said "..and similiar clocks will.." adding more pipes with out more Memory bandwidth will not give you 100% in most cases assuming you follow his post to a T as he said nothing about memory bandwith :)
JoshMST
07-Dec-2005, 19:17
In Razor's post he said "..and similiar clocks will.." adding more pipes with out more Memory bandwidth will not give you 100% in most cases assuming you follow his post to a T as he said nothing about memory bandwith :)
Heh, good point. Well then, I guess the person with the 6800 GS's will have to lower the memory speed to match the bandwidth then.
Honest question: a real time current example that would prove the above assumption?
sorry just assumd memory will be clocked higher :wink:
I think that there will be an engineering way around this relatively soon whereby they can package seperate chips after the fact into essentially one chip.
thats a possiblity.
dizietsma
07-Dec-2005, 19:40
The easiest way to check is to compare 2 x 6800 GS to a 7800 GTX and see how that holds up. I would do it for you, but I don't have the 6800 GS's at the moment.
http://www.motherboards.org/reviews/hardware/1564_6.html
http://www.motherboards.org/reviews/hardware/1564_7.html
Of course the XFX runs at 485/1100 but then if you get a 6800GS that runs under 500/1200 you are very unlucky.
Mine run 520/1250+
Andrew Lauritzen
07-Dec-2005, 20:11
Now you take your first or second mipmap of A (for performance purposes), which are in FP16, and you do the horizontal and vertical gaussian blur passes. You can use whatever weighting or threshold value that you want, so white text won't have a halo. In the final pass, however, output to a 32-bit texture. This is B, and you can do whatever you want, including accessing the 1x1 mipmap of A, to determine how you want the bloom to look in the final framebuffer.
I agree that there's really no need for floating point filtering, as the downsampling/upsampling can be performed just as easily with mipmaps and/or simple shader instructions. Performance shouldn't be very different in either case.
That said, there are legitimate uses for floating point texture filtering that cannot be easily done without it (particularly cases that use anisotropic filtering - something that is typically difficult and expensive to emulate with shader code). I'm still a little disappointed that ATI's current generation of cards doesn't have support...
Chalnoth
07-Dec-2005, 20:24
One obvious use for FP filtering would be HDR lightmaps.
bloodbob
07-Dec-2005, 20:37
One obvious use for FP filtering would be HDR lightmaps.
And enviroment maps.
Chalnoth
07-Dec-2005, 20:48
And enviroment maps.
Oh, yes, and sky boxes.
Mintmaster
08-Dec-2005, 01:41
That said, there are legitimate uses for floating point texture filtering that cannot be easily done without it (particularly cases that use anisotropic filtering - something that is typically difficult and expensive to emulate with shader code). I'm still a little disappointed that ATI's current generation of cards doesn't have support...
Oh don't get me wrong, I never said FP filtering is useless. My claim was solely restricted to the bloom effect that SS2 uses. Not very good programming there.
I agree that it would have been nice if ATI had native FP texture filtering. But you can often make do with 16-bit integer formats when you really need high precision filtering. In light of this, between dynamic branching, FP blending, vertex texturing, and FP filtering, I think the first three are the most important. It's a shame that ATI didn't do the third one either, but R2VB is an acceptable compromise.
R600, please don't let me down :grin:
bloodbob and Chalnoth: For the other HDR uses that you mentioned, Humus has shown that alternatives to strict 4-channel FP16 textures can be higher performing (even for NVidia), more compact, and very nearly as good looking. Not 100% "correct", but for visual purposes more than enough. BUT I agree it would be nice to have real FP16 filtering, more so if it's full speed.
Andrew Lauritzen
08-Dec-2005, 02:26
bloodbob and Chalnoth: For the other HDR uses that you mentioned, Humus has shown that alternatives to strict 4-channel FP16 textures can be higher performing (even for NVidia), more compact, and very nearly as good looking. Not 100% "correct", but for visual purposes more than enough.
Alright, well Variance Shadow Maps (http://gg.anxiousheart.net/andrew/vsm/) (shameless plug ;)) benefit greatly from floating point texture filtering.
As well to a certain extent one could argue that most things can be done with fixed point math and other tricks, but that's really besides the point. At some point one has to start factoring in quality of output, ease of implementation, robustness, maintainability, etc. Sure it was possible to get most things done before CPU's had floating point units as well, but it wasn't fun, fast, particularly precise or easy :)
That said, sorry if I misinterpretted your comment about bloom.
I guess the most annoying thing to me is that I always sort of assumed that ATI's new generation of cards would meet the features of the competition and perhaps expand on them a bit. As it turns out, it doesn't quite meet them, and the extra features added instead I can count on one hand. Don't get me wrong, I still like the cards overall (my X1800XL arrives in a few days! Time for dynamic control flow fun!), but they aren't "hands-down better" choices from a feature point of view like I thought they would be.
Mintmaster
08-Dec-2005, 02:49
You're right, and I totally agree with you. Quality and features are important. I was distraught with the lack of at least I16 blending in R4xx, and now the lack of VTF in R520 (but less so due to R2VB).
But dynamic branching on R520 is way beyond what I thought either IHV would bother doing on a GPU. It was pretty ballsy, and IMO it will cost ATI's bottom line. Design is all about compromise, though, and I think it's great that both G70 and R520 have different strengths from a development point of view. If both IHV's came to the same consensus, then innovation would have suffered.
Regarding my statements above, I was only talking about FP16 for HDR purposes only (as I explicitly stated). Even on G70, FP filtering isn't full speed (AFAIK, anyway), so it's worthwhile to do what I was saying. But real FP filtering is useful for cases like yours, and hacks don't always exist.
BTW, excellent idea in your paper.
bloodbob
08-Dec-2005, 02:59
I agree that there's really no need for floating point filtering, as the downsampling/upsampling can be performed just as easily with mipmaps and/or simple shader instructions. Performance shouldn't be very different in either case.
That said, there are legitimate uses for floating point texture filtering that cannot be easily done without it (particularly cases that use anisotropic filtering - something that is typically difficult and expensive to emulate with shader code). I'm still a little disappointed that ATI's current generation of cards doesn't have support...
Yes yes just use mip-mapping as of course we would implement bilinear texture filtering to do the down filtering but then we just won't let the developers use it anywhere else. Great use of transistors.
Theres a pretty big overlap between mip-mapping and bilinear filtering its a waste of transistors if you do it that way you get less freedoms when playing with non power of 2 and no square textures.
bloodbob and Chalnoth: For the other HDR uses that you mentioned, Humus has shown that alternatives to strict 4-channel FP16 textures can be higher performing (even for NVidia), more compact, and very nearly as good looking. Not 100% "correct", but for visual purposes more than enough. BUT I agree it would be nice to have real FP16 filtering, more so if it's full speed.
Yeah humus method also doesn't support blending who needs HDR blending anyway I don't know why ATI bothered implementing it is a waste of transistors like texture filtering.
Also add a good splash of pure black pixels to the frame buffer and use your mip-map chain to minise it. See how that works for you. ( The not 100% correct can start going not 1% correct (2^-16 + 2^16)/2 != 1).
Now I've got a question for you.
If using shaders for FP bilinear filtering was as good as using fixed function pipelines why doens't ATI support bilinear filtering? Cause they can emulate the the look ups when the driver sends the code to the chip. After all they have been emulating fixed-function stuff with shaders since the R300.
PS Humus DXTCRGB+ L16 Exp texture is really good for many things such as real photos and is ALOT better if you manually calculate the mip-map chain correctly
ToxicTaZ
08-Dec-2005, 04:04
Nvidia (G71) GeForce7 7900 Ultra @750MHz vs ATI (R580) Radeon X1900 @550MHz
http://www.hkepc.com/bbs/viewthread.php?tid=517255
Skrying
08-Dec-2005, 04:09
Uhh, that doesnt help me in any way. I mean, seeing basically nothing.
Mintmaster
08-Dec-2005, 04:26
Holy crap bloodbob, way to take my statements out of context! Chill out and read what I'm saying first.
Yes yes just use mip-mapping as of course we would implement bilinear texture filtering to do the down filtering but then we just won't let the developers use it anywhere else. Great use of transistors.
Andy was just agreeing with me strictly within the confines of the Serious Sam 2 bloom effect. No mipmapping is needed there. It's a simple fullscreen quad with the low-res texture on it, and hence magnification is done. Read that post more carefully. I fully stand by my statements regarding SS2. A FP bloom texture is next to useless even on NVidia's hardware.
Yeah humus method also doesn't support blending who needs HDR blending anyway I don't know why ATI bothered implementing it is a waste of transistors like texture filtering.Blending? WTF are you talking about? We're talking about filtering source textures here (env maps, sky boxes, lightmaps, remember?). If you're specifically talking about dynamically rendered environment maps for reflections that use blending in their rendering, then fine, you're right. 16-bit integer won't get you as nice results. Because of this specific case, ATI's newest architecture is a complete mistake and they should have just copied NV40. :roll:
If using shaders for FP bilinear filtering was as good as using fixed function pipelines why doens't ATI support bilinear filtering?I never said anything of the sort. I said SS2 didn't need an FP bloom texture for any reason, and I said I16 is a compromise that'll be acceptable (but not ideal) for HDR applications only. I never said anything about using shaders for filtering, only for gaussian blurring and downsampling/mipmap creation.
Ailuros
08-Dec-2005, 05:10
sorry just assumd memory will be clocked higher :wink:
Core complexity is scaling faster IMO than memory bandwidth lately.
Chalnoth
08-Dec-2005, 05:16
Core complexity is scaling faster IMO than memory bandwidth lately.
Sure, but ALU usage is also scaling more quickly than texture usage. Eventually I'm sure that we will be primarily memory bandwidth-limited for texture accesses, but that time is some way off yet.
Andrew Lauritzen
08-Dec-2005, 05:22
Design is all about compromise, though, and I think it's great that both G70 and R520 have different strengths from a development point of view. If both IHV's came to the same consensus, then innovation would have suffered.
That's a good point. Still I find it annoying that any "modern" stuff that I do generally only works 100% (feature-wise) on half of the cards out there. Ah well, I guess it is better to have more stuff to play around with though :)
BTW, excellent idea in your paper.
Thanks - I give credit for the original idea of using moments (because they can be linearly filtered) to William Donnelly. I'm happy with the final results and paper, although there are downsides to the method (light bleeding is the main one).
Theres a pretty big overlap between mip-mapping and bilinear filtering its a waste of transistors if you do it that way you get less freedoms when playing with non power of 2 and no square textures.
Actually I was talking more about mipmap LOOKUPS than automatic generation. I don't care if the latter is done automatically by the drivers or not... as you have noted it's pretty trivial to implement oneself. Still it's often handy to have API functions for extremely common tasks, even if they simply wrap the equivalent code.
Mipmap lookups on the other hand require some more specific support. While in theory it can be done totally in the shader (using derivatives and packing the MIP levels into a single texture), it probably wouldn't be fast enough on current hardware. Again, it's also something that shouldn't require custom code.
Yeah humus method also doesn't support blending who needs HDR blending anyway I don't know why ATI bothered implementing it is a waste of transistors like texture filtering.
I'm not sure what you mean, but if you mean floating point framebuffer blending, that's one that certainly isn't a waste! Have you ever written the code to double-buffer accumulate - for example - lighting contributions from deferred shading? I'll summarize: it's slow, doubles memory requirements, remarkably CPU-intensive and just downright annoying. As long as we don't have read access to the framebuffer in the fragment shader, I'll continue using framebuffer blending thanks :)
If using shaders for FP bilinear filtering was as good as using fixed function pipelines why doens't ATI support bilinear filtering? Cause they can emulate the the look ups when the driver sends the code to the chip. After all they have been emulating fixed-function stuff with shaders since the R300.
All the power to them! I think it'd be great if they supported filtering in this manner. Indeed it'd be neat if I could read four unrelated texture locations and have it cost the same as a single bilinear fetch (cache-considerations forgotten for a second). That said since anisotropic filtering and mipmapping are part of the API, it should really be handled by the driver... what it does makes no difference to me, as long as the result is correct and reasonably fast. The problem with implementing more complex filter kernels like aniso is that they end up being complicated to program and more importantly, SLOW! I wouldn't even mind the complexity issue too much if the resulting code ran as fast as hardware aniso.
So don't get me wrong: I think it'd be great if every piece of hardware in the GPU was programmable, AS LONG AS they can make it fast and accessible. At the moment, floating point texture filtering is neither on ATI cards.
Ailuros
08-Dec-2005, 05:26
Sure, but ALU usage is also scaling more quickly than texture usage. Eventually I'm sure that we will be primarily memory bandwidth-limited for texture accesses, but that time is some way off yet.
Think of single vs. dual core (on a single die) for any given timeframe and the target to reach 100% scalability on the latter. These factors may eventually change in the less foreseeable future with wider buswidths or different memory modules, but right now from a pure theoretical standpoint and exclusively for the high end segment I don't see it being possible.
bloodbob
08-Dec-2005, 05:55
Actually I was talking more about mipmap LOOKUPS than automatic generation. I don't care if the latter is done automatically by the drivers or not... as you have noted it's pretty trivial to implement oneself. Still it's often handy to have API functions for extremely common tasks, even if they simply wrap the equivalent code.
So would you care if every frame the video card download it to the CPU and create the mip-map chain on the CPU and then uploaded it back and then you sampled from the lowest mip-map to do the tone mapping?
How the mip-map chain is generated is important.
Well if its not automatically generated how are you going to generate it? Well the most common method would be to do 1/2 resolution rendering and do a bilinear sample between 4 pixels at a time. Hey you shouldn't even need a shader to do it. However since there is no bilinear filtering you have to go off and do it yourself via shaders. We are doing this every frame via shaders on a FP16 textures you still don't care about hows its done.
Mipmap lookups on the other hand require some more specific support. While in theory it can be done totally in the shader (using derivatives and packing the MIP levels into a single texture), it probably wouldn't be fast enough on current hardware. Again, it's also something that shouldn't require custom code.
I'm not sure what you mean, but if you mean floating point framebuffer blending, that's one that certainly isn't a waste! Have you ever written the code to double-buffer accumulate - for example - lighting contributions from deferred shading? I'll summarize: it's slow, doubles memory requirements, remarkably CPU-intensive and just downright annoying. As long as we don't have read access to the framebuffer in the fragment shader, I'll continue using framebuffer blending thanks :)
Well I'm point out mintmasters great solution to having HDR without actually having to use FP16 filtering isn't really pratical because of all the problems you've just mentioned.
Honest question: a real time current example that would prove the above assumption?
Any non-CPU limited game (if there are any) ;)
Radeon600
08-Dec-2005, 10:29
I don't believe G71 will have eight Quads with 750Mhz Core, but if I look at R580, which makes me believe G71 might have eight Quads with 750Mhz Core.
Lets look at this way.
R580 will have 48 Fragment Processors, now, just to get theoritical Fill Rate, even after multiplying with 400Mhz Core Speed, it gives 19,200. Lets assume, G71 with eight Quads and 750Mhz Core, which would give the Fill Rate of 24,000. Now that looks more logical while comparing G71 to R580, although I don't believe R580 will have Core Clock Speed of 400Mhz (that was mere a speculation), more like at around 500Mhz or even 600Mhz.
Also, can someone tell me about how many Texture Units will be present in G71?
With eight Quads G71 looks quite killer to me, considering the fact that both of Shader ALUs in Shader Unit can do MADD Operations (More like 64 Fragment Processors?), while R580 will only have 48 MADD Operations, and another ALU is more like 'Mini' ALU doing MULL/ADD etc.?
Although, I understand that Fill Rate these days are not such important, especially after the inclusion of Shader Model-3.0 which incorporates Dynamic Branching, and moreso its about how efficient those Shader ALUs are. But still Fill Rate does matter, if we are talking of either of Hardware supporting the Shader Model-3.0. And what I've read about R580 is the thread size will be bigger than R520 (48 Pixels opposed to 16 Pixels), so it just gives a slight hint, it -might- not be as efficient as R520 while calculating Shader Operations?
And correct me if I am wrong, but R580 will be based on 80nm TSMC Frabrication (with Low-K?), but not 90nm?
And what I've read about R580 is the thread size will be bigger than R520 (48 Pixels opposed to 16 Pixels), so it just gives a slight hint, it -might- not be as efficient as R520 while calculating Shader Operations?
Just curious, what makes you think so?
Radeon600
08-Dec-2005, 11:18
Something to note on the execution unit is that R520 (4 quads) is described as having 512 "threads" in flight at any one time, whilst RV515 (1 quad) has just 128 threads - which basically means that each of these chips have 128 threads allocated per quad. Although RV530 is described as having 12 shader pipelines, it still only has 128 threads in flight at once - in this case each of the 12 shader pipelines exist within a single (quad) "pipeline" and will still operate over three separate quad fragments, but do so with larger batch sizes. For the parts in the R5xx series that have three times the number of shader pipelines to ROP's the batch sizes are 48 (4x12) pixels large, so the efficiency drops a little.
http://www.beyond3d.com/reviews/ati/r520/index.php?p=04
I do believe Ultra Threaded Dispatch Processor on R580 can also execute upto 512 Threads, in thise case its feeding 'tripplized' Quad rather than a single Quad on R520, so efficiency might drop?
http://www.beyond3d.com/reviews/ati/r520/index.php?p=04
I do believe Ultra Threaded Dispatch Processor on R580 can also execute upto 512 Threads, in thise case its feeding 'tripplized' Quad rather than a single Quad on R520, so efficiency might drop?
R580 and X1600* have a larger batch size of 4x12 as compared to 4x4 on R520. So R580 should be as efficient as X1600 following that logic. But there should be no further drops IMHO. That should not be a big setback in most cases anyway, 4x12 is also nice.
*can't even recall which codename it was atm, that's what you have with 500 codenames, thanks ATI and NV :oops:
Radeon600
08-Dec-2005, 14:04
So R580 should be as efficient as X1600 following that logic.
I see your point Sir, but as Dave mentioned in his Article, that the efficiency with smaller Batch Size (4x4) is better than bigger Batch Size. Also, there are 16 Render back-ends, ratio of 1:3, do you feel, 16 ROPs are sufficient for 48 Fragment Processors?. To be honest, I am not quite impressed with X1600 performance, it might be future Catalysts will help, but at the moment, its doesn't look to me too good. There should've been more ROPs I believe, in the 6600GT, there was 8 Fragment Processors available for 4 ROPs, but in X1600, its 12 to 4, kinda low, don't you think?
Ailuros
08-Dec-2005, 14:34
Any non-CPU limited game (if there are any) ;)
The answer is still negative.
Ailuros
08-Dec-2005, 14:45
R580 will have 48 Fragment Processors, now, just to get theoritical Fill Rate, even after multiplying with 400Mhz Core Speed, it gives 19,200.
19200 of what? Assuming a 650MHz core frequency I can see 10400 MPixels and 10400 MTexels/sec theoretical fill-rate. 16 ROPs, 16 TMUs and 48 ALUs.
RV530 has 4 TMUs only and the lack of fill-rate shows mostly in older dx7.0+ games.
As for the rest I don't see a 512XT being over 30% faster than a X1800XT, despite the first having 13.2GTexels/sec and the latter 10GTexels/sec of raw fill-rate.
Radeon600
08-Dec-2005, 15:35
Yes, you are right, Thanks for clarrification. Although, I still believe, there should be more number of ROPs on X1600 & R580, that would also increase the number of Transistors?, resulting in higher price?
Let's see how is going to be like G71, but looks to me R580 will have tough competition.
Radeon600
08-Dec-2005, 15:43
Ailuros, I would also like to see your comment on my following Paragraph:
Although, I understand that Fill Rate these days are not such important, especially after the inclusion of Shader Model-3.0 which incorporates Dynamic Branching, and moreso its about how efficient those Shader ALUs are. But still Fill Rate does matter, if we are talking of either of Hardware supporting the Shader Model-3.0. And what I've read about R580 is the thread size will be bigger than R520 (48 Pixels opposed to 16 Pixels), so it just gives a slight hint, it -might- not be as efficient as R520 while calculating Shader Operations?
I see your point Sir, but as Dave mentioned in his Article, that the efficiency with smaller Batch Size (4x4) is better than bigger Batch Size. Also, there are 16 Render back-ends, ratio of 1:3, do you feel, 16 ROPs are sufficient for 48 Fragment Processors?. To be honest, I am not quite impressed with X1600 performance, it might be future Catalysts will help, but at the moment, its doesn't look to me too good. There should've been more ROPs I believe, in the 6600GT, there was 8 Fragment Processors available for 4 ROPs, but in X1600, its 12 to 4, kinda low, don't you think?
What I implied is just that the batch size in R580 is the same as in RV530, 4x12. The current indications are that R580 = 3 x RV530 + different clocks/memory, but that tells nothing substantial about the expected performance.
Maybe there are some other changes we know nothing about, so I'm careful with any performance estimates.
Do we know for sure how many ROP's we'll have in R580 and what these look like?
EDIT: seeing that the batch size is 48 and we have 48 shader units (fragment processors or however you like it), compared to R520's batch size of 16 and 16 shader units, it might actually be just the same efficiency-wise?
Andrew Lauritzen
08-Dec-2005, 16:16
So would you care if every frame the video card download it to the CPU and create the mip-map chain on the CPU and then uploaded it back and then you sampled from the lowest mip-map to do the tone mapping?
How the mip-map chain is generated is important.
I'm sorry but I'm totally missing your argument... I thought you were making the OPPOSITE point in your previous post. In any case, I believe that I did mention that the driver is free to do these things any way that it choses AS LONG AS it's sufficiently fast and high quality :)
Well I'm point out mintmasters great solution to having HDR without actually having to use FP16 filtering isn't really pratical because of all the problems you've just mentioned.
Actually it would work, and indeed Microsoft suggests the same thing (http://msdn.microsoft.com/library/default.asp?url=/archive/en-us/directx9_c_summer_03/directx/graphics/tutorialsandsamples/hdrlighting.asp). Doing a simple downsample on current hardware is very fast (even with floating point textures), and the upsampling is with integer textures, so filtering is supported.
Ailuros
08-Dec-2005, 17:48
Yes, you are right, Thanks for clarrification. Although, I still believe, there should be more number of ROPs on X1600 & R580, that would also increase the number of Transistors?, resulting in higher price?
Let's see how is going to be like G71, but looks to me R580 will have tough competition.
For one I don't know where this supposed G71 codename came from; that sound more like a lower end part than anything else.
Why should there be more ROPs on R580? There are 6 quads on G70 and still 16 ROPs and I doubt if there's going to be a followup part from NVIDIA with more quads that it'll have more than 16 ROPs either.
Although, I understand that Fill Rate these days are not such important, especially after the inclusion of Shader Model-3.0 which incorporates Dynamic Branching, and moreso its about how efficient those Shader ALUs are. But still Fill Rate does matter, if we are talking of either of Hardware supporting the Shader Model-3.0. And what I've read about R580 is the thread size will be bigger than R520 (48 Pixels opposed to 16 Pixels), so it just gives a slight hint, it -might- not be as efficient as R520 while calculating Shader Operations?
16 pixels/texels per clock for R580 also. You've confused ALUs, ROPs and thread sizes from what it looks to me.
In any case ATI claims "SM3.0 done right" in it's whitepapers; I'd call a hybrid between G70 and R520 done right or done better depending on perspective. G70 has the ALU throughput R520 could/should have had and R520 has the PS dynamic branching performance the G70 could/should have had. R580 increases the ALU throughput by a theoretical factor of 3x and thus closing the gap to one advantage of their competition.
VS dynamic branching performance I think goes into NVIDIA's ballpark; how important that could be I'll leave to the professionals because I'd probably come up with a wrong conclusion. IMHO PS dynamic branching should be of higher importance/priority over all.
One thing to look over when comparing architectures or GPUs isn't exclusively theoretical numbers but what comes out at the other end. R520 isn't a "G70-killer" the way I see it, but it remains still a very competitive part with it's own exclusive advantages. R580 will obviously build on those advantages and add higher ALU throughput, I don't see why it should end up with a serious disadvantage against the competition.
What I implied is just that the batch size in R580 is the same as in RV530, 4x12. The current indications are that R580 = 3 x RV530 + different clocks/memory, but that tells nothing substantial about the expected performance.
In terms of fragment pipelines, R580 will be four RV530s, excluding the ROPs.
I'm wondering, now, whether R580 might be under-specified for GDDR4. If an 80nm refresh of R580 appears, presumably being the first GDDR4 card, with 80GB/s+ of memory bandwidth, then it makes me think that it won't have enough texture pipes to feed on all that lubberly bandwidth.
A brutish 32-pipe G71, on the other hand, would be in 7th heaven with 80GBs+. (I haven't worked out the bandwidth of the slowest GDDR4 that's coming - 80GB/s is a bit of a guess.)
The only advantage R580 would gain from GDDR4 is for frame buffer operations. Not that that's trivial (faster AA is always welcome), but it seems like it would be one-sided. Unless R580 gains 40%+ core speed at the same time as it goes to GDDR4. 900MHz sounds pretty unlikely to me.
Maybe there are some other changes we know nothing about, so I'm careful with any performance estimates.
Do we know for sure how many ROP's we'll have in R580 and what these look like?
It appears to me that ATI has not decoupled ROPs as NVidia has done. Therefore it seems extremely likely that ATI is set for 16 ROPs for the high-end until R600 at the earliest.
Jawed
SugarCoat
08-Dec-2005, 18:56
i think gddr4 for cards is suppose to start at 2400MHz effective.
I suppose it's worth mentioning that Xenos does have de-coupled ROPs (the EDRAM :lol: ) so R600 would appear to be a good candidate for de-coupled ROPs - not because of EDRAM, which isn't going to happen - but because the queuing/batching of fragment-pipeline results to feed the ROPs is a pre-requisite of de-coupled ROPs.
Jawed
Don't think this has been posted, but Inquirer is now stating the G71 will be 32 pipe, 90nm card clocked about 700 MHz.
http://www.theinquirer.net/?article=28227
G71 will end up with a new Geforce name, new number and probably GTX suffix but it's still too early to talk about the final name. It will have a very fast core, we still don’t know the number but it should be around 700MHz and it's obviously paired with super fast memory. The memory is not the problem and even Samsung memory running at 1700 MHz+ is not in tight supply.
The chip will have 32 pipelines and it's obviously faster than just arrived and not so available Geforce 7800 GTX 512 cards. It's expected in early February and Nvidia is already playing with these cards. Nvidia is working hard on its 90 nanometre process and its investors are really concerned about it. We figure out that mobile part based around G72 chip is 90 nanometre or at least should be based on that, and Nvidia is in volume production with its C51 chipsets, also 90 nanometre based
Pharma
Ailuros
08-Dec-2005, 21:19
After a dozen attempts and damn close to release he might get it right after all :D
ToxicTaZ
08-Dec-2005, 23:59
Yay! my next card is G71....my 4 pipe FX5900U is getting to slow for me these days and the 90nm G71 32 pipe sounds like a great high end upgrade for me this February?
http://www.theinquirer.net/?article=28227
So if the 90nm G71 @750MHz is a 32 pipelines GPU does this make the RSX @550MHz now a 32 pipe GPU? or is it a cut down G71 with a 24 pipe?
And if the 90nm G71 is now a 32 pipelines GPU what is the G80 going to be now?
80nm G80 @550MHz with 48 pipeline? ...this is my gess!
With my new mother board and video card combo should be a fast spring for me?
MSI 975X Platinum H
http://www.msi-computer.co.jp/975/975X_Platinum_H.html
Skrying
09-Dec-2005, 00:09
Sounds like a rather stupid plan actually. You're basing your thoughts on rumors, bad idea. Then you're going to be buying into Intel? Bad idea. I'd personally suggest you wait, see which card performs better in the games you play, and then buy it and a AMD based motherboard and CPU, yep that sounds good.
Mintmaster
09-Dec-2005, 00:36
I just love the irony of you accusing me of talking out of my ass and being full of shit and then you proceed to attempt to approve your point (whatever that was, because you seemed to be arguing a different one by the end of your rant) using extrapolated performance numbers and guesses. Priceless.
You gave NO evidence whatsoever for your statement. What did you say? "HDR+FSAA isn't worth the time it took to mention it"?
I gave you hard evidence in 3 out of 4 games without extrapolation that the X1800XT is near or above the GTX in HDR performance. Then in Far Cry, the evidence I gave pretty much guarantees I'll be correct within 5%. I'd wager money on my guess if it was possible. If the math is too hard for you to understand, then that's your problem not mine.
We have multiple benchmarks that show the X1800XT takes less than a 25% hit for 4xAA when HDR is enabled.
I challenge you to find ANY benchmark for ANY video card where 4xMSAA causes a bigger % hit at 1024x768 that at 1600x1200. You will fail if you try. Therefore ATI will lose less than 25% at 1024x768.
81fps (http://imageshack.us/?x=my6&myref=http://www.beyond3d.com/forum/showthread.php?t=24394&page=7&highlight=msaa) minus 25% = 61fps. Oh yes, I must be full of shit also. :roll:
trinibwoy
09-Dec-2005, 00:51
More of the same on Nvidia's 2006 plans - http://www.tgdaily.com/2005/12/07/nvidia_g80/
Mintmaster
09-Dec-2005, 01:33
Well I'm point out mintmasters great solution to having HDR without actually having to use FP16 filtering isn't really pratical because of all the problems you've just mentioned.
Bloodbob, I really have no idea what you're ranting on about. Please make some clear sentences so that I can address your concerns. "All the problems Andy just mentioned" were regarding FP blending. I was talking about FP filtering.
Regarding mipmap generation, DX9 and OpenGL both have routines for automatic mipmap generation. ATI supports them AFAIK, so I don't know what you're complaining about.
You said yourself Humus' demo is great for photos, so textures like static skyboxes, environment maps, etc are a perfect fit. That method is a workaround to use I16 filtering (that ATI has but NVidia doesn't) instead of FP16 filtering (which NVidia has but ATI doesn't). But then you talk about how it is incompatible with blending. What in God's name are you talking about? Blending and filtering are different things!
Everything I mentioned helps both IHV's in speed. The only limitation is for dynamic HDR render target textures that also need filtering later on when mapped to another surface. There, create I16 instead of FP16, and in this specific circumstance ATI will have lower dynamic range. One "if statement", no new shaders. Happy?
Earlier, I thought you understood my SS2 argument, but I didn't see this:So okay as far as the results go the ATI should be getting a speed advatange as its doing 75% data transfer for the texture look ups for the bloom texture. So if they fixed it and enabled shader filtering for the bloom texture the only problem would be the Nvidia cards get an advantage because they use faster memory and we are doing 4x the nessecary data transfer for the bloom texture?Once again, the final bloom texture is regular 8-bit per channel. In creating that texture, though, you sample the FP16 scene texture mipmap chain*, and apply a scale or more sophisticated tone map. Only overbright areas survive in this bloom texture. Since this final 128x128 texture is 8 bits per channel, it will be faster to use on both ATI and NVidia, and it will have full speed fixed function filtering on both ATI and NVidia when it is added onto the final 8-bit per channel framebuffer that you send to the monitor.
*The mipmap generation can be either autogenerated or done with a simple shader. This shader is just as bandwidth limited as native FP16 filtering, because there is no texel re-use in mipmap generation. Per pixel, 32-bytes in, 8 bytes out. ATI isn't moving 4x the data.
Mintmaster
09-Dec-2005, 01:37
More of the same on Nvidia's 2006 plans - http://www.tgdaily.com/2005/12/07/nvidia_g80/
If G80 has a unified shader architecture, wouldn't it be funny if it appears before R600? I'm pretty sure I'll buy it if it does.
SugarCoat
09-Dec-2005, 01:45
If G80 has a unified shader architecture, wouldn't it be funny if it appears before R600? I'm pretty sure I'll buy it if it does.
funny in that it would be contrary to what Nvidia has said? Not much of a joke. Im pretty sure we'll see a new highly programmable pipeline setup.
Mintmaster
09-Dec-2005, 01:56
funny in that it would be contrary to what Nvidia has said? Not much of a joke. Im pretty sure we'll see a new highly programmable pipeline setup.
Nah, funny because ATI's been talking about this since 2001 (remember the R400 project?) and used it in XB360, but still got beat in the PC market.
Skrying
09-Dec-2005, 02:01
What part of that article hints at a G80 coming early? If anything, the article hints at it being held back due to G71 and the continuation of the G7x series. Also, the article says G71 will not be available till the second half of 2006.
I think some of you may have read it wrong. If anything, this tells me Nvidia is having the same pains as ATi with implenmenting a highend chip on 90nm.
trinibwoy
09-Dec-2005, 02:12
What part of that article hints at a G80 coming early? If anything, the article hints at it being held back due to G71 and the continuation of the G7x series. Also, the article says G71 will not be available till the second half of 2006.
Agreed.
I think some of you may have read it wrong. If anything, this tells me Nvidia is having the same pains as ATi with implenmenting a highend chip on 90nm.
Now you're guilty of doing exactly what you just accused others of doing. I don't think there has been any indication of delays in Nvidia's 90nm roadmap. It's not like their 110nm parts are currently slower than ATi's 90nm offerings.
Chalnoth
09-Dec-2005, 02:42
Yes, buying Intel right now is pretty stupid. They're more expensive, lower performance, and have higher power consumption (i.e. heat). Maybe Intel will be able to catch up to AMD with their next gen "performance per watt" architecture, but not until.
Mintmaster
09-Dec-2005, 03:48
Is that a subtle pot-shot at ATI? :smile:
EDIT: Nevermind, didn't see the reference to Intel in the previous page. Thought you were referring to trinibwoy's statement.
SugarCoat
09-Dec-2005, 04:15
Now you're guilty of doing exactly what you just accused others of doing. I don't think there has been any indication of delays in Nvidia's 90nm roadmap. It's not like their 110nm parts are currently slower than ATi's 90nm offerings.
but 110nm is pretty much topped out voltage wise. They really cant hope to squeeze performance out of 110nm to go against the R580 in my opinion. And in reference to the second revision G70 chips clocked at 550-600MHz (since thats what i assume you mean by current 110nm not being slower) i wouldnt bee too hopeful. The card has become non-existant. So whos winning right now? ATI has ability shooting out 512mb X1800XTs, Nvidia has a slower card with half the mem at about 50-70 cheaper retail wise (varrying) in high supply and no middle ground betwee the 256GTX and 512GTX. Nvidia really screwed themselves with this 512mb launch and if its any indication of yields your statement becomes pretty moot. Nvidia has no faster 110nm parts in decent supply except for a rare handful at a time. They need 90nm, and in knowing that, that does speak that they are either having problems are have a longer release table this time around.
Chalnoth
09-Dec-2005, 04:33
Is that a subtle pot-shot at ATI? :smile:
Me? No.
MulciberXP
09-Dec-2005, 04:33
Is that a subtle pot-shot at ATI? :smile:
only to those with chips on their shoulders :lol:
ondaedg
09-Dec-2005, 04:41
but 110nm is pretty much topped out voltage wise. They really cant hope to squeeze performance out of 110nm to go against the R580 in my opinion. And in reference to the second revision G70 chips clocked at 550-600MHz (since thats what i assume you mean by current 110nm not being slower) i wouldnt bee too hopeful. The card has become non-existant. So whos winning right now? ATI has ability shooting out 512mb X1800XTs, Nvidia has a slower card with half the mem at about 50-70 cheaper retail wise (varrying) in high supply and no middle ground betwee the 256GTX and 512GTX. Nvidia really screwed themselves with this 512mb launch and if its any indication of yields your statement becomes pretty moot. Nvidia has no faster 110nm parts in decent supply except for a rare handful at a time. They need 90nm, and in knowing that, that does speak that they are either having problems are have a longer release table this time around.
heh, maybe you should tell Nvidia's accountants that they "screwed" themselves before they post their financials. :wink:
it is not the 512mb cards that are bringing in the big bucks. It's the 300.00 cards. The high end cards are bragging rights. The GTs and the XLs are the money makers.
Skrying
09-Dec-2005, 04:45
Laugh, you're point is semi right, but you thinking that $300 cards bring in the bucks is a mistake. Yes they bring in more than the $450+ ones, but the real monkey makers are the $80~$200 range. The X1300/X1600 and 7200/7600 (when they come) are what makes the big money, and grabs the OEM deals.
You also under estimate the mind share value of high end cards. People will think bad of your range of products if you can not produce enough of your high end to satisfy the want, to a point.
SugarCoat
09-Dec-2005, 05:14
heh, maybe you should tell Nvidia's accountants that they "screwed" themselves before they post their financials. :wink:
it is not the 512mb cards that are bringing in the big bucks. It's the 300.00 cards. The high end cards are bragging rights. The GTs and the XLs are the money makers.
analists think Nvidia is being too cocky as far as earnings goals go. Not me.
Radeon600
09-Dec-2005, 06:53
32 Pipelines..hmm, then it has also twice the texturing power than R580?
Thats not good, from ATI point of view.
trinibwoy
09-Dec-2005, 12:13
but 110nm is pretty much topped out voltage wise. They really cant hope to squeeze performance out of 110nm to go against the R580 in my opinion. And in reference to the second revision G70 chips clocked at 550-600MHz (since thats what i assume you mean by current 110nm not being slower) i wouldnt bee too hopeful. The card has become non-existant. So whos winning right now? ATI has ability shooting out 512mb X1800XTs, Nvidia has a slower card with half the mem at about 50-70 cheaper retail wise (varrying) in high supply and no middle ground betwee the 256GTX and 512GTX. Nvidia really screwed themselves with this 512mb launch and if its any indication of yields your statement becomes pretty moot. Nvidia has no faster 110nm parts in decent supply except for a rare handful at a time. They need 90nm, and in knowing that, that does speak that they are either having problems are have a longer release table this time around.
Well a couple things I disagree with there. Firstly, the 256MB GTX core is more than fast enough to run with the XT. Take a look at Xbit's latest roundup - the XT only pulls away a little in Chaos Theory. Where it falls down in comparison to the XT is in bandwidth intensive (AA) scenarios and that has nothing to do with process or core speeds. So yes, on 110nm Nvidia is more than competitive with ATi's high end 90nm parts.
Check out the non-AA (pure speed) rankings - http://www.xbitlabs.com/articles/video/display/games-2005_24.html. Seems the good old 110nm 430Mhz G70 is giving the 90nm 650Mhz XT quite a run for it's money, no?
Secondly, Nvidia didn't screw themselves at all. All of the major review sites are hailing the 512GTX as king. It isn't "in stock" but back-orders are constantly being filled and people are always popping up on forums reporting on their new cards. So the status quo is that Nvidia is on top.
I think it highly unlikely that Nvidia held back on the 110nm G70 core with the GTX only to refresh it with a 90nm version. If that was the plan they would have maxed out the G70 from the start. So no, the 110nm 512GTX by itself is not an indication of delays in the 90nm roadmap.
pjbliverpool
09-Dec-2005, 12:49
What part of that article hints at a G80 coming early? If anything, the article hints at it being held back due to G71 and the continuation of the G7x series. Also, the article says G71 will not be available till the second half of 2006.
I think some of you may have read it wrong. If anything, this tells me Nvidia is having the same pains as ATi with implenmenting a highend chip on 90nm.
No the article very clearly states that G71 is being launched in the first half of next year while the G70 series sucessor is being launched in the second half. The confusion is coming from the fact that instead on saying the "G70's successor" they said the "7800's successor".
Here's the exact quote:
the first half of next year will focus on introducing updates for the current GeForce 7 generation - including the 90 nm model G71
Well a couple things I disagree with there. Firstly, the 256MB GTX core is more than fast enough to run with the XT. Take a look at Xbit's latest roundup - the XT only pulls away a little in Chaos Theory. Where it falls down in comparison to the XT is in bandwidth intensive (AA) scenarios and that has nothing to do with process or core speeds. So yes, on 110nm Nvidia is more than competitive with ATi's high end 90nm parts.
Check out the non-AA (pure speed) rankings - http://www.xbitlabs.com/articles/video/display/games-2005_24.html. Seems the good old 110nm 430Mhz G70 is giving the 90nm 650Mhz XT quite a run for it's money, no?
Secondly, Nvidia didn't screw themselves at all. All of the major review sites are hailing the 512GTX as king. It isn't "in stock" but back-orders are constantly being filled and people are always popping up on forums reporting on their new cards. So the status quo is that Nvidia is on top.
I think it highly unlikely that Nvidia held back on the 110nm G70 core with the GTX only to refresh it with a 90nm version. If that was the plan they would have maxed out the G70 from the start. So no, the 110nm 512GTX by itself is not an indication of delays in the 90nm roadmap.
That review shows stock X1800Xt going toe to toe with 512 GTX at highest settings. 512 GTX being 799.
Bodes well for ATI's future. Considering ATI's pipeline deficit and near eqaul clocks.
trinibwoy
09-Dec-2005, 13:35
That review shows stock X1800Xt going toe to toe with 512 GTX at highest settings. 512 GTX being 799.
Bodes well for ATI's future. Considering ATI's pipeline deficit and near eqaul clocks.
Yep that's true. Although the matchup between the X1800XL and 7800GT goes kinda contrary, they have very similar specs and are very even in benchmarks as well. So the 512GTX "should" be pulling away a lot more from the XT but it's not.
Ailuros
09-Dec-2005, 16:03
Yep that's true. Although the matchup between the X1800XL and 7800GT goes kinda contrary, they have very similar specs and are very even in benchmarks as well. So the 512GTX "should" be pulling away a lot more from the XT but it's not.
Look closer at the persentages each GPU loses from noAA/AF to 4xAA/16xAF benchmarks. The drop on the X1800XT is quite a bit smaller; it's not that much different on the X1800XL vs 7800GT either.
Maintank
09-Dec-2005, 16:07
What part of that article hints at a G80 coming early? If anything, the article hints at it being held back due to G71 and the continuation of the G7x series. Also, the article says G71 will not be available till the second half of 2006.
I think some of you may have read it wrong. If anything, this tells me Nvidia is having the same pains as ATi with implenmenting a highend chip on 90nm.
Reread it, it clearly states the G71 will be here 1st half along with the lower and mid ranged products. The successor(G80) will be available 2nd half of 06.
ondaedg
09-Dec-2005, 19:20
analists think Nvidia is being too cocky as far as earnings goals go. Not me.
analists as in those who specialize in what? Hehe, j/k. I just thought you using "analist" was kind of like an oxymoron. Gave me a good chuckle at least. Or are you being serious? :wink:
Which stock analysts have labeled Nvidia as being cocky?
Mintmaster
09-Dec-2005, 19:20
Yep that's true. Although the matchup between the X1800XL and 7800GT goes kinda contrary, they have very similar specs and are very even in benchmarks as well. So the 512GTX "should" be pulling away a lot more from the XT but it's not.
I think the XL is doing quite well, beating out the GTX on several occasions. The problem is that ATI is not handling 256MB of memory very optimally, as pointed out by the reviewer, trashing a few results.
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.