PDA

View Full Version : The NEXT LAST R600 Rumours & Speculation Thread


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [17] 18 19 20 21 22 23

fellix
01-May-2007, 14:20
Hmm, it seems to me, that those HL2 shots have been resized after the screen capture, which doesn't make good for a precise evaluation. :roll:

nAo
01-May-2007, 14:24
Yep, they have been resized, and yes..the second shot is a bit blurrer

Jawed
01-May-2007, 14:56
Yep, they have been resized, and yes..the second shot is a bit blurrer
Which would imply that in CFAA mode the RBEs are "dumb" sampling from neighbouring pixels regardless of whether the neighbouring pixels are edge pixels or not.

Which, being honest, is hardly surprising.

There's gonna be LOD fiddling, movies and riots, I guess.

Jawed

Jawed
01-May-2007, 15:00
Is it completely ridiculous to suggest that constants are loaded into registers before issuing ALU ops. I guess the performance drop would be catastrophic in some cases. :wink:
I didn't suggest that's how it worked. What I mean is that an operand for an instruction is either a register or a constant. It could, also, be a gather from memory, but...

(Question for a 3D shader expert: How common are constant operands in real life 3D shaders?)
http://www.gamedev.net/community/forums/mod/journal/journal.asp?jn=316777&reply_id=2521733

Constants are important in D3D10.

Yes, that's the other big question: but with the vec4, isn't the only freedom a permutation of xyza? (I haven't read the CTM docs...)
That reduces a 15-way fetch into a 6-way fetch (if the 1D supports MAD), if you're smart about the way you lay out registers in memory. If there are no restrictions at all in R600, you'd need 15 independent reads and 5 independent writes per shader unit. Insane.
I expect we're stuck until someone tests this. I've spent lots of time thinking about register fetch, but not for this configuration. Truthfully, no-one seems to have even explored the limits of R300 :sad:

Another question: in R600, the branch logic is in parallel with the ALU's. Is that a departure from R5xx or has it always been like that?
R5xx also has this dedicated unit. R420 and earlier couldn't perform dynamic branching in the pixel shader.

Yes, I'm not sure there is that much additional value in allowing independent scalar operations, since the majority of ops in 3D are vec3 or vec4 based anyway and the amount of scalar ops are not that common.
It's probably a different story for GPGPU.
Yeah. Hopefully there'll be a nice whitepaper for the R600 architecture describing motivations.

Purely scalar code with every instruction dependent upon its predecessor will make R600 crawl. G80 will lap it up.

Jawed

Dooby
01-May-2007, 15:03
I use photoshop for about 80% of my day, and Im *sorta* convinced that that the first Non-AA image has been sharpened, as well as resized. Zoom in around dark lines, such as the powerline pole, and you can definitely see more of a lighter outline in the Non-AA pic than the AA'd pic. Im aware the AA'd pic has a slight outline too, but the Non-AA pic is much more pronouced. You usually get this when you apply a sharpen filter in Photoshop et al.

I would take these images with a pinch of salt when looking at texture quality.

That said, OMG the FSAA looks GORGEOUS!

Jawed
01-May-2007, 15:06
But the inevitable price to pay is that you give up something at the system level. And that's a bad thing. In this case, that price means more over-dimensioning to reduce freak corner cases, over-dimensioning to limit the latency penalty (2 opposite direction rings), less control over scheduling etc. On R580, that was manageable because it was only used for read return data. But if R600 also uses this to transport write data, it gets really ugly.
GPUs are latency tolerant. It seems like a completely different focus from CPUs and routers. GPUs spend their lives converting little packets into bigger packets, because throughput is more important than latency.

Jawed

EasyRaider
01-May-2007, 15:07
Which would imply that in CFAA mode the RBEs are "dumb" sampling from neighbouring pixels regardless of whether the neighbouring pixels are edge pixels or not.

Which, being honest, is hardly surprising.

There's gonna be LOD fiddling, movies and riots, I guess.

Jawed
Considering all the nasty texture swimming and moire I have seen with my X1900, a bit of blur might be a good thing.

Silent_Buddha
01-May-2007, 16:17
HL2 16x CFAA

http://img389.imageshack.us/my.php?image=20070501d14724f5c5bff91et2.jpg

http://img168.imageshack.us/my.php?image=20070501d05322923e0d643fd2.jpg

I'm actually more interested in the 12x and 24x CFAA settings.

I believe the 16x CFAA is 8x MS with a Wide Tent. Thus I'd expect it to be a bit blurrier.

I'm much more interested in seeing the 12x CFAA with it's 8x MS and Narrow Tent.

Also, like to see the effect of Edge Resolve with the 24x setting.

Still, 16x CFAA does a wonderful job on those trees, wires, and fence. Hopefully 12x CFAA has a similar effect in those areas.

[Edit] Zoomed in on those shots and I'm really not liking how 16x with it's Wide Tent blurs and washes out the colors. Someone please try to get a shot of the 12x setting. :)

Regards,
SB

silent_guy
01-May-2007, 16:24
On the other hand, you can't use ALL those metal layers (usually top 2 reserved for power/ground IIRC?), then you have your clock distribution and what not. Also the higher metal layers are much less dense than the lower ones...
You have some freedom wrt what you can do for each layer, and you can mix. Clock networks do not have their own dedicated layers. The buffers are just placed first before the rest of the cells and usually the nets may different spacing rules. But I don't know the exact details anymore.

Wouldn't you just dump your data onto the stop closest to you and it will just go to its required stop? All the scheduling and whatnot should happen individually on each of the stops, no? Since the memory controllers are going to be on the stops i think? What am i missing here?
Additional and unpredictable(!) latency. The latter happens when you have a ring stop wanting to insert data when there's also data on the ring trying to shift in the same place.
And there's the risk of running into deadlocks. An obvious way would be if one client backs up and data for this client arrives: it will stall the whole ring. But there are more insidious cases with multiple clients injecting at just the wrong time.
This is avoided by adding larger buffers and either conservative or complex scheduling or both, but it's not easy to get right. A ring is a great example of how a seemingly simple system can exhibit beautiful patterns of entropy. :wink:

GPUs are latency tolerant. It seems like a completely different focus from CPUs and routers. GPUs spend their lives converting little packets into bigger packets, because throughput is more important than latency.

Well, yes, we all know that, don't we?
But that doesn't mean you have to ask for it. Each additional cycle of latency cost you an additional amount of buffering or an earlier breaking in performance.

Switching from defense to offense: other than some non-system related implementation details, is there a single advantage of a ring over a crossbar?

(The latency in routers introduced by the switching fabric itself is pretty much irrelevant compared to the latency introduced by higher-level scheduling. And you don't have the closed loop of a requester having to wait for returning data. Throughput is much more important.)

Silent_Buddha
01-May-2007, 16:33
Wasn't one of the reasons stated by ATI for the ring bus over a crossbar was that either the complexity of a crossbar increased much more so than a ringbus with higher width? Or that transistor useage increased much more significantly with crossbar vs. ringbus with higher bus width?

I'd imagine one reason NV could only do 384 bits was due to either number of transistor needed or complexity.

Then again, I'm just a layman so I might be getting all of this back arsewards.

Regards,
SB

Bjorn
01-May-2007, 16:34
Heh, I'm sure they wish that was "killing performance over 8800GTS/GTX". It'll be interesting to see Nvidia's response - a 2900XT at $400 will be mighty attractive.

There is a drawback with being late to the game though, the competition's cards are available for quite a bit below the MSRP. The 8800 GTS 640 Mb can be had for 360$, the 320 Mb for 260$.

mao5
01-May-2007, 16:36
That picture from another area, the game has very unbalanced frame rates, what driver the guy using with the XT?:smile:

x1950xtx's pic and R600XT'pic are in the same area?

vertex_shader
01-May-2007, 16:43
x1950xtx's pic and R600XT'pic are in the same area?

Almost.

mao5
01-May-2007, 16:43
R600 XT counterattack to NV purevideo HD without ATI AVIVO.
http://vietnamglobalteam.org/images/smilies/BBP/50.gif

http://images.anandtech.com/graphs/geforce%208600%20h264_04270710452/14537.png

http://www.chiphell.com/attachments/month_0705/20070501_f27e69d43f078e9279ffC03FgJgyMtsC.jpg

mao5
01-May-2007, 16:47
Almost.

I remember you guys want the compare fps at the same site, VS, you should see the two pics again, one pic was taken at the road beside the airport garage, one pic was taken at the road with trees and boskage around, you call them almost same?

Bjorn
01-May-2007, 16:51
I'm guessing that this means that it'll support VC1 also which the 8600 doesn't. At least not to the extent that it does with h264.

All broadcasted HD content (at least i Europe) will afaik be h264 though so i don't really see this as a problem, but it sure doesn't hurt to have it either.

_xxx_
01-May-2007, 16:51
Additional and unpredictable(!) latency. The latter happens when you have a ring stop wanting to insert data when there's also data on the ring trying to shift in the same place.
And there's the risk of running into deadlocks. An obvious way would be if one client backs up and data for this client arrives: it will stall the whole ring. But there are more insidious cases with multiple clients injecting at just the wrong time.
This is avoided by adding larger buffers and either conservative or complex scheduling or both, but it's not easy to get right. A ring is a great example of how a seemingly simple system can exhibit beautiful patterns of entropy. :wink:

But why the 512-bit bus then? ;)

The stalls are less of a problem, since you will surely have dedicated lines there. Wouldn't make sense otherwise.

pjbliverpool
01-May-2007, 16:52
R600 XT counterattack to NV purevide HD without ATI AVIVO.
http://vietnamglobalteam.org/images/smilies/BBP/50.gif



That tells us very little. For a start that graph is showing the max CPU use, not the average. Average is here:

http://images.anandtech.com/graphs/geforce%208600%20h264_04270710452/14536.png

And the CPU use would depend greatly on the film used.

mao5
01-May-2007, 16:59
That tells us very little. For a start that graph is showing the max CPU use, not the average. Average is here:

http://images.anandtech.com/graphs/geforce%208600%20h264_04270710452/14536.png

And the CPU use would depend greatly on the film used.
I don't think so, without ATI AVIVO in 8.361, R600XT already show such a good VC1 performance, How about average CPU utilization with AVIVO HD in official drv?

http://vietnamglobalteam.org/images/smilies/BBP/50.gif

mao5
01-May-2007, 17:02
Vegas carnage will show up

http://vietnamglobalteam.org/images/smilies/BBP/50.gif

vertex_shader
01-May-2007, 17:04
I remember you guys want the compare fps at the same site, VS, you should see the two pics again, one pic was taken at the road beside the airport garage, one pic was taken at the road with trees and boskage around, you call them almost same?

Some meter difference, but the 8800gtx pictures you nliked its maked in whole different area.

mao5
01-May-2007, 17:06
Some meter difference, but the 8800gtx pictures you nliked its maked in whole different area.

ok. so you admit they are really different scence.

Jawed
01-May-2007, 17:08
Well, yes, we all know that, don't we?
But that doesn't mean you have to ask for it. Each additional cycle of latency cost you an additional amount of buffering or an earlier breaking in performance.
Compared with a couple hundred clocks of worst-case latency on a DDR fetch, do you know how much worst-case latency R600's ring bus will add (compared with a crossbar)? I dunno. 10s of clocks?

Switching from defense to offense: other than some non-system related implementation details, is there a single advantage of a ring over a crossbar?
I guess we'll just have to wait for a GPU designer to pipe up :razz:

Here's an overview of a GPU I drew up while speculating about R600:

http://forum.beyond3d.com/showpost.php?p=890354&postcount=1447

(some of that is R600-specific speculation.)

Perhaps you'd like to compare that with a router and we can talk about where the meat of the scheduling/balancing problem is. In my opinion a GPU has so much scheduling to do that bus scheduling turns out to be a supporting role, not the centre of its universe.

I think the implementation factor, that you're brushing off, is a big deal. IBM went with a ring bus for Cell, hugely motivated by simplicity of implementation.

Jawed

silent_guy
01-May-2007, 17:17
In my opinion a GPU has so much scheduling to do that bus scheduling turns out to be a supporting role, not the centre of its universe.
At the end of the day, the ring is nothing but a transport mechanism that doesn't really do anything. And that's my whole point: the hoopla about the ring being an important feature is completely unwarranted. :grin:

(I have to run... Rest will follow later)

mao5
01-May-2007, 17:18
Vegas carnage will show up

http://vietnamglobalteam.org/images/smilies/BBP/50.gif

showtime!

Anandtech:
"The Benchmark

Our benchmark was suggested to us by Ubisoft and it's basically an average FPS of looking out of the window on the first helicopter ride over a cityscape in Mexico. "


CPU: Intel Core 2 Extreme X6800 (2.93GHz/4MB)
Motherboard: EVGA nForce 680i SLI
Intel BadAxe

http://images.anandtech.com/graphs/rainbow%20six%20vegas%202006_12240661206/13806.png

R600XT Tester:
CPU: Intel Core 2 E6600 (2.40GHz/4MB) R600XT( Default clock)

same scene:

http://www.chiphell.com/attachments/month_0705/20070501_c7489a111c3ad014b93b3juXWKWLMKSs.jpg

The Test Video Setting:
http://www.chiphell.com/attachments/month_0705/20070501_749826243be9e9845644ULjKHQl2UZar.jpg

http://vietnamglobalteam.org/images/smilies/BBP/50.gif

Razor1
01-May-2007, 17:20
One screenshot really doesn't give us anything mao, really have to give us averages.

3dilettante
01-May-2007, 17:21
I expect we're stuck until someone tests this. I've spent lots of time thinking about register fetch, but not for this configuration. Truthfully, no-one seems to have even explored the limits of R300 :sad:


A lot of this depends on what that big block that corresponds to that highly threaded scheduling processor does.

It's obviously important, and there's minimal detail thus far on just what it does.
Its functionality seems on the surface to be more complex than the local thread schedulers used by G80 per SIMD group.
Perhaps by omission or on purpose, the diagram doesn't show it being divided up into clusters like the rest of the chip, which might mean some shenanigans are going on in instruction scheduling and register fetch.


Yeah. Hopefully there'll be a nice whitepaper for the R600 architecture describing motivations.


I hope so too.
Trouble is, AMD is rather paranoid. Some of the best (only?) analyses of its CPUs came from outside the company.

We may not get the kind of disclosure we'd like.

Sound_Card
01-May-2007, 17:24
mao5, we can't base something off a frap counter on a picture.:???:

Evildeus
01-May-2007, 17:27
Yeah we want averages not 1 screenshot that doesn't tell us anything

pjbliverpool
01-May-2007, 17:29
One screenshot really doesn't give us anything mao, really have to give us averages.

Well it gives us one thing IMO. It shows that there is no "carnage" when it comes to that particular game, not if you assume the screenshot was taken at one of the higher points in the scene and knowing the fact that its at best only about 5% faster than G80.

Of course, if its 5% faster across the board then I think we will all be happy. Or happier than 5% slower anyway!

Rebel44
01-May-2007, 17:31
It looks like 2900XT will have very similar or equal performance as 8800GTX:twisted: :twisted:

Rebel44
01-May-2007, 17:34
It looks like 2900XT MIGHT have very similar or equal performance as 8800GTX:twisted: :twisted:
- I hate when I cant edit my posts:wink:

nicolasb
01-May-2007, 17:38
All broadcasted HD content (at least i Europe) will afaik be h264 though so i don't really see this as a problem, but it sure doesn't hurt to have it either.Broadcast content, maybe, but HD-DVD discs use VC-1 a hell of a lot, and Blu-Ray is heading in the same direction. If you want to use your PC as a hi-def disc player (a purpose for which the native-HDMI-with-audio RV6xx cards are ideal) then you need good VC-1 decoding.

Dalton Sleeper
01-May-2007, 17:43
Is it possible to bench at higher res like 1680*1050 (16:10) if available?

Jawed
01-May-2007, 17:44
A lot of this depends on what that big block that corresponds to that highly threaded scheduling processor does.

It's obviously important, and there's minimal detail thus far on just what it does.
Its functionality seems on the surface to be more complex than the local thread schedulers used by G80 per SIMD group.
Perhaps by omission or on purpose, the diagram doesn't show it being divided up into clusters like the rest of the chip, which might mean some shenanigans are going on in instruction scheduling and register fetch.
I've always presumed the UTDP is localised per shader unit. R5xx is the same, really:

http://www.beyond3d.com/content/reviews/2/3

I think the diagram simplifies this unit for the sake of focussing on the guts, the ALUs, TUs and RBEs.

From R300 onwards a fundamental aspect of the architecture has been tiled pixel-shading. As far as I can tell that means hierarchical-Z tiling, rasterisation, thread despatch, TMUs, ALUs, ROPs, colour buffer cache and z/stencil buffer cache are all tiled. If you're going to tile all that stuff, why have a shared threaded despatch processor?

It would be nice to know one way or another...

Jawed

mao5
01-May-2007, 17:44
One screenshot really doesn't give us anything mao, really have to give us averages.

If I have that card, I will give u guys average fps.

mao5
01-May-2007, 17:47
mao5, we can't base something off a frap counter on a picture.:???:

mh, I know, I've suggested the tester to get average fps.

mao5
01-May-2007, 17:50
Well it gives us one thing IMO. It shows that there is no "carnage" when it comes to that particular game, not if you assume the screenshot was taken at one of the higher points in the scene and knowing the fact that its at best only about 5% faster than G80.

Of course, if its 5% faster across the board then I think we will all be happy. Or happier than 5% slower anyway!

I'm sure u can find a buoy for GTX:

Test instruction set (length 384, no texture fetch and 100 iterations)
MAD R0.xyz, R0, R0, R1;
MUL R2.x, R2, R3;
MAD R1.x, R1, R1, R3;
MAD R0.xyz, R1, R1, R0;
ADD R2.x, R1, R2;
MUL R3.x, R3, R1;


R600XT (co-issue 3 instructions) - 93,9277 GInstr/sec

8800 GTX - 39,1998 GInstr/sec

mao5
01-May-2007, 17:52
Is it possible to bench at higher res like 1680*1050 (16:10) if available?

Anyone can donate a 20 or 22 LCD for the tester?

mao5
01-May-2007, 17:59
It looks like 2900XT will have very similar or equal performance as 8800GTX:twisted: :twisted:

It's a very simple argument, if 8800GTX could kick XT's ass, why NV annouced 8800U today ? (It's now May 2 at my local time)

Chalnoth
01-May-2007, 18:02
It's a very simple argument, if 8800GTX could kick XT's ass, why NV annouced 8800U today ? (It's now May 2 at my local time)
I think we expect a new high-end part every six months or so. What's so unusual about this?

Evildeus
01-May-2007, 18:04
It's a very simple argument, if 8800GTX could kick XT's ass, why NV annouced 8800U today ? (It's now May 2 at my local time)Still have the best card in da world?

mao5
01-May-2007, 18:09
8800U got 14009 with 720/2440MHz on X6800 3.603GHz with 2GB ram
http://www.chiphell.com/viewthread.php?tid=3534&extra=page%3D1

http://www.chiphell.com/attachments/month_0704/20070430_7a26e50fcf4dd640782byLo9ewqgqSM2.jpg
http://www.chiphell.com/attachments/month_0704/20070430_f758071efb8b81b20ecd0xkcBJkB79sE.jpg
http://www.chiphell.com/attachments/month_0704/20070430_ecf5ee494f69e237f400BrLlxdYdwbYC.jpg

R600XT got 14005 with 845/1990MHz on Core 2 Extreme QX6800 (Anyone know the speed?) with 4GB ram
http://news.mydrivers.com/img/20070425/05153606.jpg

I remember DT said with 650/2000MHz 8800GTX 158.19 drv, they got 14128 in 3DMARK06,do u guys think?

IbaneZ
01-May-2007, 18:14
3dmark-06 is a pure CPU test with those cards anyway.

Can't wait to see what K10 will do for G80 and R600 though. :smile:

Dalton Sleeper
01-May-2007, 18:16
At first I was waiting for XTX 1GB version but since it was delayed I've decided to build a crossfire system of two XT 512MB. Nice to know that it will end up faster than expected. Wonder what the x-fire package will cost, probably a little less than 2*400. Hope it will be available here in sweden around 14-15th too.

3dilettante
01-May-2007, 18:28
I've always presumed the UTDP is localised per shader unit. R5xx is the same, really:

http://www.beyond3d.com/content/reviews/2/3

I think the diagram simplifies this unit for the sake of focussing on the guts, the ALUs, TUs and RBEs.

From R300 onwards a fundamental aspect of the architecture has been tiled pixel-shading. As far as I can tell that means hierarchical-Z tiling, rasterisation, thread despatch, TMUs, ALUs, ROPs, colour buffer cache and z/stencil buffer cache are all tiled. If you're going to tile all that stuff, why have a shared threaded despatch processor?

It would be nice to know one way or another...

Jawed

You're probably right. I thought there could be a change in some of the functionality because the new diagram had lines routed to the stream processors, while the diagrams for the other chips had the lines routed to the UTDP.

The nebulous state of where the register file was being kept intrigued me. I thought perhaps the UTDP could be used to creatively rename register fills to make them more amenable to a vector fetch.

It's silly trying to speculate from a low-res pic, but that's all I've got.

LeStoffer
01-May-2007, 18:32
I'm sure u can find a buoy for GTX:

Test instruction set (length 384, no texture fetch and 100 iterations)
MAD R0.xyz, R0, R0, R1;
MUL R2.x, R2, R3;
MAD R1.x, R1, R1, R3;
MAD R0.xyz, R1, R1, R0;
ADD R2.x, R1, R2;
MUL R3.x, R3, R1;


R600XT (co-issue 3 instructions) - 93,9277 GInstr/sec

8800 GTX - 39,1998 GInstr/sec

No texture fetch? Sure, they would like that.

Bjorn
01-May-2007, 18:38
It's a very simple argument, if 8800GTX could kick XT's ass, why NV annouced 8800U today ? (It's now May 2 at my local time)

Well, if the XT would kick the 8800 GTX's ass, and the Ultra. Why sell it for 400$ ?

And sorry, i don't buy the "to gain back marketshare" thing.

Dalton Sleeper
01-May-2007, 18:50
Well, if the XT would kick the 8800 GTX's ass, and the Ultra. Why sell it for 400$ ?

And sorry, i don't buy the "to gain back marketshare" thing.

Maybe if R600 is a little slower in DX9 and catch up a bit in DX10, there isn't many DX10 games out today so a pricedrop may be the only way out. Now I don't know if XT is slower in DX9 but i think I read it somewhere.

trinibwoy
01-May-2007, 18:52
It's a very simple argument, if 8800GTX could kick XT's ass, why NV annouced 8800U today ? (It's now May 2 at my local time)

You can spin that however you like. You can ask why is ATI launching the XT at $400 is it's as fast as the GTX? You can ask whether Nvidia thought the XT would be faster and now that they see it's not they can jack up the price on the Ultras they've been hoarding.

Jawed
01-May-2007, 18:57
The nebulous state of where the register file was being kept intrigued me. I thought perhaps the UTDP could be used to creatively rename register fills to make them more amenable to a vector fetch.
Hmm, yeah, I didn't really notice how the register file has gone AWOL in the R600 diagram.

Considering how intriguing this patent application is:

Method and apparatus for managing tasks in a multiprocessor system (http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=US2006059484&F=0)

on the way it separates "memory" (which I interpret as registers) from execution processors (which can be ALU pipes or TU pipes) there may be a good reason...

Jawed

Dalton Sleeper
01-May-2007, 19:24
That NDA thing, will it expire the 2nd or the 14th?

Quite a time diff to China:
US (GMT-8?)
Sweden (GMT+1)
China (GMT+8 or 9?)

mao5
01-May-2007, 19:27
Well, if the XT would kick the 8800 GTX's ass, and the Ultra. Why sell it for 400$ ?

And sorry, i don't buy the "to gain back marketshare" thing.

I buy

Ailuros
01-May-2007, 19:35
I buy

Don't sell it then :P

Bob
01-May-2007, 19:41
Hehe, i was allowed 4 metal layers.
Pfft, you young whipper snappers and your fancy tools. We had to do with a single metal layer! Let's just say it made place and route very interesting. And by interesting, I mean unnecessarily tedious.

Switching from defense to offense: other than some non-system related implementation details, is there a single advantage of a ring over a crossbar?
Considering that there are just up to 4 ring stops in R5xx and R6xx, wouldn't a point-to-point bus be even better? In terms of wiring/power, it's not that much worse than a ring, and it does have the advantage of drastically reducing the scheduling complexity, latency and additional buffering.

Of course, that doesn't scale well above 4 points, but then again, we haven't seen the "ring bus" do better yet.

But why the 512-bit bus then?
Because efficient non-power-of-2 memory addressing is hard. So if you need more than 256 bits, your next bus width becomes 512-bits. All IMHO. It would be interesting to see if anyone can write a test to max out the bandwidth on R600, given the rumours of 16 ROPs RBEs and 16 Textures.

IBM went with a ring bus for Cell, hugely motivated by simplicity of implementation.
Cell also has 9 "stops" on their ring. That's somewhat more than 4.

Bob
01-May-2007, 19:50
I dunno. 10s of clocks?
Here's a question for you: At 750 MHz, given a ~21x21mm chip, how many clocks does it take to carry a signal to the other side of the chip and back, ignoring transistor delays? Ok, now add some shmoo factor for transistor delays. I think you'd find that your 10s of clocks is closer to ten 10s than one 10s.

CarstenS
01-May-2007, 19:51
Perhaps by omission or on purpose, the diagram doesn't show it being divided up into clusters like the rest of the chip, which might mean some shenanigans are going on in instruction scheduling and register fetch.

I know it's not exactly on topic, but i remember some pictures (which i took) from the R580-launch, where you could see the division of the so called ultra-thread dispatch processor into 12 sections. IIRC this separation was gone in the later-to-be-spread PDFs.

Maybe marketing has improved - surely they have had more time to revise their presentations this time.

edit:
I'm sorry, obviously this info has already been posted and there seemed to be some other version of the diagrams floating around, in which that separation was still visible.

Farhan
01-May-2007, 19:54
Here's a question for you: At 750 MHz, given a ~21x21mm chip, how many clocks does it take to carry a signal to the other side of the chip and back, ignoring transistor delays? Ok, now add some shmoo factor for transistor delays. I think you'd find that your 10s of clocks is closer to ten 10s than one 10s.
100 clocks from one side of the chip to the other just for wiring? Sounds really slow to me!

Razor1
01-May-2007, 19:59
That NDA thing, will it expire the 2nd or the 14th?

Quite a time diff to China:
US (GMT-8?)
Sweden (GMT+1)
China (GMT+8 or 9?)


NDA's usaully ignore time zones.

Subtlesnake
01-May-2007, 20:14
Well, if the XT would kick the 8800 GTX's ass, and the Ultra. Why sell it for 400$ ?
Because they're filling in the gap with the XTX?

nAo
01-May-2007, 20:19
Cell also has 9 "stops" on their ring. That's somewhat more than 4.
Just a minor correction: Cell has 12 "stops" (8 SPUs + PPU + MEMController + 2*FLEXIO)

fellix
01-May-2007, 20:24
There are actually 5 stops on the bus: four memory partitions + the host interface (PCIe). ;)

Sound_Card
01-May-2007, 20:58
http://i21.photobucket.com/albums/b299/Genocide737/65146379b190eac6.jpg

HIS is ready to go.:grin:

Jawed
01-May-2007, 21:27
Here's a question for you: At 750 MHz, given a ~21x21mm chip, how many clocks does it take to carry a signal to the other side of the chip and back, ignoring transistor delays? Ok, now add some shmoo factor for transistor delays. I think you'd find that your 10s of clocks is closer to ten 10s than one 10s.
How much latency for data going through a crossbar, if it comes in one side of the chip but needs to be over on the other side?

Put another way, what proportion of the latency is wire delay and how much is architecture-specific logic? Crossbar versus ring bus.

Jawed

Arnold Beckenbauer
01-May-2007, 22:12
http://i21.photobucket.com/albums/b299/Genocide737/65146379b190eac6.jpg

HIS is ready to go.:grin:

Is it a Lamda logo on the box? :grin:

I don't believe, a new graphics card like HD2900 will be bundled with an old game like HL2 or HL2 Ep. 1 without some graphical updates.
http://forum.beyond3d.com/showthread.php?t=39699
http://www.gameinformer.com/News/Story/200703/N07.0305.1818.53359.htm
For those who recently upgraded their PC video cards to support DX10, you’ll be happy to know that both Episode Two and Team Fortress 2 will support the advanced technology of these cards, even if the PC is running Windows XP. On the low-end front, Valve still plans on supporting PCs that can only run DX8 applications. Valve’s attempting to also support DX7, but couldn’t confirm if it’d go back that far with technology

tEd
01-May-2007, 22:14
Is it a Lamda logo on the box? :grin:

I don't believe, a new graphics card like HD2900 will be bundled with an old game like HL2 or HL2 Ep. 1 without some graphical updates.
http://forum.beyond3d.com/showthread.php?t=39699
http://www.gameinformer.com/News/Story/200703/N07.0305.1818.53359.htm

Ep2 is bundled

Kaotik
01-May-2007, 22:18
Is it a Lamda logo on the box? :grin:

I don't believe, a new graphics card like HD2900 will be bundled with an old game like HL2 or HL2 Ep. 1 without some graphical updates.
http://forum.beyond3d.com/showthread.php?t=39699
http://www.gameinformer.com/News/Story/200703/N07.0305.1818.53359.htm

How exactly did they plan to support "advanced features of these cards" under XP? Last time I checked HL2 didn't support OpenGL (and I don't know if there's even extensions for the "DX10 level" stuff yet?)

anaqer
01-May-2007, 22:24
Ep2 is bundled
*giggles*

Dalton Sleeper
01-May-2007, 22:30
When i bought 9800XT I got a HL2 coupon :wink: and another one when I upgraded to X800XT PE :roll: , then they released the game :???:

allnighter
01-May-2007, 22:31
More likely a coupon is attached. EP2 is not coming that soon.

Arnold Beckenbauer
01-May-2007, 22:33
Ep2 is bundled

Ep. 2 comes when? :wink:
There is another speculation in HL2.Net forums: http://www.halflife2.net/forums/showthread.php?t=123608

How exactly did they plan to support "advanced features of these cards" under XP? Last time I checked HL2 didn't support OpenGL (and I don't know if there's even extensions for the "DX10 level" stuff yet?)
Read my speculation in the linked thread.:smile:

Anarchist4000
01-May-2007, 22:35
There are actually 5 stops on the bus: four memory partitions + the host interface (PCIe). ;)

Or are there 6 stops? Your 5 plus 1 for UVD.

Kaotik
01-May-2007, 22:36
Read my speculation in the linked thread.:smile:
If the "supported advanced features" would indeed be fp16 filtering + msaa on fp16-rt's, i don't know if i should laugh or cry :lol:

Russell
01-May-2007, 22:36
More likely a coupon is attached. EP2 is not coming that soon.

Yeah, the way I heard is that it is a coupon, just like last time.

Except this time I don't see Ep2 being delayed for the third (fourth?) time. I mean...it's only Ep2, not an entirely new game.

Regardless, getting that for free makes that R600 that much more tempting.

Kaotik
01-May-2007, 22:37
Or are there 6 stops? Your 5 plus 1 for UVD.

Wouldn't audio department want one too?

oeLangOetan
01-May-2007, 22:37
Any ideas on how they are going to expose the hardware tesselator in dx10?

tEd
01-May-2007, 22:39
Ep. 2 comes when? :wink:



Does it matter? It doesn't have to be ready at r600 launch.

CJ
01-May-2007, 22:40
Ep2 is bundled

Not only EP2, but also TF2 and Portal (the voucher for these games that is). And if I recall correctly it's possible for HD2900XT buyers to immediately download a free game from Steam (forgot which game though).

tEd
01-May-2007, 22:44
Not only EP2, but also TF2 and Portal (the voucher for these games that is). And if I recall correctly it's possible for HD2900XT buyers to immediately download a free game from Steam (forgot which game though).

that's cool a free game right away.

RobertR1
01-May-2007, 22:47
So do we get a NDA lift tomorrow or is it delayed to the 14th also?

tEd
01-May-2007, 22:48
So do we get a NDA lift tomorrow or is it delayed to the 14th also?

i think 2th never was real.

Jawed
01-May-2007, 22:51
Or are there 6 stops? Your 5 plus 1 for UVD.
Hey this is fun. Let's see how high we can get?

Another one for CrossFire?

Jawed

cadaveca
01-May-2007, 22:51
Try two for crossfire lol.


:roll:

Kaotik
01-May-2007, 22:58
i think 2th never was real.

as far as I know, it never was indeed.

neliz
01-May-2007, 23:04
that's cool a free game right away.

The HL2 coupon also allowed you to play the entire back-catalog of Valve games right away and allowed for CS:Source as well.. Yes.. that 6 million goes a long way...

fallguy
01-May-2007, 23:06
Any proof or just rumors that EP2 and TF2 will be bundled?

cadaveca
01-May-2007, 23:08
It is NOT rumoured that "Black Box" from Valve is included.:grin:

black box contents are EP2, TF2, and Portal. feel free to check any of the gaming-ad sites will reflect the same info.:wink:

fallguy
01-May-2007, 23:31
Just asking because there have been no links...

Geo
01-May-2007, 23:47
So do we get a NDA lift tomorrow or is it delayed to the 14th also?

Well, we know it was only reported (that I saw anyway) by VR Zone, and VR Zone for their many charms re NV information, just has never inspired confidence re ATI Radeon info.

INKster
02-May-2007, 00:15
Well, we know it was only reported (that I saw anyway) by VR Zone, and VR Zone for their many charms re NV information, just has never inspired confidence re ATI Radeon info.

They did broke the R600 XTX OEM news with an actual picture, a few months ago.

Silent_Buddha
02-May-2007, 00:30
Except that was was the XTX back then, might no longer be what the XTX is now.

And I believe the black box voucher for Valve games is any TWO games that you can choose out of whatever the choices are. Would be interesting if it was any TWO games, now and in the future (in the case that you already have all the Valve games).

Regards,
SB

INKster
02-May-2007, 00:35
Except that was was the XTX back then, might no longer be what the XTX is now.

And I believe the black box voucher for Valve games is any TWO games that you can choose out of whatever the choices are. Would be interesting if it was any TWO games, now and in the future (in the case that you already have all the Valve games).

Regards,
SB

Well, the only thing different was the color of the shroud (translucid black, instead of translucid red).
The clocks might have been different, but the essentials of the cooler, length and general PCB layout were all there.

Kaotik
02-May-2007, 00:51
They did broke the R600 XTX OEM news with an actual picture, a few months ago.

I'm not 110% sure, but IIRC VR-Zone wasn't the first source for it?

Arty
02-May-2007, 01:00
I'm not 110% sure, but IIRC VR-Zone wasn't the first source for it?
They were and they also were big cockteasers. After CeBIT, Tweaktown's article with the Retail PCB (NDA breaking by TUL) took away all their glory. :lol:

That was a big coup for them but their latest about the NDA date being preponed takes them back to square one wrt to ATI related leaks. By now, AMD must have identified which of their AIB partner was/is leaking info..

Silent_Buddha
02-May-2007, 01:02
Well, the only thing different was the color of the shroud (translucid black, instead of translucid red).
The clocks might have been different, but the essentials of the cooler, length and general PCB layout were all there.

By different, I mean I just have a feeling that the XTX is now a dual chip card rather than the single chip card that was originally planned.

I think ATI is taking a page from the Nvidia playbook and will release a 7950 GX2ish type card as the XTX in order to take the "single card performance crown."

Regards,
SB

SugarCoat
02-May-2007, 01:04
I think this is an indication from Valve of the true release date of EP2, obviously it will come after the R700 is released. If anyone remembers the origonal HL2 voucher was bundled mainly with 9800/9600XT cards. :lol:

By different, I mean I just have a feeling that the XTX is now a dual chip card rather than the single chip card that was originally planned.

I think ATI is taking a page from the Nvidia playbook and will release a 7950 GX2ish type card as the XTX in order to take the "single card performance crown."

Regards,
SB

Really think they want to look like idiots in the performance per watt tests? Not to mention the power supply requirement. Personally i look at the GX2 as a successful failure. It worked but the card itself was a little bit of a joke.

trinibwoy
02-May-2007, 01:14
I think ATI is taking a page from the Nvidia playbook and will release a 7950 GX2ish type card as the XTX in order to take the "single card performance crown."

Big problem with that. R600 is no G71.

Arty
02-May-2007, 01:25
Big problem with that. R600 is no G71.
Yup. Even a GX2 type card based on 65nm R600s would be quite difficult to engineer.

Topman
02-May-2007, 01:32
R600 XTX dual-gpu?
Impossible...
hmmm, a vision: external box(whisper@TheInq) with dual R600 + advanced cooling(WC?TEC?) + dedicated PSU (like some VGAs@Asus) + 'Lasso' ?

>>> US$ 999,00?

bye

FrameBuffer
02-May-2007, 02:49
R600 XTX dual-gpu?
Impossible...
hmmm, a vision: external box(whisper@TheInq) with dual R600 + advanced cooling(WC?TEC?) + dedicated PSU (like some VGAs@Asus) + 'Lasso' ?

>>> US$ 999,00?

bye

Impossible .. No

Impracticle .. Yes

big difference.

INKster
02-May-2007, 03:19
R600 XTX dual-gpu?
Impossible...
hmmm, a vision: external box(whisper@TheInq) with dual R600 + advanced cooling(WC?TEC?) + dedicated PSU (like some VGAs@Asus) + 'Lasso' ?

>>> US$ 999,00?

bye

It already exists. They call it Quadro Plex. ;)

Topman
02-May-2007, 03:33
:)

-> PCINLIFE -> 3DCenter -> B3D

http://img182.imageshack.us/img182/8188/attachmentd93c4ul2.jpg
http://img485.imageshack.us/img485/9273/ringbusii2it4.jpg
http://img442.imageshack.us/img442/1770/attachment11fa6and3.jpg

:P

Arty
02-May-2007, 03:50
Palit = new AMD partner?

Chalnoth
02-May-2007, 03:52
Palit = new AMD partner?
Nope. They have a number of current AMD products too.

Arty
02-May-2007, 03:57
Nope. They have a number of current AMD products too.
My mistake then, I wasnt aware of it.

silent_guy
02-May-2007, 04:01
How much latency for data going through a crossbar, if it comes in one side of the chip but needs to be over on the other side?

Put another way, what proportion of the latency is wire delay and how much is architecture-specific logic? Crossbar versus ring bus.


There should only be a few clock cycles delay between entering and leaving the crossbar itself. So in that case, assuming no stalls, the difference is entirely in the propagation delay. Worst case, that would be double for a ring? In practice, I expect it to be less. But even if it's only, say, 30% more instead of 100%, that's a lot of additional buffering.

silent_guy
02-May-2007, 04:10
Excellent slide, Topman!

Ouch, it looks like they abandoned 32-bit MCs! That would go a long way explaining the horrible GDDR4 XTX benchmark numbers of DT. I suppose they had no choice with a 512-bit bus?

Russell
02-May-2007, 06:00
Excellent slide, Topman!

Ouch, it looks like they abandoned 32-bit MCs! That would go a long way explaining the horrible GDDR4 XTX benchmark numbers of DT. I suppose they had no choice with a 512-bit bus?


The diagram shows two chips. Each channel seems to be made up of two 32-bit chips, making it a 64-bit channel. I don't think they'd be silly enough to use 64-bit chips as that would add to the cost even more.

gunblade
02-May-2007, 06:34
The diagram shows two chips. Each channel seems to be made up of two 32-bit chips, making it a 64-bit channel. I don't think they'd be silly enough to use 64-bit chips as that would add to the cost even more.

Well, if they want to have 64bits then they would have to make it themselves since standard Gddr3/4/5 memory chip are all x32.

silent_guy
02-May-2007, 06:43
I don't think they'd be silly enough to use 64-bit chips as that would add to the cost even more.
Especially since they don't exist! :wink:
What I meant is: with R520, ATI introduced a MC that could talk independently to each 32-bit RAM. In the past, it had a 64-bit MC and treated 2 32-bit chips as 1 wide 64-bit chip. The 32-bit granularity has the advantage that you can access memory in smaller blocks, which won't make a difference in the best case, but can be much better in the worst case. The disadvantage is that it requires 1 external address/command bus per chip instead of 1 for 2 chips: more pins and more wires on the PCB.

With R600, they seem to have reverted to the old way. Probably because of too many wires...

Most (all?) GDDR3 has a burst length of 4. With a 64-bit wide bus, that gives an access granularity of 4*64/8 = 32 bytes: even if you need to read or write only 1 byte, you *have* to read or write all of them. A big waste of bandwidth. GDDR4 has a burst length of 8, which makes the worst case twice as bad as for GDDR3.

BL is a problem with all current DRAMs, so it's definitely not restricted to GPUs alone.

(Going from BL4 to BL8 is the main reason why GDDR4 can more easily be clocked higher, BTW.)

Bjorn
02-May-2007, 07:44
Because they're filling in the gap with the XTX?

If the XT performs like the 8800 GTX or even Ultra then AMD would have no problem selling the XTX for > 600$

Reputator
02-May-2007, 07:57
With R600, they seem to have reverted to the old way. Probably because of too many wires...

Most (all?) GDDR3 has a burst length of 4. With a 64-bit wide bus, that gives an access granularity of 4*64/8 = 32 bytes: even if you need to read or write only 1 byte, you *have* to read or write all of them. A big waste of bandwidth. GDDR4 has a burst length of 8, which makes the worst case twice as bad as for GDDR3.

BL is a problem with all current DRAMs, so it's definitely not restricted to GPUs alone.Does the G80 also use 64-bit channels? I'm guessing so since it loses that much going from the GTX to the GTS.

Rangers
02-May-2007, 08:12
If the XT performs like the 8800 GTX or even Ultra then AMD would have no problem selling the XTX for > 600$

Sure they would. That market is probably already pretty saturated since 8800GTX has been out many months. No reason why a equal part should fly off shelves for the same price.

$500 might make sense in that scenario, though...

BTW, about these upgraded TMU's, it seems they simply massively sped up the TMU's in regards to HDR? If that's the case how wise is that? It does seem most games will be HDR going forward.

Pressure
02-May-2007, 08:21
Sure they would. That market is probably already pretty saturated since 8800GTX has been out many months. No reason why a equal part should fly off shelves for the same price.

$500 might make sense in that scenario, though...


Not everyone in the world upgraded to the Geforce 8800GTS/GTX series of cards.

There will be plenty of options for this card to sell. People have been waiting and large OEMs are buying (e.g like Apple).

Rangers
02-May-2007, 08:37
Not everyone in the world upgraded to the Geforce 8800GTS/GTX series of cards.

There will be plenty of options for this card to sell. People have been waiting and large OEMs are buying (e.g like Apple).

It would sell some, but there is no reason for it to sell a lot if it's the same perfomance and ">$600" as the OP suggested.

The people that want the uber high end cards tend to buy the latest and greatest anyway, and they've had six months to buy 8800GTX. Maybe the die hard ATI hold-outs..but what is that, 10%?

It would have pretty limited share..it has to offer something better, not par, especially after being six months late..

BTW, I'm not saying this means it's performance>8800GTX..it probably doesn't.

mao5
02-May-2007, 09:20
Stalker Test: (default E6600 default R600XT)

1280*1024 HIGH SETTINGS HDR on

from the first step out of cellar, until reach the village

2007-05-02 12:04:07 - XR_3DA
Frames: 7027 - Time: 35704ms - Avg: 196.812 - Min: 93 - Max: 333

_xxx_
02-May-2007, 09:45
Not everyone in the world upgraded to the Geforce 8800GTS/GTX series of cards.

Sure, most of those won't update to R600 either.

There will be plenty of options for this card to sell. People have been waiting and large OEMs are buying (e.g like Apple).

Not so sure, the card I'm waiting for is single slot and doesn't require a new PSU for example. I also know many people like me, FWIW (HTPC users mostly).

So where is that "upper midrange" single slot card? I see none coming, from both camps we get double-slot heaters or anemic low-end would-be 3D cards but nothing like the 6800GT or the X1800GTO in sight yet.

Rangers
02-May-2007, 09:46
Hmm according to Firing Squad (http://www.firingsquad.com/hardware/stalker_mainstream_3d_performance/images/stalker1280.gif) a 8800 GTS scores about a third of that.

There's a million variables of course, but just wondering if the uber HDR TMU's are responsible for the high score?

Also looks like FS test was probably more demanding

Basically our test involves running down a set path where the environment goes from a densely vegetated forest (which hurts frame rate) into a more open environment.

Anyways, just throwing things out there, no real reason.

My current theory is R600 might perform really well on HDR games. Of course, as usual, I dont know what I'm talking about though..

In fact I guess it's not even a theory, AMD's own slides pretty much confirm it.

Galduta
02-May-2007, 09:53
Stalker Test: (default E6600 default R600XT)

1280*1024 HIGH SETTINGS HDR on

from the first step out of cellar, until reach the village

2007-05-02 12:04:07 - XR_3DA
Frames: 7027 - Time: 35704ms - Avg: 196.812 - Min: 93 - Max: 333

Ohhh yeah , a nice DX 8 bench .:mrgreen:

doob
02-May-2007, 09:57
Stalker Test: (default E6600 default R600XT)

1280*1024 HIGH SETTINGS HDR on

from the first step out of cellar, until reach the village

2007-05-02 12:04:07 - XR_3DA
Frames: 7027 - Time: 35704ms - Avg: 196.812 - Min: 93 - Max: 333

Does that "Hight Settings" includes at least 4xAA and 8xAF? (never played Stalker so i don't know how's the in-game settings panel)

edit: i doubt the game would allow to turn on hdr on in a DX8 codepath, unless they use int8 precision....and that wouldnt look good, still those fps are unexpectedly high to me.

Galduta
02-May-2007, 10:13
This framerate is in DX 8,, is imposible this framerate in DX9 with dinamic lighting,

Hanners
02-May-2007, 10:14
So do we get a NDA lift tomorrow or is it delayed to the 14th also?

Launch day is the 14th.

_xxx_
02-May-2007, 10:24
Stalker Test: (default E6600 default R600XT)

1280*1024 HIGH SETTINGS HDR on

from the first step out of cellar, until reach the village

2007-05-02 12:04:07 - XR_3DA
Frames: 7027 - Time: 35704ms - Avg: 196.812 - Min: 93 - Max: 333

Min 93? Sounds fishy to me.

Kaotik
02-May-2007, 10:39
http://www.hisdigital.com/html/product_ov.php?id=304&view=yes
Was that one posted already? HIS has 2900XT on their site now

neliz
02-May-2007, 10:44
Launch day is the 14th.

That's in the PD now ;)

nicolasb
02-May-2007, 10:45
Not sure if this has been lined to yet, but there's a whole bunch of presentation slides here:

http://forums.vr-zone.com/showthread.php?t=148492

neliz
02-May-2007, 10:46
http://www.hisdigital.com/html/product_ov.php?id=304&view=yes
Was that one posted already? HIS has 2900XT on their site now

the box was shown yesterday...

# Free Game Redemption Coupon
# Valve Black Box Game Coupon
----

Half-Life 2: Episode Two is the second expansion pack for Half-Life 2, and will include an additional single-player game called Portal and Team Fortress 2 in the multiplayer component.

Kaotik
02-May-2007, 10:48
Not sure if this has been lined to yet, but there's a whole bunch of presentation slides here:

http://forums.vr-zone.com/showthread.php?t=148492

Nope, that link wasn't here yet, but the same slides were

nicolasb
02-May-2007, 10:54
Nope, that link wasn't here yet, but the same slides wereI know some of them have come up before, I wasn't sure if they all had. Oh well. :cry:

Skinner
02-May-2007, 11:06
Stalker Test: (default E6600 default R600XT)

1280*1024 HIGH SETTINGS HDR on

from the first step out of cellar, until reach the village

2007-05-02 12:04:07 - XR_3DA
Frames: 7027 - Time: 35704ms - Avg: 196.812 - Min: 93 - Max: 333

Please tell me this is with full dynamic lighting on :D
If this is true, I open a bottle of champagne.

Can you ask to test it in 1600x1200 everything on? (ingame aa, full lighting)

AnarchX
02-May-2007, 11:13
HD 2900XT @ STALKER:

HD 2900XT:
http://img443.imageshack.us/img443/8630/200705029e34f00a8b1f557mx3.th.jpg (http://img443.imageshack.us/my.php?image=200705029e34f00a8b1f557mx3.jpg)
42 FPS

other cards in same setting:
http://img443.imageshack.us/img443/1458/44888hc7.gif
http://www.digital-daily.com/video/stalker_test/


source (http://chiphell.com/viewthread.php?tid=3407&extra=page%3D1&page=25)

Suflex
02-May-2007, 11:20
^^
How come the performance of the GTX isn't affected by resolution at all?

aeryon
02-May-2007, 11:22
HD 2900XT @ STALKER:

HD 2900XT:
http://img443.imageshack.us/img443/8630/200705029e34f00a8b1f557mx3.th.jpg (http://img443.imageshack.us/my.php?image=200705029e34f00a8b1f557mx3.jpg)
42 FPS

other cards in same setting:
http://img443.imageshack.us/img443/1458/44888hc7.gif
http://www.digital-daily.com/video/stalker_test/


source (http://chiphell.com/viewthread.php?tid=3407&extra=page%3D1&page=25)

from source : We ran the tests using the ForceWare 93.71, 97.92

can we stop the FUD with all these numbers/benchs meaning nothing because of 7 months old nv drivers and different test system/bench procedures ? :???:

AnarchX
02-May-2007, 11:25
can we stop the FUD with all these numbers/benchs meaning nothing because of 7 months old nv drivers and different test system/bench procedures ? :???:

Maybe someone can test a GeForce8800 with E6600, so it would more compareable.

neliz
02-May-2007, 11:26
can we stop the FUD with all these numbers/benchs meaning nothing because of 7 months old nv drivers and different test system/bench procedures ? :???:

If you can ask nvidia politely to bring out some whql drivers for the 7 series that are fresher than seven months.. be my guest!

vertex_shader
02-May-2007, 11:28
:)

-> PCINLIFE -> 3DCenter -> B3D
http://img442.imageshack.us/img442/1770/attachment11fa6and3.jpg

:P

Not much AMD AIB left, looks like MSI gone.

aeryon
02-May-2007, 11:28
Maybe someone can test a GeForce8800 with E6600, so it would more compareable.

without same test system, same bench procedure (scene), I will still be useless. and even more useless if difference between GTX and XT is less than 10% average as many people think now...

vertex_shader
02-May-2007, 11:31
HD 2900XT @ STALKER:

HD 2900XT:
http://img443.imageshack.us/img443/8630/200705029e34f00a8b1f557mx3.th.jpg (http://img443.imageshack.us/my.php?image=200705029e34f00a8b1f557mx3.jpg)
42 FPS

Looks like this is the new trend, single picture with frame rates, its tell nothing about average ingame performance. :roll:

aeryon
02-May-2007, 11:31
If you can ask nvidia politely to bring out some whql drivers for the 7 series that are fresher than seven months.. be my guest!
I was thinking XT is competitor to G80. or maybe you think it will be in front of G71. so what's the point ? :roll:

Unknown Soldier
02-May-2007, 11:32
http://www.hisdigital.com/html/product_ov.php?id=304&view=yes
Was that one posted already? HIS has 2900XT on their site now

Oh, that's gonna piss reviewers off(and ATI?) since the NDA expires only on 14th.

Still, nice to officially get some specifications.

US

Nebula
02-May-2007, 11:38
Looks like this is the new trend, single picture with frame rates, its tell nothing about average ingame performance. :roll:

Yeah I don't get it, what's the point of showing a screenshot with fps number (to claim 2900XT better) if there is not a screenshot of the same place with an Nvidia card under the same conditions. And these can't be compared to other benchmarks as they have different procedures and timedemos.

Unknown Soldier
02-May-2007, 11:39
Any proof or just rumors that EP2 and TF2 will be bundled?

PC gamers will also get the Black Box, which will include Half-Life 2: Episode Two, Team Fortress 2, and Portal. It, too, will be available this fall. Neither boxed set has "Half-Life 2" in their official title, as they are "game collections," according to EA. "The Black Box and The Orange Box represent a new approach to publishing multiple products on multiple platforms," Newell said in a statement.

http://www.gamespot.com/pc/action/halflife2episode2/news.html?sid=6165538

Google is your friend. ;)

US

Kaotik
02-May-2007, 11:48
Not much AMD AIB left, looks like MSI gone.

Some, not all, it says.

Galduta
02-May-2007, 11:50
C2 2750 mhz , 8800 GTX all in top , AF 16Q


http://img149.imageshack.us/img149/1762/xr3da2007050212474871ps8.th.jpg (http://img149.imageshack.us/my.php?image=xr3da2007050212474871ps8.jpg)

HD 2900XT @ STALKER:

HD 2900XT:
http://img443.imageshack.us/img443/8630/200705029e34f00a8b1f557mx3.th.jpg (http://img443.imageshack.us/my.php?image=200705029e34f00a8b1f557mx3.jpg)

Kaotik
02-May-2007, 11:53
What's up with that R600 stalker shot being so washed up? :???:

Nebula
02-May-2007, 11:54
C2 2750 mhz , 8800 GTX all in top , AF 16Q


http://img149.imageshack.us/img149/1762/xr3da2007050212474871ps8.th.jpg (http://img149.imageshack.us/my.php?image=xr3da2007050212474871ps8.jpg)

HD 2900XT @ STALKER:

HD 2900XT:
http://img443.imageshack.us/img443/8630/200705029e34f00a8b1f557mx3.th.jpg (http://img443.imageshack.us/my.php?image=200705029e34f00a8b1f557mx3.jpg)

Interesting, the 8800GTX ss has better image IQ, the fire has black borders (like it is not translucent) on 2900XT. Also the 8800GTX ss seems sharper even though the imagefile is much more compressed. :???:

Dalton Sleeper
02-May-2007, 11:56
It's the weather that makes the R600 brighter, right?

Galduta
02-May-2007, 11:58
What's up with that R600 stalker shot being so washed up? :???:

The gamma or the brightness ? and the weather ? - dust in that site rises from time to time

pd/ STALKER have a offiicial bench

http://www.xtremesystems.org/forums/showthread.php?t=138452

memberSince97
02-May-2007, 12:00
The bottom shot is static lighting...

Skinner
02-May-2007, 12:06
The bottom shot is static lighting...

Nope, then you see a different groundtexture I think.

BTW Stalker is very dynamic in NPC's, weather etc., you will defenetly want a bench to compare it right.

Thanx anyway.

Galduta
02-May-2007, 12:08
The two pictures have grass shadows, softshadows , I think . Iis a DX 9 render feature

neliz
02-May-2007, 12:21
or maybe you think it will be in front of G71. so what's the point ? :roll:

There are no newer drivers for a couple of cards in that bench. especially at the time of that bench.
Besides, the x2900's tested now aren't tested with release drivers as well...

IbaneZ
02-May-2007, 12:44
http://www.hisdigital.com/html/product_ov.php?id=304&view=yes
Was that one posted already? HIS has 2900XT on their site now

That's sweet. :smile:

My two latest cards have been HIS, and I'll probably buy one this time too.

VIVO Cable
HDTV Output cable
DVI to VGA Dongle x 2
DVI to HDMI Dongle x 1
Crossfire™ Cable x 1

Nothing about a 8 pin "converter" though. I hope they'll include one.

fallguy
02-May-2007, 13:00
http://www.gamespot.com/pc/action/halflife2episode2/news.html?sid=6165538

Google is your friend. ;)

US

I know what the black box is, but there was no confirmation till just late last night, that it came with the 2900. Its now on HIS' site. After I posted..

Interested in the 5.1 audio though. I dont understand how they have will audio, with a DVI to HDMI adapter.

neliz
02-May-2007, 13:02
I know what the black box is, but there was no confirmation till just late last night, that it came with the 2900. Its now on HIS' site. After I posted..

Interested in the 5.1 audio though. I dont understand how they have will audio, with a DVI to HDMI adapter.

audio on the hdmi connector. there is more than enough bandwidth for video + audio on the dvi connector used.

Dalton Sleeper
02-May-2007, 14:05
Fudzilla wrote that 2900XT in CrossFire scores 13100 in 3DMark06, I would expect more, alot more.

nicolasb
02-May-2007, 14:30
audio on the hdmi connector. there is more than enough bandwidth for video + audio on the dvi connector used.Yes, but there won't be an HDMI connector. There will be a DVI connector with a dongle. And DVI doesn't carry audio. How does the audio signal manage to get as far as the dongle?

fallguy
02-May-2007, 14:34
Yes, that is my question. Id rather just have a HDMI connection already. I bought a DVI>HDMI adapter years ago, hurry up an adopt it.

vertex_shader
02-May-2007, 14:37
Fudzilla wrote that 2900XT in CrossFire scores 13100 in 3DMark06, I would expect more, alot more.

1600x1200 8xmsaa? :wink:

Arnold Beckenbauer
02-May-2007, 14:38
Yes, but there won't be an HDMI connector. There will be a DVI connector with a dongle. And DVI doesn't carry audio. How does the audio signal manage to get as far as the dongle?

It works because HDMI is backwards compatible to DVI, and the graphics card can query device capabillities. So the link will start in "DVI compatibility mode", and when a HDMI capable device is detected the graphics card will add audio output (which is interleaved in the digital stream, so no additional pins are required).

Using a DVI-I connector on the card is the best choice because it can drive DVI single and dual link, HDMI single and dual link, and analog VGA.
:smile:

Chalnoth
02-May-2007, 14:44
Interesting, the 8800GTX ss has better image IQ, the fire has black borders (like it is not translucent) on 2900XT. Also the 8800GTX ss seems sharper even though the imagefile is much more compressed. :???:
Well, I haven't played Stalker at all, but from playing Oblivion, that difference really looks like the difference between HDR and no HDR.

neliz
02-May-2007, 14:47
Edit: whatever arnold said! :)

Jawed
02-May-2007, 14:50
Excellent slide, Topman!

Ouch, it looks like they abandoned 32-bit MCs! That would go a long way explaining the horrible GDDR4 XTX benchmark numbers of DT. I suppose they had no choice with a 512-bit bus?
Return to 64-bit channels is definitely very curious... I presume this means that each pair of chips in a channel is spread over both sides of the board, i.e. back-to-back, instead of sat side-by-side. I say this because the 3-3-2 arrangement looks awkward for 2 (out of the three) side-by-side pairings.

As to explaining the GDDR4 XTX numbers, weren't there rumours of a broken memory controller? If there are no actual XTX boards in the wild now, and all that's out there are old, partly broken prototypes, then...

Jawed

Jawed
02-May-2007, 15:01
http://resources.vr-zone.com/newzhunter/HD2900-10.jpg

I can now make out what's going on with the texture unit, working down through it, left-to-right:

4x TA unit for vertex or texture fetch (prolly unable to do LOD/Bias, filter-specific calculations?)
4x fp32 vertex fetch or fp32 texel fetch
4x TF units
16x fp32 texel fetch (might be usable for vertex fetches?)
4x TA units for filtered texture addressing (cannot be used for vertex fetch addressing)Hope that's right...

Jawed

Jawed
02-May-2007, 15:05
There should only be a few clock cycles delay between entering and leaving the crossbar itself. So in that case, assuming no stalls, the difference is entirely in the propagation delay. Worst case, that would be double for a ring? In practice, I expect it to be less. But even if it's only, say, 30% more instead of 100%, that's a lot of additional buffering.
OK, for the worst case, it would seem we're talking about 40 clocks + 200 for DDR = 240 versus 280 in total for the ring bus.

Is that reasonable?

Jawed

chavvdarrr
02-May-2007, 15:09
I can now make out what's going on with the texture unit, working down through it, left-to-right:

4x TA unit for vertex or texture fetch (prolly unable to do LOD/Bias, filter-specific calculations?)
4x fp32 vertex fetch or fp32 texel fetch
4x TF units
16x fp32 texel fetch (might be usable for vertex fetches?)
4x TA units for filtered texture addressing (cannot be used for vertex fetch addressing)Hope that's right...

Jawed
And would you be so kind to compare vs G80 texturing power?

fellix
02-May-2007, 15:10
Does that mean, that any order of vertex fetched texture filtering will be done through a Geometry Shader run code?
Much like the lack of FP16 box filtering in R5x0, were custom one have to be coded, then.

Sound_Card
02-May-2007, 15:12
And would you be so kind to compare vs G80 texturing power?

At the the same time, R580 too.

silent_guy
02-May-2007, 15:51
OK, for the worst case, it would seem we're talking about 40 clocks + 200 for DDR = 240 versus 280 in total for the ring bus.

Is that reasonable?
You mean 40/80 cycles for propagation across the die and for all the MC related scheduling? According to Bob, the 40 would be too low, right? Where does the 200 come from?

Anyway, if your numbers are reasonable, it's a 16% buffering penalty just for transport.

Jawed
02-May-2007, 15:57
Does that mean, that any order of vertex fetched texture filtering will be done through a Geometry Shader run code?
Much like the lack of FP16 box filtering in R5x0, were custom one have to be coded, then.
Being a unified architecture, all kinds of fetch/filter are available to all kinds of shader.

So geometry or vertex or pixel shaders can perform vertex fetches or texture fetches or bilinearly (or better) filtered fetches. On Int8, fp16 or fp32 data formats.

I'm heading out now...

Jawed

CJ
02-May-2007, 15:58
That's sweet. :smile:

My two latest cards have been HIS, and I'll probably buy one this time too.


I've been told that HIS is still playing with final clockspeeds. It looks like they're going to clock their HD2900XT a bit higher than reference clocks.

R600 is in full mass production at the moment and retailers should all have their cards at the 14th of May in time for the launch.

trinibwoy
02-May-2007, 16:07
Fudzilla wrote that 2900XT in CrossFire scores 13100 in 3DMark06, I would expect more, alot more.

The 8800 Ultra doesn't score much more than that on an X6800. Once you get in that range it's all about the CPU.

Robin B
02-May-2007, 16:24
I've been told that HIS is still playing with final clockspeeds. It looks like they're going to clock their HD2900XT a bit higher than reference clocks.

R600 is in full mass production at the moment and retailers should all have their cards at the 14th of May in time for the launch.
Could it be that Ati wants to clock it to 750 just to keep it at 225w ? Not many out there with a 8 pin psu.

Geeforcer
02-May-2007, 16:51
You know, I for one am getting really tired of the whole "Here some scores with no specks or settings provided" or my new favorite "Here is one screenshot with FRAPS on". IT. IS. SO. POINTLESS.

fellix
02-May-2007, 16:56
Well, the NDA is still too heavy to lift. :lol:

All I want now is a [very] high-detail shot of the R600 die. Sadly, don't have one of G80. :???:

Arty
02-May-2007, 17:13
Well, the NDA is still too heavy to lift. :lol:

All I want now is a [very] high-detail shot of the R600 die. Sadly, don't have one of G80. :???:
http://img112.imageshack.us/img112/1097/g80va8.jpg

And if you need the die-pin shot, its here (http://forum.beyond3d.com/showpost.php?p=885238&postcount=1221).

dnavas
02-May-2007, 17:18
Being a unified architecture, all kinds of fetch/filter are available to all kinds of shader.

So geometry or vertex or pixel shaders can perform vertex fetches or texture fetches or bilinearly (or better) filtered fetches. On Int8, fp16 or fp32 data formats.


:lol: I noticed this divergence awhile ago, but it struck me as being exactly the kind of architecture you might expect if you purged your shader team of everyone who didn't buy the unified approach and put them on the texture unit team. It doesn't seem unified at all. The shader units are unified, but not the texture units.

You have single-channel dedicated addressing units, single-channel dedicated samplers, multi-channel dedicated addressing units, and multi-channel samplers, which are effectively just four-wide single-channel samplers, but, err, they're "dedicated". I'm not really familiar with the complexities of addressing units, but, there doesn't seem much more complexity to addressing a vector of data, so the dedication of four of those addressing units is puzzling as well.

I am sure I do not understand the whys of this particular architecture, so, I will be interested to hear more about it.

As for a comparison, iirc, each of 8800's 8 clusters have 4 TAs/quad-channel samplers and 8 quad-channel filtering units. I don't recall the fp16 speed on them, though the 8800's clock is obviously slower than R600's.

Remember that the 8800 was a surprising texture monster....

Sound_Card
02-May-2007, 17:25
Can someone correct me if I'm wrong here?

Texture adress units: G80 = 32 R600 = 32

Texture filtering units: G80 = 64 R600 = 16

Right?


So can someone tell me how many texture samplers G80 has?

vertex_shader
02-May-2007, 17:29
HIS webpage down, looks like AMD kicked HIS butt :smile:

mao5
02-May-2007, 17:34
Ohhh yeah , a nice DX 8 bench .:mrgreen:

what a nice smile, does DX8 support FP16 HDR? http://69.93.88.162/forum/images/smiles/rofl.gif

fellix
02-May-2007, 17:38
HIS webpage down, looks like AMD kicked HIS butt :smile:
They won't dare -- valuable AIBs are too scarce those days. :lol:


Can someone correct me if I'm wrong here?
From R600 Texturing slide:
"Can bilinear filter one 64-bit value per clock"

That should mean one FP16 component per cycle, or dual INT8 bilerps, as in G80.

vertex_shader
02-May-2007, 17:43
They won't dare -- valuable AIBs those days are too scarce. :lol:

Still weird its down :wink:

Skinner
02-May-2007, 17:44
HIS webpage down, looks like AMD kicked HIS butt :smile:

Hehe :D

dnavas
02-May-2007, 17:45
So can someone tell me how many texture samplers G80 has?

I don't think it was broken out like that. Presumably each of the addressing units can sample a four-channel texture. So, 128 vs. 80. A GTX would have roughly the sampling power of a 920Mhz R600. Except, of course, that the comparison is false, as R600 has dedicated units and G80 does not. Presumably one quad-channel G80 sampler manages to be less flexible than four single-channel R600 samplers (each of the R600 samplers can come from distinct locations). A comparison by numbers absent discussion of workload would be ... difficult.

Edit: I should probably point out that what seems ideal to me would be a set of address units, and a set of component samplers and filters, which could be dynamically assigned/allocated. The cost/benefit ratio isn't at all clear, though. It also makes me wonder how much of that filter component is a lerp, while I eye G80's SFU in what might potentially be a criminal manner :)

leoneazzurro
02-May-2007, 17:57
Can someone correct me if I'm wrong here?

Texture adress units: G80 = 32 R600 = 32

Texture filtering units: G80 = 64 R600 = 16

Right?


So can someone tell me how many texture samplers G80 has?

It has 64 texture samplers, if I remember well

Farhan
02-May-2007, 18:08
They won't dare -- valuable AIBs are too scarce those days. :lol:



From R600 Texturing slide:
"Can bilinear filter one 64-bit value per clock"

That should mean one FP16 component per cycle, or dual INT8 bilerps, as in G80.

They are not necessarily reconfigurable like G80's filtering units. I think they would have made a bigger fuss about it if they could actually do that. G80's filtering hardware is nicely done, IMO.

nAo
02-May-2007, 18:13
A GTX would have roughly the sampling power of a 920Mhz R600. Except, of course, that the comparison is false, as R600 has dedicated units and G80 does not.

?????

dnavas
02-May-2007, 18:17
?????

See my earlier post. Just search for "dedicated" :)

One item that hasn't been addressed is whether each texture unit is assigned to only one of the R600's shader arrays or not. [In that sense, G80 has 'dedicated' texture units, but that's not what I was speaking to....]

mao5
02-May-2007, 18:25
to someone can not afford the truth's strike:http://69.93.88.162/forum/images/smiles/rofl.gif

http://www.3dnews.ru/_imgdata/img/2007/03/30/44883.gif
http://www.chiphell.com/attachments/month_0705/20070502_f0ccec8628c50560f57aYclLYYLN4ngC.jpg
http://www.chiphell.com/attachments/month_0705/20070502_1317863004ad8a1a5d5aLbZd4FUZu8iq.jpg
http://www.chiphell.com/attachments/month_0705/20070502_4d1abfef69420e8d9c7cNvwPThwHVaVn.jpg

source:
http://www.chiphell.com/viewthread.php?tid=3407&extra=page%3D1&page=25

ChrisRay
02-May-2007, 18:30
You know, I for one am getting really tired of the whole "Here some scores with no specks or settings provided" or my new favorite "Here is one screenshot with FRAPS on". IT. IS. SO. POINTLESS.'

I for one have to agree. It makes following this thread and treading through alot of the BS even harder.

mao5
02-May-2007, 18:39
when you provdie a screenshoot of R600XT with fps on it, someone here say: "hey, tester, one pic means meaningless. we want to see avg fps."

ok, when you provide a avg fps using fraps with FP16 HDR on, then the guy said:" What a magnificent dx8 speed."

Does DX8 support FP16 HDR? fine, I will not provide any 2900XT realgame test score anymore. I don't deserve this attitude.

Geeforcer
02-May-2007, 18:44
Mao, in order to compare two cards you need to have the same system, the same same settings and the same test. That's the only problem, but it has been persistent throughout this thread. Just because you use the same game (let's say, STALKER), it doesn't mean you can compare results from one source to another.

Love_In_Rio
02-May-2007, 18:47
Mao, in order to compare two cards you need to have the same system, the same same settings and the same test. That's the only problem, but it has been persistent throughout this thread. Just because you use the same game (let's say, STALKER), it doesn't mean you can compare results from one source to another.

but what mao5 provides us is by far better than nothing.

Arty
02-May-2007, 18:48
when you provdie a screenshoot of R600XT with fps on it, someone here say: "hey, tester, one pic means meaningless. we want to see avg fps."

ok, when you provide a avg fps using fraps with FP16 HDR on, then the guy said:" What a magnificent dx8 speed."

Does DX8 support FP16 HDR? fine, I will not provide any 2900XT realgame test score anymore. I don't deserve this attitude.
Dont get offended by some smarty pants, with comments like those all they are doing is making fool of themselves. ;)

flopper
02-May-2007, 19:08
Mao, in order to compare two cards you need to have the same system, the same same settings and the same test. That's the only problem, but it has been persistent throughout this thread. Just because you use the same game (let's say, STALKER), it doesn't mean you can compare results from one source to another.

Its good science, variables can change from one system to another, easier to just replace cards, run the same game with the same configuration.
Then there are no question about how the cards perform.

What i can conclude from the numbers you brought mao5 that the card do well.

Nebula
02-May-2007, 19:16
but what mao5 provides us is by far better than nothing.

Better than nothing but useless when compared to other test done in a different way (timedemo/benchmark instead of a single frame with fps number for that instance).
It's still something but it has to be seen for what it is, a screenshot with fps counter and not a timedemo/benchmark (a true benchmark that is!).

And could we please stop with all the strange/abuse of smilies becouse this thread is starting to look like a GAF thread. :smile:

vertex_shader
02-May-2007, 19:23
FruitZilla (http://www.fudzilla.com/index.php?option=com_content&task=view&id=788&Itemid=1) think the hd2900xt default consume more than 240watt power :smile:

Natoma
02-May-2007, 19:25
FruitZilla (http://www.fudzilla.com/index.php?option=com_content&task=view&id=788&Itemid=1) think the hd2900xt default consume more than 240watt power :smile:

Good for him. How about linking to someone who has a shred of credibility....

vertex_shader
02-May-2007, 19:27
Good for him. How about linking to someone who has a shred of credibility....

Some fun needed in this topic too sometimes :wink:

Natoma
02-May-2007, 19:29
Some fun needed in this topic too sometimes :wink:

After a cumulative 500+ pages of R600 discussion, I just want relevant information at this point. THAT is fun. Fudo is a waste of time. :wink:

jimmyjames123
02-May-2007, 19:29
FruitZilla (http://www.fudzilla.com/index.php?option=com_content&task=view&id=788&Itemid=1) think the hd2900xt default consume more than 240watt power :smile:

I wonder if Fuad took a look at this graph from NVIDIA:

http://enthusiast.hardocp.com/image.html?image=MTE3ODA2MTU5OGRmUGVmSGdkUU1fMV8xX 2wuanBn

No doubt, the one huge drawback for the 2900 XT crossfire system is that power requirements will be absolutely huge, even though performance should be very good. An 8800 GTS SLI system on the other hand would have much more modest power requirements, but the Vista SLI drivers need major work.

On a side note, glad to see that this graph did not start at 147w, and then scale to 300w from there :D

Geeforcer
02-May-2007, 19:39
I wonder if Fuad took a look at this graph from NVIDIA:

http://enthusiast.hardocp.com/image.html?image=MTE3ODA2MTU5OGRmUGVmSGdkUU1fMV8xX 2wuanBn

No doubt, the one huge drawback for the 2900 XT crossfire system is that power requirements will be absolutely huge, even though performance should be very good. An 8800 GTS SLI system on the other hand would have much more modest power requirements, but the Vista SLI drivers need major work.

On a side note, glad to see that this graph did not start at 147w, and then scale to 300w from there :D

If 240W is true, then the cooler design must be REALLY good in order to move all that heat and still ensure quiet operations.

It would also leave open the possibility that R600 is capable of much higher frequency and the power consumption is the major determining factor in default clocks.

Kaotik
02-May-2007, 19:39
If 240W is true, then the cooler design must be REALLY good in order to move all that heat and still ensure quiet operations.

It would also leave open the possibility that R600 is capable of much higher frequency and the power consumption is the major determining factor in default clocks.

It's simple, it can NOT be true due the fact it works with 2x6pin plugs

flippin_waffles
02-May-2007, 19:43
I wonder if Fuad took a look at this graph from NVIDIA:

http://enthusiast.hardocp.com/image.html?image=MTE3ODA2MTU5OGRmUGVmSGdkUU1fMV8xX 2wuanBn

No doubt, the one huge drawback for the 2900 XT crossfire system is that power requirements will be absolutely huge, even though performance should be very good. An 8800 GTS SLI system on the other hand would have much more modest power requirements, but the Vista SLI drivers need major work.

On a side note, glad to see that this graph did not start at 147w, and then scale to 300w from there :D

And then he writes...

The quoted power consumption for a GeForce 8800 GTX is 177 Watts. For the new GeForce 8800 Ultra, running at higher clock speeds, the quoted power consumption is 175 Watts. We will do our own testing to see how the Ultra compares to a GTX later in this evaluation. I would suggest that the R600 power comparisons be ignored as they are verified as not being correct or at least all the information about them is not disclosed on this slide. We will have our own R600 power comparison numbers here in a couple of weeks

So why the heck even include the numbers in the graph, if they know them not to be accurate.....:idea:

[edit]

My bad. I thought they were [H]'s slides. :(

marty101
02-May-2007, 19:44
to someone can not afford the truth's strike:http://69.93.88.162/forum/images/smiles/rofl.gif

http://www.3dnews.ru/_imgdata/img/2007/03/30/44883.gif
http://www.chiphell.com/attachments/month_0705/20070502_f0ccec8628c50560f57aYclLYYLN4ngC.jpg
http://www.chiphell.com/attachments/month_0705/20070502_1317863004ad8a1a5d5aLbZd4FUZu8iq.jpg
http://www.chiphell.com/attachments/month_0705/20070502_4d1abfef69420e8d9c7cNvwPThwHVaVn.jpg

source:
http://www.chiphell.com/viewthread.php?tid=3407&extra=page%3D1&page=25


Accepting the "one screen shot does not a test make" argument completely, and for what it is worth (not a lot probably)

at the same point with the same settings, except at 1680x1050 res, using

stock 8800 GTX with 158.18
athlon 4800+ x2 @2.55
2gb ddr400

43fps...

Arty
02-May-2007, 19:46
I wonder if Fuad took a look at this graph from NVIDIA:

http://enthusiast.hardocp.com/image.html?image=MTE3ODA2MTU5OGRmUGVmSGdkUU1fMV8xX 2wuanBn

No doubt, the one huge drawback for the 2900 XT crossfire system is that power requirements will be absolutely huge, even though performance should be very good. An 8800 GTS SLI system on the other hand would have much more modest power requirements, but the Vista SLI drivers need major work.

On a side note, glad to see that this graph did not start at 147w, and then scale to 300w from there :D
Case closed. :)

vertex_shader
02-May-2007, 19:47
I wonder if Fuad took a look at this graph from NVIDIA:

http://enthusiast.hardocp.com/image.html?image=MTE3ODA2MTU5OGRmUGVmSGdkUU1fMV8xX 2wuanBn

No doubt, the one huge drawback for the 2900 XT crossfire system is that power requirements will be absolutely huge, even though performance should be very good. An 8800 GTS SLI system on the other hand would have much more modest power requirements, but the Vista SLI drivers need major work.

On a side note, glad to see that this graph did not start at 147w, and then scale to 300w from there :D

That part is funny too where the official slide show the ultra use less power than the gtx, reviews say different things Link (http://www.hardwarezone.com/articles/view.php?id=2251&cid=3&pg=9) Link (http://www.hexus.net/content/item.php?item=8593&page=9) Link (http://www.pcper.com/article.php?aid=402&type=expert&pid=13) Link (http://www.techreport.com/reviews/2007q2/geforce-8800ultra/index.x?pg=8)

R300King!
02-May-2007, 19:47
Because they are BS, me thinks. More FUD. [H]ard to believe. ;)

neliz
02-May-2007, 19:49
So why the heck even include the numbers in the graph, if they know them not to be accurate.....:idea:

[edit]

My bad. I thought they were [H]'s slides. :(


They were... fudo just swallowed them, poo'd on his keyboard and then wrote that article about it....

Geeforcer
02-May-2007, 19:49
It's simple, it can NOT be true due the fact it works with 2x6pin plugs

I keep forgetting, what is PCI-E slot specked at, 75W? If so, then yeah, maximum power consuption could not exceed 225W.

Kaotik
02-May-2007, 19:53
I keep forgetting, what is PCI-E slot specked at, 75W? If so, then yeah, maximum power consuption could not exceed 225W.

75W slot and 75W per 6pin plug, that's correct (though I remember reading that the slot could actually provide 76W, but it might have been a typo and even if it wasn't, it's irrelevant)

Robin B
02-May-2007, 19:57
There is a reson that they have clocked the card at 750, and the rumors said it will run just fine with a 2x6 pin, that alone says it would use under 225w if i am not way of.

Geeforcer
02-May-2007, 20:00
Heh, yeah. So unless the figures are for an overclocked R600, or ATI has an on-chip power generator, it simply does not add up. Taken along with the fact that no website has been able to varify lower power consumtion by Ultra (did they mean "per clock" or something?) that chart is about as worthless as they come.

Sound_Card
02-May-2007, 20:08
I get a 404 error when trying to load the G80 arc thread. :!:

I would very much like from the guru's here if they can give me a breakdown of G80's texturing capabilties. Because I'm still in the dark...

So from what I can gather, G80 has 32 TAU's, 64 TFU's, and ?128? Texture samplers, and dedicated L1(how much L1?). Texture units clocked at 575mhz.

R600 has 32 TAU's, 16 TFU's, 80 texture samplers, dedicated L1(how much?), and shared L2(how much?). Texture units clocked at 750mhz.

Right? No?

Pete
02-May-2007, 20:09
Well, I haven't played Stalker at all, but from playing Oblivion, that difference really looks like the difference between HDR and no HDR.

The explanation is probably simpler. A histogram shows the R600 shot is clipped at both ends of the black/white range while the G80 is not. A quick play with lowered brightness ("-50" in ArcSoft PhotoStudio) and contrast ("-75") shows the G80 shot approaching the R600 in IQ and corresponding clipped histogram range (though G80 still shows more of a reddish hue). I'm guessing this is a simple case of a borked gamma setting.

But if a smaller HDR format (FP10, or that 9-9-9-5 format I think was in an R600 slide?) could produce the same results, then I guess we split the difference (MDR?).

Sound_Card, I think our front page (http://www.beyond3d.com/content/reviews/1/9) still exists. :razz: Now, to give you a chance to razz me. From what I understand of what Rys & co. have gathered, G80's texture samplers consist of 32 TAUs and 64 INT8/clk (FP16 half-speed) TFUs, divided into eight groups corresponding to 8 shader processor cores, with 8kiB L1 cache per group (so 64kiB L1, all told). I'm not clear on the terminology, though, as B3D refers to G80's TAUs and TFUs collectively as texture samplers, while R600 appears to have distinct samplers.

So, your basic #s seem right, except that as far as I can figure G80 doesn't seem to have discrete samplers. Dunno how this figures into DX10 constants. Speaking out of my opposite end, and without figuring how L1/2 caches and memory bus widths affect efficiency, it seems like (surprise!) G80's geared for more texture manipulation and R600's equipped for more shader massaging per clock.

Sound_Card
02-May-2007, 20:27
Sound_Card, I think our front page still . Now, to give you a chance to razz me. From what I understand of what Rys & co. have gathered, G80's texture samplers consist of 32 TAUs and 64 INT8/clk TFUs, divided into eight groups corresponding to 8 shader processor cores, with 8kiB L1 cache per group (so 64kiB L1, all told). I'm not clear on the terminology, though, as B3D refers to G80's TAUs and TFUs collectively as texture samplers, while R600 appears to have distinct samplers.


ahhh thank you so much!!! I see some light now..:shock:

Now I just need to know how much L1 and L2 R600 has. It would seem to me as well R600 has dedicated samplers and also has L2 cache.

R600's texturing capabilties should not be far off from G80 from what I can tell. Correct me if I'm wrong. R600 dedicated sampling units could be more effective, but that is unclear. Has the same amount of texture address units, but has 1/4 of the filtering units. Both GPU's have L1 cache but which one has more? R600 has L2(how much?), G80 does not. R600's texture units are clocked 175mhz higher than G80's.

Rys
02-May-2007, 20:35
I'm not clear on the terminology, though, as B3D refers to G80's TAUs and TFUs collectively as texture samplers, while R600 appears to have distinct samplers.
I tend to refer to the entire exercise of fetch and filtering as sampling, rather than just the fetch part.

Razor1
02-May-2007, 20:48
It's fair to say that the leaked R600 info you have seen has some validaty (yes, we'll have an article soon) in it and yes, obviously NVIDIA corporate is scratching their heads right now asking what the heck happened with R600?


Interesting quote from Guru 3d's g80 ultra article

leoneazzurro
02-May-2007, 20:50
I tend to refer to the entire exercise of fetch and filtering as sampling, rather than just the fetch part.

Just a question, being this not really clear in the info on G80 I saw in various reviews: in g80 architecture, it is possible to perform at the same time 64 basic Texture Filtering operations AND basic Texture fetch operations, or there are 64 total samplers that can perform texture fetch OR filtering operations up to the maximum of 64?

Geeforcer
02-May-2007, 21:03
Interesting quote from Guru 3d's g80 ultra article

It all makes sense. Mystified by R600, Nvidia could not sit tight and fall behind in the much-coveted "WTF is this???" category. Their retaliatory strike will no doubt be a huge success: having released a card no asked for, wanted, or will buy they left everyone and AMD scratching their heads, bellies, butts and other scratchable parts. David Orton was seen wandering the hallways barefoot, drooling and muttering "8800 Ultra.. does it make sense to you? No? I thought so.... But it is... its is... what does this mean....? What...? It makes no sense, but here it is.... It's a trick... trick I tell you!!! Why don't you listens? Will anyone listen? Snakes... snakes.. I am covered in snakes! Where am I?"

Bravo Nvidia. Bravo.

fellix
02-May-2007, 21:21
Duh! I sense a kind of preemptive "Emergency Edition" launch syndrome, here. :lol:
Damn those Megahurtz -- they've should put another set of 12 GDDR3 chippery on the back side of the board for holy-moly of 1536MB of memory. At least, that would justify the claimed $900+ tag for the most.

Rys
02-May-2007, 21:25
Just a question, being this not really clear in the info on G80 I saw in various reviews: in g80 architecture, it is possible to perform at the same time 64 basic Texture Filtering operations AND basic Texture fetch operations, or there are 64 total samplers that can perform texture fetch OR filtering operations up to the maximum of 64?
G80's sampler hardware can fetch and filter in the same clock, yes, and at full speed.

Sound_Card
02-May-2007, 21:25
It all makes sense. Mystified by R600, Nvidia could not sit tight and fall behind in the much-coveted "WTF is this???" category. Their retaliatory strike will no doubt be a huge success: having released a card no asked for, wanted, or will buy they left everyone and AMD scratching their heads, bellies, butts and other scratchable parts. David Orton was seen wandering the hallways barefoot, drooling and muttering "8800 Ultra.. does it make sense to you? No? I thought so.... But it is... its is... what does this mean....? What...? It makes no sense, but here it is.... It's a trick... trick I tell you!!! Why don't you listens? Will anyone listen? Snakes... snakes.. I am covered in snakes! Where am I?"

Bravo Nvidia. Bravo.


Ireland called...






They want their whisky back....




:razz:

compres
02-May-2007, 21:26
It all makes sense. Mystified by R600, Nvidia could not sit tight and fall behind in the much-coveted "WTF is this???" category. Their retaliatory strike will no doubt be a huge success: having released a card no asked for, wanted, or will buy they left everyone and AMD scratching their heads, bellies, butts and other scratchable parts. David Orton was seen wandering the hallways barefoot, drooling and muttering "8800 Ultra.. does it make sense to you? No? I thought so.... But it is... its is... what does this mean....? What...? It makes no sense, but here it is.... It's a trick... trick I tell you!!! Why don't you listens? Will anyone listen? Snakes... snakes.. I am covered in snakes! Where am I?"

Bravo Nvidia. Bravo.

I don't get it...

So you mean nVidia did good or bad releasing ultra today?

leoneazzurro
02-May-2007, 21:27
ahhh thank you so much!!! I see some light now..:shock:

Now I just need to know how much L1 and L2 R600 has. It would seem to me as well R600 has dedicated samplers and also has L2 cache.

R600's texturing capabilties should not be far off from G80 from what I can tell. Correct me if I'm wrong. R600 dedicated sampling units could be more effective, but that is unclear. Has the same amount of texture address units, but has 1/4 of the filtering units. Both GPU's have L1 cache but which one has more? R600 has L2(how much?), G80 does not. R600's texture units are clocked 175mhz higher than G80's.

I remember to have read L2 cache is 256 Kbyte for texture unit group, 1 Mbyte total for texture units.

leoneazzurro
02-May-2007, 21:28
G80's sampler hardware can fetch and filter in the same clock, yes, and at full speed.

Thank You very much. :)

Sound_Card
02-May-2007, 21:31
I remember to have read L2 cache is 256 Kbyte for texture unit group, 1 Mbyte total for texture units.

ahh thanks, its it me or is that a good amount of cache?

What about L1?

Khronus
02-May-2007, 21:31
I wonder if Fuad took a look at this graph from NVIDIA:

http://enthusiast.hardocp.com/image.html?image=MTE3ODA2MTU5OGRmUGVmSGdkUU1fMV8xX 2wuanBn

No doubt, the one huge drawback for the 2900 XT crossfire system is that power requirements will be absolutely huge, even though performance should be very good. An 8800 GTS SLI system on the other hand would have much more modest power requirements, but the Vista SLI drivers need major work.

On a side note, glad to see that this graph did not start at 147w, and then scale to 300w from there :D


Umm you guys did note the Nvidia logo in the top right corner of that pic for power consumption. I take it with a grain of salt.

leoneazzurro
02-May-2007, 21:32
ahh thanks, its it me or is that a good amount of cache?

What about L1?

I have no info for the moment, sorry :(.

nAo
02-May-2007, 21:37
I remember to have read L2 cache is 256 Kbyte for texture unit group, 1 Mbyte total for texture units.
1MB of L2 cache would be IMHO a complete waste of die area, at least for tipical applications (games), you can happily live with way less than that.

compres
02-May-2007, 21:41
1MB of L2 cache would be IMHO a complete waste of die area, at least for tipical applications (games), you can happily live with way less than that.

To be honest that sounds like a lot specially on a gpu, but maybe it has to do with the ring bus and it's latency. Perhaps a ring with more cache was better for them than a crossbar. Also cache takes less area per transistor than other functional units.

edit: spelling

Kaotik
02-May-2007, 21:42
Umm you guys did note the Nvidia logo in the top right corner of that pic for power consumption. I take it with a grain of salt.

The nVidia numbers with grain of salt perhaps, but the AMD/ATI numbers are known to be false already as stated on last page :wink:

leoneazzurro
02-May-2007, 21:44
1MB of L2 cache would be IMHO a complete waste of die area, at least for tipical applications (games), you can happily live with way less than that.

Waste or not, this is what I read :)

http://forum.beyond3d.com/showpost.php?p=978480&postcount=3635

flippin_waffles
02-May-2007, 21:48
Umm you guys did note the Nvidia logo in the top right corner of that pic for power consumption. I take it with a grain of salt.


Yeah, that's what I thought too. I'm curious where NV got the numbers from though! They must have a card then, no?

Unknown Soldier
02-May-2007, 21:52
It's simple, it can NOT be true due the fact it works with 2x6pin plugs

I seem to remember that it has 1x6pin; 1x8pin connectors and that if you don't use 6&8 then you can't overclock.

You can use 6&6 but cannot overclock.

US

Silent_Buddha
02-May-2007, 21:54
However, considering AMD is really focused on GPGPU is 1 meg L2 really wasted in that arena?

Is it possible that R600 is trying to be too many things in too many market segments?

After all it appears that AMD/ATI is trying to position it as a top graphics performer, a top physics processor, and a top GPGPU unit.

So is it possible they expended a lot of transistors on things that would benefit GPGPU greatly but have low to minimal impact on 3D rendering?

Regards,
SB

aeryon
02-May-2007, 21:57
I keep forgetting, what is PCI-E slot specked at, 75W? If so, then yeah, maximum power consuption could not exceed 225W.

R600 is PCI-E 2.0 compliant so it can take up to 130W from the slot

Russell
02-May-2007, 22:01
R600 is PCI-E 2.0 compliant so it can take up to 130W from the slot

Except that a) it's also PCI-E 1.1 compliant, as such can work on any current pci-e boards within their supplied power envelope; and b) there aren't exactly a lot of pci-e 2.0 motherboards around right now, so if R600 requires one to run they'll have to delay them another 6 months.

Has R600 and PCI-E 2.0 been verified? I recall it as a rumor, but I am uncertain if it ever moved past rumor status.

Geeforcer
02-May-2007, 22:11
I don't get it...

So you mean nVidia did good or bad releasing ultra today?

I think the tests speak for themselves:

http://forum.beyond3d.com/attachment.php?attachmentid=258&stc=1&d=1178140185

Fornowagain
02-May-2007, 22:19
What's the image? I can't see it for some reason.

Kaotik
02-May-2007, 22:21
I seem to remember that it has 1x6pin; 1x8pin connectors and that if you don't use 6&8 then you can't overclock.

You can use 6&6 but cannot overclock.

US

Even if you can't OC with 6+6, it still means that it's under 225W max since it works with 6+6, and there always has to be at least that couple watts of breathing space

Love_In_Rio
02-May-2007, 22:27
I think the tests speak for themselves:

http://forum.beyond3d.com/attachment.php?attachmentid=258&stc=1&d=1178140185

that test is biassed. didn´t take into account the king, kyro 3.

fellix
02-May-2007, 22:34
Wow, Parhelia just got Early-Z Test & Occlusion?... :lol:
I miss that chippery -- the mosy unique DX8-plus-some-DX9 combo, right before NV30. :lol:

compres
02-May-2007, 22:34
I think the tests speak for themselves:

http://forum.beyond3d.com/attachment.php?attachmentid=258&stc=1&d=1178140185

All is clear now...:lol:

neliz
02-May-2007, 22:38
I think the tests speak for themselves:

The only reason is ... because they can ..

hoom
02-May-2007, 22:39
Awesome :lol:

nutball
02-May-2007, 22:41
I demand one-frame-wonder FRAPshots of STALKER running on the Parhelia before I believe those graphs. They look fishy to me. Like they've been ripped off Frumpzilla.

memberSince97
02-May-2007, 22:58
The explanation is probably simpler. A histogram shows the R600 shot is clipped at both ends of the black/white range while the G80 is not. A quick play with lowered brightness ("-50" in ArcSoft PhotoStudio) and contrast ("-75") shows the G80 shot approaching the R600 in IQ and corresponding clipped histogram range (though G80 still shows more of a reddish hue). I'm guessing this is a simple case of a borked gamma setting.

But if a smaller HDR format (FP10, or that 9-9-9-5 format I think was in an R600 slide?) could produce the same results, then I guess we split the difference (MDR?).




Thanks Pete... those two comparison shots really bother me...

Jawed
02-May-2007, 23:00
:lol: I noticed this divergence awhile ago, but it struck me as being exactly the kind of architecture you might expect if you purged your shader team of everyone who didn't buy the unified approach and put them on the texture unit team. It doesn't seem unified at all. The shader units are unified, but not the texture units.
I was referring to the shader architecture being unified, not the texturing architecture.

I think you're referring to the architecture of the texture units, anyway. But I can't work out what you're saying.

You have single-channel dedicated addressing units, single-channel dedicated samplers, multi-channel dedicated addressing units, and multi-channel samplers, which are effectively just four-wide single-channel samplers, but, err, they're "dedicated".
When you address texels you have to account for LOD and bias and the kind of filtering algorithm you intend to perform (merely bilinear or something more interesting). With higher quality filtering, the texels to be fetched for one pixel in a screen-space quad don't necessarily overlap with all the other texels for the other pixels in the quad. So each set of texels needs to be addressed.

So that's why you need a fair amount of TA capability for filtered texels. Addressing formulae are more involved then I can ever be bothered to remember (or work out) so, ahem, just think of loads of interpolations in each of the 3 dimensions of screenspace.

Now, for vertex fetches, addressing should be much simpler, because fetches are from a stream. Each element in the stream is the same size as its neighbours and there's usually not much reason to flit around, a serial read is fine. Addressing consists of base address + position-in-stream * size-of-element. Much easier to compute than texel addresses for filtering. Having said that, you may want to have a stride factor (for LOD), e.g. reading 1 in 10 vertices in a 3:1 LOD reduction.

In texture filtering, with multi-texturing, each layer of textures has effectively the same address. Well partly, anyway, because the mipmap chain might be different for the extra layers (they can be lower-detail). But anyway, multi-texturing should be able to (at least partly) re-use the texture addresses from level to level. And don't forget multi-texturing usually requires less texturing quality for these extra layers (e.g. only bilinear).

In vertex fetching you may want to sample from multiple streams in parallel. This is where you can pile on the attributes and do instancing. D3D10 allows for 8 streams to be used in parallel.

So, I'm guessing that the TAs for vertex fetch are used less densely than for texel fetch. The VF-TAs can each address one stream. So four of them allow four streams to be fetched in parallel.

Separate from VF and texture filtering, you've got unfiltered texture fetches. In D3D10 these are from texture buffers, 1D, 2D or 3D. These could be something like big blobs of constant data (e.g. for morphing vertices) or they can be for post-processing of render targets (e.g. performing tone-mapping). etc.

When you address a single texel in a texture buffer, the shader will prolly have performed some calculations to identify which texel is required. The TA then fetches the texel based on base address and offset (taking care of 1D, 2D or 3D organisation of the texture). Each of the other objects executing the same shader (vertices, primitives or pixels) will decide their own address for the texel fetch. So that'll keep four TAs occupied. These TAs, I'm guessing, are VF-TAs. I guess that because without filtering, texture buffer fetches shirk most of the complexity of TA-ing (no interpolations are needed to generate these addresses).

As far as I can tell it's prolly best to think of VF-TAs as much less complex than filtered texture TAs. The throughput of both kinds of TAs needs to be high. At the same time there are overlapping and disparate kinds of fetches that need to be performed within a shader program, so you want to maximise the potential throughput per clock.

It's also worth remembering that the L2 cache in R600 is shared by both vertex L1 and texture L1 caches. In R600 the L1 texture cache is specifically for the filtering pipelines, as far as I can tell (based on patent documents). That would mean that all vertex fetches and texture buffer fetches come through the vertex L1.

In classical DX9 pixel shader code, some texel fetches are unfiltered. Typically these are for things like BRDFs (providing a short cut to the behaviour of light on a material) or for things like the infamous D3 specular lighting lookup. These texel fetches on R5xx and G7x have to be performed by the filtering pipelines, with the filtering turned off.

In theory a D3D10 GPU can perform these fetches using the texture buffer (vertex fetch) pipelines. This would then free-up the filtering pipelines for their normal duty, instead of wasting them on unfiltered texels - onerous when your shader is trying to apply four or more textures per pixel.

I'm still interpreting here, nothing's set in stone...

Jawed

Jawed
02-May-2007, 23:07
1MB of L2 cache would be IMHO a complete waste of die area, at least for tipical applications (games), you can happily live with way less than that.
As far as I can tell the amount of cache hasn't appeared on any leaked slides so far.

Jawed