View Full Version : R520 - talk, rumours, speculation; the lot!
digitalwanderer
23-Jul-2005, 00:47
16 pixel units, four quad groups, 8 MIMD vertex units, and a new rasteriser feed them
Ok I've gotten this passed along to me on the low-low from two different anonymous sources now, I was gonna post it up in the other 520 pipe thread to explain my mirth at a few comments....but it's already wisely closed.
So what kind of beast would the above quote make the R520?
Particularly, what does "new rasterizer" mean?
what's about wait and see :?
Particularly, what does "new rasterizer" mean?
You can see here (http://www.beyond3d.com/misc/chipcomp/?view=chipdetails&id=106&orderby=release_date&orde r=Order&cname=) what the G70 rasteriser looks like and here (http://www.beyond3d.com/misc/chipcomp/?view=chipdetails&id=75&orderby=release_date&order =Order&cname=) you can find the R480 rasteriser setup. So take your pick on what's new in the rasteriser for R520...
Thanks. . .that narrows it right down, doesn't it? :lol:
"16 pixel units, four quad groups, 8 MIMD vertex units"
I left out the part about new rasterizer, but why would such a unit be called anything but a 16 pipe architecture?
The vertex units are independent of the pixel pipes, and while the above arch. would have an advantage over a 16pixel pipe 6vertex pipe unit, it certainly isnt a 24 pipelined arch.
Both the X8x0 XT/XL series and the 6800GT/U series had 16 pixel pipes, four quad groups, and no one tried to pass them off as anything else.
R520 = 16/16/24, 3 ALUs per pipe, 8 vertex, 600+ MHz
8) :)
I'm going to a wild stance here but I'm thinking the r520 is a bit closer to the xbox 360 then we were orginally thinking, just got wind of this, its very possible it does have unified shaders.
Dave Baumann
23-Jul-2005, 02:45
It is not at all a similar architecture to Xenos, that is confirmed.
I'm going to a wild stance here but I'm thinking the r520 is a bit closer to the xbox 360 then we were orginally thinking, just got wind of this, its very possible it does have unified shaders.
Well, I think several of us have slowly been edging left to right on that scale. I'm not that far over to the right yet tho.
Edit: Not to mention I'd have to apologize to NV and VR Zone! :lol: I can do it tho --I'm a married man; lots of practice!
digitalwanderer
23-Jul-2005, 03:15
It is not at all a similar architecture to Xenos, that is confirmed.
Xenos is R500, right? :|
Wunderchu
23-Jul-2005, 03:23
It is not at all a similar architecture to Xenos, that is confirmed.
Xenos is R500, right? :|ya, Xenos = R500 = C1
(although ATI does not like it being called 'R500' , because one might get the idea because of the number that it is inferior / less powerful to R520)
digitalwanderer
23-Jul-2005, 03:27
Thanks Wunderchu, I admit I haven't heard it referred to as C1 either. :oops:
R520 won't be nearly as efficient, and I doubt it will have a dughter die with eDRAM on it.
If anything it will still be 16 pipes but with more ALU's for the shaders. I have not a clue.
Ailuros
23-Jul-2005, 05:06
16 pixel units, four quad groups, 8 MIMD vertex units, and a new rasteriser feed them
Ok I've gotten this passed along to me on the low-low from two different anonymous sources now, I was gonna post it up in the other 520 pipe thread to explain my mirth at a few comments....but it's already wisely closed.
So what kind of beast would the above quote make the R520?
A NV40 on steroids with more VS units :P
It is not at all a similar architecture to Xenos, that is confirmed. :(
So its just going to be a fast, feature adjusted r420 :?
http://www.sydneyshowground.com.au/index.asp?PageType=EventDetail&ID=86&SectionID=1
Launching 26 Aug? I can't think what else they'd be launching with that description.
There had been a little banner ad at an Australian site for this before, but this looks more official-like.
ATI Technologies Australia Launch Event
This event is for channel and media showcasing the latest in graphic card technology from ATI Technoligies.
Day 1 - (26th August) Trade Only
Day 2 - (27th August) Open to General Public
[Just in case somebody suddenly turns deep red and it disappears --and props to Ady at R3D for finding this]
Have they launched CrossFire yet? As in LAUNCH? :) But lets hope it's R520.
/me prays.
Particularly, what does "new rasterizer" mean?
You can see here (http://www.beyond3d.com/misc/chipcomp/?view=chipdetails&id=106&orderby=release_date&orde r=Order&cname=) what the G70 rasteriser looks like and here (http://www.beyond3d.com/misc/chipcomp/?view=chipdetails&id=75&orderby=release_date&order =Order&cname=) you can find the R480 rasteriser setup. So take your pick on what's new in the rasteriser for R520...
That is not what is usually called a rasterizer in the 3D pipeline. The rasterizer is the unit that generates quads from triangle data.
Which fits better with the "feed" language. But what would you do to that which would make it worthwhile to highlight? That's typically been considered pretty standard, unexciting stuff, hasn't it?
Could this be a sideways reference to the new internal bus?
Which fits better with the "feed" language. But what would you do to that which would make it worthwhile to highlight? That's typically been considered pretty standard, unexciting stuff, hasn't it?
Indeed. There's not much "interesting features" you can put there. Maybe changes are related to antialiasing (the rasterizer is the part that determines sample coverage), or maybe they dropped the triangle clipping stage and use infinite guardband clipping instead. My guess would be they changed the tiling mode.
Could this be a sideways reference to the new internal bus?
I don't think so.
My guess would be they changed the tiling mode.
Would it rain mockery on my fevered brow if I asked: TBDR?
Obviously they went aways that direction in C1, but then they had the eDram max size factor to deal with there. One would assume for R520 the advantage would be bandwidth in a bandwidth-starved world. . .but then I don't know how much advantage is still left there given all the Z optimizations that have developed over recent years.
Would it rain mockery on my fevered brow if I asked: TBDR?
Obviously they went aways that direction in C1, but then they had the eDram max size factor to deal with there. One would assume for R520 the advantage would be bandwidth in a bandwidth-starved world. . .but then I don't know how much advantage is still left there given all the Z optimizations that have developed over recent years.
Not deferred rendering. I meant the mode of tile distribution, i.e. each quad pipeline working on fixed screen tiles in current chips (and supertiling on top of that).
I can't imagine there's much to improve with rasterisation. R420 already has programmable tile sizes so I doubt it's along those lines.
Could it relate to batching? Sigh, we don't seem to know how ATI cards batch.
Jawed
SugarCoat
24-Jul-2005, 03:47
I'm going to a wild stance here but I'm thinking the r520 is a bit closer to the xbox 360 then we were orginally thinking, just got wind of this, its very possible it does have unified shaders.
absolutly no chance. Unified Shaders is something WGF2.0 will utilize and utilize alone. Wouldnt make sense at all for them to release a WGF2.0 card when WGF2.0 doesnt even have a launch forseen. Plus i think they'd have to get the go ahead from microsoft on something that far ahead and key to Longhorn, which i doubt they'd get. Basically what im getting at is it would be a bragging right only even if it were a chance, but a performance boost most certainly wouldnt be there without heavily modified and powerful pipes under the hood to begin with. Nothing is hinting toward unified shaders.
----
isnt a new rasteriser a given considering the rather large change to the feature set the core supports?
I'm going to a wild stance here but I'm thinking the r520 is a bit closer to the xbox 360 then we were orginally thinking, just got wind of this, its very possible it does have unified shaders.
absolutly no chance. Unified Shaders is something WGF2.0 will utilize and utilize alone. Wouldnt make sense at all for them to release a WGF2.0 card when WGF2.0 doesnt even have a launch forseen. Plus i think they'd have to get the go ahead from microsoft on something that far ahead and key to Longhorn, which i doubt they'd get. Basically what im getting at is it would be a bragging right only even if it were a chance, but a performance boost most certainly wouldnt be there without heavily modified and powerful pipes under the hood to begin with. Nothing is hinting toward unified shaders.
----
isnt a new rasteriser a given considering the rather large change to the feature set the core supports?
There are hints towards it. I wouldn't truely take a stance like this without hints, and its possible to come out with a a card that supports new features without the API coming out. Look at dx9c and nV's sm3.0 cards. What was it 3 to 5 months it took for nV to get sm 3.0 dirvers and dx9c testing for whql. 3 to 5 months from early 2006 will lead to possibly right around the release of Vista.
If I'm wrong, thats ok, its just a guess.
There are hints towards it. I wouldn't truely take a stance like this without hints, and its possible to come out with a a card that supports new features without the API coming out.
Yup. Just have a look at the R300 and DX9. ATI had the cards out several months before the API was out. Not to mention them supporting instancing even though it wasn't supported/exposed in the API until years later.
There are hints towards it. I wouldn't truely take a stance like this without hints, and its possible to come out with a a card that supports new features without the API coming out.
Yup. Just have a look at the R300 and DX9. ATI had the cards out several months before the API was out. Not to mention them supporting instancing even though it wasn't supported/exposed in the API until years later.
very true
SugarCoat
24-Jul-2005, 05:32
There are hints towards it. I wouldn't truely take a stance like this without hints, and its possible to come out with a a card that supports new features without the API coming out. Look at dx9c and nV's sm3.0 cards. What was it 3 to 5 months it took for nV to get sm 3.0 dirvers and dx9c testing for whql. 3 to 5 months from early 2006 will lead to possibly right around the release of Vista.
If I'm wrong, thats ok, its just a guess.
i guess the big thing that makes me not believe it is the simple reason of why waste all this time on it if its that. And this isnt DX9C or a user end update, its a whole new OS as far as i know. Longhorn isnt even going to launch with 2.0 thats going to come, who the heck knows when, and its suppose to be very big in changes, not just unified shaders and SM4.0. I guess i just dont see the strategic sense in wasting time for it unless it will make the R520->R600 transision that much easier? Certainly would be a shock, and i'd think it was cool, but i just dont see it. Longhorn is still on track for a Q4 2006 release i think right, or did they slip something in that Vista preview thing they did.
Vista is longhorn I think it was Q4, and most likely it will come out then too, but WGF will support both traditional pipelines and unified. ATi already has a unified pipeline structure with the xbox 360 chip. They could easily just control a chip like that through drivers to perform similiarly to a chip with traditional pipeline, probably don't have to though.
oeLangOetan
24-Jul-2005, 09:25
You seem to forget that vista will not launch with WGF2.0 but with WGF1.0 also all the new effects in vista will be programmed for WGF1.0
WGF2.0 still seems very far away, launching probably 6 months or more after vista.
http://www.rage3d.com/board/showthread.php?p=1333818374#post1333818374
This guy believes he has the answers.
http://www.rage3d.com/board/showthread.php?p=1333818374#post1333818374
This guy believes he has the answers.
I'm sure a lot of people believe they have the answers. :o
Dave Baumann
24-Jul-2005, 15:11
Well, with the number of stupid reports around I'm sure they will.
Considering r520 is still based on r300 architecture. I'm wondering how much of a problem would it be to detach the ROP's from the fragment pipeline for ATI? Nvidia had their chips designed for a long time now like that so it isn't a problem but Ati never had such a design.
One reason i believe r520 is 16pipes is that would be to much of a pain to redesign their r300 architecture to detach the ROP's and add more fragment pipelines. Just going with straight pipelines and add more ALU power per pipeline instead and the higher clockspeed looks to be the more easier way for them.
Dave Baumann
24-Jul-2005, 15:36
I don't think its much of a pain, at least no more or less so than changing other elements of the pipelines. Its been proven that "deep" ALU pipelines are not really the way to go, but wide ones are, so whatever happens, eventually you'll be looking at processing more pixels in the ALU's at some points of the pipeline than you will have the capability to actually output (although, not necessarily with R520).
http://www.beyond3d.com/forum/viewtopic.php?p=483000#483000
For instance, where the shader architecture is concerned what will R520’s ALU composition be? Will it still be the same as R300 but extended for FP32 and SM3.0 instructions, will they have done something like beefed up the second ALU with a more compete instruction set or will they have done something very different? Who knows at this point, all I can say is that when I tried to tease some information a while back I was slightly ominously, for ATI, reminded that R520 was primarily designed when ATI had a huge shader advantage with R300 vs NV30…
Jawed
trinibwoy
24-Jul-2005, 16:28
http://www.beyond3d.com/forum/viewtopic.php?p=483000#483000
For instance, where the shader architecture is concerned what will R520’s ALU composition be? Will it still be the same as R300 but extended for FP32 and SM3.0 instructions, will they have done something like beefed up the second ALU with a more compete instruction set or will they have done something very different? Who knows at this point, all I can say is that when I tried to tease some information a while back I was slightly ominously, for ATI, reminded that R520 was primarily designed when ATI had a huge shader advantage with R300 vs NV30…
Jawed
Well they would've had plenty of time to rethink that strategy since the NV40 debuted. I don't think designing your products around your competitor's mistakes is a good approach.
Well they would've had plenty of time to rethink that strategy since the NV40 debuted.
At about the time NV40 debuted, R520's functional design was about finished, I expect. I can't imagine ATI has done much to R520 in response to NV40 - that's why I'm pessimistic about FP blending, even if it's a part of Xenos.
Jawed
Hellbinder
24-Jul-2005, 16:41
16-1-3-1 (4 blocks) Close to 700mhz..
That still about summs it up.
trinibwoy
24-Jul-2005, 16:42
Well they would've had plenty of time to rethink that strategy since the NV40 debuted.
At about the time NV40 debuted, R520's functional design was about finished, I expect. I can't imagine ATI has done much to R520 in response to NV40 - that's why I'm pessimistic about FP blending, even if it's a part of Xenos.
Jawed
I would imagine that if r520 as planned was behind NV40 in terms of feature support ATi would've canned the project a long time ago and pushed forward with r580 or whatever they had planned that could match the NV40 featureset. It would be a disaster to launch another generation that is lagging behind Nvidia's last - especially after all the delays. I expect the r520 to be feature competitive (with its own nuances) with higher performance than the GTX.
fp16 blending was already confirmed by multiple sources one directly from ATI through a leaked presentation(well it was official available on their website for a couple of minutes :) )
kemosabe
24-Jul-2005, 17:05
The one thing that's consistently disappointing is that just about every single hint or "rumour" suggests that R520 is going to be less than it was intended to be from a performance standpoint. While proving nothing, the significant delays only lend credence to those whisperings and make me wonder why they didn't make the decision back in spring to forge ahead with R580 rather than risk turning this into a real fiasco. I can only assume that R580 wasn't close enough to being ready.
OK, seems I've missed the confirmation of FP16 then, or forgotten it :oops:
Is FP blending part of OGL 2.0? I'm just wondering which API FP blending will appear in first...
Jawed
fpblending is supported in opengl2.0 and dx9.
OK, put another way, which APIs is it a requisite in?
Jawed
OK, put another way, which APIs is it a requisite in?
Jawed
neither
trinibwoy
24-Jul-2005, 17:33
Which fits better with the "feed" language. But what would you do to that which would make it worthwhile to highlight? That's typically been considered pretty standard, unexciting stuff, hasn't it?
Indeed. There's not much "interesting features" you can put there. Maybe changes are related to antialiasing (the rasterizer is the part that determines sample coverage), or maybe they dropped the triangle clipping stage and use infinite guardband clipping instead. My guess would be they changed the tiling mode.
8xMSAA would be nice.
Which fits better with the "feed" language. But what would you do to that which would make it worthwhile to highlight? That's typically been considered pretty standard, unexciting stuff, hasn't it?
Indeed. There's not much "interesting features" you can put there. Maybe changes are related to antialiasing (the rasterizer is the part that determines sample coverage), or maybe they dropped the triangle clipping stage and use infinite guardband clipping instead. My guess would be they changed the tiling mode.
8xMSAA would be nice.
Before they adding more and more samples they may think about something to reduce the memory footprint first.
trinibwoy
24-Jul-2005, 17:36
8xMSAA would be nice.
Before they adding more and more samples they may think about something to reduce the memory footprint first.
But that would certainly have the same memory footprint and higher performance than Nvidia's 8xS no?
8xMSAA would be nice.
Before they adding more and more samples they may think about something to reduce the memory footprint first.
But that would certainly have the same memory footprint and higher performance than Nvidia's 8xS no?
Considering it would be a MSAA mode only certainly.
16-1-3-1 (4 blocks) Close to 700mhz..
That still about summs it up.
Still? That was the leaked R580 spec from December (except the "4 blocks" bit - no idea where that came from, unless you just mean quads). Original R520 target clocks weren't quite that high either, IIRC.
TBH it sounds as if they need the extra ~50MHz or so to beat G70 (by an acceptable margin), hence yield issues at clock.
8xMSAA would be nice.
Before they adding more and more samples they may think about something to reduce the memory footprint first.
But that would certainly have the same memory footprint and higher performance than Nvidia's 8xS no?
Considering it would be a MSAA mode only certainly.
I think it would be cool (I've said this before) for them to bump up to 4x & 8x (1 loop & 2 loop). We should be moving to 4x as the baseline, imho.
The one thing that's consistently disappointing is that just about every single hint or "rumour" suggests that R520 is going to be less than it was intended to be from a performance standpoint. While proving nothing, the significant delays only lend credence to those whisperings and make me wonder why they didn't make the decision back in spring to forge ahead with R580 rather than risk turning this into a real fiasco. I can only assume that R580 wasn't close enough to being ready.
Well that is because Inquirer and similar websites kept "dreaming" in the 32 pp story and now its time to wake up and come back to reality.
I havent seen even one substantial/credible hint or rumor that R520 is going to be less stellar than intended.
absolutly no chance. Unified Shaders is something WGF2.0 will utilize and utilize alone. Wouldnt make sense at all for them to release a WGF2.0 card when WGF2.0 doesnt even have a launch forseen. Plus i think they'd have to get the go ahead from microsoft on something that far ahead and key to Longhorn, which i doubt they'd get. Basically what im getting at is it would be a bragging right only even if it were a chance, but a performance boost most certainly wouldnt be there without heavily modified and powerful pipes under the hood to begin with. Nothing is hinting toward unified shaders.
----
isnt a new rasteriser a given considering the rather large change to the feature set the core supports?
You don't need WGF2.0 to utilize unified shaders. It is a hardware feature that the application doesn't need to be aware of.
Chalnoth
25-Jul-2005, 02:03
Before they adding more and more samples they may think about something to reduce the memory footprint first.
But memory space isn't all that expensive, so I don't see this as a big deal, not when one considers how exceedingly-challenging it is to provide a smaller memory footprint without visual drawbacks.
SugarCoat
25-Jul-2005, 03:48
absolutly no chance. Unified Shaders is something WGF2.0 will utilize and utilize alone. Wouldnt make sense at all for them to release a WGF2.0 card when WGF2.0 doesnt even have a launch forseen. Plus i think they'd have to get the go ahead from microsoft on something that far ahead and key to Longhorn, which i doubt they'd get. Basically what im getting at is it would be a bragging right only even if it were a chance, but a performance boost most certainly wouldnt be there without heavily modified and powerful pipes under the hood to begin with. Nothing is hinting toward unified shaders.
----
isnt a new rasteriser a given considering the rather large change to the feature set the core supports?
You don't need WGF2.0 to utilize unified shaders. It is a hardware feature that the application doesn't need to be aware of.
have any proof of that? As far as i understand the card will simply be treated as if it has dedicated pixel and vertex by the program, however the card on the hardware level also needs to designate a number to control what percent of the pipes are doing work for pixel/vertex shaders. Point being the user isnt going to notice a damn bit of difference, but its going to make for a complex job in creation of the core itself. Counter productive in these times if you will.
Without WGF2.0 having unified shaders is going to be a completely worthless and cosmetic feature otherwise both companies would of embraced it long before. nVidia has fought it more then anyone saying programmable is more efficient. Hopefully you agree. If not explain.
absolutly no chance. Unified Shaders is something WGF2.0 will utilize and utilize alone. Wouldnt make sense at all for them to release a WGF2.0 card when WGF2.0 doesnt even have a launch forseen. Plus i think they'd have to get the go ahead from microsoft on something that far ahead and key to Longhorn, which i doubt they'd get. Basically what im getting at is it would be a bragging right only even if it were a chance, but a performance boost most certainly wouldnt be there without heavily modified and powerful pipes under the hood to begin with. Nothing is hinting toward unified shaders.
----
isnt a new rasteriser a given considering the rather large change to the feature set the core supports?
You don't need WGF2.0 to utilize unified shaders. It is a hardware feature that the application doesn't need to be aware of.
have any proof of that? As far as i understand the card will simply be treated as if it has dedicated pixel and vertex by the program, however the card on the hardware level also needs to designate a number to control what percent of the pipes are doing work for pixel/vertex shaders. Point being the user isnt going to notice a damn bit of difference, but its going to make for a complex job in creation of the core itself. Counter productive in these times if you will.
Without WGF2.0 having unified shaders is going to be a completely worthless and cosmetic feature otherwise both companies would of embraced it long before. nVidia has fought it more then anyone saying programmable is more efficient. Hopefully you agree. If not explain.
If I understand the unified pipelines correctly it could be looked as a varient of the current pipeline/vertex shader structure. Each quad/array can be used as a pixel shader quad.array or vertex shader quad/array. So it might be possible through drivers to designate which quads/arrays do what until WGF 2.0 comes out.
trinibwoy
25-Jul-2005, 04:31
Without WGF2.0 having unified shaders is going to be a completely worthless and cosmetic feature otherwise both companies would of embraced it long before. nVidia has fought it more then anyone saying programmable is more efficient. Hopefully you agree. If not explain.
Not really. The workload isn't going to change. My understanding is that WGF2.0 will be designed around a unified paradigm but that has no bearing on what the hardware will do once it receives a workload. And the reason they haven't embraced it before is exactly what Nvidia is saying now - it's easier to design and optimize more dedicated hardware. The added flexibility of unified pipes is not free - there is a lot more complexity involved as well.
Chalnoth
25-Jul-2005, 05:20
have any proof of that? As far as i understand the card will simply be treated as if it has dedicated pixel and vertex by the program, however the card on the hardware level also needs to designate a number to control what percent of the pipes are doing work for pixel/vertex shaders. Point being the user isnt going to notice a damn bit of difference, but its going to make for a complex job in creation of the core itself. Counter productive in these times if you will.
That would be highly inefficient. If you're going for unified pipelines, you're really doing it for one primary reason: to prevent either the vertex or pixel shaders from becoming the bottleneck. This means that any sane unified design will never be in a situation where there are a set number of pixel and vertex shaders.
A much better setup would be one where you just have a massive queue of shader instructions, which could include either pixel or vertex data, and intelligent dispatching of these instructions in an execution-friendly order to whichever units happen to be available.
This can certainly be done in a completely transparent way to the software, and thus could work just fine with, say, DirectX 7 games, and even older OpenGL titles.
Now, that said, it may not be the best way to go. We'll have to see, but I can think of a couple of reasons why unified pipelines may not end up being better than today's architectures:
1. It's sure to be more complex, which means a smaller percentage of the total logic can be dedicated to computing than with today's architectures. The gain in efficiency may not be enough to offset the loss in total processing power.
2. It may be possible to build significantly higher-performance units dedicated to either vertex or pixel processing than units that can do both (this is nVidia's primary argument).
Vertex programs use to be quite large (up to hundred instructions when using multiple light sources) compared with fragment programs but at some point vertex processing just becomes setup limited and throwing more shader units isn't going to help at all. In fact the usual vertex peak test (the classic vertex position by the model-projection matrix multiplication, four DP) in any unified shader architecture (unless it's very low end) is likely that it's going to end setup limited not shader limited. I doubt that the Xeon rasterizer is going to be capable of processing 12 triangles per cycle (unless they are going Reyes, like Sony seems to be trying for a couple of generations, and someone forgot to tell us).
Unified shading helps to rationalize the architecture of the shaders as now both vertex, fragments (or whatever other stream processing element is used) share the same semantics and resources and may help in load balancing some extreme cases, for example a batch with a very large vertex program that produces very few fragments. But at the end the current GPU architectures are still based on the OpenGL polygon rasterization paradigm in which a relatively small number of vertices/triangles generate a relatively large amount of fragments. The quad processing architecture for the fragment pipeline wouldn't be precisely efficient if that wasn't the truth. And the setup stage of rasterization becomes quite expensive when the paradigm isn't true. In the future we may go Reyes or to a mixed Reyes+raytracer but we aren't there yet.
Chalnoth
25-Jul-2005, 09:37
You're missing something, RoOoBo. That's that different triangles produce different numbers of fragments, and depending upon the vertex and fragment programs used, those will each take differing amounts of time to process.
Since we're dealing with immediate-mode processing, enough caching is just not available to balance these things out over significant portions of screen space. Enter the unified architecture: Got 1000 1-pixel triangles to render? No problem, my pixel shaders aren't going to sit idle. Got 1 1000-pixel triangle to render? No problem, my vertex shaders aren't going to sit idle, either.
Yes, there is the possibility of being setup-limited, but that's going to be hardware-specific, and simply doing some research on the part of the video chip manufacturer should give the chip enough units for it not to be an issue in typical game scenarios.
I'm not saying anything different I'm only doubting that the nearly future GPUs implementing unified shaders provide the triangle setup resources to sustain the theorical vertex peak rate, mainly because it isn't a reasonable assumption. Triangle setup and rasterization are relatively inexpensive but not that much. So there may be some cases that rather than vertex shading limited become setup limited. However I doubt that the case is common in current or near future games.
In any case we can wait a few months when benchmarking done in the Xeon by game developers may clarify this matter.
Anyway your example of 1000 1 pixel triangles may not mean idle shader units but it means a 75% of useless work as the fragments are (must be?) processed in quads.
I wonder if there is a cheap alternative for non quad processing when mipmapping is involved. You could try triad (three fragments) processing but that doesn't help much and doesn't look that nice for cache or memory accesses.
Chalnoth
25-Jul-2005, 11:14
Anyway your example of 1000 1 pixel triangles may not mean idle shader units but it means a 75% of useless work as the fragments are (must be?) processed in quads.
Just because they are currently doesn't mean they must be in the future. In fact, moving away from single-triangle quads really is a must for efficient rendering in the future. It appears that the Xenos has already made this step, and I'm sure we'll hear more about such technologies in the near future.
It does require a bit of caching to allow more than one triangle to occupy the same quad, and will likely also require more temporary storage within the quad, but it is a necessary move as polycounts continue to rise.
As long as texturing is done in quads, I don't see how fragment shading is going to break away from the quad paradigm.
You've also got the problem that hierarchical-z is also quad-based.
So it seems to me that it would be extraordinarily hard to shake-off the quad paradigm.
When two adjacent triangles share an edge, fragments that fall into both triangles need to be shaded twice. On that basis it's not possible to gain efficiency by sharing a quad over multiple triangles concurrently.
Jawed
Chalnoth
25-Jul-2005, 11:56
As long as texturing is done in quads, I don't see how fragment shading is going to break away from the quad paradigm.
You've also got the problem that hierarchical-z is also quad-based.
So it seems to me that it would be extraordinarily hard to shake-off the quad paradigm.
Well, first of all, I don't believe that hierarchical-z depends on a quad paradigm at all (it's a tile-based algorithm, after all, operating on objects much larger than quads), or at least, doesn't benefit dramatically from a quad-based design.
Secondly, yes, texturing is an issue. But not a huge one. You could link up a multi-triangle quad shader pipeline with a quad-based texturing pipeline dumbly, and you'd just end up with a bit more strain on the texturing units. You could also be a bit more intelligent and optimize for the case when the surface normal doesn't change much between neighboring quads within the texturing pipelines, but it's not strictly necessary.
One simple way of solving this may already exist in the Xenos, which has completely separated the shader pipelines from the texturing pipelines.
When two adjacent triangles share an edge, fragments that fall into both triangles need to be shaded twice. On that basis it's not possible to gain efficiency by sharing a quad over multiple triangles concurrently.
Er, you're thinking of MSAA here, I imagine. It doesn't really matter. Consider a quad with a triangle edge passing through one pixel of the quad. Even with FSAA, a system that doesn't enforce quad coherency will calculate 5 fragments, whereas a system that does will calculate 8 fragments. In a worst-case scenario, the system that doesn't enforce coherency will calculate 7 fragments (assuming only two triangles within the quad) to the coherent system's 8.
Here, then, even though the two-triangle within a quad setup is a worst-case for a non-coherent design, it'll likely still manage approximately a 33% speed advantage on average (since edge going through one pixel should happen in equal proportions to quads going through three pixels, average should be 6 fragments calculated for the non-coherent design). As triangles get smaller and smaller (as they are bound to: processing, though slowing down, is still progressing at a much faster rate than display technology), the non-coherent design will look more and more attractive.
Well, first of all, I don't believe that hierarchical-z depends on a quad paradigm at all (it's a tile-based algorithm, after all, operating on objects much larger than quads), or at least, doesn't benefit dramatically from a quad-based design.
The lowest level in the z-hierarchy is a quad of pixels. The next level up (the first level in the hierarchical-z-buffer) is a single value representing that quad. If you're trying to cull pixels to avoid shading them, then you must align the z-buffer to the underlying quads that are actually rendered.
Therefore there's an inescapable link between a quad that's rendered and the hierarchy of z-buffer levels that lie above rendered quads.
Secondly, yes, texturing is an issue. But not a huge one. You could link up a multi-triangle quad shader pipeline with a quad-based texturing pipeline dumbly, and you'd just end up with a bit more strain on the texturing units.
I envisage a scenario where the texturing engine is asked to texture entire triangles at a time, rather than working piecemeal, quad by quad. On the other hand triangles are getting smaller, so piecemeal texturing is, relatively speaking, rising in granularity anyway.
You also have the problem of wasted texturing work when most of a triangle is occluded.
In the past I've suggested using quad-serialisation through the fragment shader pipeline. Here, a quad of pipelines shading together as a SIMD block no longer exists. Every fragment shader pipeline has a unique program counter. (Apart from anything else, the scheduling hardware overhead for such an architecture is quite severe.)
But each pipeline is scheduled with quads of fragments (or multiples thereof, say 16 quads). This is a way to retain the quad paradigm for texturing and hierarchical-z, but when the scheduler issues command threads to the fragment shader's execution unit it knows that some fragments are off-triangle and therefore it simply skips the fragment. It can delete that fragment's command thread from the queue.
So, overall, I agree that it would be nice to move away from quad-based shading, but I think we're stuck with quad-based texturing and hierarchical-z.
As triangles get smaller, the proportion of edge quads rises exponentially, I expect. Simply on that basis it makes sense to consider serialisation. But the overheads in terms of scheduling hardware and the way that registers (and constants) can no longer be organised in coherently-addressed blocks is a pretty severe hit.
What's interesting about Xenos is how granular it is - instead of being organised into quads like R420, Xenos is effectively operating on four quads per cycle, SIMD. It seems that to counter the overhead of scheduling so intensively per clock, the hardware has been made more parallel per clock, too.
(On the other hand, there appears to be no dual-issue scheduling complexity within Xenos - the pipelines are limited to co-issue splitting of the Vec5 ALUs, though even then we don't know how flexible that is).
So, really, I don't think it's impossible to perform incoherent fragment shading, but I think the scheduling mechanism to make it workable is a huge, huge, overhead.
GPU parallelism in fragment shading stems from the way it's been possible to use an SIMD architecture to do so (look at NV40 which is effectively 16-way SIMD). With per-fragment branching and smaller triangles becoming more important I would like to think that truly incoherent fragment shading is in our future. But the hardware overhead is ginormous.
Jawed
Chalnoth
25-Jul-2005, 13:18
The lowest level in the z-hierarchy is a quad of pixels. The next level up (the first level in the hierarchical-z-buffer) is a single value representing that quad. If you're trying to cull pixels to avoid shading them, then you must align the z-buffer to the underlying quads that are actually rendered.
Therefore there's an inescapable link between a quad that's rendered and the hierarchy of z-buffer levels that lie above rendered quads.
Sure, but none of that requires that either a full quad be rendered, or that the members of the quad come from the same triangle.
I envisage a scenario where the texturing engine is asked to texture entire triangles at a time, rather than working piecemeal, quad by quad. On the other hand triangles are getting smaller, so piecemeal texturing is, relatively speaking, rising in granularity anyway.
Well, I think that ATI, at least, has been doing tile-based rasterization for a long time now (nVidia may as well, I don't know). So, I expect that there's no reason to worry about granularity if you're not worrying so much about quad-based texturing.
All that you need to do is batch N triangles before starting to rasterize any of them, then start assigning pixels to the quads within the tile irrespective of what triangles they came from. I don't see that this would require massive amounts of overhead, and should be pretty efficient in terms of actually dispatching full quads most of the time as long as the meshes are already optimized for vertex cache coherence.
You also have the problem of wasted texturing work when most of a triangle is occluded.
Huh?
What's interesting about Xenos is how granular it is - instead of being organised into quads like R420, Xenos is effectively operating on four quads per cycle, SIMD. It seems that to counter the overhead of scheduling so intensively per clock, the hardware has been made more parallel per clock, too.
Well, I expect that the Xenos doesn't require those four quads to each come from the same triangle. This was the impression I garnered from stuff I read about the architecture at launch.
Well, I expect that the Xenos doesn't require those four quads to each come from the same triangle. This was the impression I garnered from stuff I read about the architecture at launch.
They must have the same shader state.
I haven't seen anything in the descriptions of Xenos that suggests it marries-up same-state triangles. But, logically, it should be possible because adjoining triangles will often come in batches and they'll all be executing the same shaders.
This is one of the things I was trying to get to the bottom of in my "is this how fragment shading batching works" thread.
NV40 is very much more likely to marry-up same-state triangles because of its large batch size and it seems happiest with small texture sizes.
Jawed
Chalnoth
25-Jul-2005, 13:48
Well, I expect that the Xenos doesn't require those four quads to each come from the same triangle. This was the impression I garnered from stuff I read about the architecture at launch.
They must have the same shader state.
Oh, I would agree with that. But that doesn't necessarily mean that they'll all come from the same triangle. Consider, for a moment, what exactly is different between pixels inhabiting different triangles compared to those from the same triangle, assuming that the same textures and pixel shaders are used in all (which would be the norm for skinned objects, the primary situation I'm attempting to talk about here).
I haven't seen anything in the descriptions of Xenos that suggests it marries-up same-state triangles. But, logically, it should be possible because adjoining triangles will often come in batches and they'll all be executing the same shaders.
Well, it didn't actually come out and say it, but with the increased parallelism, it really wouldn't make sense not to do it. And ATI has described that they do have a separate processing unit specifically for grouping objects for execution.
NV40 is very much more likely to marry-up same-state triangles because of its large batch size and it seems happiest with small texture sizes.
I guess I'm not seeing the connection. But all this should be easily-testable with a program that renders a full-screen flat surface that is divided into different numbers of triangles. Once could calculate pretty easily the theoretical performance drop that should occur if the video card is not limited by geometry, but just by an increase in pixel processing due to enforced quad coherency, and compare this to the results of the benchmark. I'm kinda busy to write such a thing at the moment, but it wouldn't be hard to do.
Oh, I would agree with that. But that doesn't necessarily mean that they'll all come from the same triangle. Consider, for a moment, what exactly is different between pixels inhabiting different triangles compared to those from the same triangle, assuming that the same textures and pixel shaders are used in all (which would be the norm for skinned objects, the primary situation I'm attempting to talk about here).
Me too. I'm not trying to disagree at all.
I'm intrigued by batching mechanisms in GPUs. There are so many inter-related boundary conditions (e.g. texture cache sizes and organisation into L1/L2 as per NVidia or L1 only in ATI).
The batching thread concluded that batch sizes are determined by texture latencies (worst case) at least in current NVidia GPUs. At the same time the worst-case texture latencies work against the ability of a GPU to perform efficient per-fragment dynamic branching.
Xenos's primary architectural design goal seems to be to dissociate texture latency from fragment shading. The two processes are now orthogonal (at least until one or the other runs out of work to do). It would make most sense to maximise this benefit by allowing Xenos to operate on the smallest batches of fragments possible, which would enhance Xenos's responsiveness to per-fragment dynamic branching.
At the same time the scheduling hardware overheads in Xenos have been ameliorated by making it 16-way SIMD.
Still, we're stuck with no real information on the nature of batching, either in Xenos or even in R420/R300.
Jawed
Now proven by Taiwan's AIB , R520 is truly 32 pixel pipeline GPU.
you will see 24 pipe 500MHz during 8/26~8/27
16 pipe stand for Pro while 24 pipe stands for XT.
I got this off R3d. .I just wonder..
And.. laugh out loud.
:roll:
http://www.rage3d.com/board/showthread.php?t=33823522
and more
Give you Some Yield info
G70 24 pipe yield (40%) , but change to 20 pipe ( above 70%)
R520 16 pipe yield (40%) , but change to 24 pipe (above 5%)
ChrisRay
25-Jul-2005, 14:59
You are giving that guy more exposure than he deserves Neliz. :P
trinibwoy
25-Jul-2005, 15:12
You are giving that guy more exposure than he deserves Neliz. :P
I guess if you paste the same thing 20 times in 8 different threads it's bound to get out eventually :)
You are giving that guy more exposure than he deserves Neliz. :P
gadzooks!
I know he's a notorious spammer, but he was right about a few things, including the "pro's" that have been sent to developers.
but then again, with vr-zone claiming a 3% yield and he's claiming something between 5% and 10%.. I wonder which of the two bad's I have to choose.
Oh.. well.. sorry...
And.. you'd better read it here than at L'inq, right?
Well, given we seem to be hitting the 30-day mark tomorrow, the goodies should start fining-down right along now.
So, Last Call for predictions:
240mm2 ~250-275M
16 fragment shaders
16 rops
8 VS
650mhz/1200 (600 DDR)
20% in relative effective bandwidth improvements (compared to R4xx) from pixie dust somewhere.
It will stomp G70 in a few specific scenarios that people not wearing green underwear will find exciting. . .and trail relatively narrowly in more general "legacy" (SM2.0 and earlier) ones.
Chip size is relying on Wavey's measurements, which I notice are consistently a bit bigger for ATI chips than Orton's description of them in the interview last year (Example: Orton describes R420 as "16 x 16", and Wavey has it as 16.5 x 17)
trinibwoy
25-Jul-2005, 15:41
It'd be interesting to see what kind of overclocking headroom we get on the high-end r520 parts. And geo, what 30 day mark are you referring to?
It'd be interesting to see what kind of overclocking headroom we get on the high-end r520 parts. And geo, what 30 day mark are you referring to?
I'm sitting on Aug 26 as Der Tag, based on the Australian "ATI Launch" thingie. Not that it will only be Australia, of course.
http://www.sydneyshowground.com.au/index.asp?PageType=EventDetail&ID=86&SectionID=1
But even if the date should turn out a week or two later, if you assume (as I do) that they are going for something very close to simultaneous launch/availability, then R520's for sale to end users should already either be shipping to AIBs, or very close to it. And that exponentially increases the number of folks who know the truth, and thus the leak factory of *real* information should start kicking into gear. . .
trinibwoy
25-Jul-2005, 15:51
http://www.beyond3d.com/forum/viewt...=25193&start=60
geo
Senior Member
275~250M
hahahahahahahahahaha
R520 is 350M , hahahahahaha
Naive Guy
http://www.rage3d.com/board/showthread.php?p=1333819980#post1333819980 ROFL! :lol: :lol: :lol:
trinibwoy
25-Jul-2005, 15:52
But even if the date should turn out a week or two later, if you assume (as I do) that they are going for something very close to simultaneous launch/availability, then R520's for sale to end users should already either be shipping to AIBs, or very close to it. And that exponentially increases the number of folks who know the truth, and thus the leak factory of *real* information should start kicking into gear. . .
Ah ok, that would make sense.
You are giving that guy more exposure than he deserves Neliz. :P
gadzooks!
I know he's a notorious spammer, but he was right about a few things, including the "pro's" that have been sent to developers.
but then again, with vr-zone claiming a 3% yield and he's claiming something between 5% and 10%.. I wonder which of the two bad's I have to choose.
Oh.. well.. sorry...
And.. you'd better read it here than at L'inq, right?
The problem is we all knew what the developers were getting a month back it was nothing new he told us about with the 16 pipe card. Why would a AIC or AIB partner know about yeilds :? anyways? I would imagine ATi and TSMC would never tell that to anyone....
I just found out he thinks the yields for g70 are from the ibm plant.. now where would he get THAT from? is he confusing it with g72?
In fact, moving away from single-triangle quads really is a must for efficient rendering in the future.
Although combining multiple triangles into one quad increses the computation efficiency of your pipeline (ie: you keep all 4 fragment pipes busy at all times), you end up reducing the efficieny of the rest of the chip.
- Instead of having one (x,y) per quad, and one triangle id per batch, you now need up to 4 (x,y) pairs per quad, and up to 4 triangle ids per quad.
- You now need to compute and store the plane equations and/or barycentric weights for attributes of up to 4 triangles, more if you allow consecutive quads to be of different triangles still.
You've pretty doubled your non-register storage requirement right there.
You've also made the following more complicated:
- Screen-space derivative computations
- Anisotropy ratio computations
- Texturing, which now needs to iterate on quads by triangle ID.
- Rasterization, which now has to deal with coalecsing multiple triangles into quads.
In addition, you've created the following new problems:
- Average texture latency increases dramatically: Since you need to iterate over multiple texture requests/quad, you end up taking much longer to compute the results than you would otherwise.
- Since texture latency increases, you now need a much larger register file to run more computation threads while waiting for your texture results to come back.
All in al, you've probably doubled your area for a marginal increase in speed (50%, on average, for tiny triangles, which are setup-limited anyway).
This dramatic increase in inefficency in the rest of the pipeline makes for this option to be rather unattractive. You've better off just putting down 2x the quads, which will nearly double your throughput, than to make the quad pipes 2x more complex which might get you ~50% better utilization in some cases, and nothing in most.
Hellbinder
25-Jul-2005, 16:29
lots of you have a habbit of underestemating ATi and how good they are at screeing their information before product launch.
Bob--
In the netiquette of the 'verse, you are of course under no obligation to answer this question, and I won't think less of you if you don't. Nevertheless, I'll ask --what's your background?
:lol: I can do it tho --I'm a married man; lots of practice!
Man oh man...I needed someone else to say it for me....
Thanx!
Chalnoth
25-Jul-2005, 19:35
Xenos's primary architectural design goal seems to be to dissociate texture latency from fragment shading. The two processes are now orthogonal (at least until one or the other runs out of work to do). It would make most sense to maximise this benefit by allowing Xenos to operate on the smallest batches of fragments possible, which would enhance Xenos's responsiveness to per-fragment dynamic branching.
Oh, I don't think so. All you'd need is a system that can store a few different batches at one time (say 3-4). Under normal operation with large batches, then, you'd typically only be storing two (one triangle, one pixel). Then, if a dynamic branch comes along, you'd have to store a third batch.
jimmyjames123
25-Jul-2005, 19:35
lots of you have a habbit of underestemating ATi and how good they are at screeing their information before product launch.
"habbit" of "underestemating" at "screeing" ??? Now who is really the one who needs further screening? :lol:
Chalnoth
25-Jul-2005, 19:52
- Instead of having one (x,y) per quad, and one triangle id per batch, you now need up to 4 (x,y) pairs per quad, and up to 4 triangle ids per quad.
1. You're still working with quads, just not requiring them to be from the same triangle, so you'd still only have one (x,y) per quad.
2. I don't think fragment data cares which triangle it came from. Once the fragment has been set up, it's just a pixel location on the screen with a set of registers, textures, and a shader. Since the textures and the shader are the same, it's just the registers that are different (as is the case with a coherent architecture anyway).
- You now need to compute and store the plane equations and/or barycentric weights for attributes of up to 4 triangles, more if you allow consecutive quads to be of different triangles still.
Well, this is only going to be a necessity as an optimization for z-buffer compression, and even then that's a pretty tiny price to pay compared to the register file.
You've also made the following more complicated:
- Screen-space derivative computations
- Anisotropy ratio computations
- Texturing, which now needs to iterate on quads by triangle ID.
- Rasterization, which now has to deal with coalecsing multiple triangles into quads.
1. Derivatives are just as easy: you're still working on quads. Though the case where you can't find the triangles that are related to some pixels in the quad must be handled specially, that's no different than it is today.
2. Anisotropy isn't a problem: you're doing no more calculation of anisotropy degrees whether or not you're dealing with coherent quads.
3. Well, yeah, but that's just a function of the triangle setup, which is going to be under more strain anyway from lots of small triangles.
4. Rasterization is no problem at all with a tile-based approach.
In addition, you've created the following new problems:
- Average texture latency increases dramatically: Since you need to iterate over multiple texture requests/quad, you end up taking much longer to compute the results than you would otherwise.
- Since texture latency increases, you now need a much larger register file to run more computation threads while waiting for your texture results to come back.
Texture latency shouldn't increase dramatically most of the time. Most of the time here we're just talking about two triangles that are nearly identical in their texturing requests to if there had been a single triangle over the quad (skinned, high-res meshes). So, if you set up your texture cache for the situation where the texture data requested is the same as it would be if there was just one triangle within the quad, and then just be flexible on your requests, there should be little problem.
And the cases where there's more difference between the surface normals so that the texture cache has problems between the two triangles, you'll probably be dealing with some amount of anisotropy anyway, which is automatically much more cache-friendly, so there should be little problem in this case, too.
This dramatic increase in inefficency in the rest of the pipeline makes for this option to be rather unattractive. You've better off just putting down 2x the quads, which will nearly double your throughput, than to make the quad pipes 2x more complex which might get you ~50% better utilization in some cases, and nothing in most.
Oh, I don't buy at all that a noncoherent architcture would need to be so much bigger than a coherent one at the same number of pipelines.
Hellbinder
25-Jul-2005, 20:33
lots of you have a habbit of underestemating ATi and how good they are at screeing their information before product launch.
"habbit" of "underestemating" at "screeing" ??? Now who is really the one who needs further screening? :lol:
spalling err.. typing was never my strong suit..
err.. not that i actually have a strong suit :lol:
1. You're still working with quads, just not requiring them to be from the same triangle, so you'd still only have one (x,y) per quad.
Ah, so we're only looking at a more specific algorithm than ad-hoc merging of rasterized quads: Only two triangles are supported, and they must now share an edge. That's a somewhat simpler problem to deal with.
Well, this is only going to be a necessity as an optimization for z-buffer compression, and even then that's a pretty tiny price to pay compared to the register file.
Only if you render just Z ;) You also need to compute those for "colors" and "texture coordinates", so you can actually interpolate them.
Assuming these are all fp32s, and that you have 12 interpolated vec4 attributes, you need 3 * 12 * 4 * 4 = 576 bytes of storage per triangle. If you run with 4 vec4 registers at full speed, you need 256 bytes/quad of register file space.
For 2 triangles, you need ~4 quads in flight to have a ~50:50 split of storage between attributes and registers. Let's say "tiny" means 5% of the total storage, you'll need 256 * N = 0.95 / 0.05 * 576 * 2 => ~85 quads in flight, for two triangles.
Now, if you already have multiple triangles in your batches, this is less of an issue: you probably just need to deal with a few more triangles instead of twice the number of triangles to keep the pipe fed. "A few" should be quantized, but I don't have any hard data to work with.
1. Derivatives are just as easy: you're still working on quads. Though the case where you can't find the triangles that are related to some pixels in the quad must be handled specially, that's no different than it is today.
Ok, what would a circuit that dealt with that look like? You need to: figure out which triangle a pixel is in, split up the derivative computation requests by triangle ID and compute the corresponding quad mask. Then, you need to iterate over the derivative computation for each unique triangle in your quad. Either you loop-back, or you replicate the hardware.
Worst case, you insert a bubble inyour pipeline (loopback for derivative) for every quad that flows through it. That's no better than just having independent quads. Best case if you replicated the hardware. That's 2 fp32 subtracter more that are needed per quad-pipe. You can always recycle the normal fp adders though, at the cost of additional scheduling and data routing complexity.
2. Anisotropy isn't a problem: you're doing no more calculation of anisotropy degrees whether or not you're dealing with coherent quads.
Exactly. Thus, there is no advantage here in merging quads. You need to do the same computations, but now you need to deal with multiple triangles in the same request. Much like the derivative case, you either run the computation at half speed, or you replicate the hardware. For high aniso ratios, you can hide that computation, but you can't do that for magnification or low aniso ratios.
3. Well, yeah, but that's just a function of the triangle setup, which is going to be under more strain anyway from lots of small triangles.
Ah, but by doing one triangle at a time, you generaly can take advantage of the coherency of your requests. If you have multiple triangles with multiple different screen-space derivatives (which is the usual case when you are at a triangle edge), you now potentially need a larger cache and a more complex scheduling algorithm to deal with them. Worst case is a texture cache that's 2x as large at the same speed (so perhaps ~2.2x the area). Best case is no quad merging, so your texture cache stays the same size. Any other ratio of quad merging will require you to make your texture cache larger and/or run slower.
4. Rasterization is no problem at all with a tile-based approach.
Can you give an example?
So, if you set up your texture cache for the situation where the texture data requested is the same as it would be if there was just one triangle within the quad, and then just be flexible on your requests, there should be little problem.
You will still need larger texture caches - your quad now has a larger texture footprint than it used to by virtue of sharing different triangles, which will very likely have different screen-space derivatives.
And the cases where there's more difference between the surface normals so that the texture cache has problems between the two triangles, you'll probably be dealing with some amount of anisotropy anyway, which is automatically much more cache-friendly, so there should be little problem in this case, too.
Sure, but the data isn't shared, so you would need 2x the cache anyway.
Oh, I don't buy at all that a noncoherent architcture would need to be so much bigger than a coherent one at the same number of pipelines.
Non-coherency sucks for memory accesses. Either live with the much lower speed, or beef up your caches, or use significantly more complex scheduling algorithm. Either of the last two options makes your chip a lot larger.
Just to clarify things, I'm not saying sharing quads between triangles is an absolutely bad idea. I'm saying that currently, the costs outweight the benefits. When / if it can be done cheaply, you can bet IHVs will jump on it.
After all, if you get 10% more speed for 5% more area, then that's great! But if you only end up getting a questionable 15% more speed for 60% more area, then it's probably not a good thing.
Hellbinder
25-Jul-2005, 21:08
apparently bob knows sum schtuff... :o
Chalnoth
25-Jul-2005, 21:09
1. You're still working with quads, just not requiring them to be from the same triangle, so you'd still only have one (x,y) per quad.
Ah, so we're only looking at a more specific algorithm than ad-hoc merging of rasterized quads: Only two triangles are supported, and they must now share an edge. That's a somewhat simpler problem to deal with.
Well, you could have more than two easily. But sharing edges would be a prerequisite for optimal rendering.
Only if you render just Z ;) You also need to compute those for "colors" and "texture coordinates", so you can actually interpolate them.
That should already be done by the time you're at the fragment stage, though. Once pixel shading begins, you'll need per-pixel storage for all of the interpolated stuff anyway.
Ok, what would a circuit that dealt with that look like? You need to: figure out which triangle a pixel is in, split up the derivative computation requests by triangle ID and compute the corresponding quad mask. Then, you need to iterate over the derivative computation for each unique triangle in your quad. Either you loop-back, or you replicate the hardware.
No. Here's my idea:
1. Start by converting N triangles to pixels, storing in an intermediate buffer.
2. Group pixels into tiles.
3. Separate tiles into quads.
...this way you avoid the above problem completely. You will want to be careful about making sure that you only place pixels that are from triangles sharing edges into the same quad, but that shouldn't be horribly-difficult.
Exactly. Thus, there is no advantage here in merging quads. You need to do the same computations, but now you need to deal with multiple triangles in the same request. Much like the derivative case, you either run the computation at half speed, or you replicate the hardware. For high aniso ratios, you can hide that computation, but you can't do that for magnification or low aniso ratios.
Well, let me just state that I'm not sure that the anisotropic degree calculation is simplified by working on quads at a time. The calculation is, after all, not linear, and therefore would be incorrect even for a single triangle occupying the quad.
Ah, but by doing one triangle at a time, you generaly can take advantage of the coherency of your requests. If you have multiple triangles with multiple different screen-space derivatives (which is the usual case when you are at a triangle edge), you now potentially need a larger cache and a more complex scheduling algorithm to deal with them. Worst case is a texture cache that's 2x as large at the same speed (so perhaps ~2.2x the area). Best case is no quad merging, so your texture cache stays the same size. Any other ratio of quad merging will require you to make your texture cache larger and/or run slower.
Except the screen-space derivatives aren't the same even for a single triangle. If they were, we'd always see the same MIP map level used for the entire triangle, which is obviously not the case.
And I'm not really sure you'd need a larger texture cache anyway. After all, you have two cases. In one case, the surfaces within the quad have nearly the same surface normal, and end up acting just as if they were one surface. Here efficiency is gained, without increasing cache size. In the other case, the normal varies too much, you end up with a texture cache miss, and the texture reads take longer: but this is no worse than if you'd rendered two quads anyway.
Just to clarify things, I'm not saying sharing quads between triangles is an absolutely bad idea. I'm saying that currently, the costs outweight the benefits. When / if it can be done cheaply, you can bet IHVs will jump on it.
After all, if you get 10% more speed for 5% more area, then that's great! But if you only end up getting a questionable 15% more speed for 60% more area, then it's probably not a good thing.
Right. And I don't think it'd require much more area.
So far, the only significant increase in area that I've counted is the need to store full pixels in the buffer earlier.
Why use register space to store interpolated attributes at all?
Why not store the barycentric coordinates of each of each sample,
and then use the vertex attributes already available to compute
interpolated values as they are needed?
Why use register space to store interpolated attributes at all?
Why not store the barycentric coordinates of each of each sample,
and then use the vertex attributes already available to compute
interpolated values as they are needed?
Maybe cause interpolation+storage space may be cheaper than a full calculation?
nAo, I think I misunderstood Chalnoth (I thought he was saying that interpolated attributes should be stored per sample to make them triangle agnostic).
Anyway, for large triangles (currently the case) storing a ref attribute vector and d/dx d/dy is clearly a win. If we are optimizing for really small triangles (i.e. pixel size), it seems like using the barycentric method would give you 2-3x storage space saving. The increased cost of attribute computation would be counteracted by not having to compute rarely reused attribute deltas at all.
Why use register space to store interpolated attributes at all?
You typically don't. However, you do need to store at least one of the following for each attribute of the triangle:
- The attribute at the 3 vertices
- A barycentric equation coefficients
- A plane equation coefficients
In any of the case, you need 3 values per scalar attribute.
This needs to be done regardless of the triangle size, assuming it covers at least one sample.
(Optimizations can be done for constant attributes across a triangle, or if the triangle covers exactly one pixel - but one-pixel triangles are going to be inefficient regardless).
Btw, your second sentence "and then use the vertex attributes already available to compute interpolated values as they are needed?" implies that the vertex attributes are stored somewhere. Surprisingly, the amount of storage for the attributes or the corresponding barycentric weights or a plane equation is the same. You don't save anything.
Once pixel shading begins, you'll need per-pixel storage for all of the interpolated stuff anyway.
As psurge mentioned, you don't need to store these per-pixel.
No. Here's my idea:
1. Start by converting N triangles to pixels, storing in an intermediate buffer.
2. Group pixels into tiles.
3. Separate tiles into quads.
...this way you avoid the above problem completely. You will want to be careful about making sure that you only place pixels that are from triangles sharing edges into the same quad, but that shouldn't be horribly-difficult.
How does that help you compute derivatives? I must be missing something.
The calculation is, after all, not linear, and therefore would be incorrect even for a single triangle occupying the quad.
Good point. I retract my ealier statement about identical screen-space derivatives on a triangle.
Typically, the aniso ratio computation is simplified to be the same for all pixels in a quad. However, you now need to deal with potentially 2 ratios in the same quad. That means more hardware to deal with them, and/or more time to work out the texture requests.
In the other case, the normal varies too much, you end up with a texture cache miss, and the texture reads take longer: but this is no worse than if you'd rendered two quads anyway.
I disagree: You will likely get 2 misses here: Once as you mentioned, and a second time for when you start working on the inside of the second triangle. Unless, of course, you grow your texture cache...
Right. And I don't think it'd require much more area.
I guess we'll have to agree to disagree...
Edit: Fixed the UBB tags.
Bob - what I was proposing is option #1 (store attributes for 3 vertices, available in post transform cache).
The storage saving would come from vertices being used for more than 1 triangle (compared to storing attribute plane equations say).
edit:
to be precise:
Say a sample (x,y,z) has homogeneous barycentric coordinates (b1, b2, b3) and w=1/z, obtained via linear interpolation in screen space.
Attribute u1 for vertex v1 = (v1_x,v1_y,v1_z), v1_w = 1/v1_z is stored
as u1 = ( u1_x*v1_w, u1_y*v1_w, ... ). The attribute value u for the
sample s is computed as u=(b1u1 + b2u2 + b3u3)*(1/w).
The division (1/w) would be performed once per sample and reused for all attribute computation.
(i'm aware that the computation is very expensive compared to the other methods, which is why i was suggesting it only if the in-flight vertex count approaches the in-flight pixel count).
Am I missing something/screwing something up?
Regards,
Serge
Chalnoth
26-Jul-2005, 02:16
No. Here's my idea:
1. Start by converting N triangles to pixels, storing in an intermediate buffer.
2. Group pixels into tiles.
3. Separate tiles into quads.
...this way you avoid the above problem completely. You will want to be careful about making sure that you only place pixels that are from triangles sharing edges into the same quad, but that shouldn't be horribly-difficult.
How does that help you compute derivatives? I must be missing something.
Well, I guess I was assuming that the derivatives used for texturing were linearly-interpolated across the triangle, and doing quad-based derivative calculation would be reserved for when one needs derivatives of other things calculated within the pixel shader. But I suppose I could well be wrong about that.
I disagree: You will likely get 2 misses here: Once as you mentioned, and a second time for when you start working on the inside of the second triangle. Unless, of course, you grow your texture cache...
Yes, I suppose you're right, so you would need added texture cache for optimal operation. But the texture cache in question has to be so ridiculously tiny for this to make any difference that I doubt this added texture cache would be significant (it'd have to be on the order of 4x4 texels stored, as a larger cache could just store data from the second texture in the places in the quad where the first texture doesn't touch).
So, it seems to me that what you need is likely not a larger texture cache, but rather a more flexible one.
Well, I guess I was assuming that the derivatives used for texturing were linearly-interpolated across the triangle, and doing quad-based derivative calculation would be reserved for when one needs derivatives of other things calculated within the pixel shader. But I suppose I could well be wrong about that.
It's easier to always have the TMU calculate the derivatives from whatever texture coordinates it gets passed than special-casing for interpolated texture coordinates.
You don't need WGF2.0 to utilize unified shaders. It is a hardware feature that the application doesn't need to be aware of.
have any proof of that? As far as i understand the card will simply be treated as if it has dedicated pixel and vertex by the program, however the card on the hardware level also needs to designate a number to control what percent of the pipes are doing work for pixel/vertex shaders. Point being the user isnt going to notice a damn bit of difference, but its going to make for a complex job in creation of the core itself. Counter productive in these times if you will.
Without WGF2.0 having unified shaders is going to be a completely worthless and cosmetic feature otherwise both companies would of embraced it long before. nVidia has fought it more then anyone saying programmable is more efficient. Hopefully you agree. If not explain.
Xenos will be the proof as Xbox 360 is based off of DX9, not DX10, WGF2.0 or whatever Microsoft chooses to call it at the moment. Even with DX10 the program doesn't need to know if the hardware is unified or not. There is no need for the hardware to designate a percentage of ALUs as being for vertex or pixel processing. The point of unifiying the hardware is to avoid these designations. See Dave's article here at Beyond3D for more details.
The reason a unified architecture hasn't been done before can probably be explained by many reasons. First and foremost it's complicated to implement. Second as Nvidia says there may be advantages to customing vertex ALUs vs. pixel ALUs.
SugarCoat
26-Jul-2005, 03:57
You don't need WGF2.0 to utilize unified shaders. It is a hardware feature that the application doesn't need to be aware of.
have any proof of that? As far as i understand the card will simply be treated as if it has dedicated pixel and vertex by the program, however the card on the hardware level also needs to designate a number to control what percent of the pipes are doing work for pixel/vertex shaders. Point being the user isnt going to notice a damn bit of difference, but its going to make for a complex job in creation of the core itself. Counter productive in these times if you will.
Without WGF2.0 having unified shaders is going to be a completely worthless and cosmetic feature otherwise both companies would of embraced it long before. nVidia has fought it more then anyone saying programmable is more efficient. Hopefully you agree. If not explain.
Xenos will be the proof as Xbox 360 is based off of DX9, not DX10, WGF2.0 or whatever Microsoft chooses to call it at the moment. Even with DX10 the program doesn't need to know if the hardware is unified or not. There is no need for the hardware to designate a percentage of ALUs as being for vertex or pixel processing. The point of unifiying the hardware is to avoid these designations. See Dave's article here at Beyond3D for more details.
The reason a unified architecture hasn't been done before can probably be explained by many reasons. First and foremost it's complicated to implement. Second as Nvidia says there may be advantages to customing vertex ALUs vs. pixel ALUs.
Actually the Xbox360 is going to be using a totally custom form of Windows Vista and a completely custom form of DirectX. You'd be making a mistake comparing DirectX 9 for computers to it. I only say this because you seem to be trying to make the connection that the Xbox360 is going to be running something very similiar to a computer. This is false. Using them in the same sentence or paragraph should not be done.
Xbox360 and especially Xenos....alot different then computer parts. And currently, no one knows how Unified Shaders will react, since the Xenos will be the first core to use them, and its not even a good reprisentation in comparison to a computer graphics processing core.
but then again, with vr-zone claiming a 3% yield and he's claiming something between 5% and 10%.. I wonder which of the two bad's I have to choose.
Neither. :roll:
Well, I guess I was assuming that the derivatives used for texturing were linearly-interpolated across the triangle, and doing quad-based derivative calculation would be reserved for when one needs derivatives of other things calculated within the pixel shader.
The problem with that method is that it doesn't work too well for dependent texture reads (including just tweaking texture coordinates in the shader).
However, if you don't have fragment programs capabilities, you can do away with dependent texture reads, and thus can compute derivatives for texturing analytically. I think 3DLabs had taken that route in the past.
But the texture cache in question has to be so ridiculously tiny for this to make any difference that I doubt this added texture cache would be significant
That depends entirely on the rasterization order and the orientation of the shared edge. Just imagine you have a screen covering quad, split into two triangles. Unless you have an unreasonably large cache (on the order of several MBs), or if you rasterize the diagonal first, the diagonal will get cache misses twice. It's not too hard to find tons of cases where you get two misses, for small caches or different rasterization orders.
As before, tiny triangles are likely limited upstream of the fragment program, so packing them up for fragment processing will not gain you much (if anything).
The division (1/w) would be performed once per sample and reused for all attribute computation.
Why do you think this is not already the case? ;) With fragment programs, the driver can freely do the division once, and then store the result for subsequent interpolations. Besides, you now need to store 4 things: attribute values at each vertex, and the 1/z computation. If you have a reciprocal unit that's otherwise already idle most of the time, might as well save the storage space (so that you can run more threads) and recompute 1/z when needed.
Chalnoth
26-Jul-2005, 09:49
Xenos will be the proof as Xbox 360 is based off of DX9, not DX10, WGF2.0 or whatever Microsoft chooses to call it at the moment.
No, the Xenos is much closer to WGF 2.0 than it is to DX9. It falls short of the proposed WGF 2.0 specs in some places (geometry), and surpasses it in others (MEMEXPORT).
Gibbo (IIRC, Sales Manager at OcUK) posted some R520 details from a recent meeting with ATI in the OcUK Forums today:
Full post here (http://forums.overclockers.co.uk/showthread.php?t=17446022&page=3&pp=66)
...R520 which will be released in Platinum 512MB (9000 on 3D Mark 2005) available end of September but severe allocation issues, the willy waving product as ATI put it, but not easily available. Plus 512MB means a £450+ price area ish, I have pleaded with ATI to release a 256MB version so I hope they listen, but due to yield rates they may stick to 512MB for ultimate high-end as such. Same as G70, 24 pipelines, approx 520MHz core and 1.4GHz memory...
Some of it doesn't seem to gel with the current concensus here, but at this point everything seems up in the air.
Cheers,
BrynS
trinibwoy
26-Jul-2005, 15:41
Sounds pretty close to what that sherman guy is saying. 24 pipes at 500Mhz.
Gibbo (IIRC, Sales Manager at OcUK) posted some R520 details from a recent meeting with ATI in the OcUK Forums today:
Full post here (http://forums.overclockers.co.uk/showthread.php?t=17446022&page=3&pp=66)
...R520 which will be released in Platinum 512MB (9000 on 3D Mark 2005) available end of September but severe allocation issues, the willy waving product as ATI put it, but not easily available. Plus 512MB means a £450+ price area ish, I have pleaded with ATI to release a 256MB version so I hope they listen, but due to yield rates they may stick to 512MB for ultimate high-end as such. Same as G70, 24 pipelines, approx 520MHz core and 1.4GHz memory...
If that should turn out being true: :shock: :lol:
Dave Baumann
26-Jul-2005, 15:49
I know which bin I would put that one in.
nutball
26-Jul-2005, 15:51
Wasn't he the "G70 is 32 pipes" chap? Or was that another one at OcUK?
Druga Runda
26-Jul-2005, 15:57
well overall not much to be expected http://forum.overnet.com/images/smiles/eusa_eh.gif
That's the first person to explicitly claim the 32/24 pipe rumor from an ATI source. Hmph.
trinibwoy
26-Jul-2005, 16:00
Wasn't he the "G70 is 32 pipes" chap? Or was that another one at OcUK?
Nah I think he was the 22K in 3dmark05 guy :roll:
Druga, your pic isn't working.
trinibwoy
26-Jul-2005, 16:02
That's the first person to explicitly claim the 32/24 pipe rumor from an ATI source. Hmph.
Well geo, you did predict that stuff would start leaking about now :)
Dave Baumann
26-Jul-2005, 16:21
Given the board vendors don't know 'ought yet...
Hellbinder
26-Jul-2005, 16:40
If they are indeed launching a product in 30 days the board vendors had better know a lot more than 'ought' by now.
Hellbinder
26-Jul-2005, 16:41
btw.. 24 "pipelines" at 520mhz seems like a complete waste of time for 90nm technology.
trinibwoy
26-Jul-2005, 16:48
Given the board vendors don't know 'ought yet...
How long does it take from knowing more than 'ought to getting products on shelves?
Sounds like this Gibbo guy wants to boost the 7800 GTX sales. :wink:
Or he actually knows something. Who to believe? A Sales manager or Wavey? :)
SugarCoat
26-Jul-2005, 17:02
btw.. 24 "pipelines" at 520mhz seems like a complete waste of time for 90nm technology.
not unless there were severe leakage and heat issues. However i still think its bogus. First person that gets the actual new name of the card series, gets credit in my book. Publicity wise nothing new is known fact now compared to 6 months ago.
Does it strike anyone else odd hes suggesting ATI contracted GDDR3 700 for all cards?
I was initially hoping for a 500mhz clock but a massively improved work per clock over the current R400 series. So i cant say i'd be shocked but...
One things for certain, every base has been covered one way or another...which sucks. So much confusion.
Gibbo was the first to "have" the 7800gtx.
He said they would be available on launch day and was taking pre-orders for them.
I think He also claimed the r520 could not touch the g70's 9000 3dmarks.. oh well.. at least it's the same guy from the g70 pre-release havok...
Or he actually knows something. Who to believe? A Sales manager or Wavey? :)
If you're still sometimes asking yourself such strange questions, I pity you :)
All of these rumors make no sense. It's just as laughable as the 300-350M transistor count rumors. It's not because NVIDIA does something one way ATI has to do it the same way, and history certainly proves that in fact, most of the time, they do it differently. Sometimes it doesn't work out for either company, but - amazingly enough - it does work most of the time.
As for the "catastrophic yields, too little too late performance" rumors - think for half a second about what the most logical source for this is. Who has direct interest in making sure their CURRENT parts sell, and who has direct interest in making ATI look like a loser for this so-called "generation", even though it has hardly begun?
They ain't stupid, and they know well enough that they can use their past failures to make "worst case" scenarios seem possible - but for the competition, this time around.
The most ironic part is that some apparently knowledgable information (NV = AMD, ATI = Intel, this generation, *architecture*-wise) would imo be suspicious from this point of view too, because this is what NVIDIA tried to do last-minute with the NV30. But I would still give credence to this theory since ATI seems to have done a fair bit of licensing & research in that direction.
On the other hand, NV would benefit from making ATI seem desesperate, even if they really aren't.
Uttar
Chalnoth
26-Jul-2005, 22:17
Come on, Uttar, you know as well as I do that there are more than enough people out their making up rumors already, without resorting to conspiracy theories about them being subversive in some way.
Sounds pretty close to what that sherman guy is saying. 24 pipes at 500Mhz.
sherman @ R3D = Gibbo @ O-UK ? :o
Sounds pretty close to what that sherman guy is saying. 24 pipes at 500Mhz.
sherman @ R3D = Gibbo @ O-UK ? :o
No .. sherman doesn't know how to speak or write english.. sherman is a frustrated nvnews guy now he knows that soon his precious gtx will no longer be the "3dmark" king ;)
Kombatant
26-Jul-2005, 23:11
People who know stuff will not come out and make such statements. They will merely hint stuff. That's how it always was, is and will be.
So start reading between the lines (even if they can be blurry at times) :)
There are no gaps between the lines, they've all munged together into a sickening morass that's gonna last another 2 months.
:cry: ARGH :cry:
Where's that "slit me throat" smiley?
Jawed
btw.. 24 "pipelines" at 520mhz seems like a complete waste of time for 90nm technology.
Not if these are "extreme" pipelines... :twisted:
:wink:
On the other hand, NV would benefit from making ATI seem desesperate, even if they really aren't.
Uttar
But of course they are, the 7800 sells like hot cakes and will do it for another month or two. That's a HUGE pile of money being lost to the competition there. And a huge loss on the image side. The worst nightmare for any company.
kemosabe
27-Jul-2005, 00:26
I know which bin I would put that one in.
Now would it hurt to qualify that statement just enough to make it of at least some value to this forum of fatigued GPU geeks? :?
Do you take issue with the technical specs, the purported availability issues, or both :?:
It's just wrong. 'Pipe' count and speed are just wrong.
He wasn't the first with G70 either. Either someone's yanking his chain, or he's yanking ours for a laugh.
ChrisRay
27-Jul-2005, 02:02
It's just wrong. 'Pipe' count and speed are just wrong.
He wasn't the first with G70 either. Either someone's yanking his chain, or he's yanking ours for a laugh.
Dont forget the most obvious. Trying to make a sale! :)
Dave Baumann
27-Jul-2005, 02:10
Well, if you were trying to make a sale you would paint it as demonstrably worse than 7800 GTX as he can actually sell them.
The division (1/w) would be performed once per sample and reused for all attribute computation.
Why do you think this is not already the case? ;) With fragment programs, the driver can freely do the division once, and then store the result for subsequent interpolations. Besides, you now need to store 4 things: attribute values at each vertex, and the 1/z computation. If you have a reciprocal unit that's otherwise already idle most of the time, might as well save the storage space (so that you can run more threads) and recompute 1/z when needed.
Bob, I was assuming it was already the case actually. I wrote it all down in case my math/understanding was faulty. You obviously know more than I on this subject - I certainly didn't intend to come off as patronizing or insulting towards you or any gfx HW designers in any way.
What I was trying to demonstrate (and have corrected if in error) was that the approach I outlined does have storage space savings versus storing attribute plane equations per triangle. So... does it, or am I still missing something?
Cheers,
Serge
kemosabe
27-Jul-2005, 02:36
Goodbye Noodle :?:
:lol:
No .. sherman doesn't know how to speak or write english.. sherman is a frustrated nvnews guy now he knows that soon his precious gtx will no longer be the "3dmark" king ;)
Point noted. Its a shame that such people dont even have stocks in either of these companies to back them as much. :lol:
Well, if you were trying to make a sale you would paint it as demonstrably worse than 7800 GTX as he can actually sell them.
He is saying that the XT PE is good but ... it will have:
1. Rare availability.
2. Insane price.
Hence, get a 7800 GTX today. ;)
jimmyjames123
27-Jul-2005, 04:12
Obviously this guy at overclockers UK doesn't have any idea what he is talking about. I'd bet he hasn't got anything right with respect to ATI's new cards.
SanGreal
27-Jul-2005, 05:52
Goodbye Noodle :?:
:lol:
His cat named Noodle died.
So is ATI smoking out the chatterboxes, or what? I can't think of a single stone unturned for R520. The rumors have covered every single base except TBDR.
Oh, no. I've said too much.
So Pete, I heard R520 will be a TBDR architecture? :twisted:
IgnorancePersonified
27-Jul-2005, 08:00
That was floated and debated in another thread :)
A little birdy told me the R520 will have an alien-hybrid design 8)
We're we're spotty teenagers (http://forums.overclockers.co.uk/showpost.php?p=5250313&postcount=113) now apparently :roll:
He also does a nice name and shame on his sources :lol:
We're we're spotty teenagers (http://forums.overclockers.co.uk/showpost.php?p=5250313&postcount=113) now apparently :roll:
He also does a nice name and shame on his sources :lol:
Yeah, I was just about to put his quote up... he actually says Dave and the rest of the crew at b3d are all zid covered nerds that have absolutely no knowledge of the subject
Remember one thing, review sites don't sell the actual products, the people who make the sales for NV and ATI on high-end product are companies like OcUK, not review sites run by spotty teenagers who think they know better.
And.. Hellbinder, I think you claimed the 12K score for the r520. .is that still alive?
And then he comes with the real reason why he starts the namecalling Were hated by review sites, magazine because we refuse to advertise with such companies and don't send out free hardware so most review sites and magaine reviewers are not exactly best buddies with OcUK.
Since when is an online retailer responsible for sending out demos?
Never spoken of quads and clock speed is not yet finalized but the top product will be in the 500MHz region
Ati sticking at 500Mhz? losing their fillrate advantage?
His post does mention the part is 16 "extreme" pipelines and 8 VS.
Taking the 9000 3dmarks into account.. would it have GOOD PS performance and EXTREME VS performance?
Kombatant
27-Jul-2005, 10:12
Hated because they don't send samples: No
Hated because of their crappy after-sales service: Yes.
(and yes, I speaketh from experience :? )
Hated because they don't send samples: No
Hated because of their crappy after-sales service: Yes.
(and yes, I speaketh from experience :? )
I thought you were from greece, why order cards in the UK?
Kombatant
27-Jul-2005, 11:05
Hated because they don't send samples: No
Hated because of their crappy after-sales service: Yes.
(and yes, I speaketh from experience :? )
I thought you were from greece, why order cards in the UK?
Because I lived in the UK two years ago (Reading, did my Master of Science there).
Subtlesnake
27-Jul-2005, 12:29
"Thats what I am taking a good guess at from the information I have been supplied from ATI"
http://forums.overclockers.co.uk/showpost.php?p=5251221&postcount=127
So he doesn't know.
"Thats what I am taking a good guess at from the information I have been supplied from ATI"
http://forums.overclockers.co.uk/showpost.php?p=5251221&postcount=127
So he doesn't know.
He "knows" what people tell him. when nV wanted their G70 marketing up, they told that they could do 22k in 3dmark.. so one little white lie in between all the "correct" info and you have a whole user base drooling.
If the R520 does score 9000 3dmarks and thus is 15% faster than the g70.. why do they call it a marginal improvement? in the worst case, performance over R420 is increased by 50% as opposed to the 30~40% of the g70..
Call me naïve, but I see the same pre-relase biases as with the nv40/r420
Mariner
27-Jul-2005, 13:21
So Pete, I heard R520 will be a TBDR architecture? :twisted:
How long before we see this rumour posted on The Inquirer? :P
Ailuros
27-Jul-2005, 13:40
So Pete, I heard R520 will be a TBDR architecture? :twisted:
How long before we see this rumour posted on The Inquirer? :P
You'd have to sit down first and explain what TBDR stands for, before seeing such a stunt :lol:
ChrisRay
27-Jul-2005, 13:48
So Pete, I heard R520 will be a TBDR architecture? :twisted:
How long before we see this rumour posted on The Inquirer? :P
You'd have to sit down first and explain what TBDR stands for, before seeing such a stunt :lol:
Nah we know the inquirer doesnt need to know what it really means. Afterall. They said HDR means High Definition Rendering afterall.
Nah we know the inquirer doesnt need to know what it really means. Afterall. They said HDR means High Definition Rendering afterall.
I can allready imagine:
We have just learned that our bosnian bridge will be a 16 Pipe TBDR (Toast Baking, Dipping and Roasting) part which should go well with your coffee cup holder. And it will score 41K on 3dmark01 and 12k in 3dmark05 in the process, all the while baking your toast. The Vole has allready expressed interest for it's new Vista-ish Windows Media Center, which should now become the Kitchen Theater PC, watching your cooking shows on a HDMI flatscreen.
...it will score 41K on 3dmark01...
yes, here's the score:
http://www.driverheaven.net/zardon/01.jpg
:D
...it will score 41K on 3dmark01...
yes, here's the score:
:D
That one was from tech-report or something, right? they claimed to have a " new videocard" back in feb/march and showed this 3dmark01 score .
http://www.driverheaven.net/showthread.php?t=68684&page=1&pp=15
yes... I'm not sure, but (if it's not a fake) it can be R520... (?)
http://img39.imageshack.us/img39/1016/vcard0aw.jpg
Note the perforated panel visible through the orange fan (typical for dual-slot X850 cooling design, which is similar to R520 cooler). If you set gamma to 2.0, you will see something very similar to R520 Cu heatsink (compare with foto from DaveBaumann). If this card is really a R520, ATi had working samples long quite time ago (Feb 18, 2005).
Mulciber
27-Jul-2005, 15:57
what a load of bull
We're we're spotty teenagers (http://forums.overclockers.co.uk/showpost.php?p=5250313&postcount=113) now apparently :roll:
He also does a nice name and shame on his sources :lol:
The last pimple I had was on my ass. The next one there will be named "Sherman" or "gibbo".
http://www.driverheaven.net/showthread.php?t=68684&page=1&pp=15
yes... I'm not sure, but (if it's not a fake) it can be R520... (?)
http://img39.imageshack.us/img39/1016/vcard0aw.jpg
Note the perforated panel visible through the orange fan (typical for dual-slot X850 cooling design, which is similar to R520 cooler). If you set gamma to 2.0, you will see something very similar to R520 Cu heatsink (compare with foto from DaveBaumann). If this card is really a R520, ATi had working samples long quite time ago (Feb 18, 2005).
Yeah, the first tapeout was like, november last year right? when reading the thread they say that the board design was killed, but at tapeout this were then specs ati had in mind:
24 "Pipelines" (24x1 not 24x1.5 but supports up to 32x1)
96 Arithmetic Logic Units (ALU)
192 Shader Operations per Cycle (UNKNOWN)
700MHz Core(actually last we heard it was in the high 600's)
256-bit 512MB 1.8GHz GDDR3 Memory (yes that's the memory)
57.6 GB/sec Bandwidth (at 1.8GHz)
300-350 Million Transistors
90nm Manufacturing
Shader Model 3.0
ATI HyperMemory (the core will support Hypermemory which probably will only be used in the lower end versions of the core).
ATI Multi Rendering Technology (AMR)
Launch: Q2 2005 (closer to June)
FP32 blending, texturing
Programmable Primitive Processor/Tesselator (not entirely true but will support Truform in hardware).
Now.. the 512MB of memory.. That's one of the things Gibbo is holding on too as well..
24 "Pipelines" (24x1 not 24x1.5 but supports up to 32x1)
96 Arithmetic Logic Units (ALU)
192 Shader Operations per Cycle (UNKNOWN)
700MHz Core(actually last we heard it was in the high 600's)
256-bit 512MB 1.8GHz GDDR3 Memory (yes that's the memory)
57.6 GB/sec Bandwidth (at 1.8GHz)
300-350 Million Transistors
90nm Manufacturing
Shader Model 3.0
ATI HyperMemory (the core will support Hypermemory which probably will only be used in the lower end versions of the core).
ATI Multi Rendering Technology (AMR)
Launch: Q2 2005 (closer to June)
FP32 blending, texturing
Programmable Primitive Processor/Tesselator (not entirely true but will support Truform in hardware).
Not that old crap again. Is it in a loop or something?
Were hated by review sites, magazine because we refuse to advertise with such companies and don't send out free hardware so most review sites and magaine reviewers are not exactly best buddies with OcUK.
Since when is an online retailer responsible for sending out demos?
Not defending him in any way, but I used to work for one of OCUKs biggest UK competitors, in marketing. I was always being hassled for products for review. We didnt mind much, cos rather than gfx card or mobo being acredited to the manufacturer (which is a given) we usually got free advertising. It worked too.
Anway, it came down to what you would send magazines/websites etc. We noticed that if we sent them small crappy things, our nice big shiny things like "The first UK Opteron review" (which i was responsible for) would not get as much fan fare as it probably desevered.
Glad I dont do that anymore.
Hanners
27-Jul-2005, 17:03
Not that old crap again. Is it in a loop or something?
10 PRINT "R520 has 32 pipelines"
20 GOTO 10
:P
Not that old crap again. Is it in a loop or something?
10 PRINT "R520 has 32 pipelines"
20 GOTO 10
:P
HAAAHAHAHAHAA!!! :lol:
BASIC's not dead!!! :twisted:
Tim Murray
27-Jul-2005, 18:54
oh basic, we hardly knew ye.
compres
27-Jul-2005, 18:55
Not that old crap again. Is it in a loop or something?
10 PRINT "R520 has 32 pipelines"
20 GOTO 10
:P
HAAAHAHAHAHAA!!! :lol:
BASIC's not dead!!! :twisted:
ewwwwwwwwwwwww! basic...
Hellbinder
27-Jul-2005, 19:03
Obviously
If its more of a 16 pixel part then its going to between 600- 700mhz Clock speed.
If its more of a 24 or 32 pixel part then its going to be between 500-600 mhz range
Both for different reasons could have "leakage issues" and the need for 3 respins..
A 16 pixel part at 500mhz is *completely* ridicoulous as it would not likely compete very well in several areas, nor would it likely require multiple respins.
Chalnoth
27-Jul-2005, 19:33
HAAAHAHAHAHAA!!! :lol:
BASIC's not dead!!! :twisted:
while(true) cout << "Die, BASIC, die!!" << endl;
Chalnoth
27-Jul-2005, 19:34
A 16 pixel part at 500mhz is *completely* ridicoulous as it would not likely compete very well in several areas, nor would it likely require multiple respins.
It could if the respins were related to problems dealing with the shrink to 90nm. The physics get rather different as you go smaller and smaller, making it more difficult for each subsequent die shrink.
That said, ATI would be lax indeed if they were actually planning to produce a 16-pipeline design as a high-end product for this generation.
A 16 pixel part at 500mhz is *completely* ridicoulous as it would not likely compete very well in several areas, nor would it likely require multiple respins.
16 pp, 2 TMUs per pipe at 500MHz gives a Fill rate of 16000 MTexel/s. (55% more 7800GTX)
trinibwoy
27-Jul-2005, 19:53
That said, ATI would be lax indeed if they were actually planning to produce a 16-pipeline design as a high-end product for this generation.
Yes, Chal, just add to the pipes confusion :s I still remember Dave saying that all the pipe rumours were bunk around the time when 32-pipes was king. He could be blowing smoke but it's disconcerting that there hasn't been a single shred of real evidence to support any of the bazillion permutations of r520 specs out there.
trinibwoy
27-Jul-2005, 19:54
A 16 pixel part at 500mhz is *completely* ridicoulous as it would not likely compete very well in several areas, nor would it likely require multiple respins.
16 pp, 2 TMUs per pipe at 500MHz gives a Fill rate of 16000 MTexel/s. (55% more)
fillrate - shchmillrate - we want ze shader powwa to ze extreeeme!!
Skrying
27-Jul-2005, 19:54
R520 rumors are to freaking confusing for me. Maybe its the fact that there's like 50 million of them and they are all so different went it comes to pipeline count and core speed.
Chalnoth
27-Jul-2005, 19:56
Yes, Chal, just add to the pipes confusion :s I still remember Dave saying that all the pipe rumours were bunk around the time when 32-pipes was king. He could be blowing smoke but it's disconcerting that there hasn't been a single shred of real evidence to support any of the bazillion permutations of r520 specs out there.
A simple transistor-count analysis with the assumption of R3xx-like pipes and ~300 million transistors with added SM3 would seem to indicate 24 pipelines.
That said, ATI would be lax indeed if they were actually planning to produce a 16-pipeline design as a high-end product for this generation.
Isn't R520 just renamed and finished R400?
R100: 2 pipelines
R200: 4 pipelines
R300: 8 pipelines
R400: 16 pipelines - not finished, delayed
R420: 16 pipelines (renamed R380 or what + added pipelines, replacement of unfinished R400)
R520: finished and renamed R400... :arrow: with added pipelines/or not???
Isn't possible, that R520 was primarily a 16p design, which was changed to 32p later (something like R420)?
Tim Murray
27-Jul-2005, 20:39
I don't think we can call anything "a finished R400"--R400 was so far from being finished that it's a meaningless description.
Chalnoth
27-Jul-2005, 20:46
Isn't R520 just renamed and finished R400?
No, because I'm reasonably certain that the R400 became what is now the Xenos, a unified pipeline architecture.
Kombatant
27-Jul-2005, 20:50
Isn't R520 just renamed and finished R400?
No, because I'm reasonably certain that the R400 became what is now the Xenos, a unified pipeline architecture.
Thas was my impression as well.
Ok... so R520 lookes like a quick design - much newer than Xenos if it's true.
Isn't R520 just renamed and finished R400?
No, because I'm reasonably certain that the R400 became what is now the Xenos, a unified pipeline architecture.
Thas was my impression as well.
Yes, R400=R600/C1
I still think (mostly just hope) that there is ample opportunity for r520 to include at least some tech/ideas from R400:Forever, albeit likely not in the same form as found in the final r500/xenos/c1
TMU pool? btw. could it be possible to do non-OG SS with this architecture?
TMU pool?
That's been in my sig for a while now.
Jawed
Hellbinder
27-Jul-2005, 22:29
A 16 pixel part at 500mhz is *completely* ridicoulous as it would not likely compete very well in several areas, nor would it likely require multiple respins.
16 pp, 2 TMUs per pipe at 500MHz gives a Fill rate of 16000 MTexel/s. (55% more 7800GTX)
Uh.. the likelyhood of there being any TMU's let alone 2 per "pipeline" on this chip is about the same as me flapping my arms *reall really fast* and flying to the moon.
Ati uses Rops and looping. (just like a cowboy)
what you are likely to see is 16 "TMU's" as you call them and tripple that for shader operations.
Hellbinder
27-Jul-2005, 22:33
Isn't R520 just renamed and finished R400?
No, because I'm reasonably certain that the R400 became what is now the Xenos, a unified pipeline architecture.
That is a scarrily true statement.
I was told once behind closed doors so to speak.. a long time ago.. that there was a day when they had a meeting at ati and the *beep* hit the fan. It had become readily apparent that the R400 was just not going to work without that complex scheduling logic they have now designed and included with the Xenos.
I am sure its not *exactly* like the R400 but it is what the R400 would have had to be to work like they wanted.
Kombatant
27-Jul-2005, 22:38
Ok... so R520 lookes like a quick design - much newer than Xenos if it's true.
Yup, they sat in the bar one night, and in between beers they said "Hey, let's make R520!" :lol:
edit: To prevent any misunderstandings, I just thought it was funny, humour me :)
Skrying
27-Jul-2005, 22:47
Sooooo, lets say the R400 worked out how they wanted, and lets say it would come out when they wanted. Was it going to be some super chip or something. And when would it have came out?
digitalwanderer
27-Jul-2005, 22:56
Sooooo, lets say the R400 worked out how they wanted, and lets say it would come out when they wanted. Was it going to be some super chip or something. And when would it have came out?
Probably a bit before the R420 did, but rumor has it the R400 was just too advanced for the time and they felt it was a bit of overkill. Either that or it was worked around the whole WGF unified architecture thingy before it got delayed a few years, but they're both just rumors so take it as such.
Megadrive1988
27-Jul-2005, 23:24
That said, ATI would be lax indeed if they were actually planning to produce a 16-pipeline design as a high-end product for this generation.
Isn't R520 just renamed and finished R400?
R100: 2 pipelines
R200: 4 pipelines
R300: 8 pipelines
R400: 16 pipelines - not finished, delayed
R420: 16 pipelines (renamed R380 or what + added pipelines, replacement of unfinished R400)
R520: finished and renamed R400... :arrow: with added pipelines/or not???
Isn't possible, that R520 was primarily a 16p design, which was changed to 32p later (something like R420)?
the R400 and R520 are totally unrelated. as are the R400 and R420, they are also unrelated.
R400 was put on hold, reworked, improved, enhanced, upgraded, etc, into what is now the Xenos for Xbox 360. this architecture is being upgraded and enhanced again into the R600 for PCs.
R420 and R520 are both dramatically faster and enhanced GPUs based on the R300 architecture.
here is how ATI's modern GPU architecture families diverge (that's not the right word but whatever)
ATI East: R100 (Radeon256), R200 (Radeon 8500)
ATI West: R300 (Radeon 9700) ==> R350 => R360 ===> R420 'Loki' ====> R520 'Fudo'.
Hollywood, an ATI West design that may or may not be based on current or upcoming PC designs. most likely an evolution of the ArtX Flipper.
ATI East: R400 (withheld from market) ====> C1 'Xenos' ===> R600
R700: probably a dramatic enhancement of the R600, much like R420 was a dramatic enhancement of R300.
R800: a new architecture from all of ATI's design centers working together
DemoCoder
27-Jul-2005, 23:38
HB, stop the pretending. The only closed door meetings you've been in is when the teacher sent you to the principal's office. You've been caught lying and outright wrong here too many times for your claims of being "in the know" to have any meaning. Fuad has a better track record than you.
Hellbinder
28-Jul-2005, 00:01
HB, stop the pretending. The only closed door meetings you've been in is when the teacher sent you to the principal's office. You've been caught lying and outright wrong here too many times for your claims of being "in the know" to have any meaning. Fuad has a better track record than you.
What was that saying again..????
Sticks and stones...
my you are taking what i said awefully personally if its upsetting you that bad maybe you should go read another forum.
i have been cought Lying here?? um ok..
I have been wrong and i have been right just like everyone else who knows someone who knows some etc etc etc..
No one is 100% right *EVER* until the NDA is lifted. there are gleanings, hints and suggestions and comments. Thats the way it is. However you would not give me a fair shake regardless. You ahve no intention of actually checking on what i have been right or wrong about. most of the time i am wrong about Nvidia and right (as right as possible given teh conditions) about Ati. i'll let you guess why thats the case.
However.. In this case I dont care wether you believe me or not. Its a fact that this event took place about 2 years ago. I was not saying I was there behind closed doors. That was just a figure of speach.
You need to relax and go get a starbucks dude.
That was floated and debated in another thread :)Oh, I know. But it hasn't made it to The Inq or the other fora I frequent, IIRC. And it ain't official until it's been L'Inq'ed. ;)
Ailuros
28-Jul-2005, 01:13
Pardon me if I get sick thinking that I'll be reading for another two months or so, that neverending "how many pipes" crap.
And before anyone says it, nobody really forces me to read anything, but this has gotten extremely tiresome lately.
Those with a bit of common sense have figured out already more or less which rumour really makes sense and which not. That said "sources" like Fuad don't have much in common with common sense in my book :shock:
Pardon me if I get sick thinking that I'll be reading for another two months or so, that neverending "how many pipes" crap.
And before anyone says it, nobody really forces me to read anything, but this has gotten extremely tiresome lately.
Those with a bit of common sense have figured out already more or less which rumour really makes sense and which not. That said "sources" like Fuad don't have much in common with common sense in my book :shock:
Even Faud is starting to cover his bases on the 16-pipes front! Which really leaves the 24/32 pipers with their Dr. Denton's waving in the wind! :lol:
digitalwanderer
28-Jul-2005, 01:38
I really don't think the amount of pipes is going to be the big story behind the R520, I just don't.
Ailuros
28-Jul-2005, 01:53
Even Faud is starting to cover his bases on the 16-pipes front! Which really leaves the 24/32 pipers with their Dr. Denton's waving in the wind! :lol:
Which doesn't surprise me at all considering that some reasonable folks have contacted him in the meantime and set the record straight. Instead of believing whatever crap gets tossed at you from all possible directions, it would be wiser to ask the source itself and in the given case ATI.
I really don't think the amount of pipes is going to be the big story behind the R520, I just don't.
I don't know how many times some tend to repeat that what's most important is what comes out at the other end. Sterile numbers and fancy calculations can be merely indications and helpful tools for speculations. However given the limitations of both the complexity (in conjunction with affordibality of a board) and memory availability, those that await in R520 a "G70-killer" are simply naive.
I really don't think the amount of pipes is going to be the big story behind the R520, I just don't.
i agree
kemosabe
28-Jul-2005, 02:43
I really don't think the amount of pipes is going to be the big story behind the R520, I just don't.
I'm getting the unsettling feeling that the big story behind R520 will be its commercial nonexistence. :(
A 16 pixel part at 500mhz is *completely* ridicoulous as it would not likely compete very well in several areas, nor would it likely require multiple respins.
16 pp, 2 TMUs per pipe at 500MHz gives a Fill rate of 16000 MTexel/s. (55% more 7800GTX)
Uh.. the likelyhood of there being any TMU's let alone 2 per "pipeline" on this chip is about the same as me flapping my arms *reall really fast* and flying to the moon.
Ati uses Rops and looping. (just like a cowboy)
what you are likely to see is 16 "TMU's" as you call them and tripple that for shader operations.
Go easy on me :oops:
Pardon me if I get sick thinking that I'll be reading for another two months or so, that neverending "how many pipes" crap.
And before anyone says it, nobody really forces me to read anything, but this has gotten extremely tiresome lately.
Those with a bit of common sense have figured out already more or less which rumour really makes sense and which not. That said "sources" like Fuad don't have much in common with common sense in my book :shock:
We just need a tidbit from Mr. Dave or someone else, its been a while. :wink:
Hellbinder
28-Jul-2005, 05:23
i was just kidding man. :)
Bottom line..
ITs either going to have the speed to Compete and have some features that make people happy or not.
Odds are its going to be faster that the current 7800 in at least some areas. The fact the R580 has been pushed up to get released by the end of this year instead of next spring should tell you that Ati expects Nvidia to counter with an Ultra part almost immidiately.
Skrying
28-Jul-2005, 05:33
Maybe the pushing up of R580 is to couter the fact that R520 may not be that great after all. Reminds me alot of what Nvidia did with Nv30. Had tons of delays and once they finally got it working had to rush straight to the next core change to couter the mistake.
I dont think that'll happy with ATi. Nothing really about R520 screams "Nv30" to me....... yet.
Ailuros
28-Jul-2005, 06:12
We just need a tidbit from Mr. Dave or someone else, its been a while. :wink:
Do I sound as stupid as to put my head into the lion's mouth? :P :shock:
Maybe the pushing up of R580 is to couter the fact that R520 may not be that great after all. Reminds me alot of what Nvidia did with Nv30. Had tons of delays and once they finally got it working had to rush straight to the next core change to couter the mistake.
Let's assume R520 and G70 are highly competitive (each with it's own advantages, which makes sense by the way); do you think R520 will still be able to compete against a 90nm Gxx variant NVIDIA might release half a year or so later than the G70? R580 isn't coming "too soon"; it's R520 that's "too late".
I dont think that'll happy with ATi. Nothing really about R520 screams "Nv30" to me....... yet.
Unless there's some sarcasm involved I seem to overlook, can you re-read your description in the former paragraph I highlighted? There are similarities, at least it looks like that up to now. Of course doesn't it have to be as bad as with NV30, but it's by far not an ideal situation for ATI right now either. It makes sense if you read into it in a relative sense.
We just need a tidbit from Mr. Dave or someone else, its been a while. :wink:
Do I sound as stupid as to put my head into the lion's mouth? :P :shock:
Maybe the pushing up of R580 is to couter the fact that R520 may not be that great after all. Reminds me alot of what Nvidia did with Nv30. Had tons of delays and once they finally got it working had to rush straight to the next core change to couter the mistake.
Let's assume R520 and G70 are highly competitive (each with it's own advantages, which makes sense by the way); do you think R520 will still be able to compete against a 90nm Gxx variant NVIDIA might release half a year or so later than the G70? R580 isn't coming "too soon"; it's R520 that's "too late".
I dont think that'll happy with ATi. Nothing really about R520 screams "Nv30" to me....... yet.
Unless there's some sarcasm involved I seem to overlook, can you re-read your description in the former paragraph I highlighted? There are similarities, at least it looks like that up to now. Of course doesn't it have to be as bad as with NV30, but it's by far not an ideal situation for ATI right now either. It makes sense if you read into it in a relative sense.
dude.. ati roxors!
Failure is not an option for them.
They've already failed once in a few ways, one more time and their ego will be crushed and they'll go back to making pos chips 8)
Kombatant
28-Jul-2005, 09:05
No one is 100% right *EVER* until the NDA is lifted. there are gleanings, hints and suggestions and comments. Thats the way it is.
You speak the truth. I just feel that you're overconfident about what you've "heard" most of the times, and it's OK when the info is correct, but when it's not...well.. it removes credibilty points :)
edit: Ran it through my Engrish to English converter :lol:
Unknown Soldier
28-Jul-2005, 10:42
We just need a tidbit from Mr. Dave or someone else, its been a while. :wink:
Dave's tidbit was that the R520 has a new Memory Manager. :D
Funny though that no one else has picked up on this. :?
But Dave is 8)
We just need a tidbit from Mr. Dave or someone else, its been a while. :wink:
Dave's tidbit was that the R520 has a new Memory Manager. :D
Funny though that no one else has picked up on this. :?
But Dave is 8)
Well.. there was this whole discussion after dave's hint and the kaleidoscope patent that THAT would be the ring-bus memory manager for the r500 or r520.
The added bandwidth by the new memory manager then fed the rumours of the number of pipelines etc.
Maybe Ati will use a cheaper memory, combined with the new controller to offer affordable 512MB cards.. since with current prices, a R520 would be $100 to $150 more expensive than the gtx.
Dave Baumann
28-Jul-2005, 11:26
I wouldn't confuse kaleidoscope with anything to do with memory.
And last time it was discussed people were on the right track. Kaleidoscope does what it says on the tin :P
I wouldn't confuse kaleidoscope with anything to do with memory.
I know.. that was discussed in other topics. Just stating that a different memory solution was discussed.
Ailuros
28-Jul-2005, 12:28
dude.. ati roxors!
Failure is not an option for them.
They've already failed once in a few ways, one more time and their ego will be crushed and they'll go back to making pos chips 8)
Failure is a large exaggeration and doesn't picture the situation even the slightest; as with all major companies though things do not always pan out as planned. In fact ATI has such a strong penetration in the markets it's addressing, that they'll be over it in no time.
However a delay is a delay (no matter what one would call it) and no despite what everyone would like to present it, two high end releases within only a couple of months doesn't suggest a sudden release-orgy, but one product being late.
Finally this isn't by far about ego's; it's about increasing sales and maximizing income ;)
ChrisRay
28-Jul-2005, 12:51
dude.. ati roxors!
Failure is not an option for them.
They've already failed once in a few ways, one more time and their ego will be crushed and they'll go back to making pos chips 8)
Failure is a large exaggeration and doesn't picture the situation even the slightest; as with all major companies though things do not always pan out as planned. In fact ATI has such a strong penetration in the markets it's addressing, that they'll be over it in no time.
However a delay is a delay (no matter what one would call it) and no despite what everyone would like to present it, two high end releases within only a couple of months doesn't suggest a sudden release-orgy, but one product being late.
Finally this isn't by far about ego's; it's about increasing sales and maximizing income ;)
How do you define failure anyway? Not meeting expected sales? Not meeting expected product performance? Not delivering a product on time? I mean such a word is an entirely open to subjective interpretation as is. Even if ATI parts comes out slower than the Nvidia part I wouldnt be so quick to label it a "failure".
We just need a tidbit from Mr. Dave or someone else, its been a while. :wink:
Dave's tidbit was that the R520 has a new Memory Manager. :D
Funny though that no one else has picked up on this. :?
But Dave is 8)
Kentron's qbm would be a nice improvement to the memory manager, gfx cards would be an ideal environment for this tech I think.
Any chance of it happening??
How do you define failure anyway? Not meeting expected sales? Not meeting expected product performance? Not delivering a product on time? I mean such a word is an entirely open to subjective interpretation as is. Even if ATI parts comes out slower than the Nvidia part I wouldnt be so quick to label it a "failure".
In this industry, it'd be "not beating the competitor", "not having a faster product", "not having more features", "not on time". If it's at least two of these, than it's not a failure but a catastrophy.
In this industry, it'd be "not beating the competitor", "not having a faster product", "not having more features", "not on time". If it's at least two of these, than it's not a failure but a catastrophy.
So if a product is late and lacking in features, but is much faster than the competition, it's a failure? That's what you're suggesting here. You're also missing the reality of the situation - sales numbers and profit determine if a product is successful, not some ticks on a specification sheet. The 5200 was a great product for NVIDIA, despite being a real heap of junk.
Your theory doesn't hold up.
trinibwoy
28-Jul-2005, 17:01
In this industry, it'd be "not beating the competitor", "not having a faster product", "not having more features", "not on time". If it's at least two of these, than it's not a failure but a catastrophy.
I think if you replace "In this industry", with "In enthusiast forums" that'd be more accurate.
Hellbinder
28-Jul-2005, 17:26
In this industry, it'd be "not beating the competitor", "not having a faster product", "not having more features", "not on time". If it's at least two of these, than it's not a failure but a catastrophy.
I think if you replace "In this industry", with "In enthusiast forums" that'd be more accurate.
No its a lot more than the enthusiest forums.
Look at Ati's Stock. Before that acouple years ago look at Nvidias stock.
These are public companys. Just like every other publically traded company you live and die by your product and how its viewed and received by the public. Which is directly affected by the news outlets, Forums, Blogs down to the guy at hardware R us who chooses wether to recomend an Ati or Nvidia card.
All this is based on Price, Performance, features and the general "vibe" of each product.
You go missing features, and have weak performacne you can kiss a WHOLE lot o money goodby. and public trust to. It takes a couple years to rebound from something like that.
Hellbinder
28-Jul-2005, 17:32
I posted that it had a new memory controller a long long time ago on this forum. new Memory controller and possibly new FSAA (though i wasnt sure if teh FSAA in question was in this generation or the next gen) do a search i posted it at least a couple months ago.
Yet another case where i wont get credit for something i posted and the democoders of the world will continue to accuse me of being a lying cheating dirty callywag.
Does anyone have any real yield rates on the R520? Awhile ago it was "really good", then it was "some problems", now it's "really bad". I thought the 90nm low-k line was supposed to yield significantly better than the 130nm low-k line. Or might the apparent problems ATI is having producing the R520 have more to do with their chip design than with TSMC yield rates at 90nm?
Hellbinder
28-Jul-2005, 17:33
Does anyone have any real yield rates on the R520? Awhile ago it was "really good", then it was "some problems", now it's "really bad". I thought the 90nm low-k line was supposed to yield significantly better than the 130nm low-k line. Or might the apparent problems ATI is having producing the R520 have more to do with their chip design than with TSMC yield rates at 90nm?
That would lead one to belive that it is more and not less than what people are expecting. as in more complex.
dude.. ati roxors!
Failure is not an option for them.
They've already failed once in a few ways, one more time and their ego will be crushed and they'll go back to making pos chips 8)
Failure is a large exaggeration and doesn't picture the situation even the slightest; as with all major companies though things do not always pan out as planned. In fact ATI has such a strong penetration in the markets it's addressing, that they'll be over it in no time.
However a delay is a delay (no matter what one would call it) and no despite what everyone would like to present it, two high end releases within only a couple of months doesn't suggest a sudden release-orgy, but one product being late.
Finally this isn't by far about ego's; it's about increasing sales and maximizing income ;)
How do you define failure anyway? Not meeting expected sales? Not meeting expected product performance? Not delivering a product on time? I mean such a word is an entirely open to subjective interpretation as is. Even if ATI parts comes out slower than the Nvidia part I wouldnt be so quick to label it a "failure".
If it's slower than the 7800gtx and doesn't have cool features to make up for it.
I'd think since the 9700 they've been pretty much the leader most of the time, it would be quite wierd for ati to end up playing second fiddle again.
trinibwoy
28-Jul-2005, 21:06
I'd think since the 9700 they've been pretty much the leader most of the time, it would be quite wierd for ati to end up playing second fiddle again.
That is a very strange statement. Nvidia was the leader for a lot longer than ATi was. And it is highly debatable whether they "lead" last generation. If they were to go back to playing second fiddle, wouldn't that be the "norm" instead of the exception that was the r300?
I'd think since the 9700 they've been pretty much the leader most of the time, it would be quite wierd for ati to end up playing second fiddle again.
That is a very strange statement. Nvidia was the leader for a lot longer than ATi was. And it is highly debatable whether they "lead" last generation. If they were to go back to playing second fiddle, wouldn't that be the "norm" instead of the exception that was the r300?
Thanks:)
The X850 parts was faster in most games.
The x800 pro was was a bit worse then the gt, depending on the game, but the x800xl was nice once it got on agp.
trinibwoy
28-Jul-2005, 21:35
Thanks:)
The X850 parts was faster in most games.
The x800 pro was was a bit worse then the gt, depending on the game, but the x800xl was nice once it got on agp.
It takes a lot more than one company's fastest card being faster than the competition's fastest to be crowned a "leader". But I'm sure you knew that already :) What about features, availability, the mid-range battle etc etc?
Thanks:)
The X850 parts was faster in most games.
The x800 pro was was a bit worse then the gt, depending on the game, but the x800xl was nice once it got on agp.
It takes a lot more than one company's fastest card being faster than the competition's fastest to be crowned a "leader". But I'm sure you knew that already :)
They arguably have better drivers:)
But I do think the X800/x700 crowd will be be missing out on some things if they intended to keep their card a while.
Skrying
28-Jul-2005, 21:42
It takes a lot more than one company's fastest card being faster than the competition's fastest to be crowned a "leader". But I'm sure you knew that already :) What about features, availability, the mid-range battle etc etc?
Then you'd have to consider what those features add, etc. I'm honestly not impressed so far with Farcry's or Splinter Cell's HDR, and all the other new features in SP havent even been noticable to me.
I really based by buying decision this generation on speed and price. When I needed a new high end card in PCIe I got the X800XL, clearly to me the best price/performance in my sector. When I needed a new card for my LAN rig which has AGP the 6800nu was clearly the best bang for the buck for me. Now I live happily with a card from both sides and can honestly say that the only things that effected what I bought were price and performance and "features" had zero to do with it. Maybe in the future they will but I dont play games in the future right now.
They arguably have better drivers:)
But I do think the X800/x700 crowd will be be missing out on some things if they intended to keep their card a while.
I find drivers equal between the two besides the crap that was the 77.72s. I own a computer repair business at my local town and those drivers probably picked my business up quicker than ever before, it would have been a good thing but having tons of people mad because "my video looks washed out". Compacted further by the fact that not everyone can follow clear instructions on fixing it.
trinibwoy
28-Jul-2005, 22:04
Yes but those are very specific scenarios and have little bearing on which company did more last generation. When I bought my GT the XL didn't even exist as yet. Not to mention all of the 6600GT owners out there.
My point is that the X850 XT PE was not enough to carry ATi last generation. Nvidia executed on a lot more fronts and as much as people say SM3.0 wasn't needed at the consumer level, it is surely a lot more indicative of leadership qualities than what ATi produced - no?
But this is getting a bit off topic - we can take it to PM if you like.
trinibwoy
28-Jul-2005, 22:08
I own a computer repair business at my local town and those drivers probably picked my business up quicker than ever before
That's hilarious :lol:
Skrying
28-Jul-2005, 22:10
Nvidia also had Dx9 support all down the line of the FX cards before ATi did. Doesnt make the FX series great, even if they did provide more features throughout the entire lineup.
Through this generation though Nvidia has held a clearer performance lead. Mainly because of the fact that the X800XT PE was no where to be found, the X800 Pro was just a bad idea, and the cards to clear up the problems (X850s now and the X800XL) came out way to late. They still dont have a clear card to counter the 6600GT, and the X800GT is going to be to little to late.
Nvidia also had Dx9 support all down the line of the FX cards before ATi did. Doesnt make the FX series great, even if they did provide more features throughout the entire lineup.
Through this generation though Nvidia has held a clearer performance lead. Mainly because of the fact that the X800XT PE was no where to be found, the X800 Pro was just a bad idea, and the cards to clear up the problems (X850s now and the X800XL) came out way to late. They still dont have a clear card to counter the 6600GT, and the X800GT is going to be to little to late.
No offence, but DX9 on the FX series was just a checkbox feature.
a 9600XT is more than a match for a 5950 ultra in DX9 games.
Dave Baumann
28-Jul-2005, 22:19
Keep the discussion on topic please.
Hellbinder
29-Jul-2005, 17:38
Nvidia also had Dx9 support all down the line of the FX cards before ATi did. Doesnt make the FX series great, even if they did provide more features throughout the entire lineup.
Through this generation though Nvidia has held a clearer performance lead. Mainly because of the fact that the X800XT PE was no where to be found, the X800 Pro was just a bad idea, and the cards to clear up the problems (X850s now and the X800XL) came out way to late. They still dont have a clear card to counter the 6600GT, and the X800GT is going to be to little to late.
Thats simply not an accurate statement.
To recap, we know:
16 "special" pp
New memory bus
High clocks (600MHz+)
SM3.0
Dual Slot
???
Skrying
29-Jul-2005, 20:28
Nvidia also had Dx9 support all down the line of the FX cards before ATi did. Doesnt make the FX series great, even if they did provide more features throughout the entire lineup.
Through this generation though Nvidia has held a clearer performance lead. Mainly because of the fact that the X800XT PE was no where to be found, the X800 Pro was just a bad idea, and the cards to clear up the problems (X850s now and the X800XL) came out way to late. They still dont have a clear card to counter the 6600GT, and the X800GT is going to be to little to late.
Thats simply not an accurate statement.
How is it not? Its fact. The FX series (5200 - 5950) all support Dx9. The 9x00 series does not, the 9000-9250 cards only support Dx8.1. Now tell me how this is false. This of course does not make the series great.
Where has Nvidia not had the overall performance lead this generation? The very top end cards (6800U and X8x0 XT PE) is the only case where this is different. The 6800GT is faster than the X800 Pro and just as fast as the now X800XL. The 6800nu didnt have any form of competition till the X800, a case of to little and way to late. The 6600GT/6600 havent had any competition at all beyond Nvidia's own 6800nu getting a price drop. Now the X800GT is coming but its going to be to little to late and is simply another way of ATi getting ride of the backed up inventory they have. The 6200 is around the same performance levles as the X300. Overall has a generation Nvidia has gotten there cards out quicker with more features, therefore having a performance and feature lead longer.
Serenity I think you have it correct for the most part, that's what I am expecting the R520 to be. It'll be interesting to see how vastly different or the same it is compared to R300. All of the rumors are so comfusing, even this site has had a very wide range of idea's on what R520 is or could be.
bdmosky
29-Jul-2005, 21:55
Guys you'd think by the fact that this thread was already locked once and Dave warned you to keep it on topic... you sure don't listen well.
To recap, we know:
16 "special" pp
New memory bus
High clocks (600MHz+)
SM3.0
Dual Slot
???
90nm low-k TSMC
HW h.264 acceleration
higher "pipeline" efficiency (up to 30%?)
250-350 mil. of tranzistors :D
Updated, I'm not sure about the transistor count. I've seen senior members assuming that the count will be under 300 million.
16 pp
New memory bus
High clocks (600MHz+)
SM3.0
90nm low-k TSMC
HW h.264 acceleration
Higher "pipeline" efficiency (up to 30%?)
250-350 mil. of transistors
Dual Slot
On-die frame compositing
FP16 blending with AA
FP32 shader precision only
???
you sure don't listen well.
What else is new? :lol:
Pathetic dudes.. It's 2005! Get over it.
???
On-die frame compositing.
"based on R300" should probably be mentioned as well. Just so people don't get carried away. ;)
+ FP16 blending with AA
FP32 shader precision only
"based on R300" should probably be mentioned as well. Just so people don't get carried away. ;)
"based on R300" mean "based on classical pipelines" (~not unified shaders) or "tweaked R300" core?
e.g. Radeon 256 was reportedly based on redesigned Rage 128 PRO core, but the final product was diametrically different. Saying "based on R300" seems like R520 is just another R300 tweak like R350 or R380/R420...
Skrying
29-Jul-2005, 23:21
I'm thinking its safe to assume a "tweaked" R300 core with a decent amount of changes.
Red PCB
Can't forget that :!:
???
"based on R300" should probably be mentioned as well. Just so people don't get carried away. ;)
I bet most people know that already. Everyone can struck gold once, and ATI had their fame with R300/R350/360/420/whatever/sameshitreally. Now it's back to playing catch up. Yawn. Release the R300 with SM3.0 already.
Ohh, i'm sooo excited. :roll:
Ohh, i'm sooo excited. :roll:
Why care to post then ? :roll:
Galduta
29-Jul-2005, 23:42
16 pp
New memory bus
High clocks (600MHz+)
SM3.0
90nm low-k TSMC
HW h.264 acceleration
Higher "pipeline" efficiency (up to 30%?)
250-350 mil. of transistors
Dual Slot
On-die frame compositing
FP16 blending with AA
FP32 shader precision only
Red and big PCB
40 k in 3dmark 2001 with p4 EE (4.3ghz+), + 4 gig of DDR2 800mhz = x 2 x850 xtpe :roll: :roll: :wink: :?:
8 vertex shaders
hybrid vertex textures
Lets keep the 3dmark etc. scores out. :|
16 pixel pipelines, 8 vertex shaders
New memory bus
High clocks (600MHz+)
SM3.0
90nm low-k TSMC
HW h.264 acceleration
Higher "pipeline" efficiency (up to 30%?)
250-350 mil. of transistors
Dual Slot
On-die frame compositing
FP16 blending with AA
FP32 shader precision only
hybrid vertex textures
???
Architectural configuration 16-1-1-1.
Possible use of an interim GDDR3/4 spec.
Kaleidoscope display tech.
Ailuros
30-Jul-2005, 01:02
250-350 mil. of transistors
Quite a big ballpark IMO even for a speculation.
FP16 blending with AA
I can only hope that's true.
Architectural configuration 16-1-1-1.
Any chance you know what each ALU can exactly spit out?
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.