Fetch4 - Important?

Boltneck keep in mind that ATI is going to be slow using this specific implementation of shadowing. Take a look at 3DMark06 and see how well Nvidia uses HDR and AA. At least there is a workaround for Fetch4.
 
rwolf said:
Boltneck keep in mind that ATI is going to be slow using this specific implementation of shadowing.

Is there another implementation of shadowing on which ATi's current hardware is faster in comparison to 3dmark06's approach?
 
Dave Baumann said:
Fetch4 is primarily there as an alternative to NVIDIA's PCF. NVIDIA have had PCF in hardware for many years and numerous titles make use of it - all ATI hardware prior to RV515/RV530 haven't had the capability, meaning they have always had to do something different (whether is be dropping soft shadowing entirely, such as in the slightly quick/dirty Splinter Cell XBOX port, or pay the sampling penalties in the texture unit and generating the shadow percentage in the shader).

As to why R520 doesn't support it is beyond me - it may be the case that the texture units were only changed after the design was set (RV515 and RV530 would have started after R520) or the samplers were a little borked for this operation so it has been disabled, to be sorted on subsequent parts. However, this operation is primarily there to speed up the sampling rate of single format textures (i.e. increase it 4 fold), and its actually less likely that R520 would be bottlenecked by its texture sampling rate than other areas in comparison to a part such as RV530 (with its relaltive texture to shader ratio) ;)

Hence why I said that ATI should drop the R520 and get the R580 out as quick as possible. That and a few other things.

I'll hold out for the R580, and it better impress me else the green monster gets the nod this time around.

I am very unimpressed with the speeds of the core and the memory though. ATI are holding back.. and it's starting to feel like R520 all over again.

US
 
Last edited by a moderator:
Any other techs' like Fetch4 that we should know about?

So, i am a little gun shy now.

Are there other features or "widgets" like fetch4 that we should look at as being critical to support that is either not in ATi or not in Nvidia hardware (meaning one or the other).

Could cover any area shading, Shadows, Floating point support, or anything anyone can think of.
 
How did R520 felt like ? Really, I miss your point. For me it felt like a relief, finally a fast, feature rich competitive chip with forward looking architecture.
 
boltneck said:
I just ordered a X1800XT.

It might have been wise to wait till the X1900 reviews come out to see if that might have been more suitable. The X1900 looks on paper to be a very impressive refresh part.
 
I don't know if he was aware of that, but I was, and I'm also waiting for my X1800XT.
In my region it will show up with 2-3 months delay, with a huge price bonus, so it really won't worth for me waiting.
But of course it might be different for others.
 
dizietsma said:
It might have been wise to wait till the X1900 reviews come out to see if that might have been more suitable. The X1900 looks on paper to be a very impressive refresh part.

I bought a referb from newegg for 425$
 
Lucky guys. I have to dish out more than 600 EUR for a X1800XT. But the same is true for the 256Mb GTX, also; guess I can't complain.
That's the price I pay for not living in a country where I might be a target for a terrorist attack, and I have fresh air and non genetically enhanced food. What the heck, I am the lucky one. ;)
 
Hubert said:
Lucky guys. I have to dish out more than 600 EUR for a X1800XT. But the same is true for the 256Mb GTX, also; guess I can't complain.
That's the price I pay for not living in a country where I might be a target for a terrorist attack, and I have fresh air and non genetically enhanced food. What the heck, I am the lucky one. ;)

Dracula sucked out a bit too much this time, eh!
 
Jawed said:
Will R520's lack of Fetch4 create a frustrating eye-candy gap for those who've bought one?

I don't think so. The X1800 has a 1:1 TEX:ALU ratio, which is only slightly worse to sample 4 times vs. a 1:3 TEX:ALU card like the X1600 sampling 1 time. So the X1800 can afford spending the extra time doing separate samples, while the X1600 will see a much bigger gain with fetch4.

Jawed said:
Or is Fetch4 headed the way of 3Dc?

Not sure what that's supposed to mean. 3Dc isn't going away. Especially the new single channel format is very useful. Texture compression is getting increasingly more important as ALU power increases faster than bandwidth.

Jawed said:
Are there uses for Fetch4 beyond its role supporting an alternative to hardware-PCF?

Absolutely. One idea is for instance to store bumpmaps as a heightmap (which may even be ATI1N encoded) and compute the normals dynamically.
 
  • Like
Reactions: Geo
Humus said:
I don't think so. The X1800 has a 1:1 TEX:ALU ratio, which is only slightly worse to sample 4 times vs. a 1:3 TEX:ALU card like the X1600 sampling 1 time. So the X1800 can afford spending the extra time doing separate samples, while the X1600 will see a much bigger gain with fetch4.
Yes that makes a lot of sense. The bandwidth is the same whether fetching one sample at a time or Fetch4, isn't it?

Though I'm not sure with 3DMk06, the R32F format implies more bits per sample, but I have to admit I get confused - if 3DMk06 had been built with DF16 as the fallback - then the bandwidth cost would have been identical regardless of whether the card supported Fetch4. Is that correct?

Not sure what that's supposed to mean. 3Dc isn't going away. Especially the new single channel format is very useful. Texture compression is getting increasingly more important as ALU power increases faster than bandwidth.
It's coming up to 2 years since 3Dc appeared. Any signs of it being used in PC games?

Though I suppose XB360's support for 3Dc should mean it makes a concerted appearance.

---

Do you think PCF is going to make much impact in shadowing or will the higher quality of more advanced kernels (as seen in game tests 3 and 4 of 3DMk06) make it a blip in the evolution of shadow rendering?

Jawed
 
I think we've tounched on it before but, given the size of the monolithic Bilinear texture samplers we have, and the relative lack of bandwidth, I'm wondering if the way to go isn't to break the units down to single samplers that can be grouped to 4 dependant on the operation.
 
It's interesting that the ATI patent for L1/L2 caching shows that a given texel appears in multiple L1 caches within a quad - as opposed to the four pipes sharing a coherent L1.

Now that may be an artefact of the L1/L2 structure, more than anything else (or it may be normal even with current GPUs - my understanding of texture sampling caches is pretty limited) - but it does imply a degree of independence across the four samplers that make up a "texturing quad", looking to the future.

http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220050225558%22.PGNR.&OS=DN/20050225558&RS=DN/20050225558

Came up in this thread:

http://www.beyond3d.com/forum/showthread.php?t=25332

Where I pointed to this duplicate-texel within a quad property:

http://www.beyond3d.com/forum/showpost.php?p=619024&postcount=15

Jawed
 
Last edited by a moderator:
Dave Baumann said:
I think we've tounched on it before but, given the size of the monolithic Bilinear texture samplers we have, and the relative lack of bandwidth, I'm wondering if the way to go isn't to break the units down to single samplers that can be grouped to 4 dependant on the operation.
Why should that change bandwidth requirements? It shouldn't really matter much if you fetch a single texel or 4 (more or less adjacent) ones, due to the texture caches.
 
Humus said:
Not sure what that's supposed to mean. 3Dc isn't going away. Especially the new single channel format is very useful. Texture compression is getting increasingly more important as ALU power increases faster than bandwidth.
What does ALU power have to do with the bandwidth requirements for texture sampling?
 
If the ALU power / bandwidth goes up, you'll either have to increase the ALU portion or decrease the bandwidth requirement in order to fully utilize the hardware.
 
Back
Top