hybrid vertex textures ?

nelg

Veteran
I thought this find from CJ deserved its own thread. Any insight anyone?
CJ said:
Ok, I'm not sure where to post this, but since this is the closest and most recent thread I could find about R520, I'll post it here.

Can anyone explain to me what "hybrid vertex textures" are in R520? I was browsing the ATi website and found that they were looking for someone for D3D Driver Development. The key responsibilites list these:

  • Design and develope Direct3D driver software for R3XX, R520, R600 XP and Longhorn drivers
  • Develop new code for R520 HWL and hybrid vertex textures.
  • Develop new code for the R600 HWL
  • Develop new code for the Longhorn Independent layer
  • Develop new code for the WGF driver
  • Fix EPR and work on performance issues

Source: http://sh.webhire.com/servlet/av/jd?ai=405&ji=1464555&sn=I

So what are hybrid vertex textures that are in R520? Didn't someone mention before that the vertexshaders in R520 might be more like R500? Is this it? Since they mention both R520 en R600 it seems to me as this is a hybrid between these two cores. Is it something like vertex texture fetch which allows vertex shaders to read data from textures like pixelshaders, but is this "hybrid version" more advanced?

Oh and.. please try to explain it in n00b-language. ;)
 
David Kirk said:
As to whether the architectures for vertex and pixel processors should be the same, it's a good question, and time will tell the answer. It's not clear to me that an architecture for a good, efficient, and fast vertex shader is the same as the architecture for a good and fast pixel shader. A pixel shader would need far, far more texture math performance and read bandwidth than an optimized vertex shader. So, if you used that pixel shader to do vertex shading, most of the hardware would be idle, most of the time. Which is better—a lean and mean optimized vertex shader and a lean and mean optimized pixel shader or two less-efficient hybrid shaders? There is an old saying: "Jack of all trades, master of none."

http://www.extremetech.com/article2/0,1558,1745060,00.asp

Perhaps hybrid means it can be used in vertex or pixel shaders.
 
I would assume that texture sampling is performed in the same way in both vertex and pixel shaders. As such, any texture could be used.
 
rwolf said:
David Kirk said:
As to whether the architectures for vertex and pixel processors should be the same, it's a good question, and time will tell the answer. It's not clear to me that an architecture for a good, efficient, and fast vertex shader is the same as the architecture for a good and fast pixel shader. A pixel shader would need far, far more texture math performance and read bandwidth than an optimized vertex shader. So, if you used that pixel shader to do vertex shading, most of the hardware would be idle, most of the time. Which is better—a lean and mean optimized vertex shader and a lean and mean optimized pixel shader or two less-efficient hybrid shaders? There is an old saying: "Jack of all trades, master of none."

http://www.extremetech.com/article2/0,1558,1745060,00.asp

Perhaps hybrid means it can be used in vertex or pixel shaders.

Maybe, but why is it called "hybrid vertex textures"?

Some times ago there was an info that the vertexprocessing have a load balancing between CPU and GPU. Maybe hybrid means that it can done from the CPU and GPU. If I am playing devils advocate I could even speculate that the driver splits a shader in two parts. The first part that contains the vertex texture part is done from the CPU and the second part without vertex texture operations is done form the GPU.
 
Could all the both shaders be virtual? Dynamically allocated what ever functional units required?
 
Musings of a muddled mind:

David's answer seems (surprise! :D) ever so slightly misleading in its precision. Given his previous specification of "good, efficient, and fast," I'm assuming he uses "efficient" to denote transistor count or perhaps energy use. Customers won't care about the die size or even power efficiency between a separate pixel and vertex shader and two hybrid shaders, if both GPUs are placed on cards at the same MSRP. In this theoretical example, I don't think anything prohibits the "hybrid" shader from being as "good" or "fast" as the distinct one. I'm also not sure if one hybrid shader will actually require more space or transistors than both a pixel and vertex shader. And the issue of parts of the hybrid shader lying dormant per clock can almost be a benefit from a heat/area perspective, no?

And should I bother calling attention to the irony of nVidia lamenting lean and mean? (FP16/32 vs. FP24.) Perhaps not. ;^)
 
Perhaps r520 is 16PS/8hybrid? Like four R4x0 quads and two R5x0 "extreme uber turbo quads GTi 16v". Actually there was a rumour that R420 included vertex units from the cancelled R400 project - maybe they were referring to something else that ATi had in the pipeline at the time? One day one of these crazy lines of speculations will be accurate. :LOL:

An unrelated question... what caps R3x0/R4x0's ability to loop back in its ROPs for multisample FSAA? Cache/buffer sizes? If so that seems like an obvious area where R520 might improve on the current generation (not that 6xMSAA isn't more than adequate in most people's opinions).
 
An unrelated question... what caps R3x0/R4x0's ability to loop back in its ROPs for multisample FSAA? Cache/buffer sizes? If so that seems like an obvious area where R520 might improve on the current generation (not that 6xMSAA isn't more than adequate in most people's opinions).


3DCenter:

With the 6x sparse multisampling offered by Radeon 9500+ cards, ATI set a new standard in antialiasing quality on consumer hardware. Do you feel there is a need for another increase in quality in the near future, or is the cost for even better modes too high to justify the result?

Eric Demers:

There's still not a single PC solution out there that gives AA quality like we do. Not only do we have a programmable pattern (which is currently set to sparse), we are also the only company offering gamma-corrected AA, which makes for a huge increase in quality. Honestly, we've looked at the best 8x and even "16x" sampling solutions, in the PC commercial market, and nobody comes close to our quality. But we are always looking at ways to improve things. One thing people do have to realize is that if you use a lossless algorithm such as ours (lossy ones can only degrade quality), the memory used by the back buffer can be quite large. At 1600x1200 with 6xAA, our buffer consumption is over 100 MBs of local memory space. Going to 8xAA would have blown passed 128MB. Consequently, with a lossless algorithm, the increase in subsamples must be matched with algorithm changes or with larger local storage. We've looked at things such as randomly altering the programmable pattern, but the low frequency noise introduced was worst than the improvement in the sampling position. Unless you have 32~64 subsamples, introducing random variations is not good. So we are looking at other solutions and algorithms. Current and future users will be happy with our solutions. Stay tuned.


http://www.3dcenter.de/artikel/2003/11-06_english.php

I don't think that framebuffer size is much of a problem with 256 or even 512MB onboard ram anymore. I think that bandwidth is rather the real milestone here. I'd still love to see 8x sparced MSAA though, even if it ends up making sense up to 1280. I'm sure a lot of TFT/LCD monitor users would be extremely happy about it.

On the other hand I'm still wondering why NVIDIA hasn't opted for a third loop with MSAA on NV40s:

This also signifies that NV4x is only capable of 2 FSAA Multi-Sample samples per clock cycle, and indeed David Kirk confirmed this to be the case - as it has, in fact, been since NV20. To achieve 4X Multi-Sampling FSAA a second loop must be completed through the ROP over two cycles – memory bandwidth makes it prohibitive to output more samples in a cycle anyway. Unlike ATI, only one loop is therefore allowed through the ROP which continues to restrict NV4x to a native Multi-Sample AA of 4X – modes greater than 4X still require combined Super-Sampling and Multi-Sampling.

http://www.beyond3d.com/previews/nvidia/nv40/index.php?p=11
 
Its not just the "loop" they need, to make it usful more subsample precision is required (i.e. if you only have the capability of 4 sample points, what the point of generating more than 4 samples!)
 
Per-clock, quite possibly. Are you certain about nV having a 24-pipe part in the works, BTW? Just curious.
 
DegustatoR said:
MuFu said:
Perhaps r520 is 16PS/8hybrid?
Probably more or it'll loose to NV47 which has 24PS/8VS :)
I would have to think that ATI’s modeling (IKOS?) demonstrated that the balanced nature of unified pipes would be a better route than just increasing discrete ones. It would be far easier to just further extend their current architecture, ala R420, and rely on better process tech. So, at least in theory, I would say that their research shows that it is a better way to go. Wether it is premature in another matter.
 
nelg said:
I would have to think that ATI’s modeling (IKOS?) demonstrated that the balanced nature of unified pipes would be a better route than just increasing discrete ones. It would be far easier to just further extend their current architecture, ala R420, and rely on better process tech. So, at least in theory, I would say that their research shows that it is a better way to go. Wether it is premature in another matter.
By the time a company gets to IKOS testing the hardware is usually half way done or more. Most architecture level performance modeling is probably done with a C model.
 
3dcgi said:
nelg said:
I would have to think that ATI’s modeling (IKOS?) demonstrated that the balanced nature of unified pipes would be a better route than just increasing discrete ones. It would be far easier to just further extend their current architecture, ala R420, and rely on better process tech. So, at least in theory, I would say that their research shows that it is a better way to go. Wether it is premature in another matter.
By the time a company gets to IKOS testing the hardware is usually half way done or more. Most architecture level performance modeling is probably done with a C model.
Could you use something like this to test a unified pipeline without having to emulate the entire chip?
 
Demirug said:
Maybe, but why is it called "hybrid vertex textures"?

Some times ago there was an info that the vertexprocessing have a load balancing between CPU and GPU. Maybe hybrid means that it can done from the CPU and GPU. If I am playing devils advocate I could even speculate that the driver splits a shader in two parts. The first part that contains the vertex texture part is done from the CPU and the second part without vertex texture operations is done form the GPU.
Quite frankly, this sounds most plausible to me. Hybrid implies a mixture of sorts is used to achieve the desired effect. Now, this may mean that the vertex units make use of the pixel units' texture units to do vertex textures. But it just seems more likely that it means that vertex texturing would be done by the CPU, with the CPU doing all vertex texture reads, and passing the results to the shader via vertex attributes. But whichever way you slice it, I doubt the term "hybrid" means anything good for the end-user.
 
nelg said:
Could you use something like this to test a unified pipeline without having to emulate the entire chip?
Maybe. It seems like the main advantage of the product is to speed up functional testing though.
 
sorry, I just read something about stochastic or pseudo-stochastic AA and felt I had to post.

I suppose the "stay tuned" bit is T-AA, but gah. underwhelming.
 
Back
Top