glMegatex

nv_dv

Newcomer
7970/680 now both feature Megatexture , but isn't that as been covered by glGenTextures( )/glBindTexture( ) ?
if no, what are the limitations.

its like you stil gonna need some kind of activeTexture( ) which going to bind in the end
(at leat with NV_BINDLESS_TEX ) , the 7970 method can be implemented in software, though loadind+parsing texture can be slow on cpu.
 
Agreed, bindless textures is just a component feature of hardware megatexture support.

Will consoles be the first place that hardware mega texturing is supported.
 
Agreed, bindless textures is just a component feature of hardware megatexture support.

Will consoles be the first place that hardware mega texturing is supported.

Didn't JC say ATI had MT support in hardware except it was limited to 32k^2 textures?
 
If you support huge partially resident textures in HW, doesn't that just give you something on top of which to implement bindless textures? Whereas bindless textures flip this - they give a base on top of which to implement megatexturing.

I'm not sure how the two compare... maybe a graphics dev could clarify. Presumably, ATI's approach transparently handles filtering when a pixel's texel footprint crosses texture pages (though I think the max resolution is already smaller than what the PC version of Rage uses). So for a pure mega-texturing use-case, it seems like on ATI you should be able to do it with a single texture lookup, whereas on NVidia the shader code is likely to be more complex. Is that right?

And if so, are there use cases where having a texture correspond to a sub-region of a much larger texture is also problematic (e.g. you do not want to filter across texture pages)?
 
You can use Bindless Textures (BT) as a tool to implement megatexturing/PRT, but PRT does not let you build BT.

Rather than having one massive physical page image, each physical tile could theoretically be an entirely different texture. The page table could contain the BT pointer rather than texcoord scale/bias factors.

The key advantage I can see is that your different physical pages can be different dimensions, use different texture formats (allowing you to pick a texture compression algorithm to suit the local content), and have per-physical-page mipmaps rather than needing to implement trilinear manually. Varying texture formats may make page boundaries more visible. It would be interesting to experiment with.

Given the already well discussed limitations with AMD's PRT*, I'd imagine that BT is going to have much longer legs over the next few years.

* by that, I mean that AMD PRT seems to be to be syntactic sugar: It doesn't enable anything fundamentally new, nor with dramatically improved performance to my knowledge. That being said, syntactic sugar can sure be nice sometimes.
 
Last edited by a moderator:
PRT depends upon BT.
Now I have to say that I'm not following. I just don't see how BT is required to implement PRT. In PRT, the hardware texture units are programmed as before, but a page fault results in an error code returned to the shader rather than a fatal fault. In BT, the very idea of a fixed array of HW texture units is removed. They are really entirely different features.

Can you elaborate on what you're thinking?

Which are?
- Max virtual texture size remains at 16K * 16K, which isn't enough for some existing SVT content (Rage goes up to 128K from what I recall).
- The AMD PRT page size is a fixed 64KB, which can result in significant over-fetching of texture data compared to the smaller virtual page size used by rage.
 
Now I have to say that I'm not following. I just don't see how BT is required to implement PRT. In PRT, the hardware texture units are programmed as before, but a page fault results in an error code returned to the shader rather than a fatal fault. In BT, the very idea of a fixed array of HW texture units is removed. They are really entirely different features.

Can you elaborate on what you're thinking?
You can't build generic support for PRTs without the hardware being able to access more than a "limited" number of textures bound to a shader (limit of 128 in the API). You need more than one PRT to build a world and the stuff that goes in it.

- Max virtual texture size remains at 16K * 16K, which isn't enough for some existing SVT content (Rage goes up to 128K from what I recall).
That's a non-argument. The engine needs to be rewritten anyway and creation of the shipped textures is in developer control.

- The AMD PRT page size is a fixed 64KB, which can result in significant over-fetching of texture data compared to the smaller virtual page size used by rage.
So you pack a few mip levels into a page. That's 8MB of 64KB pages holding the lowest mips for 128 distinct PRTs that are permanently in video memory.

If the pages are smaller than the population of page table entries increases, so you're now spending more memory on page table instead of textures.

So, another non-argument.
 
You can't build generic support for PRTs without the hardware being able to access more than a "limited" number of textures bound to a shader (limit of 128 in the API). You need more than one PRT to build a world and the stuff that goes in it.

Hardware PRT and the number of texture bindings are orthogonal. A device can have a limited number of texture binding slots, even while each texture is partially resident. A BT implementation can allow a single draw call to access any number of different textures, while each individual texture must be entirely resident. They really are entirely different features, and you can build a device with either one without the other.

Now Dave's comment about AMD's designs also supporting BT is really interesting. I haven't seen that disclosed in AMD's marketing materials. Has there been any information published about that?


On the usefulness of PRTs, I don't think you're saying much that's different than what I'm saying. It just isn't a revelatory feature, either good or bad. You claim that the deficiencies in AMD's hardware solution are not really big issues, and I agree with that. However I'm also arguing that the advantages of AMD's hardware solution are also not huge. It's really an ease of use thing for those who haven't implemented a software solution yet, and performance might be a bit better or bit worse depending on how the balance between over-fetching and more efficient filtering play out, but I don't expect AMD's hardware PRT to really rock the boat.
 
Hardware PRT and the number of texture bindings are orthogonal. A device can have a limited number of texture binding slots, even while each texture is partially resident. A BT implementation can allow a single draw call to access any number of different textures, while each individual texture must be entirely resident. They really are entirely different features, and you can build a device with either one without the other.
It's possible, but it's not generic. 128 bindings doesn't get you very far with PRTs.

Now you can use texture arrays and that's a solution to the 128 bindings problem. But that also applies to BT. But I don't hear you saying that BT is "not interesting".

Now Dave's comment about AMD's designs also supporting BT is really interesting. I haven't seen that disclosed in AMD's marketing materials. Has there been any information published about that?
Dave said they didn't bother because PRT is more interesting to consumers.

On the usefulness of PRTs, I don't think you're saying much that's different than what I'm saying. It just isn't a revelatory feature, either good or bad. You claim that the deficiencies in AMD's hardware solution are not really big issues, and I agree with that. However I'm also arguing that the advantages of AMD's hardware solution are also not huge. It's really an ease of use thing for those who haven't implemented a software solution yet, and performance might be a bit better or bit worse depending on how the balance between over-fetching and more efficient filtering play out, but I don't expect AMD's hardware PRT to really rock the boat.
Over-fetching isn't an issue.

Basically your arguments are rubbish but you think hardware support is a waste of time anyway.

I expect the next consoles to do this, so a few years after their release we'll know, I guess.
 
That's nice and on the right track for my current golden standard:
a universal streaming system that loads data on need from Disk.

It's likely that the engine would know better what to load, when and how, so a system that just handles the video memory as a collection of pages and automatically selects them using A.R.C. (or another cache replacement algorithm) would be quite nice.

I want to be able to ask the API for x amount of (virtually continuous) memory, and KNOW that the hardware will pick up the n best pages (using a cache replacement algorithm) and update its virtual memory page table accordingly.
I'll also need to be told what's been evicted so I can manage loading it back later.

When do I get that Dave ?



[The cache replacement algorithm could be provided from the engine side.
Also, that would work better if data memory layouts were standard across GPU, otherwise it should be possible to type the data sent so the API can convert it on the fly. (I don't like that though :p)]
 
The key advantage I can see is that your different physical pages can be different dimensions, use different texture formats (allowing you to pick a texture compression algorithm to suit the local content), and have per-physical-page mipmaps rather than needing to implement trilinear manually. Varying texture formats may make page boundaries more visible. It would be interesting to experiment with.

yep no tex_res/format limit you can mixup whatever you want (idx/gscale/rgb/rgba...) :cool:
however it was tested in software and the code dont use glsl.

haven't encountered any texure bound limit yet (then again software)
but you have to create your hown glActiveTexture( _resd, _tex_n ) type of function (if you dont use glsl )
 
That's nice and on the right track for my current golden standard:
a universal streaming system that loads data on need from Disk.

It's likely that the engine would know better what to load, when and how, so a system that just handles the video memory as a collection of pages and automatically selects them using A.R.C. (or another cache replacement algorithm) would be quite nice.

I want to be able to ask the API for x amount of (virtually continuous) memory, and KNOW that the hardware will pick up the n best pages (using a cache replacement algorithm) and update its virtual memory page table accordingly.
I'll also need to be told what's been evicted so I can manage loading it back later.

When do I get that Dave ?



[The cache replacement algorithm could be provided from the engine side.
Also, that would work better if data memory layouts were standard across GPU, otherwise it should be possible to type the data sent so the API can convert it on the fly. (I don't like that though :p)]
If BT's so nice, (and it is), then why stop at bindless textures? Why not push for hw support for other GPU resources like vertex, index, constant, uniform buffers... too.

The hw required should not be much more than what's already there.
 
If BT's so nice, (and it is), then why stop at bindless textures? Why not push for hw support for other GPU resources like vertex, index, constant, uniform buffers... too.

The hw required should not be much more than what's already there.

I said "universal", not just "texture", that's the whole point of the system I describe, you handle *every* resource the same way by using virtual memory with a cache replacement algorithm.
 
Back
Top