Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
Definitely. When one tile gets swapped out, reorganising the whole tileset to keep it in a neat order relative to the original texture would be pointless in every way! The slide is more indicative of the thought process of the designer, selecting tiles and copying them across methodically to make it easier to keep track of which tiles (s)he's copied and which are still to do.The neat and tidy example may be just for the slide.
Current Limitations and Thoughts on the Future
The PRT feature we are shipping in hardware is certainly very powerful, but does not address all the wants or needs of the current SVT community. In particular, the maximum texture size has not changed - it is 16K 16K 8K texels. The limit lies in the precision of the representation of texture coordinates with enough sub-texel resolution for artifact-free linear sampling. To some degree, this may be easy to lift, but we are seeing requests from developers to go as high as 1M 1M or more in a single texture. This presents signicant architectural challenges and may or may not be feasible in the near term.
It is also easy to see that with large textures and high precision texel formats, we start to exhaust even the virtual address space of the GPU. The largest possible texture is 16K X 16K X 8K X 16 bytes per texel. This amounts to 32 terabytes of linear address space. This far exceeds the addressable space available to the GPU, irrespective or residency. Furthermore, as it is backed by the virtual memory subsystem, page table entries need to be allocated for those pages referenced by sparse textures. The approximate overhead of the page tables for a virtual allocation on current-generation hardware is 0.02% of the virtual allocation size. This does not seem like much and for traditional uses of virtual memory, it is not. However, when we consider ideas such as allocation of a single texture which consumes a terabyte of virtual address space, this overhead is 20GB - much larger than will fit into the GPU's physical memory. To address this, we need to consider approaches such as non-resident page tables and page table compression.
There are several use cases for PRT that seem reasonable but that come with subtle complexities that prevent their clean implementation. One such complexity is in the use of PRTs as renderable surfaces. Currently, we support rendering to PRTs as color surfaces. Writes to un-mapped regions of the surface are simply dropped. However, supporting PRTs as depth or stencil buffers becomes complex. For example, what is the expected behavior of performing depth or stencil testing against a non-resident portion of the depth or stencil buffer? Also, supporting rendering to MSAA surfaces is not well supported. Because of the way compression works for multisampled surfaces, it is possible for a single pixel in a color surface to be both resident and non-resident simultaneously, depending on how many edges cut that pixel. For this reason, we do not expose depth, stencil or MSAA surfaces as renderable on current generation hardware.
The operating system is another component in the virtual memory subsystem which must be considered. Under our current architecture, a single virtual allocation may be backed by multiple physical allocations. Our driver stack is responsible for virtual address space allocations whereas the operating system is responsible for the allocation of physical address space. The driver informs the operating system how much physical memory is available and the operating system creates allocations from these pools. During rendering, the operating system can ask the driver to page physical allocations in and out of the GPU memory. The driver does this using DMA and updates the page tables to keep GPU virtual addresses pointing at the right place. During rendering, the driver tells the operating system which allocations are referenced by the application at any given point in the submission stream and the operating system responds by issuing paging requests to make sure they are resident. When there is a 1-to-1 (or even a many-to-1) correspondence between virtual and physical allocations, this works well. However, when a large texture is slowly made resident over time, the list of physical allocations referenced by a single large virtual allocation can become very long. This presents some performance challenges that real-world use will likely show us in the near term and will need to be addressed.
1M * 1M pixel virtual address space is not that much really. Consider a 4km * 4km terrain for example. 1Mpix / 4096 meters = 256 pixels/meter. Not enough for good closeup quality.we are seeing requests from developers to go as high as 1M 1M or more in a single texture
Right but that's the rub - as convenient as it is for development, it is irrational to pay the hardware cost to support these ridiculously large address spaces. It is a zillion times less expensive for applications to handle the coarser layers of the mip chain, even if it means having a little overhead in putting borders on things, etc. Obviously handling a few levels in hardware can be viable and pushes the storage overhead of the border pixels to reasonable levels, but it should be a non-goal to handle a 4km * 4km resource in a single, continuous virtual address space. There's absolutely no need for that, and paying the hardware cost in the texture sampler for that many bits is a poor use of silicon.1M * 1M pixel virtual address space is not that much really. Consider a 4km * 4km terrain for example. 1Mpix / 4096 meters = 256 pixels/meter. Not enough for good closeup quality.
For a texture? No, it's much smaller. There's a large cost to adding address bits to texture units, so they tend to deal with smaller "pointers".Would the max linear address space be permitted for existing Windows versions?
I still see the 44-bit limit being mentioned for current releases, even with wider compare and swap now available.
No, just the 7790 (Bonaire) supports it from the existing cards.I'm sure it has been answered but does the 7900 series support the tier 2 stuff?
No, just the 7790 (Bonaire) supports it from the existing cards.
Both are apparently based on the same IP pool for the GPU, same with Kabini, Kaveri, and Hawaii. But the XB1 got additions for the memory system (eSRAM), additional support for a few surface formats (no idea, maybe they are supported in all GCN GPUs) and a few customizations (two additional modified DMA engines, some custom audio logic, higher coherent bandwidth and of course the two instead of the single Jaguar module), but where you could argue, that it doesn't pertain to the core GPU IP anymore, but more system integration and northbridge design.Isn´t the XBox One GPU based on Bonaire? (GCN 1.1 Tier 2)
Isn´t the XBox One GPU based on Bonaire? (GCN 1.1 Tier 2)