Xbox One: DirectX 11.1+ AMD GPU , PS4: DirectX 11.2+ AMD GPU what's the difference?

Davros · Oct 2, 2013

Thanks Shifty...

Shifty Geezer · Oct 2, 2013

3dilettante said:
The neat and tidy example may be just for the slide.

Definitely. When one tile gets swapped out, reorganising the whole tileset to keep it in a neat order relative to the original texture would be pointless in every way! The slide is more indicative of the thought process of the designer, selecting tiles and copying them across methodically to make it easier to keep track of which tiles (s)he's copied and which are still to do.

dobwal · Oct 2, 2013

I wonder if any of these limitations have been addressed by tier 2.

Current Limitations and Thoughts on the Future

The PRT feature we are shipping in hardware is certainly very powerful, but does not address all the wants or needs of the current SVT community. In particular, the maximum texture size has not changed - it is 16K 16K 8K texels. The limit lies in the precision of the representation of texture coordinates with enough sub-texel resolution for artifact-free linear sampling. To some degree, this may be easy to lift, but we are seeing requests from developers to go as high as 1M 1M or more in a single texture. This presents signicant architectural challenges and may or may not be feasible in the near term.

It is also easy to see that with large textures and high precision texel formats, we start to exhaust even the virtual address space of the GPU. The largest possible texture is 16K X 16K X 8K X 16 bytes per texel. This amounts to 32 terabytes of linear address space. This far exceeds the addressable space available to the GPU, irrespective or residency. Furthermore, as it is backed by the virtual memory subsystem, page table entries need to be allocated for those pages referenced by sparse textures. The approximate overhead of the page tables for a virtual allocation on current-generation hardware is 0.02% of the virtual allocation size. This does not seem like much and for traditional uses of virtual memory, it is not. However, when we consider ideas such as allocation of a single texture which consumes a terabyte of virtual address space, this overhead is 20GB - much larger than will fit into the GPU's physical memory. To address this, we need to consider approaches such as non-resident page tables and page table compression.

There are several use cases for PRT that seem reasonable but that come with subtle complexities that prevent their clean implementation. One such complexity is in the use of PRTs as renderable surfaces. Currently, we support rendering to PRTs as color surfaces. Writes to un-mapped regions of the surface are simply dropped. However, supporting PRTs as depth or stencil buffers becomes complex. For example, what is the expected behavior of performing depth or stencil testing against a non-resident portion of the depth or stencil buffer? Also, supporting rendering to MSAA surfaces is not well supported. Because of the way compression works for multisampled surfaces, it is possible for a single pixel in a color surface to be both resident and non-resident simultaneously, depending on how many edges cut that pixel. For this reason, we do not expose depth, stencil or MSAA surfaces as renderable on current generation hardware.

The operating system is another component in the virtual memory subsystem which must be considered. Under our current architecture, a single virtual allocation may be backed by multiple physical allocations. Our driver stack is responsible for virtual address space allocations whereas the operating system is responsible for the allocation of physical address space. The driver informs the operating system how much physical memory is available and the operating system creates allocations from these pools. During rendering, the operating system can ask the driver to page physical allocations in and out of the GPU memory. The driver does this using DMA and updates the page tables to keep GPU virtual addresses pointing at the right place. During rendering, the driver tells the operating system which allocations are referenced by the application at any given point in the submission stream and the operating system responds by issuing paging requests to make sure they are resident. When there is a 1-to-1 (or even a many-to-1) correspondence between virtual and physical allocations, this works well. However, when a large texture is slowly made resident over time, the list of physical allocations referenced by a single large virtual allocation can become very long. This presents some performance challenges that real-world use will likely show us in the near term and will need to be addressed.

Shifty Geezer · Oct 3, 2013

Partially resident page tables pointing to partially resident textures...

Davros · Oct 3, 2013

I have another question
"The largest possible texture is 16K X 16K X 8K "
why would you want such a thing, as in why would you texture something with 8,000 layers ?

Shifty Geezer · Oct 3, 2013

3 dimensional textures are for volumes. Have been used for SVO lighting, for example.

sebbbi · Oct 3, 2013

we are seeing requests from developers to go as high as 1M 1M or more in a single texture

1M * 1M pixel virtual address space is not that much really. Consider a 4km * 4km terrain for example. 1Mpix / 4096 meters = 256 pixels/meter. Not enough for good closeup quality.

Andrew Lauritzen · Oct 3, 2013

sebbbi said:
1M * 1M pixel virtual address space is not that much really. Consider a 4km * 4km terrain for example. 1Mpix / 4096 meters = 256 pixels/meter. Not enough for good closeup quality.

Right but that's the rub - as convenient as it is for development, it is irrational to pay the hardware cost to support these ridiculously large address spaces. It is a zillion times less expensive for applications to handle the coarser layers of the mip chain, even if it means having a little overhead in putting borders on things, etc. Obviously handling a few levels in hardware can be viable and pushes the storage overhead of the border pixels to reasonable levels, but it should be a non-goal to handle a 4km * 4km resource in a single, continuous virtual address space. There's absolutely no need for that, and paying the hardware cost in the texture sampler for that many bits is a poor use of silicon.

I know you're probably on the same page here sebbbi but I just feel the need to push a little back on what some people seem to think. You're *always* going to want a software data structure to handle really massive areas... you don't even allocate that all in virtual address space on the CPU because it's a silly use of hardware that was not designed for that purpose. A few layers (64k^2, maybe a few more bits/powers of two) is one thing, but let's keep it in perspective that there are large diminishing returns to going too far with this in hardware. That quote (where was that from by the way?) sums it up nicely.

3dilettante · Oct 3, 2013

Would the max linear address space be permitted for existing Windows versions?
Isn't it capped out at 44 bits of VA?

pMax · Oct 3, 2013

hmmm strange.
Current processors cap at 48 bits, why would they use 44 when you forcefully need to use the 48th for the kernel start-up address space in canonical form? You need to put up the page-table any way, since it cant have holes.

aah, you refer to the VA for GPU. Yet, I miss why windows would limit it to 44 bits out of 48?

3dilettante · Oct 4, 2013

It seems to be a historical remnant of earlier x86-64 designs that could only compare and swap 8 bytes. Windows used that operation for low-overhead updates to linked lists.
64-bit addressing bloated the pointer to the next entry in the list, pushing the total past 8 bytes. This meant updates couldn't be atomic.
Shaving data payload, adding alignment, and making an assumption based on whether the code was in kernel mode or not allowed them to claw their way to 44 bits.

http://www.alex-ionescu.com/?p=50

I still see the 44-bit limit being mentioned for current releases, even with wider compare and swap now available.

Andrew Lauritzen · Oct 4, 2013

3dilettante said:
Would the max linear address space be permitted for existing Windows versions?

For a texture? No, it's much smaller. There's a large cost to adding address bits to texture units, so they tend to deal with smaller "pointers".

aaaaa00 · Oct 4, 2013

3dilettante said:
I still see the 44-bit limit being mentioned for current releases, even with wider compare and swap now available.

This is fixed as of Windows 8.

Since the implementation detail of things like slists should be hidden from you, as long as you yourself don't go mucking about with the high bits it should all be transparent to your own code.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms684121(v=vs.85).aspx

Ekim · Oct 8, 2013

I'm sure it has been answered but does the 7900 series support the tier 2 stuff?

Gipsel · Oct 8, 2013

Ekim said:
I'm sure it has been answered but does the 7900 series support the tier 2 stuff?

No, just the 7790 (Bonaire) supports it from the existing cards.

pMax · Oct 8, 2013

3dilettante said:
http://www.alex-ionescu.com/?p=50

fuck -.- -.-
I should flog myself 44 times in a row for this. I should have known that.
At least, it is not the case on *nix variants, which is probably the reason why I forgot this totally...
still I should have known- thank you for the blog's link.

supergrobi · Oct 9, 2013

Gipsel said:
No, just the 7790 (Bonaire) supports it from the existing cards.

Isn´t the XBox One GPU based on Bonaire? (GCN 1.1 Tier 2)

Gipsel · Oct 9, 2013

supergrobi said:
Isn´t the XBox One GPU based on Bonaire? (GCN 1.1 Tier 2)

Both are apparently based on the same IP pool for the GPU, same with Kabini, Kaveri, and Hawaii. But the XB1 got additions for the memory system (eSRAM), additional support for a few surface formats (no idea, maybe they are supported in all GCN GPUs) and a few customizations (two additional modified DMA engines, some custom audio logic, higher coherent bandwidth and of course the two instead of the single Jaguar module), but where you could argue, that it doesn't pertain to the core GPU IP anymore, but more system integration and northbridge design.

liolio · Oct 9, 2013

The issue is that the Xbox engineers stated that both systems were based on "sea island", we the recent re-badging it is clear that code names at AMD no longer reflect differences in silicon.
Pretty much we have no intel on the matter, Sony said nothing on the matter neither MSFT.

upnorthsox · Oct 9, 2013

supergrobi said:
Isn´t the XBox One GPU based on Bonaire? (GCN 1.1 Tier 2)

What's Tier 2?

Xbox One: DirectX 11.1+ AMD GPU , PS4: DirectX 11.2+ AMD GPU what's the difference?

Davros

Shifty Geezer

uber-Troll!

dobwal

Shifty Geezer

uber-Troll!

Davros

Shifty Geezer

uber-Troll!

sebbbi

Andrew Lauritzen

Moderator

3dilettante

pMax

3dilettante

Andrew Lauritzen

Moderator

aaaaa00

Ekim

Gipsel

pMax

supergrobi

Gipsel

liolio

Aquoiboniste

upnorthsox

Similar threads