DirectX Next

Ilfirin

Regular
Not sure how this went under the radar, but the Meltdown 2003 slides are available (and apparently have been since October 17th):

One presentation that caught my eye was "Future Features" where ps and vs 3.0 were discussed along with the next directX (apparently called DirectX Next).

Here's some highlights:

-Goals
-Customers
-I/O Model
-Shader Programming Model
-Surface Models
-Topology Operations
-Frame-Buffer access in pixel shader
-Virtual Video Memory (yay!)

Goals
-Enable scenarios currently blocked
+Many desirable functions must still be performed on the CPU
-Add generality to reduce number of future scenarios that may get blocked
-Optimize interface to future OSs
+Better primitive throughput performance
+Better state change performance
+Better interoperation of multiple apps/processes

DirectX Graphics Trends
-DirectX Next: dynamic geometry/topology modification
+Move CPU/GPU transition to point of very low bandwidth
+Enable new capabilities for new GPU uses

General Programming Model
-Integer instruction set
-Support for more programming constructs
+stacks and arrays
-More flexible memory addressing
+less texture specific
-Resources are unlimited
+temporary memory store
+instruction count
+iterators, streams, etc.
-Non-real-time usage can be slow
+Performance may drop drastically below 640x480 10 Hz
-Improves hardware as an HLSL target
+All algorithms will at least compile instead of hitting instruction count limits
-Identical for all shader model 4.0 shaders



Tons more information there.. check out the slides (19-35 are about DX Next)
 
FYI, I've uploaded the slide to here for those that don't want to download all the extra slides.

What does that mean? Are they talking about software or hardware level?

Not quite sure, but here's the next 3 slides:

Blocked Usage Scenario
- Rendering stencil shadowed characters
+(the majority of polygons in the scene)
1.Process vertex data
++Transform, light, skin
2.Generate shadow volume
++Compute silhouette edges
++Extrude shadow polygons
++Render into stencil buffer
3.Render pixel lighting using stencil buffer
- Current Graphics hardware can’t do step 2
+Therefore steps 1. and 2. must be done on CPU
+Result is 10x decrease in character poly count
+Back to pre T&L levels, hw is wasted


Solution
- Solution is topology modification in hw
- Enables efficient use of hw for portions of algorithm that are parallelizable
- Yet still enables key algorithms to run with parts that are not

Blocked Usage Scenario
- Small batches are not efficient
+Underutilize GPU performance potential by ~20x
+ Needed to create rich detailed scenes
++ lots of trees, rocks, grass, i.e. clutter
- Solutions
+Vertex input model will enable mesh object instancing
+Future OS integration will speed state changes due to completely new DDI

The unlimited resources (textures, instruction length, etc) stuff is interesting too.


[edit]
Some points:
The topology stuff seems to be the PPP we've been hearing about, only taken to a whole new level. Now rendering to cube-maps can be done in 1 pass rather than 6, point sprites should finally work as they should've all along, fur generation is all in hardware, as are shadow volumes, etc. Also allows lots more HOS stuff: Catmull-Rom, Bezier, B-spline and their rational versions, along with subdivision surfaces.

Frame Buffer Access from the pixel shader is finally in (officially).

Apparently you can now arbitrarily spill vertex shader results to a vidmem vertex buffer.

Here's another slide:
General I/O Model
Any data emitted at any stage of GPU pipeline can be read back at any other stage with no host-based conversion

Data can be emitted from any stage of pipeline as cache for later passes

Enables many more uses of GPU
+Some not related to rendering
+They can be unknown at ship

Very cool stuff.
 
Question is, how much complexity does all this freedom add to the chip itself? It takes YEARS to design and validate a CPU, and this seems to be pretty much a CPU, at least on the scalar, in-order level.

Would put a pretty large strain on the GPU makers, both on the hardware and software side methinks. Probably means longer cycles still and more variations of the same architecture. In all, maybe a good thing. :) Still no DX9 titles out to speak of 1.25 years and counting after R300's intro...


*G*
 
Adding the freedom is easy, making it efficiently useable is the hard part.

On the software angle ... I say they should cut out the middlemen and code their own middleware engine and maybe a game for each new generation ;)
 
What else would they name the next shader version? 3.5?


In case it wasn't clear, this isn't a DX9.1, or SDK update or anything.. it's the next major revision (i.e. the one that's coming in 2 years or so)
 
Ahhh, this is the same presentation that they were giving at ATI's Mojo Days a while back as well.
 
Ilfirin said:
What else would they name the next shader version? 3.5?


In case it wasn't clear, this isn't a DX9.1, or SDK update or anything.. it's the next major revision (i.e. the one that's coming in 2 years or so)

I know, its just the first time i've seen it written down anywhere, thats neat :)
 
Finally, float RT on NVidia hardware.
New Features in 2003
D3DFMT_A16B16G16Rfc -conditional
New texture format with conditions
Same limitations as NONPOW2CONDITIONAL
No mip mapping, no wrapping, no tiling, no cube maps
 
MfA said:
Integer programming and a general I/O model ... that is a pleasant surprise.
Could you elaborate? The presentation referred to integer programming in a few places, and I wasn't sure why I was supposed to be impressed.
 
It means data structures, arrays, etc. When's the last time you've seen a machine whose memory addresses are floating point numbers?

char *a

a++ points to the next byte, but what does

a+= 0.1f do? :)

Float coordinates are fine for looking up lossy texture data that can safety be interpolated, but you don't want bilinear filtering when you are trying to traverse a linked list struct.
 
Binary logic ops might also be useful.

Nice to see DX head the "unlimited resources" direction.
 
Yeah, it's about time we got some virtualized texture memory.

Though I do get the impression from the presentation that DXNext might turn into one of those things where you (a software developer) might have great performance one day, make some minor change and suddenly have horrible performance the next day. Just have to wait and see I guess.
 
I think it's entirely possible that "DX Next" will be too flexible. I don't think it really is all that desirable to make a GPU into a fully-generalized processor. By doing so, you inhibit the GPU's ability to make use of massive parallelism, and you add the potential for far more pipeline stalls.
 
DX NEXT? What happened to calling it 10?

Chalnoth said:
I think it's entirely possible that "DX Next" will be too flexible. I don't think it really is all that desirable to make a GPU into a fully-generalized processor. By doing so, you inhibit the GPU's ability to make use of massive parallelism, and you add the potential for far more pipeline stalls.

From the specs I've read, it's not much more flexible than DX9.
 
Chalnoth said:
I think it's entirely possible that "DX Next" will be too flexible. I don't think it really is all that desirable to make a GPU into a fully-generalized processor. By doing so, you inhibit the GPU's ability to make use of massive parallelism, and you add the potential for far more pipeline stalls.

It should still excel at graphics applications, it's just when you start doing CPU-type work on the GPU where you'll get sub-standard performance. But I see no reason why dot products, adds, multiplys, etc (or pixel shading in general for that matter, which is where the bottleneck is going to be anyway) would be any less parallelized by the added flexibility.

But while there's tons more flexibility from the software side of things (i.e. in what you'll be able to do easily as a software developer), the hardware doesn't need to be as different as you might think from the current generation. I mean, with vertex and pixel shaders (and hopefully the topology stuff) using the same hardware it shouldn't be too much of a leap to allow spilling the results from any stage to a block of memory (just as you would a render target now). Allowing that data to be read back at any stage shouldn't be too much of a problem either, since the location and size of that data will be known at the time of loading the shader in and you already have the texturing functionality there (from the ps units).

Most of the added fundamental flexability just seems to be a result of everything (at least ps and vs) using the same hardware. As long as that central bit of hardware that everything's pulling off of is optimal, and the software isn't horrible, you should be able to get plenty of performance.

So, DX9 + merge(vs, ps) + topology processor + what sounds like a rather large API change + virtualized video memory = DX Next (or DX10 if you want). Or maybe I'm oversimplifying things.. writing posts at 4:26AM tends to have that effect :)

[edit]Tried to clear it up that I'm talking about fundamental differences here and not incremental improvements

[edit2]OT: Is there any way we could have the posting interface made slightly more high-res friendly? I have 2 monitors at 1600x1200 and only have a tiny little box to type in. Makes it difficult to spot typos and such.
 
Ilfirin said:
[edit2]OT: Is there any way we could have the posting interface made slightly more high-res friendly? I have 2 monitors at 1600x1200 and only have a tiny little box to type in. Makes it difficult to spot typos and such.
Most browsers have a text zoom feature. In Mozilla (the greatest browser on Earth!), it's under the View menu.
 
Back
Top