NVIDIA GF100 & Friends speculation

Again you're equating the absence of obstacles with explicit support. Just because the hardware guys were able to workaround API shortcomings it doesn't mean the API is all honky dory.
There nothing for the API to support, its just doing an operation in a faster way, why should the API care? It doesn't block it, and it doesn't have anything to support, i.e. the API is as good as it gets... In fact I support you will find that even the hardware command FIFO doesn't even know about it and that the same Draw commands are used, because guess what?! even to the command FIFO hardware it the same operation, "Draw stuff"


D3D has no support for raytracing acceleration structures for example. Are you saying that won't take a change to the API to introduce?
C has no explicit raytracking acceleration structures, does it need to change? OCL/CS/CUDA already support raytracing acceleration structures, its the point of 'general' computing, you don't have to specifiy every single thing.
Unless we see fixed function raytracing hardware then no we won't see any change to the API for a specific algorithm. We might see updates to the general compute engine to allow certain operations that assist raytracing (better pointer chasing for example) but nothing specific for at least this spin of the wheel.


Once again, the absence of an explicit limitation doesn't imply explicit support. That would mean everything under the sun is supported unless they explicitly state that it's not the case :)
Something only need explicit support if its not accessible with the existing API. Many things are added/changed that don't require any changes to support, as such yes they are supported, only if they need extra features do you need to change the API.
An perfect example of this is the DX11 texture format that Fermi will add over the last NV chip. Are they supported by the vertex shader? Answer yes but with no explicit support, whereas a generation (or two) ago each vertex texture support required explicit support. So does that mean that D3D is out of date and need updating cos I automagically got the new texture format in the vertex engine?
 
A feature of existing APIs and host-side, not GPU side.

Ok, though I haven't seen it anywhere else myself.

Does Fermi do anything more than overlapping the execution of successive kernels?

My understanding based on the language and the diagrams is that 4 kernels can be running on the chip at any given time allocated across the SM's. Rys says that multiple can run on a given SM but I don't know about that.

How does that relate to OpenCL 1.0/1.1?

Don't know, thought we were talking about DX :)

You're confusing a performance optimisation in hardware for the capability to support an instruction in the language. The former is not a "feature" that goes beyond D3D11.

Fair enough, I'll concede that.

Just exposing compute shader has to be enough here. Microsoft can't afford to standardise how developers do RT inside of D3D and limit flexibility, since there's no upper bound to real-time RT R&D yet, and no really good fits to common hardware either really.

Hmmm, maybe I'm not clear on CS capabilities. Does CS give you the ability to read/write from arbitrary data structures with pointer support? I thought it was limited to indexed buffers and byte arrays?
 

As has been discussed previously chopping Fermi in half should be relatively easy. It should also help Nvidia get decent yields for mainstream parts that perform well with high clocks. Still I would wait until 28nm to buy any Fermi based chip, lower power consumption is absolutely necessary for me as I don't fancy replacing my PSU...

Anyway this all assumes that Nvidia are even bothering with 40nm mainstream chips, it wouldn't surprise me if they jump straight to 28nm and hard launch in Q4 with mainstream and top end refresh in Q1.
 
Hmmm, maybe I'm not clear on CS capabilities. Does CS give you the ability to read/write from arbitrary data structures with pointer support? I thought it was limited to indexed buffers and byte arrays?
No (device side pointers in CS), but that's a bit different than asking for DX to define the hardware-accelerated structures and algorithms, which is what you mooted first. One's a bit more low-level and generally enabling than the other.
 
Btw, isn't that the same operation we were discussing a while back in the prefix sum patent?
ballot is what I think you're referring back to. These __synchthreads functions are refinements related to the behaviour of a synchronisation for an entire work group. Because a work group can be large, e.g. 512 work items, it's not possible to manipulate a mask as the contents of a simple register.

Which reminds me there's an intriguing pair of instructions in Cypress: GROUP_SEQ_BEGIN and GROUP_SEQ_END which runs "all the threads in a thread group" (I think that means "all work items in a work group") sequentially, i.e. explicitly serially. This seems to be an intra-ALU clause instruction. I guess that's one way of doing floating point atomics :oops:

Jawed
 
History tells it's a reasonable assumption.
And this is supported by the inclusion of N-Patch support in R200, 3Dc and MSAA+HDR in R5xx, tessellation and Fetch-4 in R6xx, RV7xx, compute capabilities in R5xx, R6xx, RV7xx, the compute capabilities beyond CS in Evergreen (to name but a few, easy to point at things off the top of my head)?
 
And this is supported by the inclusion of N-Patch support in R200, 3Dc and MSAA+HDR in R5xx, tessellation and Fetch-4 in R6xx, RV7xx, compute capabilities in R5xx, R6xx, RV7xx, the compute capabilities beyond CS in Evergreen (to name but a few, easy to point at things off the top of my head)?
Compute in R3xx too! Mike will be really upset you forgot his F-Buffer :LOL:
 
And this is supported by the inclusion of N-Patch support in R200, 3Dc and MSAA+HDR in R5xx, tessellation and Fetch-4 in R6xx, RV7xx, compute capabilities in R5xx, R6xx, RV7xx, the compute capabilities beyond CS in Evergreen (to name but a few, easy to point at things off the top of my head)?

Don't forget EyeFinity ;)
 
Don't forget EyeFinity ;)
Well, given that we are talking about DX, it is a little aside from the 3D/Compute core, but it can certainly be viewed as an extensio to DX because the stndard model suggest rendering to a single panel (unless explicitly told to do otherwise) and we extend by enabling DX/Windows to "see" mutliple panels as a single panel.
 
Hmmm, maybe I'm not clear on CS capabilities. Does CS give you the ability to read/write from arbitrary data structures with pointer support? I thought it was limited to indexed buffers and byte arrays?
What's the difference hardware-wise? Isn't it up to a compiler to convert the former to the latter?
 
Berfore anyone asks. Yes this is a serious store. One of the better in Sweden.

The page can't be reached through the shoping lists.
 
Back
Top