NVIDIA GF100 & Friends speculation

DeanoC · Feb 26, 2010

trinibwoy said:
Again you're equating the absence of obstacles with explicit support. Just because the hardware guys were able to workaround API shortcomings it doesn't mean the API is all honky dory.

There nothing for the API to support, its just doing an operation in a faster way, why should the API care? It doesn't block it, and it doesn't have anything to support, i.e. the API is as good as it gets... In fact I support you will find that even the hardware command FIFO doesn't even know about it and that the same Draw commands are used, because guess what?! even to the command FIFO hardware it the same operation, "Draw stuff"

trinibwoy said:
D3D has no support for raytracing acceleration structures for example. Are you saying that won't take a change to the API to introduce?

C has no explicit raytracking acceleration structures, does it need to change? OCL/CS/CUDA already support raytracing acceleration structures, its the point of 'general' computing, you don't have to specifiy every single thing.
Unless we see fixed function raytracing hardware then no we won't see any change to the API for a specific algorithm. We might see updates to the general compute engine to allow certain operations that assist raytracing (better pointer chasing for example) but nothing specific for at least this spin of the wheel.

trinibwoy said:
Once again, the absence of an explicit limitation doesn't imply explicit support. That would mean everything under the sun is supported unless they explicitly state that it's not the case

Something only need explicit support if its not accessible with the existing API. Many things are added/changed that don't require any changes to support, as such yes they are supported, only if they need extra features do you need to change the API.
An perfect example of this is the DX11 texture format that Fermi will add over the last NV chip. Are they supported by the vertex shader? Answer yes but with no explicit support, whereas a generation (or two) ago each vertex texture support required explicit support. So does that mean that D3D is out of date and need updating cos I automagically got the new texture format in the vertex engine?

trinibwoy · Feb 26, 2010

Jawed said:
A feature of existing APIs and host-side, not GPU side.

Ok, though I haven't seen it anywhere else myself.

Does Fermi do anything more than overlapping the execution of successive kernels?

My understanding based on the language and the diagrams is that 4 kernels can be running on the chip at any given time allocated across the SM's. Rys says that multiple can run on a given SM but I don't know about that.

How does that relate to OpenCL 1.0/1.1?

Don't know, thought we were talking about DX

You're confusing a performance optimisation in hardware for the capability to support an instruction in the language. The former is not a "feature" that goes beyond D3D11.

Fair enough, I'll concede that.

Rys said:
Just exposing compute shader has to be enough here. Microsoft can't afford to standardise how developers do RT inside of D3D and limit flexibility, since there's no upper bound to real-time RT R&D yet, and no really good fits to common hardware either really.

Hmmm, maybe I'm not clear on CS capabilities. Does CS give you the ability to read/write from arbitrary data structures with pointer support? I thought it was limited to indexed buffers and byte arrays?

NathansFortune · Feb 26, 2010

Silus said:
http://www.fudzilla.com/content/view/17850/1/

As has been discussed previously chopping Fermi in half should be relatively easy. It should also help Nvidia get decent yields for mainstream parts that perform well with high clocks. Still I would wait until 28nm to buy any Fermi based chip, lower power consumption is absolutely necessary for me as I don't fancy replacing my PSU...

Anyway this all assumes that Nvidia are even bothering with 40nm mainstream chips, it wouldn't surprise me if they jump straight to 28nm and hard launch in Q4 with mainstream and top end refresh in Q1.

Rys · Feb 26, 2010

trinibwoy said:
Hmmm, maybe I'm not clear on CS capabilities. Does CS give you the ability to read/write from arbitrary data structures with pointer support? I thought it was limited to indexed buffers and byte arrays?

No (device side pointers in CS), but that's a bit different than asking for DX to define the hardware-accelerated structures and algorithms, which is what you mooted first. One's a bit more low-level and generally enabling than the other.

silent_guy · Feb 26, 2010

entity279 said:
Also at-speed test overhead?

Pretty much zero if you already have scan, since the common techniques simply use the existing scan chains for this.

Jawed · Feb 26, 2010

trinibwoy said:
Btw, isn't that the same operation we were discussing a while back in the prefix sum patent?

ballot is what I think you're referring back to. These __synchthreads functions are refinements related to the behaviour of a synchronisation for an entire work group. Because a work group can be large, e.g. 512 work items, it's not possible to manipulate a mask as the contents of a simple register.

Which reminds me there's an intriguing pair of instructions in Cypress: GROUP_SEQ_BEGIN and GROUP_SEQ_END which runs "all the threads in a thread group" (I think that means "all work items in a work group") sequentially, i.e. explicitly serially. This seems to be an intra-ALU clause instruction. I guess that's one way of doing floating point atomics

Jawed

Jawed · Feb 26, 2010

Rys said:
It'll run multiple kernels in parallel in the same SM, as far as I can tell.

Where's that explicitly stated?

Jawed

3dilettante · Feb 26, 2010

Per RWT, a Fermi chip globally can support 16 kernels, 1 per SM.

Jawed · Feb 26, 2010

trinibwoy said:
Don't know, thought we were talking about DX

It's a mistake to be so restrictive since OpenCL is in the mix too.

Jawed

Dave Baumann · Feb 26, 2010

CouldntResist said:
History tells it's a reasonable assumption.

And this is supported by the inclusion of N-Patch support in R200, 3Dc and MSAA+HDR in R5xx, tessellation and Fetch-4 in R6xx, RV7xx, compute capabilities in R5xx, R6xx, RV7xx, the compute capabilities beyond CS in Evergreen (to name but a few, easy to point at things off the top of my head)?

Rys · Feb 26, 2010

Dave Baumann said:
And this is supported by the inclusion of N-Patch support in R200, 3Dc and MSAA+HDR in R5xx, tessellation and Fetch-4 in R6xx, RV7xx, compute capabilities in R5xx, R6xx, RV7xx, the compute capabilities beyond CS in Evergreen (to name but a few, easy to point at things off the top of my head)?

Compute in R3xx too! Mike will be really upset you forgot his F-Buffer

Malo · Feb 26, 2010

Dave Baumann said:
And this is supported by the inclusion of N-Patch support in R200, 3Dc and MSAA+HDR in R5xx, tessellation and Fetch-4 in R6xx, RV7xx, compute capabilities in R5xx, R6xx, RV7xx, the compute capabilities beyond CS in Evergreen (to name but a few, easy to point at things off the top of my head)?

Don't forget EyeFinity

Dave Baumann · Feb 26, 2010

Malo said:
Don't forget EyeFinity

Well, given that we are talking about DX, it is a little aside from the 3D/Compute core, but it can certainly be viewed as an extensio to DX because the stndard model suggest rendering to a single panel (unless explicitly told to do otherwise) and we extend by enabling DX/Windows to "see" mutliple panels as a single panel.

Dave Baumann · Feb 26, 2010

Rys said:
Compute in R3xx too! Mike will be really upset you forgot his F-Buffer

Bad Dave! Mike can slap my wrist later!

rpg.314 · Feb 26, 2010

http://www.inet.se/recensioner/5408811/xfx-geforce-fermin-4gb

Original source: http://www.semiaccurate.com/2010/02/26/sweden-gets-world-exclusive-geforce-fermin-card/

The numbers there are making my head spin....

666 shaders, 666bit bus, 4GB ram, 666MHz clock

CRoland · Feb 26, 2010

trinibwoy said:
Hmmm, maybe I'm not clear on CS capabilities. Does CS give you the ability to read/write from arbitrary data structures with pointer support? I thought it was limited to indexed buffers and byte arrays?

What's the difference hardware-wise? Isn't it up to a compiler to convert the former to the latter?

neliz · Feb 26, 2010

rpg.314 said:
http://www.inet.se/recensioner/5408811/xfx-geforce-fermin-4gb

Original source: http://www.semiaccurate.com/2010/02/26/sweden-gets-world-exclusive-geforce-fermin-card/

The numbers there are making my head spin....

The source is the swedish site, right?

anyway, if anyone believes this, they need to start getting ready for their frequent starfairy visits.

mhouston · Feb 26, 2010

Dave Baumann said:
Bad Dave! Mike can slap my wrist later!

;-)

tannat · Feb 26, 2010

Berfore anyone asks. Yes this is a serious store. One of the better in Sweden.

The page can't be reached through the shoping lists.

CRoland · Feb 26, 2010

tannat said:
Berfore anyone asks. Yes this is a serious store. One of the better in Sweden.

The page can't be reached through the shoping lists.

But it was a "spot the fake" competition...

NVIDIA GF100 & Friends speculation

DeanoC

Trust me, I'm a renderer person!

trinibwoy

Meh

NathansFortune

Rys

Graphics @ AMD

silent_guy

Jawed

Jawed

3dilettante

Jawed

Dave Baumann

Gamerscore Wh...

Rys

Graphics @ AMD

Malo

Yak Mechanicum

Dave Baumann

Gamerscore Wh...

Dave Baumann

Gamerscore Wh...

rpg.314

CRoland

neliz

GIGABYTE Man

mhouston

A little of this and that

tannat

CRoland

Similar threads