Explanation of nv30 screens

Complete agreement here.

There are lots of algorithms. Some run well on CPU's. Some run well on GPU's. Some run well on both, and some don't run well on either.

For games, I don't see the CPU quite becoming 'a glorified joystick processor' (as Jeff Minter described the 68000 CPU on the Atari Jaguar many years ago) in PC applications - there's plenty of AI and physics for it to do (and the GPU would be totally useless at the former). And, as Chalnoth said, to tell the GPU what to render.

In professional graphics GPU's are finally starting to look interesting to the 'serious' people (universities, defence, Hollywood etc.). And rightly so; GPU's are approaching (may already be at?) teraflop levels of performance, and a few hundred high-power GPU's might be able to seriously compete with a few thousand Linux boxes pretty soon... or alternatively, you can run a few thousand high-power GPU's and throw on some even more complex algorithms!
 
Saying that a GPU is Turing-complete by multipass is a paradox, since it's the CPU that handles the multipass. And it's IMHO not reasonable to do any feedback from per pixel operations to the CPU to control how the multipass is done.

So the only current, and soon to come GPU (including NV30) that has any chance of being Turing-complete is P10.

That said, I don't think we actually need a Turing-complete GPU. It's close enough as it is.
 
The images were on page three. They were the same as has been seen elsewhere, and they were probably removed because they were slashdotted - not because of doubts of origin.

Regards / ushac
 
Basic said:
Saying that a GPU is Turing-complete by multipass is a paradox, since it's the CPU that handles the multipass. And it's IMHO not reasonable to do any feedback from per pixel operations to the CPU to control how the multipass is done.

So the only current, and soon to come GPU (including NV30) that has any chance of being Turing-complete is P10.

That said, I don't think we actually need a Turing-complete GPU. It's close enough as it is.


Well, if you want to nit-pick, no physically realizable computer can be equivalent to a turing machine because all physically realizable computers have finite state. The laws of physics (specifically, the beckenstein bound) place an absolute limit on the number of bits that can be stored in a particular configuration in a region of space.

But for all practical purposes, PCs are turing equivalent because in practice, the resource limits are practically infinite.

Saying that the GPU isn't Turing equivalent because it needs the CPU to concatenate passes is like saying that a CPU isn't Turing equivalent because it needs an OS/MMU to manage paging resource limited memory to/from disk, the same way the GPU needs pixel shaders "paged out" between passes.

For all practical purposes, GPUs can now calculate anything that is Turing computable. The CPU is needed for rendering loop/state swap, but it's a trivial functionality for the CPU to be doing. And there is still the question as to whether the NV30 has a per-primitive/per-scene processor capable of driving the rendering loop.

DX9 is just a resource limited Turing machine, the same way a CPU is a model of a resource limited Turing machine. Once the limits are raised even larger (hundreds of thousands of instructions), it will effectively be general purpose.

Now, that doesn't equate at being efficient at executing all algorithms.
 
Today Nvidia's CEO stated that the NV30 is not ven taped out yet in a conference call..

So how the *beep* were these rendered on an Nv30??

While I never had any doubt those were indeed rendered on NV30, I still don't see where did they get this information? Was it confirmed by NVIDIA or what?

Looks to me like you and everyone else should have had some doubts.

At what point do sites actually represent the truth... I am begining to wonder. This makes Sharkies's entire article look really really follish. Anyone can paint a picture and say *this is what we want to be able to do*.. But that is NOT the message that has been portayed..now Is it.

i wonder if any of this will get retracted. I bet not.
 
"Paradox" was maybe not the right word, "not a good description" might be better. But IMO I still have a point.

I agree that it's reasonable to implicitly imply "Turing complete for reasonabely sized programs". If a program is "reasonabely sized" (= fit into memory together with variables), then it's possible to run it on a PC without external intervention.

But PS.2.0 doesn't have dynamic data flow, and most importantly in this respect, no loops. IMO the number of rounds in a loop shouldn't be considered when talking about the size of a program. So a single PS.2.0 pass shouldn't be considered as Turing complete. (I think we agree with that?)

Enter multipass.
Adding an unlimited, but not infinite number of passes to compensate for the lack of dynamic data flow is not a problem for me - "Turing complete by multipass" is still a fitting description.
But now comes the problem. Consider this program:
Code:
while(!Ready()) 
{
  DoItAgain();
}
for(i=texelval; i>0; i--)
{
  DoSomethingInteresting();
}
The CPU can't know in advance how many passes to run. It needs information for each pixel to know when to stop the loop. If the CPU needs to explicitly check each pixel to see if there's anyone that needs more loops, then I'd say that the CPU is doing a very vital part of the Turing completnes, and the GPU could (somewhat unfairly) be called just a realy advanced SIMD ALU.
It can only do sequential calculations. 'If' and constant length loops can be faked, but true dynamic program flow is done by the CPU. And that was my point.


Now I'm going to argue with myself. (See, it's possible, I'm not getting confused, I hopefully don't get anyone else confused, and it does bring forth new insights. :D)

Since last post I saw a way to give feedback in an somewhat efficient way. DX9 + multipasss + NV_OCCLUSION_QUERRY could do it. (I believe R200 supports that extension too, so there's a chance that R300 has it.) Now I know that was a mix of DX and OGL features, but let's be a little creative here. :D

The idea of NV_OCCLUSION_QUERRY is that you start it, render a few polys, and then asks how many pixels that actually got written to framebuffer. The while loop could be done with one loop per pass. The first thing to do in the PS is to check if Ready() is true, and in that case trigger a texkill, otherwise continue with DoItAgain(). When the CPU has sent all geometry for one pass, it checks if it were "completely occluded", and if it was then the CPU knows that the loop has ended for all pixels.
Then it can start with the for loop in a similar way.

Now I'm not sure that you're allowed to do PS work and NV_OCCLUSION_QUERRY at the same time, but let's hope for that. If it's not possible, then it might be possible to implement your own occlusion querry in the pixel shaders, if you just could leave values in the registers from one pixel to the next. Yeah, yeah, I know that's not allowed, but it's quite possible that that's just an API limitation, not hardware. So some realy anoying nagging might work. :D
 
A Turing-complete machine does not need to be a closed system..

Can a CPU be Turing complete, if it relies upon an outside entity (me, at a keyboard, for instance) to change its program? Of course...

A machine is Turing-complete or it is not. There are no qualifiers. It does not matter if it doesn't have loops; loops are not needed to qualify as a Turing machine. Loops are damn useful for doing some things, but they are not a requirement.


As to how you render screenshots from chips that haven't been taped out: they have emulators. They have to for algorithmic testing.
 
I started to write was probably were going to be a realy long OT post on Turing Machines, but realised people would just be bored. So the short version.

You could go by the strict definition of a Turing Machine, and realise that it's teoretically impossible to implement it.

You could remove the memory requirement, and realise that a lot of strange stuff is TMs. (Like Game of Life. Yes, you can emulate a NV30 with a large enough Game of Life map. :D ) With this definition I can't see anything that makes a R300 with multipass stand out from a GF3 with multipass, except how efficient it is. I wouldn't be suprised if you could prove that a lot of old 2D boards were Turing complete by doing bitblits.
Even though this is interesting for very teoretical stuff, this definition is pretty useless as a qualifyer of how good a computational unit is.

Then you can be a bit sloppy and use the term in a way that is more useful. Turing complete => all computers can be emulated => you can do the commonly used program flow elements (dynamic if/for/while). That wasn't sloppy, the sloppy part comes now: Say that Turing completeness means that you can do dynamic if/for/while in a reasonabely efficient way.
This is of course not true, since I just slapped in references to efficiency there. But I've seen it used like that before, and it seemed as you meant that too since DX9 surely isn't the first time for TC GPUs. Note that "reasonabely efficient" should be interpreted in a very generous way, meaning that there should at least be some kind of resemblance of if/for/while. (Conditional loads being enough for 'if', even though it can be rather ineficcient.)

Btw, A real TC machine does need to be a closed system. If you set up the initial state before start, OK. Or if you just slot in memory blocks to it in a sequence, then you could be seen as just a virtual memory system of the machine (not realy part of the logic) so I'd say OK to that too.
But if the machine only works if you remember a state (1), or do decisions (2), then the machine isn't TC. You, the rules you folow, and the machine together forms a TC machine.

*1: The GPU is running the while loop.
*2: There's a few pixels that haven't finished the while loop, so run that pass again.
 
Excellent points.

I think the meaning I was looking for is that these chips require no outside intervention in processing. All the outside entity has to do is feed programs in a defined order. It never looks at the data.

That's pretty much the same operation as a swapfile, except the initiation of the swap comes from an external host rather than an internal one. The point is, the bit that does the math can do the math, if you see what I mean. I'll agree it's a pretty marginal line to walk along!

I do dispute the statement that DX8 parts can be Turing-complete as well. At a stretch the PS1.4 parts might be... I don't know enough about the exact definitions to be sure. (It is possible, for example, that there are algorithms that are impossible to implement unless you can output more than one value from each intermediate stage).

(I think we've both drifted off topic. In fact, I've drifted so far out to sea I require the Coastguard to come get me :D )
 
Back
Top