Radeon 9700 and conditional assignment in PS?

First, I was you trying to say that for x86 that wasn't that big a problem, not arguing about shaders and HLSLs.

Chalnoth said:
The main reason is simply this: video hardware is changing at a breakneck rate. If we get bogged down in a standardized instruction set, then that instruction set will hold progress back, just as has happened with the x86 architecture. While it is true that you can, for example, squeeze a little bit mroe performance out of x86 by going straight to the assembly, the truth is that our processors would be running one heck of a lot faster if the HLL's had been standardized instead of the processor instruction set.

You mean Java, C or Visual Basic CPUs? And then you say that RISC machines are better than CISCs... Are you stating that the processors should translate C (or any other High Level Language) at fly? How many time do you think it takes a program to compile? BTW the HLLs are already standardized, and they can be used with any available instruction set. The problem is that when you have an stablished base of software (in binary format, not code) for a plataform you want to continue using it. It is not an engineering problem, but an economic one.

Independence of the ISA has a cost, either storing the source code and recompiling or providing a layer of translation between ISAs (al Transmeta).

Chalnoth said:
One other example: What would you rather have in three years: a 1GHz GPU running on an equivalent of the x86 instruction set, or a 1GHz GPU running on an equivalent of a RISC instruction set? Which would be faster? Obvoiusly the more advanced one would be.

I'm really hoping that DX10 takes a "hands off" approach to assembly programming, and goes all HLSL. I also hope that 3DLabs' proposal to standardize the HLSL, not the assembly, goes through for OpenGL 2.0.

I can't arque about graphics (because i know a little about this topic) but I see it as a different problem. CPUs are designed for general purpose and to be flexible, the ISA is also part of this flexibility (even if it becomes a burden because of the compatiblity issue). I'm sure that going to HLSL for graphics APIs is good as they are high level abstractions of the hardware (as a programming language for a CPU), and as shaders remain small (yet) they can be compiled from the HLSL to the specific hardware ISA at run-time. But in the CPU world compiling Word each time you want to execute it and wanting it to run as fast as statically natively compiled is just crazy (although MS would be happy to provide more arguments to Intel/AMD/whoever for faster CPUs with they .NET VM approach ;). I have been studing the problem of binary translation for quite a time and I think there are heavy reasons because approaches like Transmeta (doesn't work properly (other than using a stupid VLIW ISA to translate in fly a CISC ISA).

Chalnoth said:
Yes, it is a problem. Particularly for a GPU, having to decode would require far more precious transistors. On a GPU, those transistors could be put to use much more effectively than the same transistors in a CPU. And, just as you stated before, a compiler can't be quite as optimal as programming right to the assembly. Don't you think that the internal translator in those CPUs reduces performance?

That could be a problem, however i don't know how many of the P4 or Athlon transistors are used for the process of translation, but in CPUs the transistor count uses to more to the caches than to any other part of the chip. In fact in the next years what we would see is that there will be 'too much' transistors. With a billion of transistors you start to have problems to put them in use (other than larger caches or embedded memory) with current architecture models (delay penalties between units, lack of exploitable ILP).
 
RoOoBo said:
You mean Java, C or Visual Basic CPUs? And then you say that RISC machines are better than CISCs... Are you stating that the processors should translate C (or any other High Level Language) at fly? How many time do you think it takes a program to compile? BTW the HLLs are already standardized, and they can be used with any available instruction set. The problem is that when you have an stablished base of software (in binary format, not code) for a plataform you want to continue using it. It is not an engineering problem, but an economic one.

Obviously programs take far too long to compile to do it "on the fly" for essentially any of today's software. However, optimal compiling would still be done on the machine used, similar to how it's done in Linux, at the time the program is installed. This is usually not done today.

Anyway, what I was attempting to say is that since today most programs optimize for CISC, which is then translated to RISC, if the backwards compatibility of the instruction set were not required by software, we would have compilers go straight to RISC, which should obviously be just plain faster.

This is optimal for GPUs because the compile time for shader programs is infinitely smaller compared to the compile time for CPU programs, and runtime compiling is most certainly an option. Given the breakneck advancement of GPUs, it should be obvious that runtime compiling is the only option that will allow optimal usage of future GPUs. The other major benefit is ease of programming. It is most definitely worth it to sacrifice some performance in lieu of easier programming, for the simple reason that easier programming can lead to more time for the programmers, which can lead to better-optimized code (i.e. it won't actually cause a drop in performance in most games....).

That could be a problem, however i don't know how many of the P4 or Athlon transistors are used for the process of translation, but in CPUs the transistor count uses to more to the caches than to any other part of the chip. In fact in the next years what we would see is that there will be 'too much' transistors. With a billion of transistors you start to have problems to put them in use (other than larger caches or embedded memory) with current architecture models (delay penalties between units, lack of exploitable ILP).

Which is where instruction sets like VLIW would be nice to have. In the meantime, I think what we'll see is multiple CPUs on one die, as that can act sort of like VLIW (parallism controlled by the software/compiler).
 
Yes, a shader codebase for an app is so small that compilation time is trivial. I could almost run a super-optimizer at runtime (a super-optimizer takes a short function and uses a generate and test approach to generate all possible instruction sequences for that function and filter for the smallest/fastest)

Even aggressive optimizations that would bring GNU C/Visual C++ to a halt on million line codebases become very feasible on shader codebases that will most likely be less than 1,000 lines of code.
 
The possibilities expand even further if compiling is done during installation, and recompiled upon user request. This may be necessary for absolutely optimal compilation of larger shader programs (and they're sure to get larger...).
 
Back
Top