If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,877
|
Hello everyone,
If there's one thing many have been decieved by, it's the NV30 way too low speed for long PS programs. But what are the differences compared to the NV20 & R300, which could cause such problems? Compared to the R300, there's an architecture which can execute multiple instructions at the same time. But at the same time, the GFFX seem to be able to work on more pixels at once ( according to David Kirk at Extremetch, the NV30 actually works on 32 pixels at the same time! ) - so it might balance out. What other difference is there? Well, according to some documents, the NV30 stores its 1024 PS Instructions in local memory. But what does "storing the PS instructions in local memory" *really* mean? If you read the instruction for each pixel, you'd need more bandwidth for that operation than for every other operation in the GPU. Probably even more than all of them united. So that's obviously ridiculous! My question to you, thus, is how do you think the NV30 truly works on that. I've got an explanation, but it's just an idea and it's likely to be wrong. I'd welcome any feedback. Here's my explanation: In past architectures, the PS programs were sent via AGP each time. In the NV30, all PS programs the hardware will use are stored in local memory, in order to reduce stall time in the case you switch Pixel Shaders frequently. This can also be very useful in cases the game is rendered front-to-back, without caring about PS switching. Now, that's a very conservative assumption. Such a system would take little memory bandwidth, and very little memory. BTW, I've been wondering two things lately: 1. In current architectures, is AGP used to transmit the Pixel Shader each time it changes? 2. Do each pixel processing unit have its own list of all instructions used in the program? Or is it one global instruction pool for all pixel processors? Thanks for reading, Uttar |
|
|
|
|
|
#2 |
|
Member
Join Date: Mar 2002
Posts: 141
|
1. Sure, Pixel Shaders are TINY compared to things like textures and geometry.
2. There's probably multiple copies of the program. As I said, they're tiny, it'd be no big deal. "Storing in local memory" means just that. The programs are stored on the card somewhere, just like textures and geometry would be. It's probably in some very fast bit of local memory. |
|
|
|
|
|
#3 |
|
Senior Member
Join Date: Jan 2002
Location: Abbots Langley
Posts: 732
|
Most likely its some kind of cache design. They can store a limit number of instruction on chip so that "general" programs run very fast. Very large programs will be slow to execute so it doesn't matter that much that you need to go and read instruction blocks from local memory.
Issue is because of latency (e.g. texture fetches) you already need to work on quite a few pixels in parallell (a texture fetch will take 10s, 100s of clocks to return from the cache and filter logic) and since you apply the same instruction to all this pixels you have quite a bit of time to grab that instruction from external memory. Problem is if a pixel shader is applied to a low number of pixels (small triangles) in that case your latency hiding can get problems and your program loading can get problems. Now pixel shader instructions are not small, the instruction size is a lot bigger than you think due to all the flexibility and the huge number of constants and registers. Also IIRC NV30 stores constants directly in the instruction, so 4 component float numbers are stored as part of the instruction. This means very large instructions (lots of bytes) and also causes problems if a program change constant between sets of polygons (need to change the whole program using the cpu, instead of just sending a single updated constant value). All IMHO... |
|
|
|
|
|
#4 |
|
Member
Join Date: May 2002
Location: Slovenia
Posts: 420
|
GeForce FX stores pixel shaders in video memory and on the other hand vertex shaders are still stored on chip as a "bunch of states". This means that FX reads some instructions from video memory and stores them in cache and then runs them. You need to know however that all pixel pipes always execute same instruction (since there is no branching yet).
I think that 1024 instructions on GeForce FX are more artificial limit then anything else (Quadro FX can do 2048 instructions)... |
|
|
|
|
|
#5 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,877
|
Hmm, interesting.
So you people think that the GFFX probably got a significantly smaller cache ( how much? 10 or 25 instructions maybe? ) and it would load the Pixel Shader program in multiple blocks? Interesting theory. But could that mean that the GFFX might be *bandwidth* limited with high number of instructions, particularly when running integer/FP16? Uttar |
|
|
|
|
|
#6 |
|
Member
Join Date: Feb 2002
Location: Germany / Thuringia
Posts: 241
|
Here are some info's from the ATi radeon sdk:
Code:
6. Optimizing Shaders Modern graphics chips offer enormous vertex and pixel processing power; nevertheless there are times when even that power is not enough. When running long and complex shaders it is possible to exhaust all that power and make shader processing a bottleneck. Another “opportunity” to limit performance hides in inefficient shader management. This section will deal with both of these obstacles. Behind the scenes shader processing The vertex and pixel shaders in DirectX® 9 are defined as streams of tokens, each token representing an op-code of assembly instruction or macro. This is how they are passed by the application to the shader creation functions of the API. This is also how the driver receives them. None of the macros are expanded by the runtime, and it is up to the driver how to deal with them. If hardware natively supports a macro, it will be executed as is, otherwise it will be expanded into a series of simpler instructions. A common misconception is that hardware shader implementation exactly matches the shader assembly or op-codes as defined by DirectX®. The direct mapping of the shader code to the hardware might not result in the most efficient shader implementation, and hardware uses many tricks to provide the best performance possible. You should think of the DirectX® shaders as p-code (pseudo code) programs that are passed to the back-end compiler implemented in the driver. The driver compiles the shaders to the hardware native instructions and runs compiled shader through the optimizer. The optimizer knows many intricate details about hardware implementation and is able to allocate registers and schedule instructions in the most efficient way. The following sections will explain some of the hardware implementation details and what can be done to help driver optimizing your shaders. 6.1. Shader Management Shader switching is one of the most expensive state changes. Batching rendering by vertex shader is always a good idea. When switches between shaders are inevitable, try limiting frequent switches only to recently used smaller shaders as driver and hardware can more effectively cache them. Switching between fixed function and programmable pipeline is in most cases more expensive that switching between equivalent shaders because of the extra driver overhead. The shader compilation and optimization in the driver is quite a complex and expensive process and it will become even more expensive as shaders grow in size and shader models become more complex. Because of that it is a bad idea to compile too many shaders on the fly. Try pre-creating as many shaders upfront as possible. 14 6.2. Shader Constant Management Updating high volumes of shader constants can add considerable amount of overhead to the drivers. Following strategies can help reducing driver overhead associated with constant updates. When there are a lot of scalar constant updates, pack these scalar values into vectors. This should reduce number of scalar constant updates by the factor of four. When picking locations for the frequently updated constants do not scatter them across the whole constant store. This will allow constant updates to happen in continuous ranges, which should reduce runtime and driver overhead. Consider fragmenting the constant store into 4 or 8 constant chunks and updating these chunks atomically. That is if you have to update every other constant in some constant range it is better to update the whole range at once than updating each changed constant individually. 6.3. Optimizing Vertex Shaders When it comes to optimizing vertex shaders only a few optimizations apply. The reason for that is the driver shader optimizer that does a pretty good job of optimizing shaders. One subtle vertex shader optimization is to output from the shader only what you need. For instance the shader can export duplicated texture coordinates only to fetch two different textures with the same coordinates. Many developers still do that; however 1.4 and especially 2.0 pixel shaders allow decoupling of texture coordinate sets from texture samplers. Just export unique texture coordinate values from the vertex shaders and use pixel shaders to do proper texture coordinate mapping. Also, when outputting texture coordinates use write masks to indicate how many texture coordinate components have to be interpolated and passed down to pixel shaders. Fixed function vs. programmable pipelines RADEON 8500/9000 chips have implemented both fixed function and programmable vertex processing in the silicon. Using fixed function with these chips can be slightly more efficient than using vertex shaders because of the optimized hardware implementation of the TnL pipeline. Using fixed function TnL also simplifies shader management and reduces the associated application and driver overhead. Having said that, shaders can be a better solution if used to pack vertex data or take some “shortcuts” in the vertex computations. There is no golden rule as the ultimate solution depends on shader usage and can only be found through extensive experimentation. RADEON 9500/9700 on the other hand has only a programmable pipeline implemented the hardware, and fixed function TnL is emulated with the vertex shaders. This means that for DirectX® 9 class hardware there is no advantage in using fixed function functionality. Using flow control available in 2.0 vertex shaders 15 solves the problem with shader management and allows application to toggle lights, texture transform and other parameters as easily as with a fixed function pipeline. Use of flow control in VS 2.0 As flexible and as powerful the 1.0-1.1 vertex shaders are, they can also be a great nuisance. Rarely only a single shader is used – some objects require per-vertex lighting with one spot and one directional light, while others need tangent space setup and texture coordinate generation, and so on. By the time you consider all possible permutations that might be required, the number of shaders becomes astronomical. This is where 2.0 vertex shaders become handy. With addition of static flow control shader model has gained a robust mechanism for shader management. Instead of swapping a huge number of very specific shaders it is much better to write just a couple of universal shaders with flow control and replace expensive shader switches with lightweight boolean constant updates. On RADEON 9500/9700 flow control instructions are essentially free, however some performance degradation might still occur due to somewhat limited scope of performance optimizations. Co-issue in vertex shaders Radeon 9500/9700 has very interesting vertex processor unit design. Each of the vertex processors has two math engines, one vector and one scalar, that can process vector and scalar instruction on the same clock. The idea is somewhat similar to pixel shader co-issue, however there are implementation differences. Vector vertex processing e operates on full 4D vectors, as opposed to 3D vectors in pixel shaders, and scalar vertex processing engine is more independent from the vector engine. ngine When the vertex shader optimizer schedules instructions it will try to pair vector and scalar operations for optimal execution. There are a few limitations that might prevent optimizer from co-issuing instructions. To increase the chances of instruction pairing, do not output to the destination registers from scalar instructions and always use write masks to write out only a single channel from scalar instructions such as POW, EXP, LOG, RCP and RSQ. Also be aware the read port limits apply for vector/scalar instruction pair the same way it is described in DirectX® 9 vertex shader specification for a single instruction. 6.4. Optimizing Pixel Shaders As pixel shaders progressively become more and more complex, they become more and more an important target for optimizations. In the older 1.0-1.4 pixel shader models there is not that much room for optimization because of low shader complexity. The 2.0 shader model however is a different story. 2.0 pixel shaders are complex enough to implement different optimization strategies, so the following sections will mostly focus on 16 RADEON: 9500/9700 pixel shader engine architecture and various pixel shader optimization tricks. Texture instructions Texture instructions are the pixel shader instructions that fetch texture such as TEXLD, kill pixel processing – TEXKILL, and TEXDEPTH for 1.4 pixel shaders outputs depth values. When it comes to texture instructions there are few things to be aware of. First, TEXKILL instruction does not interrupt pixel shader processing and provides pixel culling only after shader was completely executed. Thus positioning of the TEXKILL instruction in the shader does not make any difference and it is wrong to rely on early abortion of pixel shader execution. In general TEXKILL and TEXDEPTH (or equivalent depth output in 2.0 pixel shaders with oDepth) should be used very carefully because they interfere with operation, and if possible should be avoided. TEXKILL and clip planes The TEXKILL instruction cancels the rendering of pixel based on the texture coordinate values provided. This functionality can be used to implement user clip planes at the rasterizer level. While this is an interesting hack, it does not provide the most efficient way of implementing clip planes. All RADEON family chips have support for 6 geometry based clip planes in the TnL engine. Considering that TEXKILL instruction has some detrimental impact on performance, as previously described, it is much better to use real clip planes. Use TEXKILL only when clipping cannot be properly handled with conventional user clip planes. Legacy pixel shaders on DirectX® 9 hardware When designing the RADEON 9500/9700 family of chips, one important objective was to create architecture backwards compatible with legacy shader models that would provide the highest performance possible. This resulted in pixel shader engine architecture that natively supports shader instruction co-issue, and most of the source argument and instruction modifiers. Since 2.0 pixel shader model has very limited support for modifiers, they have to be emulated with extra instructions. This means that some of the legacy pixel shaders featuring many modifiers will execute faster than their 2.0 pixel shader equivalents. Co-issue in pixel shaders Earlier pixel shader models, namely 1.0-1.4, had a feature called instruction co-issue. It allowed pairing two instructions operating on color and alpha values in one, and executing them on the same cycle. While instruction co-issue provided a great opportunity for optimization and increase of maximum number of instructions, it did 17 complicate shader development and broke instruction and operand orthogonality in the shader model. The co-issue was removed from 2.0 pixel shader model. RADEON 9500/9700 chips have dual-pipe pixel shader units, which operate as two relatively independent engines performing calculations on the different entities. One engine operates on 3D vectors or RGB-colors and the other on scalar or alpha values. This means that in most cases two instructions, one operating on the color and another operating on alpha can be performed at the same time. Such architecture provides a perfect opportunity for optimizing shaders by splitting the computational workload between pipes and thus resulting in up to twofold speedup. Careful examination of a shader for splitting the workload between the pipes should focus on a couple of things – identifying computations that can be executed only in one pipe (vector or scalar) and balancing number of instructions in each pipe. Sometimes scalar or alpha computations can be executed in the color pipe and the other way around, the color computations can be executed in the alpha pipe. Explicit instruction co-issue in pixel shaders is available only in the older shader models. However, this does not mean that the benefits of instruction pairing can be enjoyed only in the older pixel shader models. On the contrary, the full benefit of instruction co-issue can be achieved in 2.0 pixel shaders with some clever shader programming. In 2.0 pixel shader model, write masks can be used to implicitly indicate opportunity for instruction pairing. The shader optimizer in RADEON 9500/9700 drivers will look for write masks to determine which pipe should execute instruction and will try reordering and coissuing instructions. There are some nuances the shader developers have to be aware of when optimizing shaders for instruction co-issue. The color and alpha parts of the instruction pair can reference different registers, however attempting to access alpha values in color instruction or to access color values in alpha instructions might break co-issue. This also applies to .ABGR or .WZYX swizzles available in 2.0 shaders as they force data to cross vector and scalar pipes. Another important fact is that RCP, RSQ, EXP and LOG instructions are always executed in the scalar pipe. For that reason it is better to always use scalar arguments and destinations (.W or .A) when using these instructions. This will ensure the vector pipe is available for co-issue with these instructions. Following are fragments of pixel shaders that compute diffuse and specular lighting. This demonstrates how splitting instructions between pipes for co-issue can be used to optimize shaders. 18 ps.2.0 ps.2.0 … … dp3 r0.r, r1, r0 // N.H dp3 r0.a, r1, r0 // N.H dp3 r2, r1, r2 // N.L dp3 r2, r1, r2 // N.L mul r2, r2, r3 // *color mul r0.a, r0.a, r0.a // spec^2 mul r2, r2, r4 // *texture mul r2.rgb, r2, r3 // * color mul r0.r, r0.r, r0.r // spec^2 mul r0.a, r0.a, r0.a // spec^4 mul r0.r, r0.r, r0.r // spec^4 mul r2.rgb, r2, r4 // * texture mul r0.r, r0.r, r0.r // spec^8 mul r0.a, r0.a, r0.a // spec^8 mad r0.rgb, r0.r, r5, r2 mad r0.rgb, r0.a, r5, r2 … … Total – 8 instructions Total – 5 instructions The instructions shown in purple color in the first shader are the instructions that could be co-issued if they were executed in scalar pipe. The second shader illustrates the result of such co-issue with blue and red instruction pairs. It is not required to place instructions that can be paired next to each other, since the shader optimizer can intelligently reorder instructions. In this example the instructions were reordered only to illustrate a concept. Instruction balancing On RADEON 9500/9700 the highest possible performance of pixel shaders can be achieved by carefully balancing number of texture and arithmetic instructions. Each of the pixel shader engines of RADEON 9500/9700 is capable of executing a texture fetch and color/alpha ALU instruction pair on each clock cycle. Because of this high degree of parallelism between texture units and math engines, it is a good idea to keep ratio of texture to ALU instructions close to 1:1. This of course makes sense if application is not texture fetch bound. When using more expensive texture filtering modes the ratio of instructions will be skewed more towards higher number of ALU instructions. For each particular pixel shader the cost of arithmetic vs. texture instructions should be carefully evaluated to find areas that can be implemented more optimally. For instance, if shader is too long because of some complex calculations and there is some memory bandwidth to spare, some function lookup tables can be used to reduce a number of arithmetic instructions. The perfect example is SINCOS macro that can be much more efficiently implemented as a texture fetch of one or two channel texture. This instruction balancing should be performed separately at each dependency level in the shader. 19 Dependent texture reads Dependent texture reads are quite expensive. On RADEON 8500/9000 a two-phase shader is much more expensive than a singlephase shader, however it should be less expensive than running multiple render passes with single-pass shaders. If you are developing application that targets DirectX® 8.1 hardware and uses multiple render passes with 1.0-1.3 pixel shaders, consider implementing a single-pass solution with 1.4 pixel shaders. RADEON 9500/9700 has significantly optimized dependent texture read implementation for performance and efficiency. As well the number of levels of dependency has been increased to four in 2.0 pixel shader model. The best performance on RADEON 9500/9700 can be achieved when not exceeding two dependent texture reads. While three or four levels of dependency will provide sufficient performance, it will not be as good as with only one or two levels. Keep in mind that if arithmetic instructions are used to compute texture coordinates before the first texture fetch, they will also be counted as a level of dependency. Also, bear in mind that TEXKILL instruction forces a dependency level change in the pixel shaders. To optimize shaders with dependent texture reads try to keep the number of both texture and arithmetic instructions roughly the same at each level of dependency. 6.5. When Multiple Render Passes Are Better than One One obvious optimization technique is to reduce number of rendering passes. This allows cutting down on the amount of transformed and rendered geometry and decreasing fillrate requirements. However, it turns out that this is not always true. There are a growing number of cases where multi-pass rendering can result in better performance. Consider a situation when overdraw is very high, and complex pixel shaders are used. When using long and complex pixel shaders, the chances are the performance might be hampered by shader execution. If overdraw is high, these complex shaders are run many times even for the pixels that are occluded by other geometry. This is a huge waste of VPU shader processing power. Ordering all geometry by distance and rendering it front to back might not be such a good idea since it might affect sorting by effect, shader or render state. The solution is to use multi-pass rendering since vertex processing is rarely a bottleneck. On the first render pass just initialize the depth buffer with proper depth values for your scene by rendering all geometry without any pixel shaders and outputting only depth and no color information. Since no shaders are used, it is possible to render everything in the front to back order without causing any major render state changing overhead. Then render everything once again with proper shaders. Because depth buffer is already initialized with proper depth values, the early pixel rejection can happen due to HYPER Z optimizations, thus creating effective overdraw of one on the shader pass. Of course if overdraw is already low, there is no sense in using this technique. As scene and shader complexity increases this rendering method becomes more and more important. 20 6.6. Using High Level Shader Languages It might be a lot of fun to develop vertex and pixel shaders in assembly, while chasing every single opportunity to squeeze out an extra execution cycle here and there with highly optimized handcrafted assembly code. In the real world of big and demanding projects and tight schedules this just might not be the most practical way of developing shaders. It is a well-known fact that higher-level languages provide much better productivity. With the introduction of Direct® 9, High Level Shader Language (HLSL) was introduced. This C-like language with extra constructs to deal with vectors, matrices and other graphics related features could be a great help in shader development. Besides greater readability of the code and overall increased productivity, using high-level language instead of assembly allows us to focus on code reuse and high-level algorithmic optimizations, which quite often can be more valuable than low-level optimizations. It does not mean that low-level optimizations are unimportant. The HLSL compiler is aware of many low-level optimization tricks and can produce code that rivals some of the best handcrafted assembly. To help compiler recognize optimization opportunities, use appropriate types of variables. If only a scalar needs to be computed, do not use a vector to store it. Likewise, use float3 type to store 3-component vectors and so on. When computing shader output values for everything other than texture coordinates use float4 variable for holding the final result and avoid using type casts. Another good way to help compiler recognize areas for possible optimizations is to use built-in intrinsic functions. For instance, use dot() or lerp() functions instead of implementing your own functional equivalents. In HLSL pixel shaders make sure to use tex1D() intrinsic function whenever it makes sense. In DirectX® 9 there are no 1D textures, so they are emulated through 2D textures Using tex1D() instead of tex2D() can sometimes save an extra instruction when sampling a texture with 1xN dimensions, since there is no need to worry about second texture coordinate component. If for some reason shader performance is still bellow your expectations, have HLSL compile high-level code into assembly and then go over it with fine-tooth comb.
__________________
http://www.tommti-systems.com |
|
|
|
|
|
#7 | |
|
Member
Join Date: Nov 2002
Posts: 454
|
This is the description of a shader implementation.
Quote:
A scene with a lake, with trees by the water edge a player character with a sword that glows, and loosly fit clothes that sways, and vegetation around the trees. Would the use of so many shaders run into performance problems? If a shader template was used to display the tree bark, other shaders for the rippling water, reflections and stencil shadows for shadowing the player. While the sword glowed using at least 2 shaders, plus the clothing animation. Would DirectX 9 increase performance here, and a longer instruction set help. What would a developer do if there needs to be some limiting on the number of shaders? Thanks for any feedback, hopefully I worded my question right. Speng. |
|
|
|
|
|
|
#8 |
|
Member
Join Date: Nov 2002
Posts: 454
|
anyone wanna take a stab at my question above??
Speng. |
|
|
|
|
|
#9 |
|
Member
Join Date: May 2002
Location: Slovenia
Posts: 420
|
speng,
Each vertex shader change is a state change. Other state changes include texture changes, vertex buffer changes, pixel shader changes, and other things that change the way graphics cards render pixels on screen. Developers have to make a trade-off here. Generally it seams that state changes can be arranged like this (from worst up): vertex shader change, texture change, pixel shader change, vertex buffer change,... Using lot's of different vertex shaders is not really a good idea. You can use constant based branching in vs_2_0 to enable/disable some specific vertex shader effects that you need/don't need on the fly, which reduces the total amount of vertex shaders. If you have for example a scene with 5 textures and 20 vertex shaders it would probably be better to arrange by vertex shaders first and have 20 vertex shaders changes and say 20 texture changes, since if you'll arrange by textures, you could have 5 texture changes and say 60 vertex shader changes. However this is one incredibly complex aspect of 3D engines. You actually have to weight advantages of everything: state changes and draw order (preferring front to back). |
|
|
|
|
|
#10 |
|
Aptitudinal Constituent
Join Date: Mar 2002
Posts: 869
|
Too much sorting can hurt performance too, though
__________________
Crusher The metric system is the tool of the devil! My car gets 40 rods to the hogshead, and that’s the way I likes it! |
|
|
|
|
|
#11 |
|
Senior Member
Join Date: Jul 2002
Location: UK
Posts: 1,758
|
A good engine should be designed such that some degree of sorting is easy.
A sort by 'material' (shader, texture) combined with sort by depth (front-to-back) usually provides the best combination. The usual recommendation is to find the closest 'object', render it, then render all other objects using the same material. Repeat until done. If the material changes are 'small' then a more exacting depth sort may extract marginal improvements. But certainly the first thing you render should always be any close-up big polygons - walls, etc. And ALWAYS render the sky last. Not first. Please. Generally you expect the number of fine structure changes (texture etc.) to outnumber gross structure (shader programs). |
|
|
|
|
|
#12 | |
|
Tea maker
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,379
|
Thanks TB. ATIs presentation was quite interesting. One thing I did find amusing was
Quote:
I'm guessing it wasn't ATI who introduced TEXKILL.
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson "I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay |
|
|
|
|
|
|
#13 | |
|
Tea maker
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,379
|
Quote:
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson "I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay |
|
|
|
|
|
|
#14 | |
|
Senior Member
Join Date: Feb 2002
Posts: 2,636
|
Quote:
|
|
|
|
|
|
|
#15 |
|
Member
Join Date: Feb 2002
Posts: 331
|
I guess on ATI clearing the Z is free, as you just invalidate the on chip Z macroblocks. Then, if you draw the a sky box with a large texture you'll save as huge swathes will be occluded by scenary.
|
|
|
|
|
|
#16 | |
|
Senior Member
Join Date: Jul 2002
Location: UK
Posts: 1,758
|
Quote:
Of course, all alpha blended objects will have to be rendered after all opaque objects unless you have very clever occlusion tracking in your engine. But you may well have to strictly depth-sort these anyway unless all your blend operations are commutative. |
|
|
|
|
|
|
#17 | |
|
Senior Member
Join Date: Jul 2002
Location: UK
Posts: 1,758
|
Quote:
I've seen several apps that issue a Z clear and then either don't draw the sky unless they have to, or skip the bits they don't have to - and draw it last. |
|
|
|
|
|
|
#18 | |
|
Senior Member
Join Date: Jul 2002
Location: UK
Posts: 1,758
|
Quote:
|
|
|
|
|
|
|
#19 | |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,678
|
Quote:
As for how the NV30 handles changing constants, well, that's a good question. It all depends on how nVidia made the chip. No matter how the chip stores the instructions, there is no fundamental need to retrieve an entire program to update one constant. In any case, I doubt that changing constants will ever be a performance bottleneck.
__________________
April 20, 1979 - America must never forget. |
|
|
|
|
|
|
#20 | |
|
Professional Malcontent
Join Date: Feb 2002
Location: HTTP 404
Posts: 2,855
|
Quote:
|
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Rightmark3D | Lezmaka | 3D Architectures & Chips | 72 | 08-May-2003 10:23 |
| Carmack's comments on NV30 vs R300, DOOM developments | boobs | 3D Architectures & Chips | 332 | 15-Feb-2003 18:21 |
| A few questions on NV30, NV35 | Bigus Dickus | 3D Architectures & Chips | 98 | 17-Aug-2002 01:03 |
| NV30 AND NV35 specs revealed? | Steve | 3D Architectures & Chips | 1 | 15-Jul-2002 15:57 |
| NV30 information. | Fuz | 3D Architectures & Chips | 29 | 11-Jul-2002 00:08 |