So, do we know anything about RV670 yet?

Wddm 2.1?

In 2006 there were talks about WDDM, version 2.0 came with Vista and DX10. These talks also mentioned WDDM 2.1, coming with DX10.1, and the benefits it should bring. For example:
WinHEC 2006 said:
Future Directions In Graphics:
  • Move to preemptive context switching and page-level memory management ASAP, WDDM 2.1
  • Video, Glitch-resilience: Preemptive context switching in WDDM 2.1 is key
  • WDDM 2.1 – efficient GPU virtualization
WinHEC 2006 said:
Desktop And Presentation Impact On Hardware Design: WDDM 2.1 and DX10.1
  • Advanced Scheduling with page level context switching
  • Direct impact on desktop scenarios
EliteBastards said:
WDDM 2.1
  • First on the list for WDDM 2.1 is further improvements to context switching abilities
  • The other major addition to WDDM 2.1 is a change to the way the GPU and its driver handles page faults.

Since then it's absolute silence on the matter. What happened? Does WDDM 2.1 still exist? Does it contain main features outlined above? Does it come with DX10.1? Is Windows Media Player enchanced to be more "glitch-resilient"? Etc.
 
So call me a skeptic. G92 looks to be quite a bit larger than RV670. It's not rocket science to think that G92 would cost more per die than RV670 due to the area difference alone. Hence, nvidia won't like to compete at the same price point as RV670 because nvidia would make less money than AMD.
I'm not speaking for NVIDIA's CFO here, so I don't know exactly how margins are accounted for, but I would expect that the cost of a chip is determined based on the cost of the wafer and the *overall* wafer yield. So high-end bins would kinda sorta have artificially high margins, and low-end SKUs would have artificially low margins.

I don't know for sure if that is the case, but if so, it makes the margins easy to explain: overall G92 margins will be lower once cheaper SKUs based on it come out, since their costs will be identical but their ASPs will be lower. In roughly the same way, you'd expect the higher-end G92 SKUs to have a positive impact on margins, but they'd be lower volume.

I wouldn't expect lower-end G92 SKUs in Q4, so 50-55% margins doesn't seem impossible to me for the 8800GT assuming yields and costs are calculated as I would expect, and if they're blending in the 8800GTS in there it should be even easier. On the other hand, Q1/Q2 G92 margins would presumably be lower due to <=6C SKUs, assuming these do happen (and I'd be surprised if there was a distinct chip for 192-bit, so!)

Of course, this is all speculation in order to explain NVIDIA's financial claims. If you have *clear* evidence that these claims are false however, or that any of my above assumptions is flawed (and "I don't believe it!" doesn't count), then I'd love to be corrected.
 
Would you like having the capability of having fully compatible MSAA in all games?

Wavey, capability is useless if it's not implemented. Unless ATi plans to release a "Chuck patch" for every DX10 game on the market then it will remain simply a checklist feature. We all know devs aren't going to go out of their way to make anything work on ATi hardware.

Its down to the devloper to support it, but indeed, titles that use deferred shading (which includes STALKER, UT3, Gears of War, R6V, etc.) are incompatible with MSAA - this has left IHV's sometimes trying to hack the renderer in order to support AA, with the potential side effects of IQ issues and lower performance because you don't necessarily know which buffers need AA and which don't.

DX10.1 allows control of the MSAA buffers to the developer, so titles that use deferred shading in the future could still enable AA from a DX10.1 enabled application.

So that'd be a "checklist feature only" then, eh?
 
If ATI/AMD claimed that 3870 will be 10% faster than 2900XT, possible promise will come true in Catalyst 7.11 driver. :D

Don't care about what Catalyst 7.9 show for 3870.
 
I'll wait for TR's & Xbit's reviews before I make any conclusions. Legion ain't exactly at the top of my "trusted sites" list.

Results in 4 out of the 6 biggest wins for the 8800GT over the 2900xt are not consistent with other reviews.

WIC, Crysis, Bioshock and to a lesser extent Quake Wars (tested in High which seems to favor G92 whereas max is almost as perf) are the ones I found to be quite different.
 
Read and discuss!

Stream computing

AMD is making it clear they're not going to cede the burgeoning GPGPU market to NVIDIA's G80, and the company's pre-launch press materials tout the 2900's usefulness in high-performance computing applications. In particular, there's a sort of software component to the new GPU that hasn't gotten much attention in any of the launch coverage, and indeed I hadn't seen news of it anywhere before coming across it in AMD's press materials.

The R600 is an extremely wide VLIW/SIMD design that relies heavily on a special software layer to dynamically manage its large volume of parallel execution resources. This software layer is called the Accelerated Computing Software Stack, and it includes both compile-time and run-time components. The compile-time component is a set of stream extensions for C/C++ and a math library that AMD calls ACML (probably for Accelerated Computing Math Library). These tools allow coders to write stream computing (or "data parallel") code in C and C++for both the R600 and AMD's multicore GPUs.

This code isn't run natively on the AMD/ATI hardware, but instead it's passed to a runtime component called the Compute Abstraction Layer (CAL), which sits between the programmer and both the multicore CPU and the GPU and appears to contain a just-in-time (JIT) compiler that dynamically translates the code for either x86 or CUDA before passing it on to the appropriate piece of hardware.

The GPU's CTM assembler interface is itself covered by another hardware abstraction layer (HAL) that appears to reside within the CAL. Third party developers can write to either the CAL or the HAL, depending on whether they need to talk only to the GPU (via the HAL, as in the case of display drivers and some HPC applications) or to both the CPU and GPU (via the CAL, for generic "stream processing").

The CAL and HAL portions of ACSS are complex yet integral parts of the driver for the R600 family, and I'd bet money that together they're one of the bottlenecks that's holding back the system from achieving its full potential on gaming benchmarks. It appears that on all of the benchmarks run so far, both DX9 and DX10, all of the graphics calls are going to the CAL via the DirectX and OpenGL CAL bindings, where they're dynamically farmed out to the available stream computing resources on the GPU. If the CAL/HAL stack, which is a brand new piece of software that probably has quite a bit of optimization overhead left in it, doesn't do its job optimally, then the graphics code that's running on it won't be able to get peak performance out of the hardware.

People who really want to max out the R600 will write directly to the GPU hardware using CTM, bypassing the ACSS entirely. This is probably behind AMD's recent promise to open source the R600 drivers—they may be hoping that developers will step up and use CTM to write card-specific drivers that are fully optimized, game-console-style, so that all of the R600's potential can be unlocked.

It could also be the case that AMD themselves will write OpenGL and DirectX drivers that run directly on the GPU using CTM.

At any rate, using a runtime component to dynamically divide up a workload among a highly parallel architecture that runs at a lower clockspeed (the R600 approach) sounds a lot harder to me than writing directly to a narrower but faster architecture (the G80 approach). I think that this is why the HD 2000 series hasn't dethroned the G80 at the very top of the graphics heap, but it hasn't even tried. The highest-end card announced today is the $399 ATI Radeon 2900 XT, a part that falls well below NVIDIA's top-of-the-line card in both price and performance. The 2900 XT isn't reaching its peak potential with its current software stack, so throwing more GDDR3 and more clockspeed at it for a boutique $1,000 card wouldn't buy it enough performance to stand up to the high-end G80.
 
Clueless. e.g. "appears to contain a just-in-time (JIT) compiler that dynamically translates the code for either x86 or CUDA".

Oh dear.

Jawed
 
8brmb1v.gif

Nice image 'comparision' there!..
C'mon, guys, that's not even weak, it's appalling.
 
8brmb1v.gif

Nice image 'comparision' there!..
C'mon, guys, that's not even weak, it's appalling.

What, you mean comparing pics taken at different distances from the object of interest isn't valid?It's not like these games have a LOD system in place or anything...and we all know nV are the cheatorrrrzzzz, so...:D
 
Back
Top