Nvidia GT300 core: Speculation

Status
Not open for further replies.

That is interesting -- and I wonder which instructions dominate in the SFU as well. Might not be DIV at all. And to nao's point, not a lot of FMAs either. It would be instructive, I would think, to also know the breakdown of MUL vs. ADD. Clearly, of the programs they ran, the DIV:MUL ratio is less than the 1/2 I estimated (from FMA alone), but now I'm curious :)
 
I did analysis of these games with RV770, and it has even less dependence on BW for Crysis. Crysis is a bad game to evaluate this, too, as the timedemos/walkthroughs that most reviewers use definately have some parts that are CPU limited.
Curiously, the analysis that we did for RV790 clock settings said that Crysis (or Crysis Warhead - forget which) was one of the few titles that gained with more bandwidth on this arch.

Generally speaking though, internal testing on Cypress has indicated similar findings as the FS overclocking does - it benefits more from engine speed than, at least, I expected.
 
If you gain 8% from a 9% increase in engine and memory clocks, how can you claim it's CPU limited?

-FUDie
I said parts are CPU (or PCI-e) limited, and your numbers are wrong, too. He overclocks the GPU by 9.4% and mem by 12.5%. He gets 7.2-8.5% gain, depending on resolution and settings. 100% GPU limited would have given over 10%.

Anyway, that's all besides the point. nAo is right in saying RV870 is more BW limited than RV770. Look at this graph:
http://www.firingsquad.com/hardware/ati_radeon_4850_4870_performance/images/lpoc.gif
 
Yea, today's keynote is a key point for me :)
Will they give the final specs? Will there be actual working silicon? Will they mention a launch date?

I feel like I'm in a black hole in terms of upgrading right now. nVidia doesn't have DX11 yet, and not sure when they'll have it, or how good it's going to be.... AMD doesn't have OpenCL and DirectCompute yet, and no clue about when they'll deliver that.
I'm only going to upgrade to a card that does DX11 AND OpenCL/DirectCompute, out of the box (my 8800GTS already does DX10 and OpenCL/DirectCompute, would be pointless to get a HD5850 now if I still need to use my 8800GTS for software development, which is my primary use for the card).
Will be interesting to see who's first to offer me what I want.

What do you mean by AMD doesn't have DirectCompute and OpenCL?!

Not only does HD 5000 series support both, it's actually faster by a fair amount compared to nV GTX series.

http://www.anandtech.com/video/showdoc.aspx?i=3643&p=8

Thats nV Ocean demo for DX Compute.
 
I think the biggest issue is that of using function pointers. A lot of the object-oriented features of C++ are implemented through the manipulation of function pointers. As far as I know, they could only branch with fixed offset so far .
This reminds me certain paper on compiler optimisation technology. The idea was to use novel technique to implement "virtual" (as in C++ nomenclature) method invocations in generated machine code.

Typical way of doing this, is to use virtual-method-tables and indirect branches. In the paper, they didn't generate any indirect branches. Instead, at each call site there was generated (inline) tiny binary search tree traversal to find right jump target among set of precalculated candidates. All done with conditional branches. The goal was to better utilise CPU's branch prediction resources, which allegedly were underutilised under the typical way.

This optimisation technique relied on whole-program-analysis (as opposed to dumb linking of separately compiled fragments, typical in C/C++ world) to make these search tree really tiny, or even to eliminate need of search altogether (in as much as 90% of virtual method invocations in tested programs).

If this technique was used, I think you could run fully OO code on a GPU, even now (ignoring SIMD, memory organisation etc. of course).
 
Generally speaking though, internal testing on Cypress has indicated similar findings as the FS overclocking does - it benefits more from engine speed than, at least, I expected.
I think most people expect BW to make more of a difference than it actually does. Take a look at my findings here:
http://forum.beyond3d.com/showthread.php?t=48761

In most games, the 4850 is BW limited for less than 30% of the frame time.
 
That's hardly an excuse, AMD didn't suffer as much so it comes down to NV's design.
So RV740 avialability 6 months after it was announced means that AMD didn't suffer as much, eh? And it's price parity with 4850 surely mean the same thing?

They both need a 40nm process and it's "safe" for IHVs to build an "easy" chip on a new process before attempting a behemoth.
It's not "safe", it's easier, less risky and it's done so that later you won't make the same mistakes with a bigger chip.

You think NVidia was entirely blameless?
I don't have enough info for any conclusions right now.
And it puzzles me when I see someone who apparently does.
It puzzles me even more to read about G300 delays while the initally planned launch frame haven't even passed yet.
All we have right now is a delay of GT21x series for which TSMC is the one to blame. That's all. How it'll end up with GT21x power/price/performance and GF100 we'll eventually see.
 
DirectCompute works, but only on the HD5800-series.
By my testing, DirectCompute doesn't work at all on any NVIDIA parts yet (maybe it does on Win7?)... not even DirectCompute 4. AMD is clearly a step ahead here with DirectCompute 5 working even on Vista.
 
Yea, today's keynote is a key point for me :)
Will they give the final specs? Will there be actual working silicon? Will they mention a launch date?

I feel like I'm in a black hole in terms of upgrading right now. nVidia doesn't have DX11 yet, and not sure when they'll have it, or how good it's going to be.... AMD doesn't have OpenCL and DirectCompute yet, and no clue about when they'll deliver that.
I'm only going to upgrade to a card that does DX11 AND OpenCL/DirectCompute, out of the box (my 8800GTS already does DX10 and OpenCL/DirectCompute, would be pointless to get a HD5850 now if I still need to use my 8800GTS for software development, which is my primary use for the card).
Will be interesting to see who's first to offer me what I want.

the Jensen Keynote will be live on Nvidia.com at 1 PM; i am here at the GTC now
- they say about 1/3rd of it requires 3D glasses .. so you know it will be a lot of 3D

what i am interested in is the PRESS conference after the keynote; it is at 2:45 PM
- i expect a lot more to be revealed then

The Fairmont Hotel, San Jose is such a cool place for a technology conference; Nvidia has an entire floor for it .. and (best of all, good) food is free for the press
:p
 
Specs from Bright side of news

3.0 billion transistors
40nm TSMC
384-bit memory interface
512 shader cores [renamed into CUDA Cores]
32 CUDA cores per Shader Cluster
1MB L1 cache memory [divided into 16KB Cache - Shared Memory]
768KB L2 unified cache memory
Up to 6GB GDDR5 memory
Half Speed IEEE 754 Double Precision

By comparison to ATI Rv870

20 SIMDS
16 kb L1 cache per SIMD = 320kb Texture cache
8 kb L1 cache per SIMD for computational work= 160kb computational cache

32 kb local data share L1 cache per SIMD = 640kb local data share cache

128 kb L2 cache per memory controller= 512 kb L2 cache

L1 cache speed 1 terabyte per second

L2-L1 cache speed 435 gb per second.
 
By my testing, DirectCompute doesn't work at all on any NVIDIA parts yet (maybe it does on Win7?)... not even DirectCompute 4. AMD is clearly a step ahead here with DirectCompute 5 working even on Vista.

DirectCompute 4 works out-of-the-box on Win7 with 190-release drivers.
It works on Vista aswell, but by default it is disabled through a registry key.
The release notes of the GPU Computing SDK tell you how to enable it (that's where Anandtech got the Ocean demo from).
So looks like nVidia is ahead. They've had support on release drivers for a while, and I don't think we need to compare the installed base :)
 
Last edited by a moderator:
If the BSN specs are accurate, then the L2 size alone indicates that is one part of Larrabee's design that wasn't copied.
A scheme similar to Larrabee's (in particular the tiling) would need the capacity such an L2 affords, and the L2 given isn't much bigger than that of Cypress.
 
It works on Vista aswell, but by default it is disabled through a registry key.
Why on earth would they do that? The rest of "feature level 10" on DX11 interfaces seems to work fine... silly decision IMHO, but thanks for the pointer. I'll go look into that.

So looks like nVidia is ahead. They've had support on release drivers for a while, and I don't think we need to compare the installed base :)
Huh? You're arguing that ComputeShader 4 support on G80+ HW with a registry key setting somehow puts them "ahead" of ATI's full ComputeShader 5 implementation that works "out of the box" on their latest hardware? From a developer's point of view, you and I have different definitions of "ahead"...

No point in arguing though, the key point is that I can write CS5 code right now on AMD parts, with no ETA on when I can do that on NVIDIA. This puts AMD as the obviously more useful piece of hardware at my disposal right now :)
 
From this russian site, one curious line:

Отсутствие аппаратного блока тесселяции, данный функционал будет реализован программно; -- There is no hardware tessellation unit, the function is implemented on a program level;

:rolleyes:
 
Status
Not open for further replies.
Back
Top