Nvidia GT300 core: Speculation

Status
Not open for further replies.
I haven't seen this before, but it just looks like a piece of the hierarchical rasterisation feature set that we've seen in other NVidia patents.
I wasn't aware of that. :(

I can't see anything there that's meaningfully beyond G80.
Given that it's been filed only after G80's launch (and thus not been issued yet) I doubt this particular assumption.
 
Yea, 3DMark03 is the one that the FX failed in.
It contained two ps1.x game tests, and the nature test with ps2.0.
FX did fine in the ps1.x tests, but completely died in the ps2.0 test.
Then a driver update appeared where the ps2.0 performance was 'fixed'... nVidia had replaced everything with int and half-precision shaders, and also 'optimized' some other things, like not rendering things that were outside the visible range (abusing the fact that the camera path was fixed). It ran about as fast as ATi's stuff, but it suffered from blocky aliasing because of the limited precision.
That's when Futuremark started with the whole driver approval thing.
Funny enough many people couldn't believe the FX series was THAT bad in ps2.0, and suspected foul play from FM/ATi instead. Then again, who could blame them, really? Games only used fixedfunction or ps1.x, and there was no reason to assume performance problems based on that.

If I remember correctly. Two of the game tests that did pixel shader 1.1 were actually using 1.4 on any card that supported it. So GT2 and GT3 were actually 1.4 shaders. Unless those were replaced. 1.1 shaders were not always faster than 1.4 shaders on the FX cards. It really depended on register usage.
 
So the restrictions are at the compiler/library level?
Restrictions mostly at the hardware level.

I dare say I'm getting a sense that GPU/game programmers will be blazing a trail, from what you've described. Though there's still a very tricky scaling question beyond a single GPU. I stumbled into this:

http://insidehpc.com/2009/05/12/argonne-researchers-receive-award-for-mpi-performance-study/

which paints a grim picture.

Not grim IMO, but rather shows what will become important. For example, note how the BG/P OS doesn't do disc backed memory, pages are always physically pinned so DMA engine has low latency and CPU doesn't touch pages during communication. What I gather from all of it is that eventually the hardware is going to consist of cores and interconnect which provides dedicated hardware support for the most important parallel communication patterns, so that the cores aren't involved in communication which is latency bound. Things like CPUs manually doing all the work on interrupts (preemption) just isn't going to scale ... nor is ALUs doing atomic operations on shared queues between cores ... etc. I think all this goes away at some point for dedicated hardware, and a different model of general purpose computing.

My little brother (James Lottes, different last name) worked at Argonne in the MCS Division on tough scaling issues for Bluegene (until he decided to go back to get his PHD this year, now he works there on/off). An interesting paper related to the issues of scaling algorithms in interconnect limited cases, http://www.iop.org/EJ/article/1742-6596/125/1/012076/jpconf8_125_012076.pdf?request-id=12293745-5238-4326-9be2-43b91b4c4753, covers how they adjust data exchange strategies for the problem to lower network latency.

Are global fetches cached? I disagree fundamentally on the cache question - just because you can hide latency doesn't mean performance is fine without a cache.

If you haven't read this PTX simulator paper, http://www.ece.ubc.ca/~aamodt/papers/gpgpusim.ispass09.pdf, you might find it interesting. Their results showed performance more sensitive to interconnection network bisection bandwidth rather than latency. They also added a cache in their simulation, which indeed helped some of the apps, but also reduced the performance of a lot of them.
 
If I remember correctly. Two of the game tests that did pixel shader 1.1 were actually using 1.4 on any card that supported it. So GT2 and GT3 were actually 1.4 shaders. Unless those were replaced.

You are correct that they were ps1.4 on hardware that supported it.
And I'm not entirely sure, but I vaguely recall that nVidia may have reported ps1.1 capability in those tests because it ran faster than ps1.4 on the FX.
 
1. I think you are confused there as I believe you mean 3dM2k not 2k1. 2k was a DX6/7 tester and 2k1 was 7/8. 3/5/6 are dx9 for the most part with maybe dx8 just a tincy bit..
2. Secondly, I've scored damn near close to 10k in 2k1 with a GF2(3d Prophet that made Nv angry for being as fast as a Pro card because of core and memory oc) and a P3 1ghz, I've yet to see any 2ghz single core processor with any none TnL gpu come close to that. Hell my laptop with an ATI express 200 and A64 s754 3200+ does even top 5k in 2k1
3D Mark 2001 is called DX8 test, but it doesn't test real DX8 capabilities at all:

You can run tests 1-3 in full quality on any other DX6 compatible graphics card. No effect will be missing. The only advantage of DX7/8 graphics card in these tests is hardware accelerated geometry.

Test 4 use PS1.1 on the lake surface, which is shown for 15-20% of testing time - that's the only DX8 exclusive effect, which can reflect DX8 performance in the score.

Score is calculated via this formula: (total low-detail FPS * 10) + (total high-detail FPS + nature FPS) * 20

Here are results of DX8 graphics card: (107,1 + 98,6 + 103,2)*10 + (41,4 + 67,3 + 46,9 + 29,4)*20 = 6789 3D Marks

The bold value (29,4) is framerate in Nature test. Imagine, that the graphics card would be so crappy in pixel-shading, that the performance in PS/lake scenes would be zero. We know, that the lake scenes takes about 18% of the test time, so it's quite easy to count, what the framerate will be: 29,4*0,82 = 24,1 FPS

If I use the 3D Mark formula, the graphics card would score 6683 3D Marks. Well, this "DX8 benchmark" shows 1,5% difference between fast DX8 graphics card and graphics card with zero DX8 performance.

Do you understand now, why I rate 3D Mark 2001 as DX6 test? ;)

As for GeForce 2 scoring near 10k in 3DM01 - are you sure? 10k score was typical for GeForce 4 Ti...
I've yet to see any 2ghz single core processor with any none TnL gpu come close to that.
You don't need non-TnL GPU to prove my point. Just switch to SW TnL in 3DMark. For majority of DX7 TnL cards, SW TnL on 2GHz+ GPU will score slightly better in 3DM score. 8-lights tests score will be about twice as high with SW TnL.

The real performance advantage of GF2 wasn't hidden in the TnL engine, but in the 4x2 configuration. Competition was 4x1, 2x2, or 2x3 - GF2 simply offered almost double fill-rate...
 
I think you have to see physics much like shadows.
When the first games with dynamic shadows arrived (eg Doom3), the effect was VERY expensive, and didn't do much for gameplay itself.
But they did make the game look nicer and more realistic, and now all games have it, and people take the performance hit for granted.

I'd thank you but the forums don't use thanks.

Oh wait! 'Thanks'

:D

US
 
3D Mark 2001 is called DX8 test, but it doesn't test real DX8 capabilities at all:

You can run tests 1-3 in full quality on any other DX6 compatible graphics card. No effect will be missing. The only advantage of DX7/8 graphics card in these tests is hardware accelerated geometry.

Test 4 use PS1.1 on the lake surface, which is shown for 15-20% of testing time - that's the only DX8 exclusive effect, which can reflect DX8 performance in the score.

Score is calculated via this formula: (total low-detail FPS * 10) + (total high-detail FPS + nature FPS) * 20

Here are results of DX8 graphics card: (107,1 + 98,6 + 103,2)*10 + (41,4 + 67,3 + 46,9 + 29,4)*20 = 6789 3D Marks

The bold value (29,4) is framerate in Nature test. Imagine, that the graphics card would be so crappy in pixel-shading, that the performance in PS/lake scenes would be zero. We know, that the lake scenes takes about 18% of the test time, so it's quite easy to count, what the framerate will be: 29,4*0,82 = 24,1 FPS

If I use the 3D Mark formula, the graphics card would score 6683 3D Marks. Well, this "DX8 benchmark" shows 1,5% difference between fast DX8 graphics card and graphics card with zero DX8 performance.

Do you understand now, why I rate 3D Mark 2001 as DX6 test? ;)

As for GeForce 2 scoring near 10k in 3DM01 - are you sure? 10k score was typical for GeForce 4 Ti...

You don't need non-TnL GPU to prove my point. Just switch to SW TnL in 3DMark. For majority of DX7 TnL cards, SW TnL on 2GHz+ GPU will score slightly better in 3DM score. 8-lights tests score will be about twice as high with SW TnL.

The real performance advantage of GF2 wasn't hidden in the TnL engine, but in the 4x2 configuration. Competition was 4x1, 2x2, or 2x3 - GF2 simply offered almost double fill-rate...


I'm sorry, but I disagree with you and for funs and giggles, I will put together a P4 2.8Ghz HT fsb800 machine together and use a GF4/3 or 2MX(depending on what I can find stashed away) and will post the number from 2k1. And I will guaruntee, that SW T&L will not be faster than hardware cept for maybe the 2mx.
 
Thanks for the responses on Physics on the upcoming GPU's.

DX11 is supposed to have some physics implementation and with the new GPU's a lot more powerful than the currently crop it had me thinking and wondering if physics could be implemented with little or no performance loss in fps.

Looking at the PhysX Sacred 2 patch(youtube video of the differences here) has me thinking that Physics on GPU's will be a really good thing.

The amount of memory and bandwidth that the new GPU's will have with faster and more efficient cores and shaders should help with getting more and better physics in games or at least that's my hope.

As mentioned, fluid and cloth physics should get a boost imo(being easier to simulate), enviromental destruction physics though is a bit more taxing.

Still, all this would be nice if it does get the support it requires whether through PhysX, Havok Physics or with further implementations in DirectX from MS.

For me the most importan physics improvement: HAIR. When will be have proper hair physics ?.
 
Given that it's been filed only after G80's launch (and thus not been issued yet) I doubt this particular assumption.
The provisional filing date was a year earlier. I'm not even sure what value there is in a comparison of patent application filing date and launch date for a technology.

Jawed
 
Normally, you file you patent as soon as you're done with your work and do not wait 'til all the other execution stages + marketing are done also.

But since the provisional filing was a year earlier, which I did not notice, this is also moot.
 
Nvidia G300 has got taped out. He is actually running well at A1 step.
The GDDR5 memory, he is used, clocks higher than 1,000 MHz. So you can expect a bandwidth higher than 256 GB/s.

Source: Hardware-Infos
 
That (doubling bandwidth to ~280ish GByte/sec.) would IMO only be necessary if they've really decided to ditch the FF-ROPs (thus also removing quite a bit of compression/decompression hardware) and are doing all this stuff in the shader ALUs.

If I am not mistaken, the scheduler/scoreboarding stuff could also be simplified quite a lot with this step, since each pixel/thread is effectively "fire and forget", once it's left for the shader core. If there's geometry stuff to be done, it can be re-queued from VRAM.
 
That (doubling bandwidth to ~280ish GByte/sec.) would IMO only be necessary if they've really decided to ditch the FF-ROPs (thus also removing quite a bit of compression/decompression hardware) and are doing all this stuff in the shader ALUs.

If I am not mistaken, the scheduler/scoreboarding stuff could also be simplified quite a lot with this step, since each pixel/thread is effectively "fire and forget", once it's left for the shader core. If there's geometry stuff to be done, it can be re-queued from VRAM.

Noooo, R600 all over again, noooo!.

Seriously it that would be the case i hope they had a real AA solve shader solution this time or what is the same: lots of flops!
 
That (doubling bandwidth to ~280ish GByte/sec.) would IMO only be necessary if they've really decided to ditch the FF-ROPs (thus also removing quite a bit of compression/decompression hardware)
I know NVIDIA's design decisions haven't always impressed everyone lately, but I hope you're not suggesting they replaced all their engineers by drunk monkeys?
 
@Love_in_Rio: First of all, I think it could make a difference, if you're planning your architecture around this "feature"/"economization" or if you have to bolt it on afterwards.

Second: please look at what Edge-Detec-AA costs you on HD 4890. I've just had time to run Deep Freeze from 3DMark 06 (at least it uses HDR-Rendering) in 1.680x1.050:

1x MSAA: 72,2 Fps
4x MSAA: 53,2 Fps
8x MSAA: 42,3 Fps
4x & EDAA: 47,3 Fps

Nice, isn't it?

@Arun:
At least shader-based AA seems feasible IMO. What else would you suggest one could need that amount of bandwidth for? We're talking about doubling again! If it's at all true, that is.
 


Would GF2/4MXs be fine by you then, its not like teh T&L engine stopped being fixed on the 3/4s. And you claimed SW T&L on a 2+Ghz proc would be faster than on majority of DX7 capable hardware. GF3/4s are capable of DX7 or did they stop supporting it when they became DX8 capable? Something tells me they didn't.
 
Status
Not open for further replies.
Back
Top