Beyond3D Forum

Beyond3D Forum (http://forum.beyond3d.com/index.php)
-   Pre-release GPU Speculation (http://forum.beyond3d.com/forumdisplay.php?f=51)
-   -   The NEXT LAST R600 Rumours & Speculation Thread (http://forum.beyond3d.com/showthread.php?t=39173)

leoneazzurro 02-May-2007 21:28

Quote:

Originally Posted by Rys (Post 980606)
G80's sampler hardware can fetch and filter in the same clock, yes, and at full speed.

Thank You very much. :)

Sound_Card 02-May-2007 21:31

Quote:

Originally Posted by leoneazzurro (Post 980609)
I remember to have read L2 cache is 256 Kbyte for texture unit group, 1 Mbyte total for texture units.

ahh thanks, its it me or is that a good amount of cache?

What about L1?

Khronus 02-May-2007 21:31

Quote:

Originally Posted by jimmyjames123 (Post 980550)
I wonder if Fuad took a look at this graph from NVIDIA:

http://enthusiast.hardocp.com/image....1fMV8xX2wuanBn

No doubt, the one huge drawback for the 2900 XT crossfire system is that power requirements will be absolutely huge, even though performance should be very good. An 8800 GTS SLI system on the other hand would have much more modest power requirements, but the Vista SLI drivers need major work.

On a side note, glad to see that this graph did not start at 147w, and then scale to 300w from there :D


Umm you guys did note the Nvidia logo in the top right corner of that pic for power consumption. I take it with a grain of salt.

leoneazzurro 02-May-2007 21:32

Quote:

Originally Posted by Sound_Card (Post 980613)
ahh thanks, its it me or is that a good amount of cache?

What about L1?

I have no info for the moment, sorry :(.

nAo 02-May-2007 21:37

Quote:

Originally Posted by leoneazzurro (Post 980609)
I remember to have read L2 cache is 256 Kbyte for texture unit group, 1 Mbyte total for texture units.

1MB of L2 cache would be IMHO a complete waste of die area, at least for tipical applications (games), you can happily live with way less than that.

compres 02-May-2007 21:41

Quote:

Originally Posted by nAo (Post 980618)
1MB of L2 cache would be IMHO a complete waste of die area, at least for tipical applications (games), you can happily live with way less than that.

To be honest that sounds like a lot specially on a gpu, but maybe it has to do with the ring bus and it's latency. Perhaps a ring with more cache was better for them than a crossbar. Also cache takes less area per transistor than other functional units.

edit: spelling

Kaotik 02-May-2007 21:42

Quote:

Originally Posted by Khronus (Post 980614)
Umm you guys did note the Nvidia logo in the top right corner of that pic for power consumption. I take it with a grain of salt.

The nVidia numbers with grain of salt perhaps, but the AMD/ATI numbers are known to be false already as stated on last page :wink:

leoneazzurro 02-May-2007 21:44

Quote:

Originally Posted by nAo (Post 980618)
1MB of L2 cache would be IMHO a complete waste of die area, at least for tipical applications (games), you can happily live with way less than that.

Waste or not, this is what I read :)

http://forum.beyond3d.com/showpost.p...postcount=3635

flippin_waffles 02-May-2007 21:48

Quote:

Originally Posted by Khronus (Post 980614)
Umm you guys did note the Nvidia logo in the top right corner of that pic for power consumption. I take it with a grain of salt.


Yeah, that's what I thought too. I'm curious where NV got the numbers from though! They must have a card then, no?

Unknown Soldier 02-May-2007 21:52

Quote:

Originally Posted by Kaotik (Post 980555)
It's simple, it can NOT be true due the fact it works with 2x6pin plugs

I seem to remember that it has 1x6pin; 1x8pin connectors and that if you don't use 6&8 then you can't overclock.

You can use 6&6 but cannot overclock.

US

Silent_Buddha 02-May-2007 21:54

However, considering AMD is really focused on GPGPU is 1 meg L2 really wasted in that arena?

Is it possible that R600 is trying to be too many things in too many market segments?

After all it appears that AMD/ATI is trying to position it as a top graphics performer, a top physics processor, and a top GPGPU unit.

So is it possible they expended a lot of transistors on things that would benefit GPGPU greatly but have low to minimal impact on 3D rendering?

Regards,
SB

aeryon 02-May-2007 21:57

Quote:

Originally Posted by Geeforcer (Post 980563)
I keep forgetting, what is PCI-E slot specked at, 75W? If so, then yeah, maximum power consuption could not exceed 225W.

R600 is PCI-E 2.0 compliant so it can take up to 130W from the slot

Russell 02-May-2007 22:01

Quote:

Originally Posted by aeryon (Post 980636)
R600 is PCI-E 2.0 compliant so it can take up to 130W from the slot

Except that a) it's also PCI-E 1.1 compliant, as such can work on any current pci-e boards within their supplied power envelope; and b) there aren't exactly a lot of pci-e 2.0 motherboards around right now, so if R600 requires one to run they'll have to delay them another 6 months.

Has R600 and PCI-E 2.0 been verified? I recall it as a rumor, but I am uncertain if it ever moved past rumor status.

Geeforcer 02-May-2007 22:11

1 Attachment(s)
Quote:

Originally Posted by compres (Post 980608)
I don't get it...

So you mean nVidia did good or bad releasing ultra today?

I think the tests speak for themselves:

http://forum.beyond3d.com/attachment...1&d=1178140185

Fornowagain 02-May-2007 22:19

What's the image? I can't see it for some reason.

Kaotik 02-May-2007 22:21

Quote:

Originally Posted by Unknown Soldier (Post 980632)
I seem to remember that it has 1x6pin; 1x8pin connectors and that if you don't use 6&8 then you can't overclock.

You can use 6&6 but cannot overclock.

US

Even if you can't OC with 6+6, it still means that it's under 225W max since it works with 6+6, and there always has to be at least that couple watts of breathing space

Love_In_Rio 02-May-2007 22:27

Quote:

Originally Posted by Geeforcer (Post 980642)
I think the tests speak for themselves:

http://forum.beyond3d.com/attachment...1&d=1178140185

that test is biassed. didn“t take into account the king, kyro 3.

fellix 02-May-2007 22:34

Wow, Parhelia just got Early-Z Test & Occlusion?... :lol:
I miss that chippery -- the mosy unique DX8-plus-some-DX9 combo, right before NV30. :lol:

compres 02-May-2007 22:34

Quote:

Originally Posted by Geeforcer (Post 980642)
I think the tests speak for themselves:

http://forum.beyond3d.com/attachment...1&d=1178140185

All is clear now...:lol:

neliz 02-May-2007 22:38

Quote:

Originally Posted by Geeforcer (Post 980642)
I think the tests speak for themselves:

The only reason is ... because they can ..

hoom 02-May-2007 22:39

Awesome :lol:

nutball 02-May-2007 22:41

I demand one-frame-wonder FRAPshots of STALKER running on the Parhelia before I believe those graphs. They look fishy to me. Like they've been ripped off Frumpzilla.

memberSince97 02-May-2007 22:58

Quote:

Originally Posted by Pete (Post 980572)
The explanation is probably simpler. A histogram shows the R600 shot is clipped at both ends of the black/white range while the G80 is not. A quick play with lowered brightness ("-50" in ArcSoft PhotoStudio) and contrast ("-75") shows the G80 shot approaching the R600 in IQ and corresponding clipped histogram range (though G80 still shows more of a reddish hue). I'm guessing this is a simple case of a borked gamma setting.

But if a smaller HDR format (FP10, or that 9-9-9-5 format I think was in an R600 slide?) could produce the same results, then I guess we split the difference (MDR?).


Thanks Pete... those two comparison shots really bother me...

Jawed 02-May-2007 23:00

Quote:

Originally Posted by dnavas (Post 980475)
:lol: I noticed this divergence awhile ago, but it struck me as being exactly the kind of architecture you might expect if you purged your shader team of everyone who didn't buy the unified approach and put them on the texture unit team. It doesn't seem unified at all. The shader units are unified, but not the texture units.

I was referring to the shader architecture being unified, not the texturing architecture.

I think you're referring to the architecture of the texture units, anyway. But I can't work out what you're saying.

Quote:

You have single-channel dedicated addressing units, single-channel dedicated samplers, multi-channel dedicated addressing units, and multi-channel samplers, which are effectively just four-wide single-channel samplers, but, err, they're "dedicated".
When you address texels you have to account for LOD and bias and the kind of filtering algorithm you intend to perform (merely bilinear or something more interesting). With higher quality filtering, the texels to be fetched for one pixel in a screen-space quad don't necessarily overlap with all the other texels for the other pixels in the quad. So each set of texels needs to be addressed.

So that's why you need a fair amount of TA capability for filtered texels. Addressing formulae are more involved then I can ever be bothered to remember (or work out) so, ahem, just think of loads of interpolations in each of the 3 dimensions of screenspace.

Now, for vertex fetches, addressing should be much simpler, because fetches are from a stream. Each element in the stream is the same size as its neighbours and there's usually not much reason to flit around, a serial read is fine. Addressing consists of base address + position-in-stream * size-of-element. Much easier to compute than texel addresses for filtering. Having said that, you may want to have a stride factor (for LOD), e.g. reading 1 in 10 vertices in a 3:1 LOD reduction.

In texture filtering, with multi-texturing, each layer of textures has effectively the same address. Well partly, anyway, because the mipmap chain might be different for the extra layers (they can be lower-detail). But anyway, multi-texturing should be able to (at least partly) re-use the texture addresses from level to level. And don't forget multi-texturing usually requires less texturing quality for these extra layers (e.g. only bilinear).

In vertex fetching you may want to sample from multiple streams in parallel. This is where you can pile on the attributes and do instancing. D3D10 allows for 8 streams to be used in parallel.

So, I'm guessing that the TAs for vertex fetch are used less densely than for texel fetch. The VF-TAs can each address one stream. So four of them allow four streams to be fetched in parallel.

Separate from VF and texture filtering, you've got unfiltered texture fetches. In D3D10 these are from texture buffers, 1D, 2D or 3D. These could be something like big blobs of constant data (e.g. for morphing vertices) or they can be for post-processing of render targets (e.g. performing tone-mapping). etc.

When you address a single texel in a texture buffer, the shader will prolly have performed some calculations to identify which texel is required. The TA then fetches the texel based on base address and offset (taking care of 1D, 2D or 3D organisation of the texture). Each of the other objects executing the same shader (vertices, primitives or pixels) will decide their own address for the texel fetch. So that'll keep four TAs occupied. These TAs, I'm guessing, are VF-TAs. I guess that because without filtering, texture buffer fetches shirk most of the complexity of TA-ing (no interpolations are needed to generate these addresses).

As far as I can tell it's prolly best to think of VF-TAs as much less complex than filtered texture TAs. The throughput of both kinds of TAs needs to be high. At the same time there are overlapping and disparate kinds of fetches that need to be performed within a shader program, so you want to maximise the potential throughput per clock.

It's also worth remembering that the L2 cache in R600 is shared by both vertex L1 and texture L1 caches. In R600 the L1 texture cache is specifically for the filtering pipelines, as far as I can tell (based on patent documents). That would mean that all vertex fetches and texture buffer fetches come through the vertex L1.

In classical DX9 pixel shader code, some texel fetches are unfiltered. Typically these are for things like BRDFs (providing a short cut to the behaviour of light on a material) or for things like the infamous D3 specular lighting lookup. These texel fetches on R5xx and G7x have to be performed by the filtering pipelines, with the filtering turned off.

In theory a D3D10 GPU can perform these fetches using the texture buffer (vertex fetch) pipelines. This would then free-up the filtering pipelines for their normal duty, instead of wasting them on unfiltered texels - onerous when your shader is trying to apply four or more textures per pixel.

I'm still interpreting here, nothing's set in stone...

Jawed

Jawed 02-May-2007 23:07

Quote:

Originally Posted by nAo (Post 980618)
1MB of L2 cache would be IMHO a complete waste of die area, at least for tipical applications (games), you can happily live with way less than that.

As far as I can tell the amount of cache hasn't appeared on any leaked slides so far.

Jawed


All times are GMT +1. The time now is 04:15.

Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.