PDA

View Full Version : Interesting R300 information!


Fuz
18-Jun-2002, 09:49
This is taken from another forum, I thought you people might find it interesting.

ATi's R300
I was speaking with the ATi rep at the T.O. AMD Tech Tour (heh).

He said:
"It will have a 128bit floating point colour system!"

So I commented and said how carmack let it slip that the newer hardware has done a good thing with increasing the colour depth of graphics cards and he finished my sentence by saying;

"he (referring to carmack) said that 64bit colour wasn't enough! Yeah thats why the r300 supports more then needed...if there is such a thing. Basically when we release the R300 the whole industry will take a step back, and realize they are behind...r300 is truly ground breaking: Full DX9 support, which is as big a leap from DX8.1 as DX8.1 was to DX7, precision calculations both on polygon and texture information, and a whole host of other stuff that I can't really comment on."

He also said:
"You have no idea. The sample that Doom]I[ ran on was both ALPHA hardware as well as ALPHA drivers....the way the card is now (referring to its alpha stage) it can do 14XXX+ 3Dmarks...we will definately do our best to get it higher. Also realize that we haven't determined the final clock speeds either...the pics you have probably seen on various sites show a PRE PRE engineering sample....
Basically we WILL have a show stopper. The R300 will revolutionize the way we see graphics. In fact nothing will even be able to compete with us until Februray or March of 2k3 (I think he was talking about nVidias 6month cycle)."

He said it will be out late Q3...or late fall...

Also nVidia was pushing the nForce stuff...they said the were definately sticking with it..its not just a one time thing....also there will be a DX8.1 line of nForce (pixel and vertex shaders) coming out soon and if you check up on computex there is already some leaked info on it.

-Rogue5-

And this is taken from Hardocp front page.

The problem with the first revision was that it violated Intel's AGP spec, which allows 48W maximum power consumption across the port. The high consumption was most likely due to a power hungry memory interface (this is a 256-bit memory bus, remember).

The second revision of R300 was said to have arrived back from the fab about two weeks ago and solved the PS problem. However, the AIW version of the card may still require an external power supply (I noticed your Xabre 400 post this morning which reminded me to tell you this, lol).

Mind you, all this info is from forums, so take it as rumour, nothing more. But, still very interesting.

Fuz

Tahir2
18-Jun-2002, 12:41
I'm not sure how long this link is going to last, but it has what appear to be an ATI authored Powerpoint slide presentation on the R300. Here are some of the highlights:
R300


350 MHz. clock

400+ MHz. memory clock

8 Pipelines

150 Million Tri/sec (peak)

Increase Multitexturing to 8 textures

Improved Pixel/Vertex Shaders

DX9 Support

MPEG 1/2/4 Encode/Decode

The interesting thing to note, if one were to assume that these are authentic, is the use of the term "Comprehensive DX9 support" and "Improved Pixel & Vertex Shaders." It nearly suggests that R300 might not be a fully DX9 supported chip?


Typedef Enum posted:
http://www.nvnews.net/#1024385608

"It nearly suggests.."

Comprehensive = complete

From Cambridge International Dictionary of English:

comprehensive (FULL)
adjective
complete and including everything that is necessary

We offer you a comprehensive training in all aspects of the business.
Is this list comprehensive or are there some names missing?
He has written a fully comprehensive guide to Rome.

If you have comprehensive insurance for your car, it financially protects any other vehicles and people that are involved in a car accident with you, in addition to yourself.

comprehensively
adverb
a comprehensively illustrated book
The plan has been comprehensively rejected.
Our local team [NVIDIA] was comprehensively defeated.


Sorry.. I added the NVIDIA bit above :P

Hehe... it's not a crime to be a f@nb0y once in a while is it? :P

Tahir2
18-Jun-2002, 12:43
Wow... the word f@nb0y is censored.. LOL

Good job too methinks ;)

One thing that puzzles me is the fact that the core clockpeed is so high. 350MHz on what is assumed to be .15nm technology and with approx 100 million transistors?

Matrox are having problems with 80 million transistors at .15nm technology, hence the rumoured clockspeed of 220MHz. I think it is best to take these numbers with a large dose of salt or ATI is in fact gunning for .13nm tech to fab these GPU's at.

Just my tuppence worth ..

BoardBonobo
18-Jun-2002, 12:50
Well this comes from the Inquirer, so it DEFINATELY has to be true <lol>, it seems the who's producing what on what has reversed!?

In fact, Nvidia is playing it rather safe – its next graphics foray will use 150 nanometer technology, while sources tell us that its bitter rival, Canadian firm ATI, will risk it for a frisket and go the 130 nanometer full hog for the R300. (http://www.theinquirer.net/18060208.htm)

Hmmm.

Tahir2
18-Jun-2002, 12:55
Interestinger and interestinger... looks like ATI have gone for the juggular!

That is if ANY of the above is true. :)

Maybe this is why NVIDIA are being so tight lipped right now, going back to the drawing board and making sure their NV30 is the fastest on the block when it comes out?

Not impossible to believe.

Gollum
18-Jun-2002, 13:40
Interestinger and interestinger... looks like ATI have gone for the juggular!

That is if ANY of the above is true. :).

Cosidering it kind off contradicts just about all the info we've had up to date that is questionable. Certainly not impossible but a bit unlikely given previous information, lets not forget where this news came from, hehe...


Maybe this is why NVIDIA are being so tight lipped right now, going back to the drawing board and making sure their NV30 is the fastest on the block when it comes out?

Not impossible to believe.

Sorry to ruin the party, but I think it is in fact impossible to believe (esp. based on anything the Inquirer "reports"). Its too late for any of that, if they'd go "back to the drawing board" they'd delay the NV30 by like almost a year and have no next-gen product to launch this fall (yet another NV2x as the only high-end part would be ridiculous). Its far too late now to really do any significant changes to NV30 architecture, besides trying to tweak every bit of perfomrnace they can in the next chip revision. Being tight lipped is pretty much standard procedure in these final stages, considering Nvidia will almost certainly come to market later than ATi, the best they can do is wait till the last minute before releasing any hard information on NV30 ...

PiNkY
18-Jun-2002, 13:41
[it can do 14XXX+ 3Dmarks...]

is that with or without splash screens...

hughJ
18-Jun-2002, 14:05
8 pipes with only 8 textures?

Fuz
18-Jun-2002, 15:15
[it can do 14XXX+ 3Dmarks...]

is that with or without splash screens...

Hehe, good call.

There will be two versions of the next 3Dmark.

3Dmark 2002 and Cg enhanced 3Dmark 2002SE (Splashscreen Edition. :lol:

MuFu
18-Jun-2002, 15:44
And this is taken from Hardocp front page.

The problem with the first revision was that it violated Intel's AGP spec, which allows 48W maximum power consumption across the port. The high consumption was most likely due to a power hungry memory interface (this is a 256-bit memory bus, remember).

The second revision of R300 was said to have arrived back from the fab about two weeks ago and solved the PS problem. However, the AIW version of the card may still require an external power supply (I noticed your Xabre 400 post this morning which reminded me to tell you this, lol).

http://www.beyond3d.com/forum/viewtopic.php?t=1285

;)

MuFu.

Gollum
18-Jun-2002, 17:59
"It nearly suggests.."

Comprehensive = complete

From Cambridge International Dictionary of English:

comprehensive (FULL)
adjective
complete and including everything that is necessary

We offer you a comprehensive training in all aspects of the business.
Is this list comprehensive or are there some names missing?
He has written a fully comprehensive guide to Rome.

If you have comprehensive insurance for your car, it financially protects any other vehicles and people that are involved in a car accident with you, in addition to yourself.

comprehensively
adverb
a comprehensively illustrated book
The plan has been comprehensively rejected.
Our local team [NVIDIA] was comprehensively defeated.


Sorry.. I added the NVIDIA bit above :P

Hehe... it's not a crime to be a f@nb0y once in a while is it? :P

While we're being f@anb0yish:

From Oxford Advanced Learners Dictionary:

"comprehensive / adj
1 that includes (nearly) everything"

thus comprehensive /= complete
but
comprehensive = (nearly) everything

Ironically on the backside of this 1600 pages monster it reads:
"This comprehensively revised and greatly expanded new edition of A S Hornby's classic and highly acclaimed dictionary..."

for what its worth ... :wink:

Tahir2
18-Jun-2002, 21:38
LOL

now thats funny shit !!!

Anyways everyone nows Cambridge kix Oxford's azz!!!!

Tahir2
18-Jun-2002, 21:43
Sorry to ruin the party, but I think it is in fact impossible to believe (esp. based on anything the Inquirer "reports"). Its too late for any of that, if they'd go "back to the drawing board" they'd delay the NV30 by like almost a year and have no next-gen product to launch this fall (yet another NV2x as the only high-end part would be ridiculous). Its far too late now to really do any significant changes to NV30 architecture, besides trying to tweak every bit of perfomrnace they can in the next chip revision. Being tight lipped is pretty much standard procedure in these final stages, considering Nvidia will almost certainly come to market later than ATi, the best they can do is wait till the last minute before releasing any hard information on NV30 ...

You are most likely 100% correct (hows that for a contradiction), but was it too late say 3 months ago?

Personal opinion: NV30 is waiting for .13nm technology and that is its reason for it taking slightly longer to appear than R300. However it is a bit of a stale point as the fact is neither card is available right now and well you never know NV30 might come out at the same time as R300 since neither company have commited to a release date.

Bambers
18-Jun-2002, 21:47
Comprehensive can mean all or most.

Rumours are fun :)

Doomtrooper
19-Jun-2002, 03:43
[it can do 14XXX+ 3Dmarks...]

is that with or without splash screens...

Hehe, good call.

There will be two versions of the next 3Dmark.

3Dmark 2002 and Cg enhanced 3Dmark 2002SE (Splashscreen Edition. :lol:

http://www.plaudersmilies.de/yellows/laughing.gif

Bjorn
19-Jun-2002, 07:27
Comprehensive can mean all or most.

Rumours are fun :)

But, why didn't they write full instead of comprehensive then ? 8)

SvP
19-Jun-2002, 08:04
This is copy-paste from Rage3D... not very new slides afterall 8)

This looks like an authentic ATI creation but I believe the slides are at least a year old and provide a "concept" overview of RV250 and R300 before they were actually designed/made. There seems to be a number of small clues which I think may suggest this.

Firstly, the R300 slide is full of vacuous statements such as "comprehensive DX9 support". At this point in the game, ATI KNOWS it has full DX9 support so why 'comprehensive' unless it was published at a time before DX9 was finalized. Their other claims, such as "Improved pixel & vertex shaders", "More complex higher order surface support", etc..., suggest to me that they (ATI) did not know exactly how they were going to do all these things just yet.

Secondly, The proposed 400+ MHz SDR/DDR memory clock suggests (to me) they are talking about 200mhz SDR and 200mhz (400mhz effective) DDR. Why? Reason 1, 400mhz DDR (effective 800mhz) is 2.5ns memory which is no where near available and certainly seems unlikely to be the 'minimum' clockrate of the memory on the R300 in the fall. Reason 2, SDR memory is no longer cost advantageous to DDR and no one uses it today. It seems out of place to be mentioned alongside R300 in an up-to-date presentation. A year or longer ago, however, SDR memory was used and overall memory speeds were in and around 200mhz SDR and 200mhz DDR, thus the basis for ATI's clock claims.

Also, the date in the corner is likely just a script or macro which displays the current date on the slide (meaning the ATI spy opened and recorded images of the slides on Jun 13th). I've seen this done on other PP presentations before.


http://www.rage3d.com/board/showthread.php?s=&threadid=33622984&pagenumber=2

Ailuros
19-Jun-2002, 08:13
Can someone enlighten me how long it takes for a card on average from the chalkboard design to reach the shelves (design--->shipping)? Isn't one year a bit too short for a board of such complexity?

Pete
19-Jun-2002, 20:44
ATi has multiple teams working simultaneously, much like 3dfx did with V5 & Spectre, and probably like nVidia has with its multiple chips (NV2A, NV25, NV17, NV17 mobile, MCP, ect.).

But if R300 is essentially just a bigger and better R200, I should think a company as large as ATi could pull off building on top of the R200 in one year.

</completely uninformed opinion still getting by with an Xpert 128>

Dave Baumann
19-Jun-2002, 21:26
But, why didn't they write full instead of comprehensive then ? 8)

DX9 is still in beta so there is still the possability of a slight ebb and flow of specification; until its 100% finailised nobody can claim anything for certain.

ben6
19-Jun-2002, 21:34
Tony Tamasi said last year at E3 that Nvidia has 5 design teams. So I imagine it's 1 team for new architechture i.e NV30 one for refresh part 6 months later and 3 other teams for integrated, mobile and what not. As Nvidia has done a lot of hiring since then and is still hiring (171 positions open on talent.nvidia.com right now, I imagine they may have added another team at least..)

Tahir2
19-Jun-2002, 21:49
I think that slide is meant to be old.

Anyway new information about the Rv250 says that it will integrate the TV decoder (?) and will be smaller in size physical and consume less power.. PLUS it will STILL be 4 pipelines not cut down a la MX.

If this is true it can only mean one thing.... .13nm technology.


Source - some Chinese site via some Russian site, via some American site (the internet is great) :P

Edit: that Russian site is claiming Rv250 and R300 are STILL .15nm

Lots of red herrings out there...ah well might as well sit back and wait.

MuFu
19-Jun-2002, 22:38
Or you can believe me when I say both GPUs are still fabbed on a 0.15u process (that was the case two weeks ago anyway). R300 rev2. draws about 25W so that's a fair amount. The memory subsystem alone pulls 12-15W. The boards now meet Intel's AGP spec though, which is good news. Performance is right on track.

Those slides are almost a year old. RV250 integrates a TV Out + 2nd display RAMDAC, BTW, no capture. Capture is handled by RT2 which forms the basis for the ATi daughtercards and adds component input, a secondary TDMS trasmitter and audio DACs. Like RT1, RT2 does not support MPG encode/decode so I'm guessing that R300 will use a DCT engine combined with software algorithms to encode MPG1/2/4 in realtime.

From what I know, the higher end RV250-based reference card (300MHz core, 300MHz BGA) performs very similarly to an R200 in most circumstances. This would only be the case if it had a reduced number of rendering pipelines, so I still think it'll have a 2/2 rendering array.

ATi are trying to secure 0.13u for early next year but TSMC are still tied up with Crusoe and C3. I am not sure what nVidia are doing but I can probably find that out too in a short while. ;)

MuFu.

Ailuros
19-Jun-2002, 22:54
I don´t think 2 weeks is enough time to flip manufacturing process :o

Mufu,

Any idea how much power the AIW version will draw?

MuFu
19-Jun-2002, 23:58
At the moment it's slightly over the AGP 3-rail spec of 48W, that's all I know. So it may need an external power supply. They'll probably use one anyway just to safeguard against product malfunction on low-Q mobos. Even R300 + one of its daughtercards might be a little dodgy since I believe the daughtercard draws power from the header on the main PCB. RT2 doesn't consume much power though, so it might be ok.

MuFu.

Grall
20-Jun-2002, 00:13
Why the heck use an *external* PSU??? We got a really phat one built right into every PC, conveniently hooked up to molex connectors that deliver an easy 50W of 12V power or so per connector. Use 'em!

3dfx did with the V5 after all.


*G*

Ailuros
20-Jun-2002, 00:35
Thanks Mufu I figured so.

MuFu
20-Jun-2002, 01:23
Why the heck use an *external* PSU??? We got a really phat one built right into every PC, conveniently hooked up to molex connectors that deliver an easy 50W of 12V power or so per connector.

"External" just means that the source isn't the AGP port. Drawing power from a header is perfectly feasible but to be useable it needs to be supplied across 3-rails in a tightly regulated form. I didn't mean a thunking great big separate mains PSU for the graphics card if that's what you were thinking. :D

MuFu.

Althornin
20-Jun-2002, 03:52
"External" just means that the source isn't the AGP port. Drawing power from a header is perfectly feasible but to be useable it needs to be supplied across 3-rails in a tightly regulated form. I didn't mean a thunking great big separate mains PSU for the graphics card if that's what you were thinking. :D

MuFu.
I also thought you meant a "Voodoo Volts" type of solution.

MuFu
20-Jun-2002, 18:02
I have no idea as to the nature of this "external" supply so I can't rule that out either.

Would suck a bit though, huh? :P

MuFu.

Doomtrooper
20-Jun-2002, 18:18
If it was external powered (which I see no reason why it shouldn't or couldn't) then it would be a Molex connector :wink:

Now why I have no problem with External powered video cards (the Voodoo 5500 I thought was a great idea). So much stability issues arrise when pushing the envelope on AGP voltage specs, especially cheap motherboards. If powered externally these issues are not a problem as long as the power supply is sufficent.
Todays powersupplies like mine a Enermax 430 watt has so many damn Molex Connectors for accessories that 50% of them are tied up and I have alot of drives :)
The Voodoo 5 didn't have any issues running on a AGP 1.0 spec board, some due to the minimal AGP support but voltage has alot to do with it also.
The way I look at it, if its a accessory like a hardrive etc then its no different for a card....I still see AGP 8X causing issues @.15 Micron with lots of transistors :-?

MuFu
20-Jun-2002, 18:25
If it was external powered (which I see no reason why it shouldn't or couldn't) then it would be a Molex connector :wink:

That's all I need, thanks Dt. ;)

MuFu.

John Reynolds
20-Jun-2002, 18:26
So long as the connector used stayed firmly in place, I couldn't care less about external PSUs for a video card. Oh, but what if the power cord gets accidentally pulled out? Well, what if the power cord on your PC's case gets pulled loose?

Doomtrooper
20-Jun-2002, 18:27
P.s My opinion is my own and not based on information given... :lol:

Seeing you put together all this info is truly well done Mufu, unfortunatley this is truly based off my own opinion :)

Doomtrooper
20-Jun-2002, 18:33
So long as the connector used stayed firmly in place, I couldn't care less about external PSUs for a video card. Oh, but what if the power cord gets accidentally pulled out? Well, what if the power cord on your PC's case gets pulled loose?

Never had a issue when I ran a Voodoo 5 pressed in nice and tight, in fact I had more problems with customers bringing in cheap towers (not mine :D ) that were so flimsy when they plugged their monitor in they pushed the video card right out of the AGP slot :lol:

MuFu
25-Jun-2002, 22:22
Just to keep the R300 rumour mill turning, here's some "speculation" regarding Rage Theater 2 (just C&P'd from Rage3D, thought you might be interested). This is pure guesswork formed from information freely available on the internet. :D


1. INTRODUCTION

RAGE THEATER 2™ is an advanced, single chip analog video decoder and stereo audio processor that forms the basis for the forthcoming ATi RV250/R300 daughtercard range. These DC's interface with compliant graphics cards to provide a truly orgasmic I/O experience. Ok?

RAGE THEATER 2™ captures Composite, S-Video, and Component analog video signals and converts them to ITU-656 compliant digital video. It also captures Sound IF or composite audio, performs audio demodulation and stereo decoding, and outputs the decoded audio via I2S, S/PDIF, or VIP ports. So basically it's the dogs bollocks.

HIGH LEVEL OF INTEGRATION - In addition to integrating the video decoder and stereo audio processor, RAGE THEATER 2™ integrates many of the other support functions (input signal selection muxes, ADCs, anti-aliasing filters, etc) required to decode analog television signals so you can watch encoded German porn without the boob-deforming pixelation.

WORLD-WIDE STANDARDS SUPPORT - RAGE THEATER 2™ has been designed to support the major analog television broadcast standards throughout the world, even in Canada and some parts of Wales.

STANDARD INTERFACES - RAGE THEATER 2™ incorporates industry standard interfaces on all input/output ports. So basically you can plug ANYTHING into it! Even your toaster.

POWER SAVING MODE - RAGE THEATER 2™ is designed so that blocks not being used can be powered down to reduce overall system power consumption. This isn't very important but it keeps tree-huggers happy.


2. BENEFITS AND APPLICATIONS

FIGURE 1 (hand-drawn, would you believe?!) - RAGE THEATER 2™ ARCHITECTURE (http://muthafunker.homestead.com/files/Ripper2architecture.jpg)

FIGURE 2 (MS Paint simply wasn't up to the job!) - TYPICAL SYSTEM IMPLEMENTATION WITH RAGE THEATER 2™ (http://muthafunker.homestead.com/files/Ripper2system.jpg)


3. VIDEO DECODER

DECODER INPUT - RAGE THEATER 2™ is capable of capturing video signals from all standard video sources such as broadcast systems (tuners), high-end VCRs, and DVD players and in all transmission formats. Specifically, RAGE THEATER 2™ supports:

• Composite, S-Video, and Component (including High Definition) video inputs
• Universal formats:
• NTSC North America and Japan
• PAL I, B, G, H, M, D, N
• SECAM D, K, L, B, G

In addition, an integrated video mux allows multiple sources to be simultaneously connected to RAGE THEATER 2™. RAGE THEATER 2™ can accept:

• Y:Pr:Pb Component Input and
• 5 Composite signals or
• 4 Composite and 1 S-Video signal or
• 3 Composite and 2 S-Video

DECODER INTERNAL PROCESSING - The video decoder is designed to deliver superior quality, low noise digital video signals for the most demanding applications. Features of the decoder include:

• Integrated Anti-Aliasing filters
• Dual 12-bit ADCs
• Automatic Gain Control (AGC) with programmable override option
• Advanced AGC & Clamp circuitry increasing immunity to noisy input signals
• Adaptive 2D Comb Filter
• Integrated high quality horizontal and vertical downscalers
• Continuous Sharpness control
• Peak White Detector ensuring input signals with suppressed synchronization are not distorted
• Support for square pixel, ITU non-square pixel, arbitrary scaling
• VBI data capture
• Macrovision detection circuitry

DECODER OUTPUT - Once processed, the digital video is delivered to the host processor via an 8 or 10-bit industry standard ITU 656 interface.


4. STEREO AUDIO PROCESSOR

The RAGE THEATER 2™ stereo audio processor demodulates and decodes the audio accompanying a TV channel to baseband left and right channels. Audio can be supplied from a tuner to the RAGE THEATER 2™ via the Sound IF or AF signals. Like the video decoder, the stereo audio processor is multi-standard and supports major audio television broadcast standards used throughout the world, namely;

• BTSC
• Dual FM
• EIA-J
• NICAM

Additional features of the stereo audio processor include:

• 16-bit audio output
• Multiple sampling rate support
• 32kHz, 44.1kHz, 48kHz, 96kHz
• Automatic Stereo and SAP/Second Language detection and processing
• Programmable Volume/Mute control
• Automatic Volume Control maintains constant volume even as volume of incoming signal varies

After demodulation and decoding, the audio can be delivered to the host via the VIP interface or to other system components via the I2S or S/PDIF digital audio ports.


5. AVAILABILITY

• I don't know... either late July or early August. :)

Regards,

MuFu.

Doomtrooper
25-Jun-2002, 22:29
:lol:

MuFu
26-Jun-2002, 01:38
Hehehehee... :D

zxzx
26-Jun-2002, 02:17
If they got the RAGE THEATER 2™ right, just for this ATI will be a winner, even if they can't reach the competition coming (NV30).(With the right pricing of course)

DemoCoder
26-Jun-2002, 03:14
The audio processor doesn't sound that advanced compared to the Audigy or MCP-X. I mean, these days, anyone who cares about home theater quality want to do Dolby Digital and DTS.

Are these phony specs? Why waste space on the video chip with audio processing, and stereo!

T2k
26-Jun-2002, 03:37
At the moment it's slightly over the AGP 3-rail spec of 48W, that's all I know. So it may need an external power supply. They'll probably use one anyway just to safeguard against product malfunction on low-Q mobos. Even R300 + one of its daughtercards might be a little dodgy since I believe the daughtercard draws power from the header on the main PCB. RT2 doesn't consume much power though, so it might be ok.

MuFu.
MuFu, is there any chance to get a bit higher core frequency? I mean, to reach the 'old' 8500's level (275MHz)?
I'm still hoping... :)

Nagorak
26-Jun-2002, 04:39
At the moment it's slightly over the AGP 3-rail spec of 48W, that's all I know. So it may need an external power supply. They'll probably use one anyway just to safeguard against product malfunction on low-Q mobos. Even R300 + one of its daughtercards might be a little dodgy since I believe the daughtercard draws power from the header on the main PCB. RT2 doesn't consume much power though, so it might be ok.

MuFu.
MuFu, is there any chance to get a bit higher core frequency? I mean, to reach the 'old' 8500's level (275MHz)?
I'm still hoping... :)

Supposedly ATI was having "problems" with the R200 (R8500) at launch, so they had to clock down to 275/275 and 250/250 instead of running at 300/300. Funny thing was my OEM Radeon 8500 (later the LE) clocked just fine to 300 core. So...I think even if they "can't ship" at 275, there's a good chance you'll be able to clock up to it. ATI needs to be overly conservative about their shipping settings, so even if it's not a great overclocker, that doesn't mean it won't overclock AT ALL.

Geeforcer
26-Jun-2002, 04:43
Why are people so fixated on GPU clockspeed? Operating frequency is just one part of the performance equation, the other being architecture. GF DDR at 120MHz was faster then TNT2 o/c'ed to 200MHz; GF3 at 200MHz was faster then GF2 Ultra at 250 MHz; R8500LELE at 225MHz is faster then R7500 at 290MHz, etc. If the chip provides great performance, it can run at 10MHz for all I care.

T2k
26-Jun-2002, 04:54
Why are people so fixated on GPU clockspeed? Operating frequency is just one part of the performance equation, the other being architecture. GF DDR at 120MHz was faster then TNT2 o/c'ed to 200MHz; GF3 at 200MHz was faster then GF2 Ultra at 250 MHz; R8500LELE at 225MHz is faster then R7500 at 290MHz, etc. If the chip provides great performance, it can run at 10MHz for all I care.

Thanks but I already know it very well. :P

My question was targeted about competitive capabilities in conjunction with NV30 - not simply about working frequency... :D

Nagorak
26-Jun-2002, 05:29
Why are people so fixated on GPU clockspeed? Operating frequency is just one part of the performance equation, the other being architecture. GF DDR at 120MHz was faster then TNT2 o/c'ed to 200MHz; GF3 at 200MHz was faster then GF2 Ultra at 250 MHz; R8500LELE at 225MHz is faster then R7500 at 290MHz, etc. If the chip provides great performance, it can run at 10MHz for all I care.

If you clock it higher, it's still faster...

MuFu
26-Jun-2002, 19:24
I would rule out 275MHz. In fact, I wouldn't even count on 250MHz but I'm sure we'll see it. Remember that the first revision was consuming about as much power as a 1GHz Athlon when clocked at 250MHz (or perhaps even less). Clockspeed isn't really important though. This an 8-pipelined card remember, so should still fare well even when multitexturing isn't used too much, unlike Parhelia with 4-pipes and a low clock (that's right isn't it? Someone who knows more about 3D graphics processing correct me if I am wrong please).

MuFu.

MuFu
26-Jun-2002, 19:27
The audio processor doesn't sound that advanced compared to the Audigy or MCP-X. I mean, these days, anyone who cares about home theater quality want to do Dolby Digital and DTS.

Do TV broadcasts come in DTS where you live? Do you capture home videos onto your computer in Dolby Digital?

MuFu.

JF_Aidan_Pryde
26-Jun-2002, 20:54
Why are people so fixated on GPU clockspeed? Operating frequency is just one part of the performance equation, the other being architecture. GF DDR at 120MHz was faster then TNT2 o/c'ed to 200MHz; GF3 at 200MHz was faster then GF2 Ultra at 250 MHz; R8500LELE at 225MHz is faster then R7500 at 290MHz, etc. If the chip provides great performance, it can run at 10MHz for all I care.

If you clock it higher, it's still faster...

:roll: I think you missed the point of the entire post..

T2k
27-Jun-2002, 06:22
The audio processor doesn't sound that advanced compared to the Audigy or MCP-X. I mean, these days, anyone who cares about home theater quality want to do Dolby Digital and DTS.

Do TV broadcasts come in DTS where you live? Do you capture home videos onto your computer in Dolby Digital?

MuFu.

Hoppa, stop here.
Especially I'm a DirecTV subscriber - and we are a few million people only in US!
(I already have about 140 channels and all the movie channels HAVE 5.1 sound carrier or minimum ProLogic.)
It's a big and growing market, don't forget...

Nappe1
27-Jun-2002, 06:32
I would rule out 275MHz. In fact, I wouldn't even count on 250MHz but I'm sure we'll see it. Remember that the first revision was consuming about as much power as a 1GHz Athlon when clocked at 250MHz (or perhaps even less). Clockspeed isn't really important though. This an 8-pipelined card remember, so should still fare well even when multitexturing isn't used too much, unlike Parhelia with 4-pipes and a low clock (that's right isn't it? Someone who knows more about 3D graphics processing correct me if I am wrong please).

MuFu.

I think this is the right place...

to some of you, "Nappe1's close calls" are already every day life, so here comes what I am expecting from R300:
- 0.15µm
- 107 000 000 transistors
- DX9 compliancy
- 8 pipelines with 2 texture units per pipe.
- Supports pipeline coupling (like Parhelia) or texel combining
- 256Bit DDR memory Bus.
- needs 10 layer PCB
- core: 175-230Mhz, Memory: 275-350Mhz DDR

so, I am not either expecting too much from core clock. somewhere near 200Mhz it should be considering the tech and amount of transistors. (Matrox never drives their cores to the limits on default, so I am not suprised if Parhelia could easily reach 250Mhz / 300Mhz when over clocking.)

MuFu
27-Jun-2002, 16:42
The audio processor doesn't sound that advanced compared to the Audigy or MCP-X. I mean, these days, anyone who cares about home theater quality want to do Dolby Digital and DTS.

Do TV broadcasts come in DTS where you live? Do you capture home videos onto your computer in Dolby Digital?

MuFu.

Hoppa, stop here.
Especially I'm a DirecTV subscriber - and we are a few million people only in US!
(I already have about 140 channels and all the movie channels HAVE 5.1 sound carrier or minimum ProLogic.)
It's a big and growing market, don't forget...

Do you wish to record DirecTV onto your PC with 5.1 sound? Even if you do, do you think there is a big market for that? Rage Theater 2 is a capture device. It's cheap to produce and you can add it to compliant RV250/R300-based cards easily via a DC. People who want to encode 5.1 sound for home movies/VCR-style recording will no doubt buy more expensive equipment! Adding such functionality to RT2 would introduce unnecessary cost to the chip and the daughtercard; at the moment the DC's actually cost LESS than the RT2; they are only 4-layer and have few components.

Not a lot of people's PC's form the basis for their home cinema systems; even less use their PC as a VCR and wish to add DTS/5.1 capture to it's range of abilities. No doubt the day will come when it's commonplace but until then I think just stereo audio capture will suffice. Besides... you can always use your plain old soundcard. :D

Do you know whether an 8-pixel pipeline card can single-texture twice as fast as a 4-pipelined one? I presume it can. In which case, clockspeed isn't too much of an issue. Even when single-texturing a 200MHz R300 should offer more effective fillrate than a Ti4600 (sadly a lot of benchmarks still reflect single-texturing ability; take Parhelia's plight for example). I'm sure we'll see 225-250MHz as the part is yielding at ~<250MHz.

MuFu.

MuFu
27-Jun-2002, 16:59
- needs 10 layer PCB


That may well be the case... PCB manufacturers may feel they need to use a ten layer PCB to maintain signal integrity at high clockspeeds. The design actually calls for an 8-layer PCB but alot of board manufacturers have stepped forward and said they cannot guarantee QC with that level of integration. A 10-layer board will no doubt add to manufacturing costs, but I don't think ATi will adjust their retail prices because of it. I'm sure they have a price point in mind (say $299) and if their profit margin becomes smaller they will probably just suck it up. Radeon 9000 is the main earner; Radeon 10000 is just a showcase and something to make geeks like us happy. :D

MuFu.

Randell
27-Jun-2002, 17:12
are you sure thats going to be the naming policy? ATI already had a bad rep for the whole R100 OEm v reatil speed fiasco, which was compounded with the 8500/LE clock speed issue.

ATI criticised nVidia for the Gf4MX product name.

For ATI to maintain credibility here they need to stick to their DX related naming policy and call the RV250 something like the 8750 and the R300 the 9500.

Chalnoth
27-Jun-2002, 17:37
I just have to comment on one thing: 128-bit color.

I find it highly unlikely that the R300 will support 128-bit color, unless, perhaps, it only supports it at reduced fillrate (i.e. ATI has found some way of combining two 64-bit pipelines to act as one 128-bit pipeline).

That is, if the R300 had four 128-bit floating-point pipelines, it would likely require around four times as much die space for the rasterizer. Then you have to add in the increased PS flexibility that is bound to exist, and you have a massive increase in size required in just the PS. I seriously doubt it's possible to have 128-bit pipelines at around 100 million transistors.

Additionally, I see no point in using 128-bit color for normal color ops (That is, no 128-bit framebuffers). 64-bit should be more than enough for any normal color operations.

Mize
27-Jun-2002, 17:40
128 bit is internal with 32 bit rendering. Any time you mulitply two numbers the error multiplies as well. This means that "32 bit" rendering often has an effective bit depth of only 12 or so. The idea is to obtain real 32 bit rendering by performing operations with 64 or 128 bits of precision.

Mize

Chalnoth
27-Jun-2002, 17:45
Even for internal rendering, I see no reason for 128-bit rendering for basic color operations. 64-bit rendering I can see, and 64-bit rendering should be good enough for up to around 12 bits per channel with little to no loss in most situations.

I have heard that some specialized operations will require higher bit depths, though I'm not currently certain what those are...I think for more complex math ops.

And the use of floating-point should reduce or eliminate the multiplication problems.

Mize
27-Jun-2002, 18:57
Bits are bits and quantization errors will be a problem whether FP or INTs are used if many operations are performed.

Basic
27-Jun-2002, 19:51
Mize:
Would you mind detailing which of those numbers were per component, and which were for all? I can't make sense of the 12bit part.

Chalnoth:
Most critical calculations would be with subtractions, like if you want to generate gradients for bumpmapping from a heightmap in a pixel shader. Using the color as coordinates for dependant reads could also be worse for precision than just general multiplications. Still, I agree with what you say about 128 bit color.

Mize
27-Jun-2002, 20:08
For any variable bits = dynamic range. If you do lots of math you get rounding errors (aka quantization noise) that eat up your effective dynamic range. With today's and tomorrow's games a 32 bit graphics card might wind up with an effective dynamic range of only 12 bits for all components simply because of the number of times the values have been multiplied, etc. This is easily demonstrable by the banding that appears in games even at 32 bit depth (banding you wouldn't see in bitmap). By going to 64 or 128 bits of internal processing it would be possible to render with an effective bit depth much closer to 32 since quantization noise would be "in the noise" so to speak. I.e. process at 128 bits and accumulate 32 bits of noise then render to 32 bits and you wind up with 28 bits of effective dynamic range.

I'm no expert in 3D rendering, but this is how data acquisition and processing works and noise is noise regardless of the data being processed.

Mize

Typedef Enum
27-Jun-2002, 20:17
MuFu,

I know you've got some ATI contacts and what-not...What are you hearing, as far as an R300 product announcement? July? August?

Doomtrooper
27-Jun-2002, 20:51
are you sure thats going to be the naming policy? ATI already had a bad rep for the whole R100 OEm v reatil speed fiasco, which was compounded with the 8500/LE clock speed issue.

ATI criticised nVidia for the Gf4MX product name.

For ATI to maintain credibility here they need to stick to their DX related naming policy and call the RV250 something like the 8750 and the R300 the 9500.

The 8800 is still available...FireGL 8800 is not a gaming card. There may be different flavors of both Rv250 and R300 also :wink:

MuFu
27-Jun-2002, 21:28
MuFu,

I know you've got some ATI contacts and what-not...What are you hearing, as far as an R300 product announcement? July? August?

I swear I don't have ATi contacts at the moment. Early August has always been the target launch window for R300. I think the second revision is qualified and ready to go. This chip needs to ride on the back of DirectX 9 (or should that be the other way around?! Hehe...) so while they are holding back for that they will be trying to put extra effort into the software as well as tieing up a few other things unkown to me. The "geniuses" of ATi won't be getting much sleep, optimising the drivers to make best use of the memory controller right now, no doubt. Apart from that; I know there were some QC issues regarding PCB manufacture but they can be solved by moving to a 10-layer PCB if necessary. I have not heard anything to indicate that the launch target date won't be met, in fact; quite the opposite. Things seem to be moving along nicely. :D

Regarding the names; just a personal feeling I've had for a long time that we'll see "Radeon 9000" and "Radeon 10000" debut in late July/early August respectively. They seem to make the most sense to me (albeit at the compromise of not having a DX-association). These will be accompanied by LE versions; the 10kLE may well be the card that a lot of us end up buying, but the 9kLE is sure to cause some confusion due to different memory configurations. There may well be 3-4 different types of Radeon 9000 with more, such as the RV25A from PowerColor, from 3rd parties.

The daughtercard idea really excites me. You can see how much Rage Theater 2 (internal codename "Ripper 2") can bring to a standard solution; I believe the only exception is that RV250 cannot support component input so may use different DCs to R300. BTW, in case you hadn't noticed those specs are almost identical to those of the already-announced Theater 200. That's where I got them from; the two chips are virtually identical with RT2 only differing by lacking some set-top functionality.

Another thing that you might be interested in; there is a dipswitch on RV250s that can switch between PAL and NTSC output without having to flash the BIOS. Not sure about R300 although I'm pretty sure it will feature the same, very elegant solution.

MuFu.

Gunhead
27-Jun-2002, 21:43
256 colours should be enough for everyone.

Basic
27-Jun-2002, 21:57
Mize:
Would you mind giving a link to some screenshots with banding that make 32bit look like "effective 12bit". And explain how an internal 128bit would make it better (over internal 64bit). All with an 32bit framebuffer.

Btw, I'm not a newbie to numerical methods, and the hint here is that I think you've overestimated the effects of error propagations. At least with multiplications.

demalion
27-Jun-2002, 22:05
How did the rumors that DX "9" might be called DX "10" turn out? That is the only way ATi won't be pulling a GF4 MX type maneuver...unless DX "9" can be effectively implemented with the 8500's functionality (related to that "R200 reference platform for DX 9 development, R300 target platform" that I seem to recall), in which case they'll only be pulling a "Ti has new features" type of maneuver, :lol: .

In any case I hope the RV250 is not called 9xxx, if it doesn't offer any enhanced functionality in regards to R200. :-?

Tahir2
27-Jun-2002, 22:10
I believe Mize said it was no reproducable in .bmp format from a screen grab.

However I have seen this 'banding' myself in 3DMark 2001 @32bit and some games were a lot of 'smoke' or 'fire' effects are used. Even racing games like GP4. Higher precisions offers cleaner colours, for example the difference in Dot3 precision from R200 and GF4 as shown in The Tech Report.

Basic
27-Jun-2002, 22:23
Errors in smoke come from low frame buffer precision, since it's done as lots of single textured polys. So it wouldn't be helped by a 128bit => 64bit internal increase. (The internals isn't the limiting factor.)

Can I bother you for a link to "The Tech Report" you're talking about?

Tahir2
27-Jun-2002, 22:31
np... here you go:

http://www.tech-report.com/reviews/2001q4/r200/index.x?pg=1

This is what I was talking about specifically:

http://www.tech-report.com/reviews/2001q4/r200/7.jpg

The URL:

http://www.tech-report.com/reviews/2001q4/r200/index.x?pg=5

Basic
27-Jun-2002, 22:51
That's a demo from ATI to show off the larger range of values in their PS registers. The scaling is such that a GF3 will saturate some intermediate results, but different scaling would remove the saturation and make the differences smaller, if at all visible. So this isn't about number of bits, but where to put the decimal (binal) point.

Either way, the R200 image is rendered with less than 64 bit per color. If there is anything in the R200 image missing due to low precision it would be fixed by 64 bit. Wouldn't you say? Then there is a massive unused presicion wasted if you go to 128 bit.

[Edit]
Forgot: Thanks for the link.

Tahir2
27-Jun-2002, 22:59
I had originally thought that the R200 rendered internally at a higher precision to acheive these effects. Also I don't believe the term pixel shaders is really necessary as that term was not invented before Dot3. I would love to see this same image replicated on a Radeon vs a Geforce 2.

I don't know the author of that particular demo but thought it may help you see that rendering internally higher does have a preceived effect on the output image even if it is 'only' at 32bit in the end anyway. Something similar to what the Kyro manages when rendering at 16bit externally (onscreen) but 32bit internally.

It is not confirmed that 128bit will be used at all in the next ATi part and I have not seen the final specs for DX9 stating that 128bit fp accuracy is needed to be 'compliant' with its spec.

Mize
27-Jun-2002, 23:09
Mize:
Would you mind giving a link to some screenshots with banding that make 32bit look like "effective 12bit". And explain how an internal 128bit would make it better (over internal 64bit). All with an 32bit framebuffer.

Btw, I'm not a newbie to numerical methods, and the hint here is that I think you've overestimated the effects of error propagations. At least with multiplications.

Don't have 'em.
My background in these issues comes from the data acquisition and mathematical modeling world (I did my PhD on wavelet transforms for modeling of elastodynamic wave propagation in anisotropic solids) so I'm not the best person to answer any questions in the 3D graphics world.

Nonetheless, I've certainly encountered situations where a 32 bit word was inadequate to get useful data at the end of a 4D Laplace transform :)

Mize

LittlePenny
27-Jun-2002, 23:20
I was under the impression the extra bits would be most helpful with floating point precision. With 32bits the numbers may only be accurate up to the 5th or 6th unit. So an accurate number could be 123456.0 12345.6 1.23456 and so on. A extra benefit of increasing your precision would be for color calculation.

Basic
27-Jun-2002, 23:54
I don't object to higher precision.
5:6:5 => 8:8:8:8 Is a massive difference.
8:8:8:8 => 12:12:12:12 (or whatever ATI use) Shold give a visible difference.
12:12:12:12 => 16:16:16:16 Could quite possibly look different in some demanding situations.

But you've realy started to get diminishing returns.

16:16:16:16 => 32:32:32:32 Would be useless in pretty much all cases. Not to mention the massive hardware cost. I don't even understand why it's discussed. (Ehrm oops, I'm discussing it. :oops: )


I talked about PS because that demo is done with PS1.1. It's a demo that ATI used when introducing R200. (The demo continues with some PS1.4 stuff later.)

quattro
28-Jun-2002, 00:26
this goes very technical right now. but we can put it really simple, for anybody that doesn't have a clue about discussion that developed.
you know scanners, folks? that thingies used for converting stuff on paper to a computer image? well, those thingies alredy scan the paper at higher than 32 bit color precision for a few years now. 36 and 48bit are fairly common, with higher end scanners using even 96 bits per pixel (and that is consumer, not pro level). yet the final picture is 32 or even 24 bit. that higher precision is used only to scan the picture as faithful to the original as possible, with no scanning artefacts. the same thing is coming now to graphic chips. And what we call overkill today, just might be a must tomorrow:)

oh, i know the more tech savy will yawn @ my post. but i believe there are many lurkers that don't quite get it:)

OpenGL guy
28-Jun-2002, 00:34
I was under the impression the extra bits would be most helpful with floating point precision. With 32bits the numbers may only be accurate up to the 5th or 6th unit. So an accurate number could be 123456.0 12345.6 1.23456 and so on. A extra benefit of increasing your precision would be for color calculation.
32-bit IEEE floating point values are 1 bit for sign, 8 bits for exponent, and 23 bits for the fraction. To get the amount of precision you can achieve, take (23 + 1) * log 2 = 7.2. This means you can achieve 7 digits of precision in base 10. (The extra +1 is because of the implicit 1 in the mantissa, see example below.)

If you went to 64 bits, the IEEE spec is 1 bit for sign, 11 for exponent and 52 for fraction, giving (52 + 1) * log 2 = 15.95, or 15 digits of precision.

However, what limits actual precision is how numbers are combined. For example, if you took 1/3 and wrote it as a 32-bit floating point value, then you can only get an approximation. Add it to itself a few times, and the error grows. However, if you instead took 1/3 and wrote it as a 64-bit floating point value then added it to itself and converted it to 32-bit, you would get a more accurate result.

P.S. Exponents are handled in a strange manner in the IEEE spec. For example, if you wanted to write 1.0 in 32-bit IEEE it would be 0x3f800000. The exponent is 0x7f which equals 0 because you subtract 0x7f = 127 (the bias) from the exponent to get the real exponent. This gives a range of -126 to 127 for the exponent. The mantissa is 1.F where F is the fraction. F is 0 here so we have a mantissa of 1. This effectively adds 1 bit of precision. When the exponent is 0xff, this is a special case: If mantissa != 0, then the value is NaN (not a number); if mantissa = 0 and sign is 0, then value = infinity; and if mantissa = 0 and sign is 1, then value = - infinity. There's are other special cases when the exponent is 0.
P.P.S. See http://www.psc.edu/general/software/packages/ieee/ieee.html for a full description.

LittlePenny
28-Jun-2002, 01:27
32-bit IEEE floating point values are 1 bit for sign, 8 bits for exponent, and 23 bits for the fraction. To get the amount of precision you can achieve, take (23 + 1) * log 2 = 7.2. This means you can achieve 7 digits of precision in base 10. (The extra +1 is because of the implicit 1 in the mantissa, see example below.)


Ok, well I just counted some ulps from some SPIM output and you are right. :oops:

Mintmaster
28-Jun-2002, 01:57
I just have to comment on one thing: 128-bit color.

I find it highly unlikely that the R300 will support 128-bit color, unless, perhaps, it only supports it at reduced fillrate (i.e. ATI has found some way of combining two 64-bit pipelines to act as one 128-bit pipeline).

That is, if the R300 had four 128-bit floating-point pipelines, it would likely require around four times as much die space for the rasterizer. Then you have to add in the increased PS flexibility that is bound to exist, and you have a massive increase in size required in just the PS. I seriously doubt it's possible to have 128-bit pipelines at around 100 million transistors.

Additionally, I see no point in using 128-bit color for normal color ops (That is, no 128-bit framebuffers). 64-bit should be more than enough for any normal color operations.

A couple of points:

Your idea of reduced fillrate at 128-bits makes perfect sense. Lets assume that R300 is an 8-pipe part with a 256-bit bus and equal memory and GPU clocks. That makes for 64 bits of bandwidth per pipe per clock. It would be totally useless to have full fillrate at 128-bit.

As for saying 128-bit is four times the die space of 32-bit on R200, there are a couple of things to consider. First, R200 already has around 16-bits of internal precision per channel - they mention it in some of their PPT presentations. On the other hand, making a calculation twice the bit-depth require more than double the transistors if you want to keep the clock the same. Think of adding 000001 to 1111111, and then double the number of bits - that one carries all the way through, lengthening the longest path by two unless you sacrifice more transistors to make a carry lookahead.

Basic
28-Jun-2002, 02:00
Numerically evaluated 4D Laplace transform. Hmmm... I guess you had a reasonable size in each dimention too. I'm not surprised if you got some nasty error buildup. But that's a *much* nastier problem than general graphics calculation. It involves lots of sum of long products where the terms can have arbitrary sign, and then do it over and over again. Manny places to get cancelation, the evil foe of floating point numbers.

In a pixel shader, most stuff is adding positive terms like adding many lights, or alpha blending. You can get cancelation in dot products, but it's usually not a problem since the result is used in ways were absolute errors are a more interesting measure than relative.

Worst cases are probably (as said before) differentiating a heightmap, or dependant texreads with high resolution in the dependant map.

And then add that you probably needed a lot higher precision in the output of your Laplace transform than what is needed in a final color.

Sorry for the technical text, but it's mostly pointed to Mize.
The short version is that floating point numbers usually are rather good at keeping the precision, as long as you don't get cancelation. Cancelation is what you get when you subtract two numbers of similar size, which cause the relative error to suddenly take a jump. Ie: 1.2934536 - 1.2934525 = 0.0000011 sudden jump from 8 valid digits to 2.

Going to bed.

eSa
28-Jun-2002, 02:06
It's nice see some technical discussion for a change.

I have done some reading about HDR (High Dynamic Range) rendering lately and it's been really interesting. nVidia seems to be advocating the HDR as one of the new "you really need nv30 for this" gimmicks. That is, they are giving courses&lectures about dx9 and future hw and HDR seems to one popular topic.

To come back to this threads original subject, it' seems that HDR is indeed application where you really can use say 16:16:16:16 format for the enviroment map and internal rendering precision. And it will make a HUGE difference 8)

For those of you who are not that familiar with subject, maybe a little intro about HDR is needed :

Basic idea : dynamic range in natural and articifial lighting is huge. Think about the photo featuring sunny day, lots of bright light, say a small bond and a little tree. This photograph includes very bright light (sun, reflection in bond), medium light (grass etc.) and low light (tree shadow). So, we have huge amout of colors with huge range of different intensities. Also way the intensities distribute is not linear, there maybe more bright and less dark colors or vice versa. Standard 8-bit computer gfx intensity range isn't enough.

So, we want to mimic the photographs lighting quality when rendering 3d images with computer. What we do is :

We take several photographs of ball with mirror surface. Each of the photographs has different amount exposure. Using clever algorithm, we build a single image of mirror ball where each of the pixel has r,g,b components with high dynamic range (16-bit or 32-bit etc.). We can now cover a lot better the intensity range appearing.

Using this image as enviroment map we can light the 3d object with very realistic way.

Enviroment map can be improved by taking several HDR images from the mirror ball from different directions and combining these. HDR maps can also be build with synthetic rendering process.

There's already some research done with realtime versions of the HDR lighting using 8-bit multitexturing. But it's obvious that when hw will support for example 16/32-bit textures (per component!!!), 64-bit internal precision and maybe even 10-bit framebuffer we will get A LOT nicer gfx. Basically lighting quality will be equal what can achived when using offline HDR rendering. And it will be fully realtime :)

Gollum
28-Jun-2002, 02:12
One of the most interesting uses of HDR maps is using them for easy authentic looking global illumination/radiosity renders - wonder when we will see 3d chips with radiosity support, hehe... ;)

Mintmaster
28-Jun-2002, 02:36
I have one more important point to make about this subject, directed both at Chalnoth's last statement and EVERYONE else talking about high bit depth futility.

You are missing the point of additional bit depth - we are talking about pixel shaders here, and they can be used for a whole lot more than just colour buffers. There are many reasons to have higher bit depth. Here are some examples:

-NVidia has a water simulation demo, but you only get an 8-bit height field, which can cause severe artifacts. Height maps definately need higher precision to work properly.

-I have a pixel shading enhanced shadow map algorithm I'm planning to implement, and I need three high bit depth channels to render to. I'm stuck in 8-bit right now, however, making for very limited range shadows. Seeing how a 24 or 32 bit Z-buffer is needed for minimal Z errors, you can see my point for the shadow maps.

-There's a Mandelbrot set demo somewhere done with pixel shaders. I did a Pascal program for that for fun one day and used double precision (64-bit) for my calculations. You start zooming in to explore all the intricacies, and before you know it, you get blocks much larger than the screen resolution due to precision limits. I could see this happening with procedural shaders.

The list goes on and on. Developers have to stop thinking about effects as simple blends with adds and multipies. These video cards can do things far beyond these limitations - they just need to get creative.

Mintmaster
28-Jun-2002, 02:47
Good point, eSa and Gollum. I completely forgot about HDR lighting. They're very important for realistic reflections off things like cars, as dim reflected surfaces amplify the difference far more than what you see straight up.

For example, when rendering the sky and trees, the sky would appear on the monitor as a bright colour (say RGB=0.9,0.9,1.0) and grass would still have a fair amount of brightness (say RGB=0.1,0.5,0.1).
In the reflection off a black car, however, the sky barely decreases in intensity (say down to RGB=0.7,0.7,0.8), but the grass is almost black (say RGB=0.0,0.0,0.1)

Chalnoth
28-Jun-2002, 05:47
As for saying 128-bit is four times the die space of 32-bit on R200, there are a couple of things to consider. First, R200 already has around 16-bits of internal precision per channel - they mention it in some of their PPT presentations. On the other hand, making a calculation twice the bit-depth require more than double the transistors if you want to keep the clock the same. Think of adding 000001 to 1111111, and then double the number of bits - that one carries all the way through, lengthening the longest path by two unless you sacrifice more transistors to make a carry lookahead.

Note that I said four times the die size if it were to have four 128-bit pixel pipelines. I'm not sure whether or not it would be possible to have four 64-bit pipelines that can also act like two 128-bit pipelines (You'd have to ask an actual engineer that has actually tried to do it with floating-point pipelines...).

As for the R200 currently having 16 bits of internal precision per channel, I highly doubt it. While it is certainly true that all modern video cards do have somewhat higher internal precision, I don't believe any go all the way to 16-bit. After all, 10-bit alone would be enough to keep the error from going beyond 8-bit in most situations. Anyway, the easiest way to see that all modern video cards have higher-precision internal pipelines is just to look at texture filtering. Neither bilinear, trilinear, nor anisotropic decrease effective bit depth. And believe me, particularly with the higher degrees of anisotropic, if all calculations were done at 8 bits per channel, there would be very, very noticeable drops in effective bit depth, and even more when four or more textures were in use.

In other words, for most internal calculations, there is actually no reason to go above 32-bit color for maintaining color precision (Assuming you have an 8-bit DAC, of course). Currently, the main reason for going above 8-bit internal precision would be for color operations that require multiplication. One of the most obvious is gamma adjustment. If you turn the gamma way up on any of today's graphics cards, you'll get massive banding. Currently, only the Parhelia looks like it may solve those problems (though I would expect the R300 and NV30 will as well).

The reason that 64-bit color should be enough for any basic color operations, however, is for internal calculations that require multiplication (such as some lighting effects, and the gamma seen above). The reason why 128-bit will never be required for such basic color ops is simply that 64-bit color will use 16-bit floating point numbers per channel. Floating-point numbers should have no problem with multiplication. Since there is no need for a sign in 3D color ops, I currently expect that the calculations will use a 12-bit mantissa with a 4-bit exponent. Given higher internal bit depth calculations, this should be enough for accurate 12-bit color reproduction (which is professional-level). Hopefully both the R300 and NV30 will fully support 64-bit floating-point color, and will also support flexible enough programming that there is no longer any need for multipass rendering (All you'd need is virtualized program storage, so that there is no limit to the length of a PS or VS program).

Now, as other posters have stated, there may be a need for higher than 64-bit calculations for non-color ops. In fact, for some ops, there may be a need for up to 256-bit calculations (64 bits per channel). However, this would probably be better-handled by a flexible vertex shader unit. Toward this end, it might be good to have the PS unit capable of up to only 16-bit floating-point calcs, but have the capability to use the 64-bit precision of a vertex shader unit to operate on textures. This would be very slow, obviously, but performance will continue to improve very quickly into the future...

Disclaimer:
I don't really know for sure if today's vertex shader units actually use 64 bits for each x, y, z, and w value for vertices, but it does make sense given the lack of rendering errors seen in today's cards (32-bit floats generally aren't enough, in my experience...).

g__day
28-Jun-2002, 11:20
BTW - I went to that Inquirer (what often low quality, biased reviewers) web site to read more about ATI using 0.13 micron and how it flopped for NVidia.

At the bottom of that article is a retraction :) saying they got it back to front (oh really - well that's a balls up isn't it). Of course NV30 is 0.13 micron and R300 is 0.15 micron.

Tahir2
28-Jun-2002, 19:13
Well this is all way above my head but it is very interesting to learn about.

:)

Basic
28-Jun-2002, 19:24
I've applauded P10 before for having "infinite" precision by combining registers. Probably done like you do 16 and 32 bit arithmetic on a 8-bit processor. (One bad part though is that they probably only use integers.)

And I would like it just as much if some other company did the same thing. As long as it doesn't cost many gates. (Which implies a big performance hit when used, likely x2 for additions and x4 for multiplications or x3 with slightly reduced precision.)

I have no problem seeing algorithms that could use such high precision. And I was thinking of stuff like nvidias PS wave demo before they announced it. And even about doing real geometry calculations in the pixel pipe. There's also image filtering algorithms that could use 4x32bit floating point.

But optimizing hardware for those cases would be bad design. If it can't be done with just a small percentage increase in gatecount, then it's better to rethink if those algorithms maybe belong better somwhere else than in the PS. Again, if it's done at small gate cost, then great.
When trying to do such algorithm it's better to be careful to make full use of low precision.

The size of a multiplicator is O(N^2) where N=precision. For low precision there's a O(N*log(N)) term that might be dominant, but the square term take over for high precision.
Extending precision by doing the arithmetic "in pieces" is reasonably easy for integers, but looks harder for floating point.


About HDR:
Yes, that certainly need more than 32bit color, but 64bit color with floating point should be plenty. The key being floating point.

About Mandelbrot:
Doing it in a pixel shader is just the kind of twisted thinking I could do myself. But if the precision wasn't enough when zooming in, I certainly wouldn't blame the hardware for being bad.

About floating point color:
I've still not decided how I would like the floating point to be done, with block exponent or individual exponents. In many cases the block exponent would work just fine, and it would give more bits to the mantissa. It Could be necesary to have one exponent for RGB and one for A, since they could be unrelated. Then again, sometimes you'd like to use a vec4 as two vec2 (think texreg2ar and texreg2gb), and then it's better to have one exponent for AR and one for GB.


Chalnoth:
I've seen several references to nvidias VS being 4x32bit, and it seems like a reasonable precision. Yes there can be cases where you'd need to be careful, but normaly it's OK.

Mintmaster
29-Jun-2002, 05:04
About Mandelbrot:
Doing it in a pixel shader is just the kind of twisted thinking I could do myself. But if the precision wasn't enough when zooming in, I certainly wouldn't blame the hardware for being bad.


Well sorry, but it was just a quick program using z'=z^2+c straight up, and done when my programming knowledge was in its infancy :) . I didn't do any extra precision modifications/tricks, though I see your point.

Still, extra precision is always useful, and I was just giving examples.

Mintmaster
29-Jun-2002, 05:30
[quote=Mintmaster]
As for the R200 currently having 16 bits of internal precision per channel, I highly doubt it. While it is certainly true that all modern video cards do have somewhat higher internal precision, I don't believe any go all the way to 16-bit. After all, 10-bit alone would be enough to keep the error from going beyond 8-bit in most situations.

Okay, fine. Lets say 10-bit to cover all normal blending. That's for all values between 0 and 1. But all of Radeon 8500's internal combiners have a range from -8 to 8. That makes for another 4 bits. We're up to 14 now, which is pretty damn close to 16-bits per channel.

Also, a rep from ATI posted this at Rage3D to help someone having precision issues doing shadow maps:
http://www.rage3d.com/board/showthread.php?s=&threadid=33604028

He says between 12 and 16 bits of precision.

Crusher
29-Jun-2002, 05:58
-NVidia has a water simulation demo, but you only get an 8-bit height field, which can cause severe artifacts. Height maps definately need higher precision to work properly.

I haven't seen this (I assume it's using DX8 hardware functions, and I don't have a DX8 card yet); is the heightmap used directly by the GPU to generate the geometry, or is it still parsed by the CPU? If it's done in software by the CPU, there's not really anything restricting you to using 8 bits. That is, most heightmaps I've seen use a greyscale by choice, not because it's necessary. I don't see why you can't just use full 24 bit color, it's just a lot harder for the artists to generate them. The easiest way to do that would probably use something like the HTML color spec, where 000000 is your base level, and FFFFFF is the highest point, and I can't imagine it would be too hard to implement a tool that allows the artists to paint with intensity that covers the whole range of colors. Of course, if 8 bits isn't precise enough, 24 bits is probably excessive. I can't imagine a graphic engine using enough polygons to take advantage of all the possible elevation points. By the time you reach that level of detail, you should have something better than heightmaps to go by.

Chalnoth
29-Jun-2002, 11:00
Okay, fine. Lets say 10-bit to cover all normal blending. That's for all values between 0 and 1. But all of Radeon 8500's internal combiners have a range from -8 to 8. That makes for another 4 bits. We're up to 14 now, which is pretty damn close to 16-bits per channel.

Yes, that would make for at least 12 bits per channel, in certain stages of the pipeline. There's nothing there for me to believe that 12 to 16 bits per channel would be used for all operations. I don't currently see why ops like texture filtering or FSAA would require the higher precision.

And given the already higher precision from the higher range, I somewhat doubt that ATI bothered to add in the extra bits for absolute 12-bit precision under those ops After all, you're not going to be doing more than around six or so such ops with colors in this range...there's not much reason to. But, I suppose only ATI engineers currently know exactly what the internal precision at each stage of the Radeon 8500 is. I just see no evidence that it's much higher than 8 bits per channel in most stages.

MuFu
29-Jun-2002, 15:52
A "not-irrelevant" snippet from a Parhelia interview over at Gamers Depot...

GD: John Carmack has been quite public about his desire for 64bit color, do you think you guys "missed the mark" with only doing 10-bit?

Kamran/Matrox
"The goal of full 64-bit color depths (or 16-bits per color channel) is to allow very high precision to remain in all color values to minimize artifacts in demanding multi-pass next generation 3D rendering techniques. There is no question that these very high color formats will be important to enable better image quality in the photo-realistic 3D titles of the future (we may even see a move to 32-bits per color channel eventually) and that they are the right thing to do...

I'm pretty sure R300 has two fully integrated 30-bit RAMDACs (anything over 10-bit/channel output is wasted, of course) but totally agree that 128-bit internal rendering precision is a distinct possibility, especially given the complexity of the blends we may start to see with the advent of DX9.

I'd feel guilty gaming with a card that assigns 32-bit accuracy to the alpha channel though. Seems like a bit of a waste. :-? :D Is the alpha accuracy in 32-bit ( 8:8:8:8 ) mode even used fully by developers? Doubt it...

MuFu.

Chalnoth
29-Jun-2002, 16:00
I do think that JC is mistaken that we will move to 32 bits per channel for framebuffer and most normal color ops.

However, I will admit that any pipeline should certainly not be just 16-bit all the way through. A few optimizations are necessary to make 16-bit good enough:

1. Slightly higher internal precision to minimize errors.
2. Center all errors about zero.

With these two, 16-bit color should be enough for 12-bit DACs in any situation. The only places where 16 bits per channel might not be enough is when the pixel shader is used for non-color data.

andypski
29-Jun-2002, 16:59
I'd feel guilty gaming with a card that assigns 32-bit accuracy to the alpha channel though. Seems like a bit of a waste. :-? :D Is the alpha accuracy in 32-bit ( 8:8:8:8 ) mode even used fully by developers? Doubt it...

MuFu.

You don't want the alpha channel treated differently to the colours as it isn't orthogonal - this is very important now that we have moved to a shader based system as the meaning of 'alpha' and 'colour' are much less distinct. You could be dealing with any types of value in the shaders, not just colours - you don't want to be artificially limited by some preconceived notion of what these values mean. At the output stage when writing to the frame buffer you are implicitly converting values into colours, but while they are in the shader they only mean what you, as the programmer, want them to mean... :D


I do think that JC is mistaken that we will move to 32 bits per channel for framebuffer and most normal color ops.


Define a 'normal' colour op. :) You are limiting yourself by thinking of things as colour ops. Think of them as arithmetic operations, and do away with this colour misnomer for good... :wink:

Note that in a very complex multi-pass calculation you may want to pass high-precision data from one shader pass to the next, which requires high-precision intermediate formats in the frame buffer if you don't want to sacrifice your accuracy. JC has talked in the past about his interest in the Stanford research into doing any Renderman shader in multi-pass, which requires floating point in the shader and the frame buffer, and I don't think he is mistaken at all about this being the future direction...


- Andy.

Mintmaster
29-Jun-2002, 19:10
-NVidia has a water simulation demo, but you only get an 8-bit height field, which can cause severe artifacts. Height maps definately need higher precision to work properly.

I haven't seen this (I assume it's using DX8 hardware functions, and I don't have a DX8 card yet); is the heightmap used directly by the GPU to generate the geometry, or is it still parsed by the CPU? If it's done in software by the CPU, there's not really anything restricting you to using 8 bits. That is, most heightmaps I've seen use a greyscale by choice, not because it's necessary. I don't see why you can't just use full 24 bit color, it's just a lot harder for the artists to generate them. The easiest way to do that would probably use something like the HTML color spec, where 000000 is your base level, and FFFFFF is the highest point, and I can't imagine it would be too hard to implement a tool that allows the artists to paint with intensity that covers the whole range of colors. Of course, if 8 bits isn't precise enough, 24 bits is probably excessive. I can't imagine a graphic engine using enough polygons to take advantage of all the possible elevation points. By the time you reach that level of detail, you should have something better than heightmaps to go by.

Actually the method is entirely different from how you're picturing it.

There is just one big polygon (or more to fit a more complex shape). In any case, you don't have vertices for the waves of the water. The pixel shader renders to textures, where each pixel in the texture is a height point of the water. The pixel shader does the simulation by writing these textures.

Then, the texture, which is a height map, is converted into a normal map by looking at adjacent heights. This gives you the ability to do bump mapping. This is were precision becomes an issue, as adjacent points often have very small differences, so 8-bit is very annoying. Trying to expand the height map into 4 channels for extra precision (or even 2) is somewhat complicated an may not fit into the pixel shader's instruction limits. It would greatly reduce the number of pixels per clock as well.

Effectively, you're harnessing the massive parallelism of the pixel shader to do on textures what you would have previously done with the vertices of a highly subdivided surface with the CPU.

Crusher
29-Jun-2002, 21:08
Ah, you confused me by calling it a heightmap :) There are no actual changes in height of the water itself then, it's just using the normal map to bumpmap it. And a normal map for per pixel lighting/bumpmapping/whatever you want to call it, already uses all the color channels, so yeah, I guess I could see where 8 bits per axis might not be enough for vector encoding.

Chalnoth
30-Jun-2002, 00:04
Define a 'normal' colour op. :) You are limiting yourself by thinking of things as colour ops. Think of them as arithmetic operations, and do away with this colour misnomer for good... :wink:

If only color values are dealt with, 16-bit floating point should be enough (provided all operations are done at a slightly higher internal bit depth to prevent error creep). Some non-color data ops may need higher precision, such as height maps.