3dfx/3dhq Revenge superchip.

siconik

Banned
It appears that 3dhq secret project, Revenge, has been canceled:

http://www.tdhq.com/

To our valued supporters:
We would like to take this opportunity to thank you for the support you've given us during the many, many months we have existed as an organization. It's no secret that we've taken pride in our productive, helpful, and active community. We've also enjoyed designing and building the software driver releases we have provided to others during this time, and it is with our greatest apologies that we announce the original intended goals of this project will not be completed.

We are not declaring a cease of all driver development, but over time our ability to effectively build and release drivers has slowly ebbed away as we have encountered increasingly difficult dilemmas in their design.

We encourage others to take up the projects and while dedicated members of our driver development team may continue at their own pace and choice to build and improve drivers for 3dfx hardware, as said, they may not do so with such pace as before and even may not continue at all. That is why we hope others will also take up the production of these drivers to help as much as possible those who still use the products they are built for.

It is with mixed emotion, also, that we release the intended aims of the "Revenge" project long alluded to by our web site.

"Revenge" was a revolutionary new computer graphics rendering hardware design that using modern innovations mixed with the proven reliability of various hardware aspects of rendering hardware as it is now would have been capable of exceeding the performance and rendering detail of currently available hardware by geometric proportions, and quite possibly, had it been produced as planned, leading opposing products by as many as four product cycles.

Originally intended for production by Q1 2003, Revenge technology needed a vast corporate infrastructure to support its production and sale. Revealed to venture capitalist firms worldwide earlier in 2001 and 2002, the technology received promises of investment backing by two major investment corporations and multiple high-capacity single investors, and indeed the seeds of 3dhq, Inc., as it would have been, had already been sown. However, the leading single investor and would-be CEO of the new corporation endured multiple debilitating situations near the final phases of the project, and, due to timing concerns as well as the lost sureness of the project's success and prospective logistics, Revenge technology's finalization and production for sale has had to be cancelled, as of now.

Further technical data and information on Revenge technology can be downloaded from the downloads section of this site, and more information is available on the applicable areas of 3dhq's online community.

The online community of 3dhq will remain intact for as long as it is active and able to help others with any aspect of computing and also provide an excellent place for discussion of events of the graphics hardware and computing world, and beyond. We encourage you to visit our forums via the link on this page, and to become and remain involved in this community that has held constant for well over a year and a half.

Again, we thank all of you for your continued support through this time of refocus for 3dhq. We do still envision a future for this site and this community, and from the kind of quality we've seen in our community ever from its start at the turn of the millennium as 3dfx Underground until now as 3dhq, we securely believe that is possible.

Sincerely,
Devin Phillips
3dhq Founder and Administrator

So what was Revenge, you might ask? 3dhq has provided the details in this Powerpoint presentation: http://kirby.colorado.edu/jrr/revenge.ppt

In addition, the speck sheet (which for some reason doesn't exactly match the PP presentation) has been made available in this thread by 3dhq personnel:

Code:
96M+ Triangles/s 
0.13 micron 8 w power consumption with approx 36 million transistors. 
3200MP/s fill rate 
32 fully featured pixels per clock. 
128 bit internal Precision with 64 bit Zbuffer support. 
Hardware Deferred Rendering Capability (Tiling). 
1024 bits wide Memory Crossbar divided into 4 256 bit pathways. 
96 Texel cache Stages upto 16 shared rendering pathways with 2tmus per pipe. 
32 layer multitexture in a single pass. 
Upto 1000 K + triangles per frame. 
Capable of 1600 Mega pixels in a single pass with 16 textures, Trilinear Anisotropic Filtering and 24 bit textures with 4x Multisampling. 
Free Anisotropic Filtering with AA enabled. 
Upto 256 tap Anisotropy possible. 
Hardware Displacement Mapping and full DX9.1+ Capability. 
Upto 78GB/s fill rate. 
Full color compression and Bandwidth saving capabilities. 
Multi-sample anti aliasing (Gamma corrected and rotated grid jittered samples) upto 16 samples possible. 
Super Sampling upto 8X jittered grid. 
Full support for cinematic factions such as motion blurs soft reflectance and soft shadows. 
Off Chip geometry co-processing Industry innovation in TNL solution 
Twin Risc TNL engines for Advanced pixel/Vertex shading calculations  
TNL units hyper threaded for optimal CPU like performance. 
PS /VS Shaders 1.0 to 2.0 + specs  
Full DXTC 1 to 5 Support with FXT1 support in Hardware. 
Fully microcode updatable Architecture. 
500 + MHZ internal and DDR clock. 
 Multichip Capable Architecture MCA. 
Upto 4 chips in MCA. 9.6 GPixel/s possible. 
AGP 8X AGP 3.0 Spec. 
32MB to 256MB frame buffer

Apparently, a licensing agreement has been reached:
HARDWARE TECHNOLOGY LICENSING AGREEMENT

This agreement is made as Dec 28, 2002 by and between 3DHQ ("3DHQ") and its Business source provider Schmidt Co For the amount of USD 97.5 Million , Ninety Seven and a Half Million dollars.

WHEREAS:

3DHQ develops markets computer and console graphics hardware and
Licensee wishes to license such technology on the terms of this license.

The overview of some ground-braking, patent-pending technologies developed for Revenge has also been provided. The summary of some of the patents follows:

Multiplexed synchronization circuits for switching frequency synthesized signals

Multiplexers are used to generate synchronized slave clocks from a common master clock. A first multiplexer and a second multiplexer generate a first slave clock and a second slave clock, respectively, from the common master clock. A third multiplexer and
a fourth multiplexer are configured as a divide-by-n circuit for providing a third slave clock that is a divided version of the second slave clock. A fifth multiplexer provides a matching delay to preserve the synchronization between the first slave clock and the other slave clocks. A sixth multiplexer is used to select between the second slave clock and the third slave clock in response to a select signal. A flip-flop may be used to provide the select signal and to guard against false selection of slave clocks.

Content addressable memory with an internally-timed write operation

A content addressable memory with an internally-timed write operation includes a data input for receiving a input word. Coupled to the data input are a plurality of storage registers comprising stored words. Each storage register includes a comparison circuit for comparing the stored word with the input word and producing there from a match output indicating a match when the stored word matches the input word, and indicating a miss when the stored word does not match the input word. Coupled to the storage registers is a miss detector for generating a miss signal responsive to each of the match outputs of the storage registers indicating a miss. Coupled to the miss detector is a write cycle circuit for writing the input word to at least one of the storage registers responsive to receiving the miss signal

Command data transport to a graphics processing device from a CPU performing write reordering operations

A system and method for enabling a graphics processor to operate with a CPU that reorders write instructions without requiring expensive hardware and which does not significantly reduce the performance of the driver operating on the CPU. The invention allows the graphics processor to evaluate the data sent to it by software running on the CPU in its intended and proper order, even if the CPU transmits the data to the graphics processor in an order different from that generated by the software. The invention works regardless of the particular write reordering technique used by the CPU, and is a very low-cost addition to the graphics processor, requiring only a few registers and a small state machine. The invention identifies the number of "holes" in the reordered write instructions and when the number of holes becomes zero a set of received data is made available for execution by the graphics processor.

Circuit for processing RISC instructions for 96 bit Pipeline texture stage. Vertex and Pixel Shader Implementation routines 2.0+

An apparatus and method for prioritizing interrupt requests in a RISC processor. By utilizing hardware to prioritize the requests, processor time is reduced. The acknowledge signal (Ack(0)-Ack(n)) from a priority resolve circuit (50) selects the given service routine entry to branch instruction generating circuit (56, 5. A lower priority service routine can be interrupted by a higher priority reply. Circuit Accelerates Video Bit signals by dividing itself into multiple synchronized Shader arrays, the shader array is 196 bit precision with 32 alpha channels. Speed tessellation allows RISC processor to emulate all pixel shader versions from 1.0 to 2.0+ HLSL AND CG capable.

EDRAM based internal Hardware Bus with a 1024 bit Memory cross bar divided into 128 divisors.

Internal Revenge XC-1 architecture utilizes a 1024 bit internal compression bus. Internal compression allows multiple ACKS numbering to a million per byte to be transferred via complete switching multiplexing. Memory data is compressed up to 24:1 internally with a single chunk size exceeding 1 megabyte/ns. This allows for hardware based compression and faster color compression and Z buffer Routines.

Additional information:

Revenge, as we all know now from Aqueel's last saying, was going to be a Gfx chip. A revolutionary one at that. We had old 3Dfx tech, namly the stuff nVidia didn't get. That being Rampage. Rampage, by itself kicked ass. We had got a board made, from a company that will be left un-named, and it was crazy. Aqueel tested in in Quake 3, and @ 1280x1024 with max gfx @ 8xAA it churned out 254FPS. Now, if you ask me that's faster than anything out there. And that was about 8 months ago! Between then and now, we've made alot of revisions to that design. New memory architecture and so forth. Added some things that would make it even faster. It would've been the chip of chips!

It was of RISC design, and used very little power, thus putting off very little heat. I bleive, if memory serves me, it would've either not needed a HFS or it would've used just a passive HS. Then there was the Sage chip, or as we called it PHOENIX. We could've added numerous Revenge and Phoenix chips to one board, and it would've just been nuts. It would've obviously needed an external power source if enough were used, but with the right amount, it wouldn't have needed any. We were going to have a few types of boards. (R is revenge, P is Phoenix) 1R and 1P and then 1R and 2 P and then 2R and 2P and 2R and 4P. Crazy shit to say the least. Can you imagine, if a 1R 1P turned 254FPS, what would a 2R4P done LOL. "But it would've cost a fortune". Nope, due to how we had it all layed out, it would've been cheaper than anything out there. We were going to be going from $75-$275. Now the $275 I'm sure would've been the 2R2P but it might've been the 2R4P. Who knows :| Time to bust out documents I have and rip some info.
 
siconik said:
It appears that 3dhq secret project, Revenge, has been canceled:
Looks like the world didn't miss anything... The specs look like a farce. 1024-bit memory bus... Tell me, how were they going to fit all the pins onto the chip package? If there was a package large enough, it, and the correspondingly large die required, would have made the chip extremely expensive.
Code:
96M+ Triangles/s 
0.13 micron 8 w power consumption with approx 36 million transistors. 
3200MP/s fill rate 
32 fully featured pixels per clock.
So it's 100 mhz. Let's say with their multiplexing stuff (mentioned below) that they can do a four component vector op at once. That means they would need 32x multiplexing. In other words, the vector unit would be running at 3.2 GHz. I don't believe it. And if you say, "Well they could have done more operations at once." Well sure they could, but with such a low gate count? Also, think about all the FIFOs that would be needed to hide memory latencies... They would be massive! (Those cost gates too, btw.)
Code:
128 bit internal Precision with 64 bit Zbuffer support.
Full floating point is a lot of transistors.
Code:
Hardware Deferred Rendering Capability (Tiling).
The tile memory takes transistors too.
Code:
1024 bits wide Memory Crossbar divided into 4 256 bit pathways.[/c
How do you package it or make a board for it?
Code:
Upto 1000 K + triangles per frame.
Wow up to a million triangles! :rolleyes:
Code:
Capable of 1600 Mega pixels in a single pass with 16 textures, Trilinear Anisotropic Filtering and 24 bit textures with 4x Multisampling.
So now that single four component vector op I mentioned above is actually 8 times as large... only way to handle the extra textures. That means 8 times as many float units and 8 times (or more) the buffers. Or they could have made it 8 times as fast (i.e. 25.6 GHz)... yeah right.)
Code:
Free Anisotropic Filtering with AA enabled.
Nothing is "free". It all takes gates.
Code:
Hardware Displacement Mapping and full DX9.1+ Capability.
So it's VS/PS 3.0? Below only 2.0 is claimed.
Code:
Upto 78GB/s fill rate.
Fillrate != bandwidth
Code:
Full color compression and Bandwidth saving capabilities.
Just what is "full"?
Code:
Full support for cinematic factions such as motion blurs soft reflectance and soft shadows.
That's great because these are all done by the application not the chip.

If these really are the specs they were trying to sell, it's no wonder they didn't get any funding: It reads like an April Fool's joke to me.
 
There is a whole litany of events concerning this over at 3dhq, apparently the designer is dying of cancer etc...
 
Im sorry for the guy's cancer :(, but the whole revenge thing has been a COMIC since day 1......

OpenGL Guy said:
If these really are the specs they were trying to sell, it's no wonder they didn't get any funding: It reads like an April Fool's joke to me

Those specs are the ones circulating around since almost 6 months ago, with the famous 3DHQ ppt, i had it since it was made, nothing new :).

The problem is that those 3dfx zealots believe everything.. from Talion Graphics to Revenge, no wonder, there are ppl on those forums with nForce 2, AXP2000 and a Voodoo3 ....... :)

Of course we wont be free of those 3dfx lamers, now the Rampage Myth (not that Rampage was a bad chip, but it was FAIRLY overhyped by zealots) has been substituted with the Revenge Myth, im sure i'll read things like: "NV85 and R800 are cool, but if Revenge....." :rolleyes: sigh :cry:
 
Mummy said:
Im sorry for the guy's cancer :(, but the whole revenge thing has been a COMIC since day 1......


The problem is that those 3dfx zealots believe everything.. from Talion Graphics to Revenge, no wonder, there are ppl on those forums with nForce 2, AXP2000 and a Voodoo3 ....... :)

except an AGP Voodoo3 won't fit in the nForce 2's 8xAGP slot :LOL: , though they could be using a PCI V3 2000 ;)
 
The whole thing is ludicrous. The specs are total B$ and some of the characters come straight out of a psychologist's dream case. The power of sociopathy can be quite startling sometimes!

MuFu.
 
Hmmm ... this thing, as described by that spec list has only barely higher theoretical fillrate than the R350 (3200 vs 3040 MPixels/sec), and apparently substantially weaker T&L performance than R350 as well :!: . The 1024-bit bus (1204 bit * 500 MHz DDR = 128 GB/s) makes little sense - with the stated fillrate, color compression, and tiling, I'd estimate the bus to reach ~20-40% utilization, making everybody pay through the nose for an ultra-expensive and uselessly wide bus. Also, for something that is supposed to crush NV/ATI into fine dust, a limitation of 1M triangles per frame is ... weak.

As far as hoaxes go, this spec list is less than impressive. Can't we at least get a Photoshopped card to play with?
 
jvd said:
Heh if only rampage came out , you'd all be singing a diffrent tune :D

What on earth has the rampage core to do with the BS speclist posted above?
 
Ailuros said:
jvd said:
Heh if only rampage came out , you'd all be singing a diffrent tune :D

What on earth has the rampage core to do with the BS speclist posted above?

Come on man this is a thread about 3dfx stuff. Someone had to say it. You can't have a post that brings up 3dfx with out bringing up how the rampage would have rocked . Diddn't you see the smiley face ? :LOL:
 
Yeah if Rampage came out we'd all be singing 32 * 500 = 3200 right (core clock is also given)? :devilish: It would redefine math as we know it today... :LOL:
 
jvd said:
Heh if only rampage came out , you'd all be singing a diffrent tune :D

"Yes, but if BitBoys had put something out, ..."
:D

Please note that the following is just to make sure no 3DFX fanatic says it and gets points by demolishing some arguments. I firmly believe this is BS too.

Arjan de lumens: Well, remember they could save cost by using little RAM. So, they COULD be using several techniques to use *more* bandwidth, but less memory - thus using that memory more. For example, Z Compression requires extra memory usage. And guess what? The spec sheet isn't even talking about it - just Color Compression and "bandwidth saving capabilities"
Still, remember the whole chip seems to be made with 8x or 16x Antialiasing in mind - without Z Compression, such bandwidth might become useful.

What I question however, is the financial logic behind this decision. Unless they found a great way to make a cheap 1024-bit memory bus, and that's really, really unlikely, this is probably completely illogical.

Uttar
 
Come on man this is a thread about 3dfx stuff. Someone had to say it. You can't have a post that brings up 3dfx with out bringing up how the rampage would have rocked . Diddn't you see the smiley face ?

Of course did I see the smiley. My point was/is that the context of this thread has nothing to do with the late 3dfx, rather a group of hopeless dreamers or whatever you want to call them.

I don't see any "3dfx stuff" in this thread, do you? I saw the obvious sarcasm behind your post, but I never considered former 3dfx engineers as that impotent.
 
I don't think that saving RAM would be very much of an option in this case. Much of the costs of RAM chips are in packaging rather than from the RAM dice themselves, so with a 1024-bit bus (=32 RAM chips, if you use standard FBGA packages), you are gonna spend a LOT of money per board on RAM packaging alone. More than the minuscule amount you save by not having, say, Z compression.

Not to mention that the GPU itself would need to be huge (and thus expensive) only to support a bus that wide, and that you'll need a board with ~20-30 layers to support the bus.
 
Maybe, yes. That's why I doubted the financial logic behind this - but then again, 3DHQ probably had no, or very little, financial guys. But still, yeah, they probably didn't have that little knowledge of finance. Even a 4 years ago moron knows more about it!

What is also possible is that the 1024-bit memory bus would be a for a 4-chip board, with shared memory.

Actually, that would make sense. But how the heck would they bypass the shared memory limitations? They could be having one additional chip which is mostly all types of cache - that would make sense, too.
Only problem is that there's no reference to such a thing anywhere.

As I said, probably BS.


Uttar
 
Back
Top