Why isn't TBDR used anymore ?

Love_In_Rio · Jan 4, 2007

Remembering how great was Shenmue in Dreamcast in its graphic department, even today after almost 7 years and along with other remarkable games moved by the smartest graphic chip ever, i can´t avoid wondering why Nvidia and Ati has never used something similar to Power VR´s Tile Based Deferred Rendering tech.
The mix between today´s brute force processing power approachment and something as clever as TBDR would be a big jump in performance.
Is it due to patent issues ? is it due to any kind of incompatibility with today´s graphic chips architectures ?

Fox5 · Jan 4, 2007

Love_In_Rio said:
Remembering how great was Shenmue in Dreamcast in its graphic department, even today after almost 7 years and along with other remarkable games moved by the smartest graphic chip ever, i canÂ´t avoid wondering why Nvidia and Ati has never used something similar to Power VRÂ´s Tile Based Deferred Rendering tech.
The mix between todayÂ´s brute force processing power approachment and something as clever as TBDR would be a big jump in performance.
Is it due to patent issues ? is it due to any kind of incompatibility with todayÂ´s graphic chips architectures ?

-Harder to make fast tbdr hardware.
-TBDR hardware doesn't give much of a benefit to pixel shaders.
-Vertex shaders are so fast now that the vertex load of a game isn't really a limiting function anymore.
-Current hardware does tile, not in the same way but it still helps with memory bandwidth usage.
-The primary advantage of making TBDR hardware today would be lowering fillrate requirements, but you'd make a vastly weaker chip to do so.
-Also, nVidia and ATI don't have much experience in making a TBDR, and the parts of the graphics pipeline it helps are already very functional, accurate, and fast, why mess with them? The focus is more on the shader processing power of the chips, which a TBDR can do less to help with and takes up valuable die space. Why try to do something that's already done well differently when any small misstep can cost you large portions of your market?

Just my guesses of course. TBDR hardware makes more sense for more limited applications though, such as integrated graphics, game consoles, etc, but I'd argue spending the same resources on shader technology just yields more visible results.

Rainbow Man · Jan 4, 2007

Maybe you remember shenmue more fondly thna it really was. I know I tend to do that with many old games and then I get surprised and disappointed when it isn't as fun or good looking as I remembered.

If there was a bright future for the dreamcast's style of graphics rendering I'm sure it would have arrived and conquered the world by now. Since it hasn't I can only assume it can't actually compete in the ultra high performance arena of today.

It was good at the time but that was then and now is now. I'm old and set in my ways in some respects but I've got the experience to have learned some things are best left where they were and then move on.

Peace.

OtakingGX · Jan 5, 2007

I was actually a Kyro 2 owner back in the day. In fact, searching around on the internet for more information about the card was how I originally found B3D. It was pretty impressive what the chip could do with SDR SDRAM and a lower clock speed than the competition.

Unfortunately PowerVR canned Series 5. It would have been awesome to see what a TBDR with pixel shaders would be like.

ShootMyMonkey · Jan 5, 2007

Fox5 said:
-Harder to make fast tbdr hardware.

I don't know that I would say that. The thing is there isn't even an isolated example of a TBDR design (and I do not consider Xenos to be a counterexample) which was made by a major manufacturer who had the resources and the ability to target high-end hardware. It's actually not that difficult to scale the concept up to high speeds, but given that the only manufacturers who have attempted are the ones who could never have a shred of hope of making a splash in the market... it only follows that the concept should fail.

Moreover, it doesn't help that neither ATI nor nVidia were even willing to acknowledge that it had a place in the market. ATI simply proclaimed "We're not interested so there's no point in talking about it," while nVidia proclaimed "You're an idiot for asking in the first place. Begone! The power of Christ compels you! The power of Christ compels you!" Business as usual.

Fox5 said:
-TBDR hardware doesn't give much of a benefit to pixel shaders.
-Vertex shaders are so fast now that the vertex load of a game isn't really a limiting function anymore.

I wouldn't agree with parts of that. I agree with the first point, but I would put pixel shaders in your second point rather than vertex shaders. Vertex throughput is still a pain in the neck, and it's not just shader performance that affects that. We simply stay well under those limits because you'd be in really hot water otherwise.

Fox5 said:
-Current hardware does tile, not in the same way but it still helps with memory bandwidth usage.

Mmmmm... I'd hardly say that the extent to which it saves bandwidth usage is even worth mentioning. And that is the biggest advantage of working on local small tile caches over simply having your ROPs output quads at a time for every sample that hits that quad.

Fox5 said:
-The primary advantage of making TBDR hardware today would be lowering fillrate requirements, but you'd make a vastly weaker chip to do so.

I fail to see how implementing TBDR inherently guarantees that the chip MUST be weaker. ROPs that write to small local tilecaches and the logic to writeback tiles on eviction isn't that big a deal, and it's not going to destroy anything else in the chip. Granted, the more fillrate you've got, the more tile caches you might need to avoid pointless evictions.

Frankly, I think you have it backwards. It's not that TBDR made all TBDR GPUs pitiful. They were pitiful to begin with and that would have been more obvious without it.

Fox5 said:
-Also, nVidia and ATI don't have much experience in making a TBDR, and the parts of the graphics pipeline it helps are already very functional, accurate, and fast, why mess with them? The focus is more on the shader processing power of the chips, which a TBDR can do less to help with and takes up valuable die space. Why try to do something that's already done well differently when any small misstep can cost you large portions of your market?

Because consistent growth trends (particularly, exponential ones) can never be indefinitely sustainable. Memory will never keep up, power consumption will just keep skyrocketing, and the obvious answer of throwing more silicon at the problem is a guaranteed recipe for failure at some point down the line. At that point, something has to give.

Inane_Dork · Jan 5, 2007

TBDR has gone the way of the Wankel rotary engine. The standard piston engine just had too much investment in it to warrant the change to a potentially superior engine type. At this point, TBDR needs to have a sizable, dependable performance increase over IMR (IMR is the correct abbreviation, right?). And that's the overall performance. It also needs to perform at least close to IMR in individual/related fields so that current games don't suck on new hardware. I'd say that the investment in and risk on such a change would only be worth it if they could get more than 2x performance.

Also TBDR seems to only be truly effective if it is planned for. That is, if you make a game for today's cards and then run it on a TBDR card, you're not seeing the performance potential the TBDR card has. Of course, this is intuitively obvious, as games tuned for Nvidia cards can suck on Ati cards. It stands to reason that a bigger architecture difference could warrant a bigger difference (e.g., Xenos vs. RSX). I don't see this happening very soon, though.

Simon F · Jan 5, 2007

Apart from SMM's post, there is some rubbish in this thread.

Sonic · Jan 5, 2007

Why would TBDR be worse off than IMR? If the same amount of resources were thrown into IMG's graphics solutions that were put into ATI's or Nvidia's then I'm sure we would see a PowerVR product that held up to the competition and even beat it in many circumstances.

Last I knew of there were a lot of silicon savings when it comes to TBDR. And cheaper memory could be used thus lowering the cost of the product while having performance on par with the top of the line. Sure, IMR has seen great bandwidth savings techniques throughout the years. IMR has come a long way and has even more to go, but just imagine where TBDR could be if IMG were still in the desktop or console marketplace.

The little SGX or MBX or whatever is really impressive for its size and speed. The thing can do a lot. I can just imagine how great that would be if expanded into a full on chip built to compete wih the best of Nvidia and ATI.

Rainbow Man · Jan 5, 2007

Simon F said:
Apart from SMM's post, there is some rubbish in this thread.

Perhaps.

I remember the add-in boards they made using those chips. I was tempted to get one myself because I'd seen virtua fighter or if it was soul calibur or somesuch fighting game on DC and thought it was pretty much the coolest smoothest thing I ever saw.

However I got discouraged by reports from all over those boards were more trouble than not in many games.

Is it so wrong then to assume where there's smoke there's also fire?

inefficient · Jan 5, 2007

Trollicious post.

Tim Sweeney killed the Kryo.

Lazy8s · Jan 5, 2007

TBDR is everywhere. Most of the major chip producers -- Intel, Texas Instruments, Samsung, Renesas, NEC, NXP, Freescale, etc. -- have all adopted it into their processors.

PowerVR's TBDR has always led the market in performance, simply outclassing anything comparable:

We are all extremely lucky that Kyro II based products will not only be available, but also easy to get by the end of March or the beginning of April. The performance of the Kyro II based 3D Prophet 4500 is nothing short of stunning given its price: a mere $149.99.

It has been a while since we have had a truly high powered graphics card dip below the $200 price mark. In the past, stripped-down versions of higher performance parts were sold to cost-conscious consumers, oftentimes leaving them with sub-par performance. The Kyro II changes all that.

With its tile based rendering algorithm, the Kyro II provides blazing fast performance considering the price and was actually able to beat products almost $200 more than the cost of a Kyro II based board. Throughout the benchmarks, the Kyro II based 3D Prophet 4500 simply dominated everything else in its price range. The Kyro II was ready and able to tackle any game we sent its way.

http://www.anandtech.com/showdoc.aspx?i=1435&p=18

CLX2 was a high end TBDR part in clockspeed and die area for the era of Voodoo2 when it released, and it performed over a generation ahead of the competition (beyond Voodoo3) as might be expected.

TBDR stays on-chip and eliminates redundant and disorderly processing, so it affords both more bandwidth and more computation, especially in the age of shaders.

PowerVR Series 5 has finally been launched as SGX.

Gubbi · Jan 5, 2007

inefficient said:
Tim Sweeney killed the Kryo.

I remember that quote. It was just after the release of Unreal.

Ironically Epic spent a lot of CPU cycles deferring rendering in the original Unreal engine. Cycles that could have gone elsewhere with clever TBDR hardware.

Cheers

Mariner · Jan 5, 2007

Well, to quote this recent press release:

Imagination Technologies Group plc (LSE:IMG; "Imagination") – a leader in System-on-Chip Intellectual Property ("SoC IP") – today announces that it has reached a new major collaboration agreement with Intel Corporation (“Intel”), which extends the licensing and deployment of its graphics and video IP cores for use with Intel’s PC, mobile computing and consumer architectures in certain segments.

The section bolded for emphasis has led to some hopes that we might be seeing a bit more of TBDRs in the PC market in the future.

Mintmaster · Jan 5, 2007

One giant issue on consoles is the memory use of TBDR.

nAo (or DeanoC) mentioned he's pushing 2M polygons per frame in some parts of HS which would have to be binned for deferred rendering. Post-transform vertex size can easily be over 100 bytes, and I don't think Ninja Theory would be happy if they had 100-200MB less RAM to work with.

There are some ways to reduce this like trying to separate position and iterator parts of the vertex shader, or doing two passes on the geometry and storing a bitmask the first time, but it gets messy and either reduces vertex throughput or requires much more vertex-related silicon.

There was a nice technical thread on B3D a while back:
http://www.beyond3d.com/forum/showthread.php?t=27990

The other thing is that the change in performance would be pretty limited if not negative nowadays. Back in the Kyro days, the GeForce2 didn't have any HSR like today's offerings. The fixed-function processing didn't need nearly as many iterators/texcoords as todays pixel shaders do. Also, there wouldn't be any speed-up compared to Xenos as the eDRAM eliminates framebuffer bandwidth anyway. You'd save the eDRAM transistors only, and would have to increase the vertex hardware.

Overall, it just isn't worth the headache on either the hardware or software side of things.

Inane_Dork · Jan 5, 2007

Sonic said:
Why would TBDR be worse off than IMR?

The deferring cost of millions of triangles is kinda a deal-breaker. Maybe they would have figured out clever ways around that like they've added tiling and HSR to IMR. It's kinda hard to say. Dynamic tesselation? Geometry textures? Deferred buffer compression? There's too much time elapsed to do more than speculate, IMO.

Acert93 · Jan 6, 2007

Mintmaster said:
Overall, it just isn't worth the headache on either the hardware or software side of things.

I was wondering about this very issue today: Will it be worth the headache though for a GPU integrated into a CPU die? CPUs, as of today, have far less available bandwidth. eDRAM and GDDR3/4 are not really options, hence Intel and such are using TBDR. Could ATI maybe dabble in this area with Fusion?

Mintmaster · Jan 6, 2007

With the cost of system RAM so low it's probably more feasible on an integrated GPU for the PC than any other market (except low-poly handheld apps). Moreover, the vertex processing speed on any integrated part isn't enough to handle 2M polygon scenes anyway, so a game running with playable model detail probably woudn't need that much RAM to defer rendering.

However, I do think GPU speed (and hence game requirements) will scale faster than memory capacity, so in the long term TBDR may not be feasible even for this market.

TBDR was probably the optimal design choice until around 2003 when DX9 hardware allowed higher vertex counts and pixel shaders needed bigger vertex input. NVidia and ATI probably saw this and decided it wasn't worth the investment for the limited time frame.

Lazy8s · Jan 7, 2007

TBDR might always be more effective overall for the lifetime of rasterization because of its framebuffer RAM savings, especially considering AA and HDR, and the die area savings from the more effective performance requiring less execution units/ROPS and from the smaller on-chip, tile buffer.

London Geezer · Jan 7, 2007

Lazy8s said:
TBDR might always be more effective overall for the lifetime of rasterization because of its framebuffer RAM savings, especially considering AA and HDR, and the die area savings from the more effective performance requiring less execution units/ROPS and from the smaller on-chip, tile buffer.

... Which is why it is so widely used by the two main competitors in the high-end graphics card market... Oh wait...

The brute force approach might be less elegant, but it obviously has advantages, otherwise Ati and NVIDIA would have jumped ship a long time ago. And it's not a matter of patents or licences.

ShootMyMonkey · Jan 7, 2007

london-boy said:
... Which is why it is so widely used by the two main competitors in the high-end graphics card market... Oh wait...

Why would they spend hundreds of millions of dollars to change something on which they've already spent billions to ensure it never changes?

london-boy said:
The brute force approach might be less elegant, but it obviously has advantages, otherwise Ati and NVIDIA would have jumped ship a long time ago.

You think if PC CPU clocks hadn't hit a scaling wall, we would have still seen the demise of Netburst? nVidia and ATI are the 800-ton gorillas, so it should go without saying that will never "jump ship" until some punctuation occurs that says the status quo is failing.

Why isn't TBDR used anymore ?

Love_In_Rio

Fox5

Rainbow Man

OtakingGX

ShootMyMonkey

Inane_Dork

Rebmem Roines

Simon F

Tea maker

Sonic

Senior Member

Rainbow Man

inefficient

Lazy8s

Gubbi

Mariner

Mintmaster

Inane_Dork

Rebmem Roines

Acert93

Artist formerly known as Acert93

Mintmaster

Lazy8s

London Geezer

ShootMyMonkey

Similar threads