Trident XP4 NOT a tiler after all...

Kristof

Regular
Supporter
Check :

http://www.anandtech.com/video/showdoc.html?i=1673&p=4

The XP4 has long been rumored to be a tile-based rendering solution like STMicro's Kyro II and as intriguing as deferred rendering technologies are, you won't find any such technology in the XP4. Instead, the XP4 is a conventional immediate-mode renderer like the GeForce4 or Radeon 9700 but with a tile-based rasterization engine. All this means is that the XP4 uses a tile-based algorithm for storing pixels in its frame buffer; so instead of writing lines of pixel data to the frame buffer the XP4 writes the data in blocks/tiles. The XP4's tile-based rasterizer is much like Intel's 845G graphics core in this respect, and the main reason behind it is to optimize for the XP4's internal caches. The end result is improved memory bandwidth efficiency, which helps tremendously considering that the XP4 has no real occlusion culling technology.

Now AFAIK and can remember even good'old Voodoo1 did this, its known as a tile based memory layout and is a classic technique to improve, mainly, the texture cache efficieny. Why are companies dusting off this technique and introducing it as something new and revolutionary ?

Also what do you guys think about the "sharing" pipelines story ? Make sense to you or is it again something everybody else has been doing for years ? E.g. LOD calc done once and used for all 4 pixels ?

K-
 
I am not sure how 4 pixels can share logic and still be done in one clock. My guess is that it is some kind of 4 stage pipeline.
trident.jpg

Where stage one would be the easy stuff, like just retreiving the pixels. Then as it progresses you pop and send the amount of pixels off the stack according to where you are stage wise. Then stage 4 might be the hard stuff (fragment combiners) and it also acts as a buffer with a size 4. So when the buffer is filled you can send out 4 pixels, thus the 4 pixels per clock.
 
Ok I just changed my mind. Wouldn't you want the hard stuff at the beginning, like the maths that take multiple clocks, and the the easy stuff at the end of the funnel. So lets say you are only calculating 1 pixel per clock at the end, but 4 pixels(in parallel) in 4 clocks at the beginning, there still isnt much of a chokepoint.
 
Kristof,

You PowerVR guys have patents on all this technology. As far as TBR, Deffered etc.. at least the way you do it.

If a new company does something as simply as changing the tile size does that make it a new design? Im am just curious. Intel recently released a chipset that uses TBR or deffered (i cand remember wich) also.

This is a really broad question but.. How many different ways are there to Go about TBR that would not infringe on PowerVR owned Designs? You guys must whip out the wite papers every time another comany starts talking this stuff. Or are you'all more open natured in this area.
 
The integrated graphics in the 845G aren't that bad. My last laptop had the mobile variant (i845MG) and considering the shared PC133 memory (1GB/s) bandwidth available, turning 28fps in Q3 @ 1024x768x16 (demo demo001) is pretty impressive. And that's at just 100MHz core speed.

I'm not buying the hype that the XP4 is equal to any GF4, so I won't be disappointed when it's released. Sounds interesting, though.
 
Well Kristof, although the tile-part may be old/standard tech, the interesting part is how they managed to make a DX8.1 chip with only 30 million transistors. We can't get around that part. Also the performance they are claiming is quite good; especially compared to the 30 million figure! It will be interesting to see if they can reach their performance goal. We should find out in a couple of weeks according to Anandtech. :)

Edit: I'll add that in any case the XP4 will allow everybody to get access to DX8.1, so at least that's something.
 
Onslaught said:
Well Kristof, although the tile-part may be old/standard tech, the interesting part is how they managed to make a DX8.1 chip with only 30 million transistors. We can't get around that part. Also the performance they are claiming is quite good; especially compared to the 30 million figure! It will be interesting to see if they can reach their performance goal. We should find out in a couple of weeks according to Anandtech. :)

Edit: I'll add that in any case the XP4 will allow everybody to get access to DX8.1, so at least that's something.

In *principle*, you could architect a VGA-card around a programmable embedded CPU core (anyone remember the Rendition Verite and Chromatic MPACT2?) Then, using the CPU to run an on-card firmware program, maybe you could implement any and all DirectX 8.1 functionality? (I'm asking, not stating with any certainty!) I'm sure the graphics throughput would be terribly slow, but the transistor count could be well under 5 million!

IBM's "PGA" adapter (circa 1987?!? professional graphics adapter) had an onboard i80186 CPU to accelerate raster (BITBLT) operations. If paired with an FPU-unit, I bet that thing could 'sort of' do hardware 3D-rendering.

Back in the mid-late 1980s, there used to be a bunch of "workstation-class" PC graphics adapters centered around TMS34010/34020 CPUs. This was kind of before my time. I read that these expensive video-card beasts went out of fashion with the rise of 'ASIC' manufacturing and 'Windows GUI.' (From a VGA maker's standpoint, Windows' rise to prominence meant the core application was Windows -- a *single* standard hardware functionality set -- this made ASICs practical, because the designers could optimize the most common draw-functions in hardware, while leaving rarer functions to the software driver.)

Everything I know about the TI 340x0 is because MAME emulates it so I can play old arcade classics like NARC and Smash-TV. :)
 
While the thought of a "CPU-GPU" is funny to play around with, that solution will most certainly be very slow, so I guess we can drop that theory in regards to the XP4, but I think you were really just mentioning the "CPU-GPU" thing for the fun of it.

We are still left with the interesting performance figure (XP4 T3 = 80% speed of GeForce4 Ti 4600) at only 30 million transistors. Impressive no?! :)
 
Hmm...
Doesn't every card output pixels in blocks? Kyro builds 32x32 (IIRC) pixel blocks and stores them in onchip memory, GeForce builds 2x2 pixel blocks and stores them in video memory... What's so unique on this chip??
 
Yes Overlord that was what I meant.

MDolenc,

I "think" the tiles are rectangular at PVR ;)
 
That 80% performance of a GF4 Ti4600 doesn't seem too impressive once you think about it. 80% of a GF4 Ti4600 would mean a GF4 at 240eng/260mem. The XP4 T3 claims to run at 300 MHz, with 300-350 MHz memory. Seeing how they want to hype up the XP4, these numbers are probably with the faster memory, so we are looking at a chip that has only 75% of the GF4's memory efficiency.

As for the number of transistors, ATI's Radeon 9000 must be somewhere around there also (it's priced around $75 already, and is effectively a replacement for the 7500 on the same process, so I think the transistor count would be 30-40 million). Mind you, on paper the 9000 isn't as strong as the XP4, but we all know how paper specs can turn out to be pure fluff, as was the case with Matrox. Honestly, I would have trusted Matrox more than Trident, but its hard to judge anyone these days.

I also don't know how they plan to get such high speed memory at dirt cheap prices, but that's what they claim. 300-350 MHz memory at under $99 for the whole card? The GF4MX 460 uses 275 MHz memory and costs $111, and although Radeon 8500's are about $100, it seems like they're being dumped for the almost-as-fast-but-cheaper 9000's. The only cards with that fast memory (300 MHz+) cost $200+.

Maybe I'm pessimistic, but I'm not holding my breath. I doubt Trident will be able to live up to their claims, both from performance and pricing standpoints. I guess we'll have to wait and see.

(All prices from PriceWatch)
 
Maybe Trident is doing something similar to Intel's Hyperthreading where execution resources are being shared when they're not needed. I'm not sure how this might work for a graphics chip though. At first I thought if memory bandwidth becomes the bottleneck the execution resources would switch pipes, but this doesn't seem any better than just using FIFOs to handle bubbles.
 
Back
Top