PowerVR Series5

Enbar said:
I'm surprised everyone here is so pro TBDR. As time progresses I believe its advantages become less and less.

Here are the problems I see with it:
+ Both Nvidia and ATI are convincing developers to use a zfill pass before doing any complex pixels. For games that do this a TBDR will have almost no advantage over a traditional raserizer with fast early z/hiz.
It will generally need zero external z bandwidth. You can't do that with a traditional IMR. Early z rejection reduces z read bandwidth requirements, which is only a small part of the equation. The "z fill" pass itself will be faster on a TBDR, too (for any given amount of raw specifications).
Enbar said:
+ As geometry complexity increases creating an efficient TBDR becomes increasingly difficult/impossible.
As in "it's impossible to burn ten cubic feet of wood in a two cubic feet stove"? Or perhaps "it's difficult"? I'd prefer "it requires a minimum amount of thinking, but it's possible".

Another thing you should think about is how many opaque surfaces fit in a, say, 32x16 tile? There's an upper bound to geometric complexity after you've thrown the unnecessary bits away. Granted, it's not as easy for transparent geometry.
Enbar said:
+ Pixel shaders and vertex shaders that modify depth make TBDR more difficult.
Vertex shaders compute and output vertex depth in the first place, so it strikes me as redundant to hear about a vertex shader that modifies depth ;)
No rebuttal on the pixel shader thingy.
Enbar said:
Now to be fair I could list off some positives, but since everyone seems so supportive of TBDR you must already know about those so I won't go there.
It's interesting technology, that's all.
 
Simon F said:
Ailuros said:
Simon,

I just saw the Queen's award announcement. Was there/is there going to be some kind of dinner? I'd love to hear the details *snicker*
I think that is for META so I wouldn't know if there is a dinner.

However, we won an award for DC/CLX a few years ago, and I was one of the lucky few who got to go to the palace. It was quite an amazing experience.

Slightly OT: :E

What did you say to Her Majesty Simon? G'day Sport, how's it hanging? :oops: :LOL:
 
Megadrive1988 said:
that said, the Xbox 2 will at the very least, rival Sega's new PowerVR-based arcade board, if not surpass it. minimum specs for Xbox 2 would probably be 16 if not 32 pixel pipes, at least 8 Vertex Shaders, VS/PS 3.0+ 500~700 MHz core, GDDR3 memory. 1.5 to 2 billion verts/sec peak
(NV40 is at 600M, R420 will almost certainly beat NV40's vertex performance and Xbox 2 will blow both out of the water)
I'm pretty positive that NV40 (@400MHz, as tested by world&dog) is triangle setup limited at 133 MTris/s. It can't do more. I also see a transform limitation at 300Mverts/s.

Anyway, what you're talking about sounds like something in the 250 to 400 million transistor bracket ... I currently reserve my right to doubt it ;)
 
PVR_Extremist said:
Simon F said:
Ailuros said:
Simon,

I just saw the Queen's award announcement. Was there/is there going to be some kind of dinner? I'd love to hear the details *snicker*
I think that is for META so I wouldn't know if there is a dinner.

However, we won an award for DC/CLX a few years ago, and I was one of the lucky few who got to go to the palace. It was quite an amazing experience.

Slightly OT: :E

What did you say to Her Majesty Simon? G'day Sport, how's it hanging? :oops: :LOL:

There's a quite funny story behind it, albeit I'd prefer Simon to tell it again, hence my question.
 
Ailuros said:
There's a quite funny story behind it, albeit I'd prefer Simon to tell it again, hence my question.
Indeed, but repeating what my colleague said would probably lead to him killing me.

BTW, Ailuros, there seemed to be a helluvalot of hits on my website from a German forum. One guess as to who's responsible.


Oh and for those who know the front cover of "Computer Graphics. Principles and Practice" (aka Foley, van Dam, et al), the Vermeer painting that inspired it is in the palace. Fascintating to see it.
 
Simon F said:
Panajev2001a said:
Simon F said:
how many clocks does it take for each read/modify/write process?

If you have two, small sequentially adjacent translucent polygons that overlap each other, do you get full performance?



Ahem... how do I break it to him... it has... well... like... 0.
My my! That is impressive. Zero clocks equals infinite fill rate. With that, why did they bother putting more than one texture unit....?

Ok, you are the native english speaker of the two of us, but can you double-check what I wrote ?

16 Texture units ?

Ahem... how do I break it to him... it has... well... like... 0
.

The way I understand it is that we have 8 of the 16 Pixel Engines that do double duty when the GS is drawing textured primitives and act as TMUs.

Edit: The way the Hardware s set-up is like this:

We have 16x2 Mbits DRAM macros and we have 16 Pixel Engines with a 64 bits interface each: 32 bits RGBA and 32 bits for the Z-buffer.

Each DRAM macro seems to be connected to the Pixel Engines through three busses ( there are buffers and caches around to play with of course ): one 64 bits READ bus, one 64 bits WRITE bus and one 32 bits bus to access texture data.

You can do READ and WRITE operations in parallel then while accessing texture data.

Texturing mode: the GS writes in a 4x2 pixel pattern

Non Texturing mode: the GS writes in a 8x2 pattern.

Reccomended triangle size is 32 pixels in area ( optimal speed ).

As nAo and other licensed PlayStation 2 developers can testify, PlayStation 2's strength lies in rendering translucent surfaces and doing fill-rate heavy operations.

Look at the highest profile PlayStation 2 games, think about Z.O.E. 2 and Final Fantasy X ( during the Summons ) or MGS 2 ( some effects use tons of semi-transparent layers: think about the rain or the heat-haze effect when the Harrier is flying upwards ) for example: particle effects, alpha blended surfaces all over the place.

I am not in a particularly awake state of mind at the moment, so I might have lost my sense of humor and missed your joke, but I did not say it takes Zero clocks.

I just said that in theory it has no real Texture Units to speak of.

More like a 16x0/8x1 I would say: 16 pixels ( with color, alpha and Z ) per clock when you are not texturing and 8 pixels ( with color, alpha, Z and Texture ) per clock when you are texturing.

AGHHH... the system just ate some lengthy additions to this post ("HTTP error" :( ) describing limitations that would ocur due to single ported memory and pipeline lengths but I just noticed:
nAo said:
MfA said:
Can the rasterizer in the graphics synthesizer even work on multiple primitives at a time?
Unfurtunately it can't.
This would eliminate problems with data hazards that I was trying to described in the edited text. Unfortunately, it does it by deliberately slowing down the system.

I would assume the DRAM macros are multi-ported or have buffers on each end to allow simultaneous READs and WRITEs to a certain degree at least since they went through the trouble of placing two 1,024 bits busses.
 
Panajev2001a said:
Simon F said:
Panajev2001a said:
Simon F said:
how many clocks does it take for each read/modify/write process?

If you have two, small sequentially adjacent translucent polygons that overlap each other, do you get full performance?



Ahem... how do I break it to him... it has... well... like... 0.
My my! That is impressive. Zero clocks equals infinite fill rate. With that, why did they bother putting more than one texture unit....?

Ok, you are the native english speaker of the two of us, but can you double-check what I wrote ?
I did double check what you wrote but, as I said, an HTTP error kindly ate the rest of my post.
I am not in a particularly awake state of mind at the moment, so I might have lost my sense of humor and missed your joke, but I did not say it takes Zero clocks.
In all fairness, I took the following to be the answer to my question of "How many clocks...".
Ahem... how do I break it to him... it has... well... like... 0

Each DRAM macro seems to be connected to the Pixel Engines through three busses ( there are buffers and caches around to play with of course ): one 64 bits READ bus, one 64 bits WRITE bus and one 32 bits bus to access texture data.

You can do READ and WRITE operations in parallel then while accessing texture data.
But, presumably, only one access from any particular ram block at a time.

Texturing mode: the GS writes in a 4x2 pixel pattern

Non Texturing mode: the GS writes in a 8x2 pattern.
Out of curiosity, does the non-texturing mode allow alpha blending?

I would assume the DRAM macros are multi-ported or have buffers on each end to allow simultaneous READs and WRITEs to a certain degree at least since they went through the trouble of placing two 1,024 bits busses.
Multi-ported memory is generally much more expensive area wise. It's more likely that they've just made the memory, say, 2x as wide (and thus 1/2 as deep) as the normal write or read operations and hope things average out....
 
Out of curiosity, does the non-texturing mode allow alpha blending?

IIRC yes, it does, but I will double check the GS manual as soon as I get near the darn DVD ;).

All 16 Pixel Engines have access to the VRAM and can get up to 64 bits of data each ( 32 bits RGBA and 32 bits Z-buffer ).

I remember early Sony PR specs sheets that quoted the number of polygons per second drawable by the GS and it had one line with Z, alpha enabled ( no Textures ) and one that also factored the textures.

But, presumably, only one access from any particular ram block at a time.

[...]

Multi-ported memory is generally much more expensive area wise. It's more likely that they've just made the memory, say, 2x as wide (and thus 1/2 as deep) as the normal write or read operations and hope things average out....

It is possible that some people had the idea "well, we have 16 macros... there is a good chance that we can sustain parallel READs and WRITEs a good percentage of the time as they would go to different DRAM macros", but I know that there are buffers and caches between the DRAM and the Pixel Engines to allow for parallel READs and WRITEs.

You can use two parallel FIFOs and solve the problem about writing and reading from the same block although you have to watch out for synchronization issues if they are trying to write and read from the same address.

IIRC, the Pixel Engines do not texture directly from the VRAM, but from a sort of Texture Buffer that is 8 KB in size: textures can exceed the buffer's size of course and the refill speed is around 150 GB/s or 1,024 bits/cycle.

Thank you Simon for havign the patience and basically re-writing the message one more time: I was not trying to sound rude, but I was not in a clear state of mind at all :(

:LOL:
 
deviantchild said:
anaqer said:
Pottsey said:
PowerVR had hardware T&L working in products a good 2 years before Nvidia

Yup, and failed to deliver when HW T&L was already a no-brainer for modern GPUs... :rolleyes:

was that not STM's choice?

I was kinda out of the loop those days, but I never heard this variant. Even if it was the case, I'd suspect there was a reason why STM choose so.
 
I can only guestimate what deviant is refering too. The originally planned STG5000 from the K3 family was to be on a larger manufacturing process and ~166MHz. It had been scratched in favour of the 250MHz@130nm STG5500, because the price/performance ratio compared to Series3@166/175MHz whatever wasn't supposedly high enough.

If you ask me we never should have seen a KYRO2 at all. I would have much more prefered a K3 on same clock speeds even if it would had been slightly more expensive. If anything else it would have had a much longer compatibility track with a HW T&L unit and cube map support. Whoever was responsible for those kind of decisions was extremely shortsighted back then IMHO, whether it was ST or a shared responsibility between ST and IMG.
 
Im not sure Ailuros if Kyro 2 was such a bad idea or not. A 4-pipe version would have blitzed the gf2 and radeon competion of its time (especially a T&L equipped one). Maybe they failed to send to the market a high and low-end version at the same time.

The Kyro 1 was too late to market and built on an outdated process. If it came out at about 150mhz and Kyro 2 at say 220mhz to compete with the geforces of the time maybe Kyro might have caught on better. Remember the Kyro 2 was 15 million transisters vs ...15 mil in tnt2 ...23 in gf1 ...25 million in gf2 and about ~60mill in gf3.

The best (for consumers) would probably have been to scrap kyro 1, push the kyro 2 against the geforce mx and tnt2 (which it beat in nearly all benchmarks) and push forward the release of the tnl equipped 4-pipe kyro 3 (rumoured spec) to compete against the radeon8500 and geforce 3 & 4. Im sure more chips would have been sold if a higher-end version was available on the market.

The problems here might have arose not only from the withdrawal of STM's graphics department, but also the 130nm process(rumoured process) which was not ready at the time. Maybe a lack of necessary features of the time such as pixel shading 1.x were not available (speculation) and a reason for Via/S3 & others etc not using this chip in intergrated/standalone graphics (the savage cores lack more & perform bad so this might be wrong).

As long as series 5 can implement dx9 with performance, im sure many would love the alternative. The more competition the better. It all comes down to a partner willing to produce this competitive part and market it. Sega seem to be involved in an arcade version of the chip/tech... but who is manufacturing it????
 
aZZa said:
The best (for consumers) would probably have been to scrap kyro 1, push the kyro 2 against the geforce mx and tnt2 (which it beat in nearly all benchmarks) and push forward the release of the tnl equipped 4-pipe kyro 3 (rumoured spec) to compete against the radeon8500 and geforce 3 & 4.

Unless Kyro3 was to have VS/PS too, this would have been VERY BAD for the consumer. The GF4MX did more than enough damage to us by lacking shader support... had there been yet another shaderless card ( even worse, an otherwise powerful one ) I doubt we'd even be using SM2.0 today.
 
ailuros said:
I can only guestimate what deviant is refering too. The originally planned STG5000 from the K3 family was to be on a larger manufacturing process and ~166MHz. It had been scratched in favour of the 250MHz@130nm STG5500, because the price/performance ratio compared to Series3@166/175MHz whatever wasn't supposedly high enough.

That is correct, but then STM decided to bottle out the 3D card market, which was nice of em, real nice. :devilish:

ailuros said:
If you ask me we never should have seen a KYRO2 at all. I would have much more prefered a K3 on same clock speeds even if it would had been slightly more expensive. If anything else it would have had a much longer compatibility track with a HW T&L unit and cube map support. Whoever was responsible for those kind of decisions was extremely shortsighted back then IMHO, whether it was ST or a shared responsibility between ST and IMG.
Back to top

Well, there is no denying the level of sucess of the KYRO series, they sold loads of em. They just didn't follow up. KYRO II was a good move, there was supposed to be a KYRO II SE but that never happened because STM pulled out the market. Then KYRO III could not be built at all (i.e. nor ith another AIB manufacturer) because of licensing issues with STM. They wanted royalties. :devilish:


azza said:
Im not sure Ailuros if Kyro 2 was such a bad idea or not. A 4-pipe version would have blitzed the gf2 and radeon competion of its time (especially a T&L equipped one). Maybe they failed to send to the market a high and low-end version at the same time.

It would have done, KYRO III had it been released would have come out some time after the GF2, however, so the GF3 when it was reelased would still have had this programmability (even though it was hardly used then) going for it.

azza said:
The Kyro 1 was too late to market and built on an outdated process.

They sold buttloads of KYRO's. Thats a successful product, it was only a budget card.

azza said:
If it came out at about 150mhz and Kyro 2 at say 220mhz to compete with the geforces of the time maybe Kyro might have caught on better. Remember the Kyro 2 was 15 million transisters vs ...15 mil in tnt2 ...23 in gf1 ...25 million in gf2 and about ~60mill in gf3.

To incerase the clockspeed of the KYRO any further would have meant increasing the number of PCB layers - this is what held up KYRO II SE (IIRC) eventually causing its market non-viability. The fact that the KYRO used cheap ram, a cheap pcb and a chap manufacturing process - none of the parts at all bleeding edge was its strong point for OEM customers. It cost bollock all and they could whack it in their machine for good enough 3D performance for them to say it has 'Blistering 3D' (you know how oems lie=).

azza said:
The best (for consumers) would probably have been to scrap kyro 1, push the kyro 2 against the geforce mx and tnt2 (which it beat in nearly all benchmarks) and push forward the release of the tnl equipped 4-pipe kyro 3 (rumoured spec) to compete against the radeon8500 and geforce 3 & 4. Im sure more chips would have been sold if a higher-end version was available on the market.

That wouldna worked, KYRO II was not held back for market timing, they had to up the number of board layers IIRC and they added 3 million transistors so engineering work had to be done between the two. I think they were trying to get the KYRO out as soon as possible.

azza said:
As long as series 5 can implement dx9 with performance, im sure many would love the alternative. The more competition the better. It all comes down to a partner willing to produce this competitive part and market it. Sega seem to be involved in an arcade version of the chip/tech... but who is manufacturing it????

At a completely uninformed guess, I would say TSMC.

Dave
(damn quote tags;p)
 
Back
Top