PDA

View Full Version : RSX: Vertex input limited? *FKATCT


Pages : 1 [2]

pakotlar
02-Jan-2007, 07:18
Are you refering the the PS2's "lack" of VRAM?

I think he is referring to more than just that. More like VU0/VU1 utilization maybe?

Nesh
02-Jan-2007, 07:55
Soooooo....is there anything we non-techgeeks can derive from this discussion? Any help? :razz:

I dont understand technical talk but from what I can understand from all these, RSX has issues and pales in comparisson to 360's GPU?

Can someone also be more clear regarding PS3's capabilities relative to 360's? Because lately there is too much negative talk regarding the PS3 I d like to hear something clear from the developers in these forums. Sometimes its hard to get the right idea of something especially when some "journalists" like spreading things that have no relation to reality like that other time when I read in a 360 site that the PS3 is less powerful than the half of 360's. They even used IBM's reports on Cell's performance and other stuff to back up their claims

popper
02-Jan-2007, 08:00
So as Dean say we (NT) use the WWS provided job manager, which is itself built as a custom SPU job scheduler on the standard PS3 job api (SPURS).
We (NT) also have a layer above the WWS job manager which provides a bunch of high level things to make our lives easier but all the hard work is done by WWS and the OS, Which is nice :)

which is how it should be,:smile:

btw all this talk about 40+ fpu's, everyones talking at 1080P or at least 1080i i assume?.

some random thoughts, its clear that currently (if ever) from the user linux side we cant see the GameOS, as is understandable, but can you from your dev side see and auto-mount such a user linux partition and for instance make an option inside your game to stream a section of gaming action to that partition?.

that would be a cool way to get master grade HD class ingame clips to put out on the likes of http://www.zudeo.com/az-web/app the new commercial Azureus (v3 bata)HD video torrent site to show your friends etc, and a good way to advertise ,all without some specialist HD grade video capturing kit from the users side.

perhaps even go as far as realtime encodeing the screen capture to AVC/h.264 at preset sizes (pip/SD/HD-720/1080/ip)for the users convenience?, assumeing it can do it realtime!.

DeanA
02-Jan-2007, 09:10
To understand you correctly, does that mean a ~28.1MB backbuffer (720p 4xMSAA 32bit and Z) is then resolved to ~3.5MB frontbuffer (or smaller for 24bit?) that is used as the framebuffer displayed on screen?
Yeah... that's right. You'd have a single AA backbuffer with Z, and a double buffered (normal sized) front buffer pair.

Cheers,
Dean

Jesus2006
02-Jan-2007, 09:19
That rather slow bus is as fast as the PS3's VRAM bus... It's just that in the X360 the same bus has to provide the CPU-RAM bandwith as well.

Of course, i meant in relation to the EDRAM bandwidth. Still RSX does not need this additional step nor is there the need of tiling.

I just wonder what the performance penalty is:

RSX: Rendering the image with MSAA and postprocessing in VRAM (1 pass)

Xenos: Rendering multiple tiles, apply MSAA in EDRAM, tiles moved back to VRAM, put together and postprocessing applied (??-passes).

Arwin
02-Jan-2007, 09:46
Where PS3 is radical is in the CELL SPEs; instead of multiple traditional CPU cores like the PPE, it has 1 PPE and 7 assymetric processors that are simpler (e.g. no branch prediction

Slight nitpick, but that is incorrect, isn't it? The SPEs do have (albeit quite limited) branch prediction.

popper
02-Jan-2007, 10:01
You can stream of a HDD on the 360 so that isn't in the PS3's favor. Blu-Ray won't increase graphic fidelity (it will allow you to hold more varied data but you are still limited by RAM). Cell would be the major difference I would think.

by saying 'You can stream of a HDD on the 360 so that isn't in the PS3's favor. Blu-Ray won't increase graphic fidelity' are you refering to AVC/VC-1 streaming video content?

according to the two main MS lead Devs/coders/testers http://forum.doom9.org/showthread.php?p=897452#post897452 veffremov and zambelli the 360 has some serious problems (nicely spun but clearly there) in rescaleing and they say that ALL the decoding is done in SOFTWARE even for/with the add-on HD-DVD with its specially coded faster software VC-1 codec, its all rather strange given the varid and informed thread content here so far.

as far as i know the PS3 also uses software only HD 1080P decoding but has the benefit of the spe's for that option?.

read that whole thread above to get the picture given the doom9 locals are specialists in their own right in that thread and veffremov is new ,zambelli on the other hand is a vet of D9 as you can probably tell :wink: .

Shifty Geezer
02-Jan-2007, 10:07
Oooo, fuzzy memory. But I'm pretty sure SPEs have no branch prediction hardware, and branch 'prediction' is performed by the developer with hints.

As for this thread, it's getting very messy. It's great when the devs get talking, but then every Joe jumps in to ask questions, and every other Joe jumps in to answer them. I'm kinda losing track of what's been said. I think Joker has said the problem he's encountering with PS3 is the memory, which seems to be the OS taking 32-64 MB more RAM than XB360 plus about 30 MB of backbuffer, and the vertex power of RSX. We know RSX has half the vertex setup of Xenos, but wasn't it also hinted that it was more effecient in selecting which vertices to actually render?

Has the original point and later comment been addressed yet?...
As stated previously though we are vertex heavy, typically having 6 or so inputs and outputs to the vertex shader, and thats after optimizing! The title of this thread states 'vertex input limited', but actually vertex output is also a limiting factor as per rsx docs. So our title is doubly hit on performance on rsx.Is RSX really much more limited in vertex throughput?

MasterDisaster
02-Jan-2007, 10:12
Of course, i meant in relation to the EDRAM bandwidth. Still RSX does not need this additional step nor is there the need of tiling.

I just wonder what the performance penalty is:

RSX: Rendering the image with MSAA and postprocessing in VRAM (1 pass)

Xenos: Rendering multiple tiles, apply MSAA in EDRAM, tiles moved back to VRAM, put together and postprocessing applied (??-passes).

assuming 720P 4xMAA, It's better to have 3 instant passes with a 5% hit on performance, than 1 single slow pass with an hit of 35-45%
don't confuse apples with oranges, the ROPs on Xenos uses a 158 GB/s internal bus

TTP
02-Jan-2007, 11:22
It's great when the devs get talking, but then every Joe jumps in to ask questions

Is RSX really much more limited in vertex throughput?

:P .

DeanA
02-Jan-2007, 11:47
can you from your dev side see and auto-mount such a user linux partition and for instance make an option inside your game to stream a section of gaming action to that partition?
Short answer is no.. The long answer would need someone (else) to break their NDA.

Cheers,
Dean

Crossbar
02-Jan-2007, 11:48
Oooo, fuzzy memory. But I'm pretty sure SPEs have no branch prediction hardware, and branch 'prediction' is performed by the developer with hints.
I think you are right about this. Some hints are probably added by the compiler as well and some branches are transformed to conditional assignments. And when the SPU take a branch penalty it is much cheaper than on a ordinary CPU anyway. DeanoC has already stated that this is non issue for them, I think that sums it pretty much.

I think he mentioned it did, specifically when he mentioned the ability to do MSAA with FP10 and how porting this to the PS3 had the issue the FP16 wasn't compatible with MSAA and there was no time to design a shader based solution.
If this is the case for many multiplatform developers, I guess we will see a major improvement in PS3 multiplatform titles when they get tools/frameworks supporting nao32 style HDR representations. It seems like it is more the tools than the RSX it self that is an issue for joker.

Regarding the RAM space for textures, it would be interesting to know if any PS3 developer see a possibility that SPEs will be used for decompressing textures in the future with higher compression rates than what is possible with DXTC, using for example wavelet style compression techniques? Could that be a way to keep more textures available in main RAM, trading RAM space for CPU cycles?

If the 360 got some cpu cycles to spare I guess it could also benefit from this tecnhnique.

DeanA
02-Jan-2007, 12:04
Regarding the RAM space for textures, it would be interesting to know if any PS3 developer see a possibility that SPEs will be used for decompressing textures in the future with higher compression rates than what is possible with DXTC, using for example wavelet style compression techniques? Could that be a way to keep more textures available in main RAM, trading RAM space for CPU cycles?
Yes, this is possible. The things to be aware of are that the PS3/360 GPUs would both require the texture to be in a natively supported format before using it. As such you would either have to decompress quite some time (ie. a number of frames) before use (which is quite doable, if your engine/assets can identify usage of these compressed textures ahead of time) into some kind of LRU cache, or have your decompression/use of textures closely tied together via some kind of GPU/CPU synchronisation method - decompressing into a smaller buffer area directly before use. Although to be honest, in this latter case, you'd probably find that you'd waste large amounts of GPU time stalling on CPU/SPU decompression of those textures... so it's probably not a good idea to do something like this.. :)

Cheers,
Dean

Crossbar
02-Jan-2007, 12:50
Thanks for the answer.
... so it's probably not a good idea to do something like this.. :)

Is it a correct interpretation that you above are refering to the "closely tied together via some kind of GPU/CPU synchronisation method - decompressing into a smaller buffer area directly before use" and that a decompression into a LRU cache is a more viable option in the future.

Sorry, I found your post a little ambigous with regard to this. :smile:

Arwin
02-Jan-2007, 12:56
I think you are right about this. Some hints are probably added by the compiler as well and some branches are transformed to conditional assignments. And when the SPU take a branch penalty it is much cheaper than on a ordinary CPU anyway. DeanoC has already stated that this is non issue for them, I think that sums it pretty much.

I think many people equate dynamic branch prediction with branch prediction. But it does have hardware support for branch prediction - just not dynamic. Sure, this will be a feature mostly used by compilers, but it can be used by a programmer as well. That, or maybe I just misunderstand, and most people will assume that branch prediction is automatically dynamic.

Branch Optimization. The SPE's hardware has no dynamic branch prediction but has a special branch hint instruction, which indicates likely taken branches. (italics mine)
http://domino.research.ibm.com/comm/research_projects.nsf/pages/cellcompiler.spe.html

Anyway, this has probably already been posted, but I came across it just now so here it is again just in case. ;)

Maximizing the power of the Cell Broadband Engine processor: 25 tips to optimal application performance
http://www-128.ibm.com/developerworks/power/library/pa-celltips1/

DeanA
02-Jan-2007, 13:03
Sorry, I found your post a little ambigous with regard to this. :smile:
Heh.. Yes, I meant that preemptively decompressing into some kind of LRU cache is going to be the best way of achieving this. The unknown (to me, at least) is how fast SPU decompression of wavelet (or other) compressed images would be - and hence how many frames latency would be required to decompress a useful number of textures prior to them being ready for use by the GPU.

Sorry if I wasn't clear in my original reply..

Cheers,
Dean

mboeller
02-Jan-2007, 14:40
That's correct. You basically save two 4X (or 2X) render targets (colour and depth), and only need a resolved 1X frambuffer, unless you also need to resolve the depth buffer for post processing, which is common.

as a layman;

IMHO I would be quite surprised if the Xenos eDRAM-chip would not contain a nicely featured 2D/3D graphics unit to do a lot of post processing for free (needs only a few million transistors). I would not be surprised at all when we hear in the future that the eDRAM chip contains even a little bit of talisman-voodoo (the image layer compositor) and that the X360 has an enhanced compositing DAC.


Manfred

[maven]
02-Jan-2007, 15:02
The unknown (to me, at least) is how fast SPU decompression of wavelet (or other) compressed images would be - and hence how many frames latency would be required to decompress a useful number of textures prior to them being ready for use by the GPU.

IME this would largely depend on how well you can get your entropy decoding scheme to run on the SPU, all the other elements of any form of transform coding should work very well on the SPU (as long as your data is tiled appropriately for the size of your local storage). Which reminds me that I still wanted to work on decompressing directly into DXTn...

nelg
02-Jan-2007, 15:35
Does the SPU job scheduler reside on one of the OS reserved SPUs?

EndR
02-Jan-2007, 16:19
<Keanu Reeves>Woaaa</Keanu Reeves>

Great thread!! Lots of info, lots of dev speaking.. nice!
Continue as you were please... just wanted to write my appriciation..

toodeloo..

:cool:

[maven]
02-Jan-2007, 16:19
Does the SPU job scheduler reside on one of the OS reserved SPUs?
I think it is a very small kernel on the SPUs themselves that accesses a shared job-list in main memory to extract (and create new) jobs. At least that's how I would do it... ;)

mckmas8808
02-Jan-2007, 17:11
On the topic of plateform ,when we straight port our first Ps3 engine to X360 we had much better performances.Then ,becoming ps3 exclusive ,we did a lot of rethink and tuning (and up to date kits and libs) ,it 's now a lot much better than x360 's....
This Is normal ,when you have oportunity play with the strenghts.

Most multiplateform titles can't get that dedicated tailoring pass so both versions will compromise.Ports,depending on $$ pressure ,will even more.

WHAT?!?!?! You make games on the PS3?!?!?! :shock: I thought you just played games, not make them.

Jawed
02-Jan-2007, 17:23
;901278']IME this would largely depend on how well you can get your entropy decoding scheme to run on the SPU, all the other elements of any form of transform coding should work very well on the SPU (as long as your data is tiled appropriately for the size of your local storage). Which reminds me that I still wanted to work on decompressing directly into DXTn...
What's Carmack's MegaTexture doing?...

Jawed

almighty
02-Jan-2007, 18:11
On the memory front of thing's regarding PS3 am i the only person that feel's that it will only be a problem with just mulitplat game's? Any developer that has a very good history of PS2 developing should have good experience of efficiently using memory.

Also this has to be one of the best and most informative threads ive read in a while, thanks to everyone for there input and keep it coming :)

3dilettante
02-Jan-2007, 18:26
I think you are right about this. Some hints are probably added by the compiler as well and some branches are transformed to conditional assignments. And when the SPU take a branch penalty it is much cheaper than on a ordinary CPU anyway. DeanoC has already stated that this is non issue for them, I think that sums it pretty much.


I think what was actually said was that cache misses for in-order CPUs tend to be more important than branch mispredictions. In-orders are often more limited by data latency (something the local store is good at keeping low for the SPE target workloads) than by instruction load latency.

The wording didn't read to me that branches are cheaper on the SPEs, just that other costs tend to obscure them.

Jov
02-Jan-2007, 18:41
Maybe it's just me - but I could swear this thread demonstrates pretty clearly that working well with RSX is not exactly clear to many people out there (developers or general public alike). In fact lot of the discussion sounds eerily reminiscent of 6 years ago.

Yeah déjÃ* vu! Now can we safely assume the RSX isn't as common as the G7x brethren as most here were led to believe (and no, I am not trying to imply its got G8x anything)?

I mean even if there were graphics features stripped from it still makes my comment valid. If so there must be some good reasoning behind these decisions.

Has it been officially confirmed the RSX die size difference was due to redundancy or is it still debatable?

Shifty Geezer
02-Jan-2007, 18:56
On the memory front of thing's regarding PS3 am i the only person that feel's that it will only be a problem with just mulitplat game's? Any developer that has a very good history of PS2 developing should have good experience of efficiently using memory.Lack of memory affects all titles. It's not like PS2 multiplatform titles had low quality textures while exclusives had high-quality textures. The platform was known for poor texturing because of lack of memory (and lack of texture compression) and you can't get round that. Or to illustrate another way, imagine PS3 had 128 MB RAM versus XB360's 512. You wouldn't expect the same quality assets on PS3 as XB360 then just because their 1st party exclusives, would you? Exclusives may benefit from better workarounds, but if RAM is limiting texturing on multiplatform titles, it'll limit it on exclusives too.

Also I'm not sure about your choice of words, "only be a problem with multiplat games" when these are likely to make up some 90% of the games!

Crossbar
02-Jan-2007, 19:05
I think what was actually said was that cache misses for in-order CPUs tend to be more important than branch mispredictions. In-orders are often more limited by data latency (something the local store is good at keeping low for the SPE target workloads) than by instruction load latency.

The wording didn't read to me that branches are cheaper on the SPEs, just that other costs tend to obscure them.

Yeah, you are correct, the quote concerned "a variable memory access patterns" meaning plenty of cache misses. I was actually thinking of branch misses to non-cached code when I wrote that, but failed to describe it. Actually the 50+ cycle number suggested by DeanoC sounds a bit low for access to main RAM, and still too high to be the latency figure of the level 2 cache. Maybe some NDA margins in there? :smile:

almighty
02-Jan-2007, 19:12
Lack of memory affects all titles. It's not like PS2 multiplatform titles had low quality textures while exclusives had high-quality textures. The platform was known for poor texturing because of lack of memory (and lack of texture compression) and you can't get round that. Or to illustrate another way, imagine PS3 had 128 MB RAM versus XB360's 512. You wouldn't expect the same quality assets on PS3 as XB360 then just because their 1st party exclusives, would you? Exclusives may benefit from better workarounds, but if RAM is limiting texturing on multiplatform titles, it'll limit it on exclusives too.

Also I'm not sure about your choice of words, "only be a problem with multiplat games" when these are likely to make up some 90% of the games!

Excellent points in that reply Shifty, just out of wonder do you know how PS3 and 360 stack up in terms of compression types available to them?

Onlooker1
02-Jan-2007, 19:15
Lack of memory affects all titles. It's not like PS2 multiplatform titles had low quality textures while exclusives had high-quality textures.
That is partly true but the ps2 had 32mb of memory and original xbox had 64mb of memory, that is a factor of 2x, and the xbox had a faster cpu as well.

Now it is 512mb (minus 60mb?) vs 512mb (minus 20mb?) with differences in memory organisation, pluses and minuses in bus speeds, and 7 usable cpus vs 3 cores. (There is also the possibility sony could re-evaluate the OS footprint needed while games were running and future titles might find from release 1.xx onwards, they have more memory. Firmware upgrade could be enforced as it is on PSP by bundling with the game).

So it would be impossible to say what this all boils down to even in multi-platform titles. But if you're expecting the same perceived texture disparity this generation vs ps2/xbox then the spec differences - even just in memory alone - wouldn't support such a view.

Rash'
02-Jan-2007, 20:46
Of the GPUs and CPUs on both consoles I think many would agree with me that RSX is the least radical design; it probably is also the best known in regards to what it can do, and what works well and what doesn't. Just my 2 cents on that :smile: I'm confused. If the workings of RSX are that well know, then why are there disagreements about what does and doesn't work well on the GPU?

Personally, I think it's because different developers have different creative solutions to the hurdles any hardware presents. What some describe as an impassable problem may just be a issue that needs a different creative resolution.

This was the point I was fundamentally trying to make, which I think you overlooked because of the "radical design" remark. I accept your point that maybe the choice of word wasn't appropriate for all aspects of the PS3 design, but is it not premature to make comparisons on hardware that clearly developers, whether they be first, second or third party haven't fully come to terms with?

Fredrik
02-Jan-2007, 20:55
[...]
Now it is 512mb (minus 60mb?) vs 512mb (minus 20mb?) with differences in memory organisation, pluses and minuses in bus speeds, and 7 usable cpus vs 3 cores. (There is also the possibility sony could re-evaluate the OS footprint needed while games were running and future titles might find from release 1.xx onwards, they have more memory. Firmware upgrade could be enforced as it is on PSP by bundling with the game).
[...]

I don't think they will re-evaluate their OS requirements. IMO 32MB of RAM over 512MB will only make small noticeable difference in texture quality; this may be compensated with a higher poly count. What is now seen as a problem that penalizes the console may be in the future a big sales point: it's likely that they reserved that much because they have big things in mind. And having as much processor power and RAM set aside may let them do amazing stuff, and that could distinguish their console.

I would say that in the short term we will see that SPE used to stream content to the PSP, that could allow the player having a PS3 game running on PS3 and play it on their PSP (with some sort of location free software). That is something that has been largely hinted by Sony. If I had to bet on more stuff coming for that GameOS I would presume they have something in mind related to the EyeToy2. I don't know if it could be made, but I suppose they could have a windowed videochat running on top of a PS3 game while you are playing. That's a more close game experience than pure online play, you see your friend playing as if he was sitting next to you.

I'm not a dev, so I don't know how many thinks can be made using a SPE and 64MB of RAM, but I guess there's enough to do some little amazing things.

3dilettante
02-Jan-2007, 20:59
Yeah, you are correct, the quote concerned "a variable memory access patterns" meaning plenty of cache misses. I was actually thinking of branch misses to non-cached code when I wrote that, but failed to describe it.

If instruction bandwidth were the key issue, the outcome would be a less clear win for the SPE, since a performance-critical branch misprediction that missed cache would likely only miss cache once.

If that branch is in some hot code that is run for a long time, it would remain in cache, and the longer latency of the local store would in the end turn out to be a performance loss for the SPE.

On the other hand, things get fuzzier still if the hot code exceeds the size of the L1 instruction cache, in which case the slower L2 plays a factor, depending on just how often the L1 I cache misses.

It seems from current trends that it's minimizing data load latency that's more important.


Actually the 50+ cycle number suggested by DeanoC sounds a bit low for access to main RAM, and still too high to be the latency figure of the level 2 cache. Maybe some NDA margins in there? :smile:
Maybe he meant 50+ ns, which sounds reasonable.

TurnDragoZeroV2G
02-Jan-2007, 21:17
I'm confused. If the workings of RSX are that well know, then why are there disagreements about what does and doesn't work well on the GPU?

I think there's far more agreement than disagreement concerning what does and doesn't work well on RSX. On the other hand, there's plenty of disagreement as to how big a deal all of it is.

Laa-Yosh
02-Jan-2007, 21:30
by saying 'You can stream of a HDD on the 360 so that isn't in the PS3's favor. Blu-Ray won't increase graphic fidelity' are you refering to AVC/VC-1 streaming video content?

I think you've got this wrong, the topic was that even though a BluRay disc can hold more assets - models, textures, levels, animations, sounds etc. - it still wouldn't make a game look better on the PS3, because the bottleneck is the smaller amount of available system RAM from which the game can display/use these assets.

Gubbi
02-Jan-2007, 23:19
It seems from current trends that it's minimizing data load latency that's more important.

Maybe he meant 50+ ns, which sounds reasonable.

Slide 23 of this (http://www-static.cc.gatech.edu/classes/AY2007/cs8803hpc_fall/lectures/22_Cell.ppt) suggests main memory-to-SPE latency of ~170ns for blocking read access (ie. load-to-use latency). That seems crazy high.

Just doing an inter-SPE DMA transfer is quite costly at ~100ns. Latency from communicating with the main memory modules themselves looks like ~70ns, which is reasonable I suppose.

The slides make it all the more clear that you would want to operate out of the LS and *only* the LS.

Cheers

edit: Or were you talking about PPU I$ latencies? If so, move along, nothing to see.

rounin
02-Jan-2007, 23:31
Sony must reduce its os memory usage in order to be competitive , if you think about it, 64mb are all the memory the xbox had !

I don't know. I would rather wait before saying that : what happens if doing so allows them to do some really amazing things later on.

3dilettante
02-Jan-2007, 23:39
Slide 23 of this (http://www-static.cc.gatech.edu/classes/AY2007/cs8803hpc_fall/lectures/22_Cell.ppt) suggests main memory-to-SPE latency of ~170ns for blocking read access (ie. load-to-use latency). That seems crazy high.

Just doing an inter-SPE DMA transfer is quite costly at ~100ns. Latency from communicating with the main memory modules themselves looks like ~70ns, which is reasonable I suppose.

The slides make it all the more clear that you would want to operate out of the LS and *only* the LS.

Cheers

edit: Or were you talking about PPU I$ latencies? If so, move along, nothing to see.

The quoted section contained two separate points.
By being deterministic and optimized for larger DMA transfers, the LS can lower the average apparent memory latency for the data sets that can be broken down properly.
A hundred loads from the LS at 6 cycles every time after a block fetch is better than a cache-unfriendly stream of loads that can take hundreds of cycles each, or use so many prefetches that it slaughters instruction bandwidth.

For instructions, the LS may be slightly less optimal than a good fast I cache. Since in-orders worry about data latency more than instruction fetch latency, it is probably not as important that the LS has a higher latency.

The 50 ns portion was me trying to interpret what DeanoC meant by "~50+ cycles". If he's comparing the SPE to other chips, the A64 can get best-case latencies in the neighborhood of 50 ns.

Tahir2
02-Jan-2007, 23:41
Hmm, joker has admitted he is new to PS3 coding and there are developers here that have been having a stab at the behemoth that is PS3 for a bit longer. I remember some of them stating quite clearly PS3 requires a rethink to coding and some experienced problems that seem to be resolved now.

I would describe these complaints by joker454 as exploratory steps into the world of PS3 .. in the most respectful and nicest way possible of course. :)

And Xenos might be superior in certain ways to RSX but these devices do not act on their own in complex systems.

Edit: replying to a post that has been already deleted but I am keeping this post as I think it might explain the reason for some of the complaints. No disrespect intended of course.

Gubbi
03-Jan-2007, 00:18
The quoted section contained two separate points.
By being deterministic and optimized for larger DMA transfers, the LS can lower the average apparent memory latency for the data sets that can be broken down properly.
A hundred loads from the LS at 6 cycles every time after a block fetch is better than a cache-unfriendly stream of loads that can take hundreds of cycles each, or use so many prefetches that it slaughters instruction bandwidth.

Sorry, you (you and Crossbar) got me confused by discussing I$ misses in a SPE context. You'd need to initiate a DMA-request to load more code into the LS, and hence a "miss" *is* a data-dependency, - on the i-stream.


For instructions, the LS may be slightly less optimal than a good fast I cache. Since in-orders worry about data latency more than instruction fetch latency, it is probably not as important that the LS has a higher latency.

I think you're right, your core is already hosed by the mispredict penalty, the extra latency of the initial i-fetch after a mispredict is probably in the noise.

The only place I could see the 6 cycle LS latency have a significant impact on i-stream accesses is that it sets the lower bound for distance between software-BTB priming and branching.


The 50 ns portion was me trying to interpret what DeanoC meant by "~50+ cycles". If he's comparing the SPE to other chips, the A64 can get best-case latencies in the neighborhood of 50 ns.

It's probably easier to ask DeanoC directly since 50ns doesn't rhyme with anything. Mayhaps he meant 50+ instruction penalty, equivalent to 25 cycles of dual-issue/commit (which seems to high though).

Cheers

ShootMyMonkey
03-Jan-2007, 00:24
I don't think they will re-evaluate their OS requirements. IMO 32MB of RAM over 512MB will only make small noticeable difference in texture quality; this may be compensated with a higher poly count. What is now seen as a problem that penalizes the console may be in the future a big sales point: it's likely that they reserved that much because they have big things in mind. And having as much processor power and RAM set aside may let them do amazing stuff, and that could distinguish their console.
Well, if it were up to me, I wouldn't really give a damn about having all sorts of ancillary functions running at the same time as a game and would have preferred to just leave the resident OS be a kernel and nothing more (at least while a game is running). But I guess some people might have some desire to keep a webpage of cheat codes open at the same time they're playing the game or something or whatever. And I can imagine they might take a small hit on account of electing not to lock down to all nature of proprietary peripherals and thereby needing a more formal driver layer for various things... but I doubt that's a huge drain.

All the same, I think it boils down to being unable to plan ahead of time what they intend to throw in feature-wise. They're overestimating so that they have breathing room. While I can believe there's room for them to decrease the memory requirements (and just have those later games require a certain update to be installed), I doubt it will ever happen. Of course, you could just as easily argue that Microsoft has the opposite problem in that if Sony develops some sort of killer app on PS3, 360's OS may not have the necessary memory space to follow suit. Again, something that I doubt will ever happen.

3dcgi
03-Jan-2007, 02:08
wasn't it also hinted that it was more effecient in selecting which vertices to actually render?
Without thinking much about it I'm not sure what RSX could be doing to be more efficient here? Is Shifty's memory faulty or does someone have an explaination?

ShootMyMonkey
03-Jan-2007, 04:41
Without thinking much about it I'm not sure what RSX could be doing to be more efficient here? Is Shifty's memory faulty or does someone have an explaination?
Perhaps he's thinking of vertices not getting reprocessed as often since RSX has its rather large post-transform caches? Well, it's true that if you partition the streams nicely enough to keep re-accessing verts that are sitting in the cache, then you can get significantly better vertex throughput out of it than the raw vertex shading performance would suggest, and RSX's caches being a few times larger than Xenos' means your odds are a little better. There's not a whole lot else I could think of that sounds anything similar to what Shifty was saying.

Crossbar
03-Jan-2007, 09:00
;901278']IME this would largely depend on how well you can get your entropy decoding scheme to run on the SPU, all the other elements of any form of transform coding should work very well on the SPU (as long as your data is tiled appropriately for the size of your local storage). Which reminds me that I still wanted to work on decompressing directly into DXTn...
I downloaded some jpeg-libs from intel (http://www.intel.com/cd/software/products/asmo-na/eng/219766.htm)just for fun.

And you are right, from running some rough benchmarks it seems that the Pentium perform much better if the output fits within the cache. My Pentium 4 at 3.2 MHz (Presler, 1 MB cache) decompressed a 15 kB jpeg file into a 250 kB (24-bit) bitmap in 2.8 ms. You could probably do better on an SPU with some hand tuned code taking advantage of the huge register file.

Yes, you really would like to decompress it into a DXTC texture, perhaps there are better compressions schemes for that. Huffman (edit: or LZW or some other lossless compression, probably with some custom optimisations,for the particular DXTC in question) encoding of a DXTC texture with some repetetive colours could perhaps give a good result, both with regard to compression rate and decompression speed?

Kryton
03-Jan-2007, 09:38
I downloaded some jpeg-libs from intel (http://www.intel.com/cd/software/products/asmo-na/eng/219766.htm)just for fun.

And you are right, from running some rough benchmarks it seems that the Pentium perform much better if the output fits within the cache. My Pentium 4 at 3.2 MHz (Presler, 1 MB cache) decompressed a 15 kB jpeg file into a 250 kB (24-bit) bitmap in 2.8 ms. You could probably do better on an SPU with some hand tuned code taking advantage of the huge register file.

Yes, you really would like to decompress it into a DXTC texture, perhaps there are better compressions schemes for that. Huffman (edit: or LZW or some other lossless encoding) encoding of a DXTC texture with some repetetive colours could perhaps give a good result, both with regard to compression rate and decompression speed?

This is probably more appropriate in the Cell programming thread, but did the Intel library make use of SSE? JPEG has the quantization operation which could be tweaked nicely if it isn't vectorized already (on Intel chips).

Fafalada
03-Jan-2007, 09:40
And you are right, from running some rough benchmarks it seems that the Pentium perform much better if the output fits within the cache. My Pentium 4 at 3.2 MHz (Presler, 1 MB cache) decompressed a 15 kB jpeg file into a 250 kB (24-bit) bitmap in 2.8 ms.
Was that standard jpeg (not some 2000 wavelet variant?). It's just curious - PS2 decodes a JPEG of that size (I assumed ~295*295 pixels based on your 24bit size), in close to 0.8ms - granted it's a hardware decoder - but it's also over 6 years old now.
I imagine SPE should absolutely fly at DCT macroblock decoding though, I'm sure someone will gave a go at it sooner or later.

I doubt it will ever happen
That depends on how much they overestimated and how much was result of code like
char *KenIsGreat = new char[1024*1024*16]; //never remove!For what's worth - they did "unreserve" half of the kernel reserved space on PSP eventually.

Crossbar
03-Jan-2007, 09:47
This is probably more appropriate in the Cell programming thread, but did the Intel library make use of SSE? JPEG has the quantization operation which could be tweaked nicely if it isn't vectorized already (on Intel chips).
Yes, the intel library contains tweked code for almost every single intel CPU, so you can be pretty sure it used the SSE instruction set.

The mods can feel free to move this to the Cell programming thread, if they think it's appropriate. :)

Crossbar
03-Jan-2007, 10:01
Was that standard jpeg (not some 2000 wavelet variant?).
Yeah, just a standard jpeg. Maybe I give the jp2 version a go later today if I have the time.

popper
03-Jan-2007, 10:01
for what its werth theres been PPC/Altivec DCT macroblock decoding in x264, ffmpeg, mplayer/mencoder for while now (dont know if its profiled though), so it shouldnt be to hard to port/patch that to use spe's, just a little effort and time, plus users get to benefit if you did it.

Crossbar
03-Jan-2007, 10:26
for what its werth theres been PPC/Altivec DCT macroblock decoding in x264, ffmpeg, mplayer/mencoder for while now (dont know if its profiled though), so it shouldnt be to hard to port/patch that to use spe's, just a little effort and time, plus users get to benefit if you did it.

Might as well do that once I get my hands on a Cell in a couple of months, I enjoy tweaking code. I've found the Cell performance thread really enjoyable. :)

Is that code available at Sourceforge or elsewhere?

popper
03-Jan-2007, 10:56
yes SF is were you find leads to most of them , but theres other cutting edge builds around, ill see if i can find time later tonight and get some urls if no one else posts them first.

http://forum.doom9.org/index.php is were you find most devs for that but their mainly x86 based (guess they dont have PS3 yet 8) but you get good leads to source there.

ffmpeg is the main one that most use as the base, but its always good to keep x264 tweeked to produce the current best (AVC) en/decode as well :wink:

Schweet
03-Jan-2007, 11:03
From the main original article, there seems to be very obvious incorrect commentary about the PS3 and the Cell and the RSX.
For one very obvious one: "...but its spu's can't see main memory"
This is untrue - all SPE's can fetch directly from main memory into their LS. In fact that is the suggested manner of operation in the Sony documentation. The author seems to no have had actual PS3 experience, because one of the things Sony also state (and the cheif RSX engineer said this many times on the dev forums) - "DONT JUST PORT HLSL TO RSX". Reason is that the RSX is not a traditional GPU and doesn't work well with HLSL compiled assembler (horribly inefficient for the GPU). The RSX has a _tuned_ Cg compiler that is supposed to be used, and just converting shaders to Cg produces a perf increase of between 10 and 50 times. Again, this isn't my words, this is what Sony and their devs have stated many times on the dev forums.

The RSX is no polygon beast.. but its easily comparable to the X360's GPU. The main differences between the two are tools. And this is where most PC coders get into trouble - if it doesnt compile in MSVC then there must be something wrong. Alot of the time in console dev, its the platform specific modifications you do that get you the "true" capabilities of the hardware. Like anything, you put code onto a platform that is inefficient for it.. it will run slow. Its that simple. Theres just too many PC coders who have jumped into console dev, expecting the compiler.. the toolsets.. and the libraries to do everything for them. Sadly.. this is the typical PC coding world.. alot less understanding of how things work, although.. its funny because most PC coders think they are smarter than compiler writers and use compiler overrides everywhere.. thats one of my favourites :)

Anyway.. sorry if this has been covered. I'd just like to quash the idea that the RSX is somehow a poor GPU.

popper
03-Jan-2007, 11:31
well i dont claim to know anything but by all accounts even before people realised with the cell/ps3, the PPC Altivec GCC (and its free to use derivatives) produces FAR less optimised code than the x86 codebase devs get, and that needs sorting PDQ/ASAP.

theres more to PPC/Altivec than apple and their special patched versions, we need far better gcc/Altivec default produced code and we need it now, can YOU and your compiler friends do that please... :roll: for EVERYONES benefit.

patch it,submit it, do whatever it takes to get it in base and tell people about it..

DeanA
03-Jan-2007, 11:32
The RSX has a _tuned_ Cg compiler that is supposed to be used, and just converting shaders to Cg produces a perf increase of between 10 and 50 times. Again, this isn't my words, this is what Sony and their devs have stated many times on the dev forums.
Whaaaaaaaaaaaaa? You can only use Cg on PS3/RSX.. so what on earth is this 10->50 times improvement relative to?

I've never read anything like this on the PS3 dev forums, that's for sure.

Cheers,
Dean

ShootMyMonkey
03-Jan-2007, 17:21
That depends on how much they overestimated and how much was result of code like
char *KenIsGreat = new char[1024*1024*16]; //never remove!For what's worth - they did "unreserve" half of the kernel reserved space on PSP eventually.
Hm. That's news to me. I haven't touched PSP in just over a year, so I hadn't paid much attention, but I'm surprised they'd take the trouble. Of course, if PS3 were eating up the same fraction of memory that the PSP OS was originally taking up, we'd have an OS that reserves 128 MB...

"DONT JUST PORT HLSL TO RSX". Reason is that the RSX is not a traditional GPU and doesn't work well with HLSL compiled assembler (horribly inefficient for the GPU). The RSX has a _tuned_ Cg compiler that is supposed to be used, and just converting shaders to Cg produces a perf increase of between 10 and 50 times.
Cg and HLSL are pretty similar, and the real point is not that HLSL is a bad performer or that RSX isn't traditional (which is everything but true), but that normal practices you might do on other (read:ATI) architectures aren't the best thing you can do for RSX. Secondly, it's fairly accepted that the Cg compiler optimizes shader code better than Microsoft's HLSL compiler. Even on the PC, I've seen this quite often. But 10-50x? There isn't that much breathing room in the first place and there never could have been. Perhaps you mistook 10-50% for 10-50x? Best improvement I've ever seen, and this was on a pretty contrived example, was around 40%

Shezad
03-Jan-2007, 19:38
(and the cheif RSX engineer said this many times on the dev forums) - "DONT JUST PORT HLSL TO RSX". Reason is that the RSX is not a traditional GPU and doesn't work well with HLSL compiled assembler (horribly inefficient for the GPU)


Who is the mysterious RSX chief engineer posting on the forum Rudolf the red-nosed reindeer?
Please..the thread went quite well until now, stop trolling.

Shezad
03-Jan-2007, 19:41
Cg and HLSL are pretty similar, and the real point is not that HLSL is a bad performer or that RSX isn't traditional (which is everything but true),

There is not HLSL for Playstation 3, in this conventional GPU you can use only CG.

ShootMyMonkey
03-Jan-2007, 19:49
There is not HLSL for Playstation 3, in this conventional GPU you can use only CG.
That's not quite what I meant, but that it's quite often possible for straight HLSL code you wrote for Xbox360 or PC left as is to compile through the Cg compiler (assuming you avoid platform-specific features). Fundamentally, they're the same language and Cg was developed jointly with MS, who decided to wave their magic wand and said "I rename you Microsoft HLSL". And when referring to differences between Cg and HLSL compilations on the PC, I was also referring to the equivalent nVidia hardware on the PC.

Geo
03-Jan-2007, 19:59
Who is the fantomatic RSX chief engineer posting on the forum Rudolf the red-nosed reindeer?
Please..the thread went quite well until now, stop trolling.

Please allow new members a few rounds to make their points well understood before you shred them to bits on a personal basis (their points are fair game, however, immediately). Some alternative formulations of the above --"That's interesting, I hadn't heard that. Can you quote or provide a link to a source?" or even "Hmm, news to me and I follow this stuff closely --can you provide a source for this?" Try it, it's pretty easy. Thank you.

Shezad
03-Jan-2007, 20:35
because one of the things Sony also state (and the cheif RSX engineer said this many times on the dev forums) - "DONT JUST PORT HLSL TO RSX". Reason is that the RSX is not a traditional GPU and doesn't work well with HLSL compiled assembler (horribly inefficient for the GPU).
.

That's interesting, I hadn't heard that, as far as I know the GPU is quite standard, can you provide more details?


The RSX has a _tuned_ Cg compiler that is supposed to be used, and just converting shaders to Cg produces a perf increase of between 10 and 50 times. Again, this isn't my words, this is what Sony and their devs have stated many times on the dev forums.


Hmm, news to me and I follow this stuff closely, so I could speed-up my application from 300 to 1500 frames/sec.?


The RSX is no polygon beast.. but its easily comparable to the X360's GPU. The main differences between the two are tools.
.

Are you telling us that the RSX is unified with EDRAM or that both require a power supply?

Shifty Geezer
04-Jan-2007, 14:31
Perhaps he's thinking of vertices not getting reprocessed as often since RSX has its rather large post-transform caches? There's not a whole lot else I could think of that sounds anything similar to what Shifty was saying.It was in that Informer 'news' article a while back that RSX was 'broken' and had half the power of Xenos. It mentioned triangle setup at 250 Million per second versus Xenos' 500 Million. nAo was asking about some jiggery pokery that suggested RSX was being more efficient. Is that the same thing as Joker's vertex limits? Or something else? :???:

nAo
04-Jan-2007, 14:43
nAo was asking about some jiggery pokery that suggested RSX was being more efficient.
Umh??

Shifty Geezer
04-Jan-2007, 15:10
Umh??You were asking if Xenos did such-and-such, IIRC. I probably ought to dig the thread up (where by now it's probably a festering corpse that'll smell really bad if I animate it).

Panajev2001a
04-Jan-2007, 15:11
Umh??

Don't mind him... he is just hitting on you ;).


















edit: you ought to be faithful...



edit: to Ken of course you cursing Italian :P.

danteye
04-Jan-2007, 17:19
Really a nice thread!! Go on guys! :D

Fafalada
05-Jan-2007, 02:52
Don't mind him... he is just hitting on you
You could say Shifty was just looking for something like BF, but nAo isn't interested in his CULLinary skills.

:twisted:

Hm. That's news to me. I haven't touched PSP in just over a year, so I hadn't paid much attention, but I'm surprised they'd take the trouble.
Admittedly it's kinda funny since PSP has no competition hw-wise, you can only hope PS3 guys would be as eager to please.
But actually I don't think it was trouble at all - since the space made accessible has a single specific function for OS, it could technically have been made available from the start. It did help to make PSP reserved space look a lot more more reasonable too.

Darkon
05-Jan-2007, 03:11
The only thing i got from this thread is that one gpu is superior in pixel shading the other at vertex, which isn't exactly new information.:sad:



Edit

How much memory was initially reserved for PSP OS ?

ShootMyMonkey
05-Jan-2007, 03:34
The only thing i got from this thread is that one gpu is superior in pixel shading the other at vertex, which isn't exactly new information.:sad:
Well, you should have also gotten that they both suck overall, but that too isn't new information.

How much memory was initially reserved for PSP OS ?
8 MB, IIRC. It's rather funny to think of how 8 MB was originally supposed to be the total amount of RAM in the PSP, but Sony upped it to 32 and then took away 8 for themselves.

Admittedly it's kinda funny since PSP has no competition hw-wise, you can only hope PS3 guys would be as eager to please.
Yeah, well, even a year ago when I was wrestling with that monstrosity, I remember a lot of people on the devnet newsgroups complaining that 8 MB is just way too much.

Fafalada
05-Jan-2007, 04:10
Yeah, well, even a year ago when I was wrestling with that monstrosity, I remember a lot of people on the devnet newsgroups complaining that 8 MB is just way too much.
I thought main mem reservation was a non-issue to be honest, though I was under mistaken assumption we would be allowed to store sound data in MediaEngine eDram :cry: .

But with current setup, compared to the past - 4MB of kernel space is reasonable enough - consider PS2 had 1MB that never did anything - at all, for 6 years, at least PSP kernel "does" some thing things.

mckmas8808
05-Jan-2007, 07:23
But with current setup, compared to the past - 4MB of kernel space is reasonable enough - consider PS2 had 1MB that never did anything - at all, for 6 years, at least PSP kernel "does" some thing things.


What does it do?

Zeross
05-Jan-2007, 09:30
What does it do?

You can find some infos here (http://pc.watch.impress.co.jp/docs/2005/0323/kaigai166.htm)

http://pc.watch.impress.co.jp/docs/2005/0323/kaigai_8.jpg

Shifty Geezer
05-Jan-2007, 13:23
Is background OS expected to provide support functins than, such as file access? Does that mean some of that reserved OS is being used for game operations that would need to be coded otherwise if not present? And is that how XB360's OS operates too, providing a backbone of functions?

blakjedi
05-Jan-2007, 17:13
Question for the devs: instead of a comparison of the GPU architectures of either system...

What would have been ideal in each system? What could/should have been done additionally in hardware that would have made coding a whole lot easier and not incurred too much extra cost for the manufacturer and/or been a more reasonable tradeoff vice the featureset currently included?

pjbliverpool
05-Jan-2007, 20:02
I should also add that those were figures for one particular project. The other one has its proof-of-concept levels being enormous with really huge draw distances and very few occluded polys (lots of wide-open area), loads of full-screen effects, and so one... and we stick around 40 fps on that one -- 40 on the 360, that is. PC is lucky to get 30, and they don't have a PS3 build yet.

As a point of reference, what GPU is that using? And is it an issue of capability or optimisation?

ShootMyMonkey
05-Jan-2007, 20:14
As a point of reference, what GPU is that using? And is it an issue of capability or optimisation?
Either GeForce 7800 cards or ATI X1900s... doesn't make much difference because we're mainly CPU-limited (occasionally bandwidth-limited) on the PC for that particular project. Animation and physics sims are among the big hogs of CPU time in these gigantic levels, and no, we don't use any middleware.

pjbliverpool
05-Jan-2007, 21:46
Either GeForce 7800 cards or ATI X1900s... doesn't make much difference because we're mainly CPU-limited (occasionally bandwidth-limited) on the PC for that particular project. Animation and physics sims are among the big hogs of CPU time in these gigantic levels, and no, we don't use any middleware.

I guess the question then becomes which CPU? And if dual core, are you using both cores either partually or fully?

Also, how do you think a quad core would effect things? Is it even possible to leverage 4 x86 cores for that kind of project?

ERP
05-Jan-2007, 23:31
I guess the question then becomes which CPU? And if dual core, are you using both cores either partually or fully?

Also, how do you think a quad core would effect things? Is it even possible to leverage 4 x86 cores for that kind of project?

The reason you end up CPU limited on PC but not on console has little to do with the raw speed of the CPU, it's largely to do with the API overhead.

ShootMyMonkey
06-Jan-2007, 00:36
I guess the question then becomes which CPU? And if dual core, are you using both cores either partually or fully?

Also, how do you think a quad core would effect things? Is it even possible to leverage 4 x86 cores for that kind of project?
I figure it's worth echoing what ERP said prior to anything else, but to answer your questions... It again seems to be the case with any CPU. We're not really multithreading anything on the PC, and even the extent of work we have in that respect on 360 is pretty localized to things like audio, physics, and AI. So on the PC we have hell whether it's a dual-core A64 or a single-core P4. Bear in mind that neither of these projects are likely to be out until mid-2008. In a funny way, in spite of having a very incomplete codebase for the PS3, there are already enough SPE-directed tasks to give it something of an edge over the state of code on other platforms -- even though the only SPE code we have at the moment are the really obvious candidates.

Assuming that we can ultimately get around to making more cores fly on the 360 and PS3, and we follow in kind on the PC, you can theoretically see a benefit over 4 or 6 or 8 cores, but I can't say it'll be terrific or horrific... We'll cross that bridge when we come to it.

Though there are definitely cases where raw CPU power just for our end of the code is a problem on the PC but not on the consoles, but they come down mainly to just throwing a lot of activity into a scene.

AlStrong
06-Jan-2007, 01:25
The reason you end up CPU limited on PC but not on console has little to do with the raw speed of the CPU, it's largely to do with the API overhead.

...

Does anything change to a significant degree with Vista, SMM?

ShootMyMonkey
06-Jan-2007, 02:08
Does anything change to a significant degree with Vista, SMM?
Can't say I've bothered to try or even look up anybody's examples. The only DX10 tests I've even seen were all running on the reference rasterizer, so it was more than a miracle if anything ran at better than 5 fps.

If I were to guess whether the API overhead will decrease enough to make a big difference, I'd have to say... I'm not holding my breath. I'm not capable of expecting things to go any direction but downhill.

pjbliverpool
06-Jan-2007, 20:23
I figure it's worth echoing what ERP said prior to anything else, but to answer your questions... It again seems to be the case with any CPU. We're not really multithreading anything on the PC, and even the extent of work we have in that respect on 360 is pretty localized to things like audio, physics, and AI. So on the PC we have hell whether it's a dual-core A64 or a single-core P4. Bear in mind that neither of these projects are likely to be out until mid-2008. In a funny way, in spite of having a very incomplete codebase for the PS3, there are already enough SPE-directed tasks to give it something of an edge over the state of code on other platforms -- even though the only SPE code we have at the moment are the really obvious candidates.

Assuming that we can ultimately get around to making more cores fly on the 360 and PS3, and we follow in kind on the PC, you can theoretically see a benefit over 4 or 6 or 8 cores, but I can't say it'll be terrific or horrific... We'll cross that bridge when we come to it.

Though there are definitely cases where raw CPU power just for our end of the code is a problem on the PC but not on the consoles, but they come down mainly to just throwing a lot of activity into a scene.

Thanks for the responses, so the bottom line really is that regardless how much raw power the CPU has you can't really get to it on a PC while with a console its fully accessable?

ERP
06-Jan-2007, 21:25
Does anything change to a significant degree with Vista, SMM?

Theoretically yes, the Vista driver model has less overhead. However PC drivers will still be black boxes, with the driver writers sticking God knows what work arounds in for software that does stupid stuff, so who knows how much of that potential improvement we'll actually see.

Butta
05-Feb-2007, 15:29
To understand you correctly, does that mean a ~28.1MB backbuffer (720p 4xMSAA 32bit and Z) is then resolved to ~3.5MB frontbuffer (or smaller for 24bit?) that is used as the framebuffer displayed on screen?

How much memory would a 720p 4xMSAA and Z take in terms of backbuffer and frontbuffer on Xenos? I imagine this would need to be heavily tiled?

Crossbar
05-Feb-2007, 16:26
How much memory would a 720p 4xMSAA and Z take in terms of backbuffer and frontbuffer on Xenos? I imagine this would need to be heavily tiled?

I recommend the search function, but you can get the details over here (http://www.beyond3d.com/articles/xenos/index.php?p=05). ;)