New Heavenly Sword Info (screens included)

idsn6 · May 22, 2007

patsu said:
EDIT: Holy Sh*t ! Why didn't anyone highlight this before ? It will make a huge difference.

Wait, SPUs have caches? Little snoopy caches for main memory address space?
That's actually really cool.

Kryton · May 22, 2007

idsn6 said:
Wait, SPUs have caches? Little snoopy caches for main memory address space?
That's actually really cool.

Yeah, remember you've got the EIB (a token ring bus) which can push data between SPUs far faster than consulting either memory pool.

idsn6 · May 22, 2007

Kryton said:
Yeah, remember you've got the EIB (a token ring bus) which can push data between SPUs far faster than consulting either memory pool.

Yeah, a snooping protocol actually makes a lot of sense for the EIB (high bandwidth for the coherence traffic, broadcasting to all SPUs).

one · May 22, 2007

patsu said:
Holy Sh*t ! Why didn't anyone highlight this before ? It will make a huge difference.

Hehehe

http://forum.beyond3d.com/showpost.php?p=951489&postcount=4

DeanA · May 22, 2007

patsu said:
It will make a huge difference.

I'm not sure it'll make a huge difference... infact, I'm interested as to why you think it would! Even without keeping this data in the 4 entry cache, it's my understanding that full LS to LS DMAs stay on the EIB.. they don't go via main memory.

So bearing that in mind, I'm not sure why Deano is describing a system where data goes out from LS, to main memory, and back to LS. As that simply doesn't happen in the case of LS->LS DMA.

And surely the utilisation of the SPU cache in this way pretty much requires that in order to run at full speed the other SPUs you're communicating with are not evicting cache contents by performing other DMAs? So your system needs to be pretty static in terms of DMA usage to reap the full benefit of what is described.

Cheers,
Dean

idsn6 · May 22, 2007

DeanA said:
I'm not sure it'll make a huge difference... infact, I'm interested as to why you think it would! Even without keeping this data in the 4 entry cache, it's my understanding that full LS to LS DMAs stay on the EIB.. they don't go via main memory.

So bearing that in mind, I'm not sure why Deano is describing a system where data goes out from LS, to main memory, and back to LS. As that simply doesn't happen in the case of LS->LS DMA.

And surely the utilisation of the SPU cache in this way pretty much requires that in order to run at full speed the other SPUs you're communicating with are not evicting cache contents by performing other DMAs? So your system needs to be pretty static in terms of DMA usage to reap the full benefit of what is described.

Cheers,
Dean

LS->LS DMA still requires more programmer synchronization between SPUs than the ACU to my naive mind for some cases, so it seems more a matter of convenience than speed...though if, as you say, DMAs with memory can erase lines in the cache then that is a bit of a bummer. I assumed that the ACU locked the atomic lines in while other DMAs went directly between LS and main memory, though I guess I had no reason to think this.

DeanA · May 22, 2007

idsn6 said:
I assumed that the ACU locked the atomic lines in while other DMAs went directly between LS and main memory, though I guess I had no reason to think this.

Hmm.. I thought that the ACU shares some bits with the DMA subsystem, but hey.. irrespective of this, if other SPUs are doing things (unrelated to stats update), then it would be possible for entries to become evicted.

Probably wouldn't affect things too much though, to be honest..

Dean

DeanoC · May 22, 2007

DeanA said:
I'm not sure it'll make a huge difference... infact, I'm interested as to why you think it would! Even without keeping this data in the 4 entry cache, it's my understanding that full LS to LS DMAs stay on the EIB.. they don't go via main memory.

So bearing that in mind, I'm not sure why Deano is describing a system where data goes out from LS, to main memory, and back to LS. As that simply doesn't happen in the case of LS->LS DMA.

And surely the utilisation of the SPU cache in this way pretty much requires that in order to run at full speed the other SPUs you're communicating with are not evicting cache contents by performing other DMAs? So your system needs to be pretty static in terms of DMA usage to reap the full benefit of what is described.

Cheers,
Dean

Cos its very hard to do LS->LS DMA in real world usage (you need static memory layout and synchronised tasks). In practise you do a LS->EA on one SPU and EA->LS on another. If you lucky this occurs at the same time so its shortcut, else it goes in back into the main cache/memory system. Tho atomic put/get is higher priority so should be faster for 128 bytes than a LS->LS DMA anyway...

The ACU cache gives you a place to leave the data effectively on the ring bus for a while without knowing any details of the destination. Its partly LRU and AFAICT doesn't get evicted via normal DMA get, tho put does. Its also a high speed ring bus op, faster than normal ring bus movement. So its should always be better or the same as normal get.

Its not perfect but it does appear to be better than the alternatives 'most' of the time. Which is true of all caches really.

patsu · May 22, 2007

one said:
Hehehe
http://forum.beyond3d.com/showpost.php?p=951489&postcount=4

LOL. I vaguely remember your reply but I didn't quite grasp it last time.

DeanA said:
I'm not sure it'll make a huge difference... infact, I'm interested as to why you think it would! Even without keeping this data in the 4 entry cache, it's my understanding that full LS to LS DMAs stay on the EIB.. they don't go via main memory.

So bearing that in mind, I'm not sure why Deano is describing a system where data goes out from LS, to main memory, and back to LS. As that simply doesn't happen in the case of LS->LS DMA.

Ok cool... I have confirmation about efficient LS<->LS DMA (without PPE or other external subsystem involvment). The gain from the cache would be relatively smaller if so.

What is the time saved between an atomic cache write/read (cache hit) versus a LS atomic store/read (cache miss) for multiple SPUs ?

And surely the utilisation of the SPU cache in this way pretty much requires that in order to run at full speed the other SPUs you're communicating with are not evicting cache contents by performing other DMAs? So your system needs to be pretty static in terms of DMA usage to reap the full benefit of what is described.

Yes it seems. The algorithm in question should be pretty regular/predictable (Some globally shared data structure needs to be consulted/updated "everytime"). In DeanoC's case it looks to be the death/alive counter.

EDIT: Ah ! DeanoC replied with more juicy details.

deepbrown · May 22, 2007

This is all sounds very interesting

This appears to be the article that other blog was referencing. http://www.gameswank.com/content/view/48/

"So hats off to him for taking the time to get so much out of the PS3 and being a pioneer in RSX and Cell programming. Certainly, don't infer that the game he's working on is going to be a failure, or that he doesn't put effort into his work just because he uses Microsoft applications. If anything that shows a dedicated and passionate programmer committed to his work. Keep up the good work Ninja Theory."

Haha suck up.

StefanS · May 23, 2007

I've copied to the discussion about the SPU and Atomic Cache Units to a thread in the technology forum, so that this won't get missed.

http://forum.beyond3d.com/showthread.php?p=1010096

cthellis42 · May 23, 2007

Is it just me, or does "Atomic Cache Units" sound too... dangerous to be put into a chip? Doomsday device, surely...

AlNom · May 23, 2007

cthellis42 said:
Is it just me, or does "Atomic Cache Units" sound too... dangerous to be put into a chip? Doomsday device, surely...

PS3 = Cell = Folding@home = 1st step to Skynet = atomic cache units = nuclear war! :runaway:

Shifty Geezer · May 23, 2007

cthellis42 said:
Is it just me, or does "Atomic Cache Units" sound too... dangerous to be put into a chip? Doomsday device, surely...

More like they're using 1950's technology. Atomic caches, and those discs the games come on appear to be some kind of solidified electricity.

cthellis42 · May 23, 2007

Is Nariko going to defeat Ming the Merciless?

crazygambit · May 28, 2007

I can't believe noone has posted this yet, but here DeanoC revealed information about a demo of HS coming to the PSN (I can't wait) and some tidbits on loading in the game. There wasn't a date for the demo though, so Deano if you're reading this... hint, hint.

idsn6 · May 28, 2007

crazygambit said:
I can't believe noone has posted this yet, but here DeanoC revealed information about a demo of HS coming to the PSN (I can't wait) and some tidbits on loading in the game. There wasn't a date for the demo though, so Deano if you're reading this... hint, hint.

They just scavenged those bits from comments in Deano's blog again.

danteye · Jun 3, 2007

New screenshots:

Can't wait for this game... :Q______

danteye · Jun 3, 2007

the lasts

wco81 · Jun 3, 2007

Looks great but they really light up the characters and the water while making the mountains dark.

Almost like there's a spotlight on the characters and the water and it's dusk everywhere else.

Shouldn't the HDR show more gradations between dark and bright?

New Heavenly Sword Info (screens included)

idsn6

Kryton

idsn6

one

Unruly Member

DeanA

idsn6

DeanA

DeanoC

Trust me, I'm a renderer person!

patsu

deepbrown

StefanS

meandering Velosoph

cthellis42

Hoopy Frood

AlNom

Moderator

Shifty Geezer

uber-Troll!

cthellis42

Hoopy Frood

crazygambit

idsn6

danteye

danteye

wco81