New Heavenly Sword Info (screens included)

Status
Not open for further replies.
Wait, SPUs have caches? Little snoopy caches for main memory address space?
That's actually really cool.

Yeah, remember you've got the EIB (a token ring bus) which can push data between SPUs far faster than consulting either memory pool.
 
Yeah, remember you've got the EIB (a token ring bus) which can push data between SPUs far faster than consulting either memory pool.

Yeah, a snooping protocol actually makes a lot of sense for the EIB (high bandwidth for the coherence traffic, broadcasting to all SPUs).
 
It will make a huge difference.
I'm not sure it'll make a huge difference... infact, I'm interested as to why you think it would! Even without keeping this data in the 4 entry cache, it's my understanding that full LS to LS DMAs stay on the EIB.. they don't go via main memory.

So bearing that in mind, I'm not sure why Deano is describing a system where data goes out from LS, to main memory, and back to LS. As that simply doesn't happen in the case of LS->LS DMA.

And surely the utilisation of the SPU cache in this way pretty much requires that in order to run at full speed the other SPUs you're communicating with are not evicting cache contents by performing other DMAs? So your system needs to be pretty static in terms of DMA usage to reap the full benefit of what is described.

Cheers,
Dean
 
I'm not sure it'll make a huge difference... infact, I'm interested as to why you think it would! Even without keeping this data in the 4 entry cache, it's my understanding that full LS to LS DMAs stay on the EIB.. they don't go via main memory.

So bearing that in mind, I'm not sure why Deano is describing a system where data goes out from LS, to main memory, and back to LS. As that simply doesn't happen in the case of LS->LS DMA.

And surely the utilisation of the SPU cache in this way pretty much requires that in order to run at full speed the other SPUs you're communicating with are not evicting cache contents by performing other DMAs? So your system needs to be pretty static in terms of DMA usage to reap the full benefit of what is described.

Cheers,
Dean

LS->LS DMA still requires more programmer synchronization between SPUs than the ACU to my naive mind for some cases, so it seems more a matter of convenience than speed...though if, as you say, DMAs with memory can erase lines in the cache then that is a bit of a bummer. I assumed that the ACU locked the atomic lines in while other DMAs went directly between LS and main memory, though I guess I had no reason to think this.
 
I assumed that the ACU locked the atomic lines in while other DMAs went directly between LS and main memory, though I guess I had no reason to think this.
Hmm.. I thought that the ACU shares some bits with the DMA subsystem, but hey.. irrespective of this, if other SPUs are doing things (unrelated to stats update), then it would be possible for entries to become evicted.

Probably wouldn't affect things too much though, to be honest..

Dean
 
I'm not sure it'll make a huge difference... infact, I'm interested as to why you think it would! Even without keeping this data in the 4 entry cache, it's my understanding that full LS to LS DMAs stay on the EIB.. they don't go via main memory.

So bearing that in mind, I'm not sure why Deano is describing a system where data goes out from LS, to main memory, and back to LS. As that simply doesn't happen in the case of LS->LS DMA.

And surely the utilisation of the SPU cache in this way pretty much requires that in order to run at full speed the other SPUs you're communicating with are not evicting cache contents by performing other DMAs? So your system needs to be pretty static in terms of DMA usage to reap the full benefit of what is described.

Cheers,
Dean

Cos its very hard to do LS->LS DMA in real world usage (you need static memory layout and synchronised tasks). In practise you do a LS->EA on one SPU and EA->LS on another. If you lucky this occurs at the same time so its shortcut, else it goes in back into the main cache/memory system. Tho atomic put/get is higher priority so should be faster for 128 bytes than a LS->LS DMA anyway...

The ACU cache gives you a place to leave the data effectively on the ring bus for a while without knowing any details of the destination. Its partly LRU and AFAICT doesn't get evicted via normal DMA get, tho put does. Its also a high speed ring bus op, faster than normal ring bus movement. So its should always be better or the same as normal get.

Its not perfect but it does appear to be better than the alternatives 'most' of the time. Which is true of all caches really.
 

LOL. I vaguely remember your reply but I didn't quite grasp it last time.

DeanA said:
I'm not sure it'll make a huge difference... infact, I'm interested as to why you think it would! Even without keeping this data in the 4 entry cache, it's my understanding that full LS to LS DMAs stay on the EIB.. they don't go via main memory.

So bearing that in mind, I'm not sure why Deano is describing a system where data goes out from LS, to main memory, and back to LS. As that simply doesn't happen in the case of LS->LS DMA.

Ok cool... I have confirmation about efficient LS<->LS DMA (without PPE or other external subsystem involvment). The gain from the cache would be relatively smaller if so.

What is the time saved between an atomic cache write/read (cache hit) versus a LS atomic store/read (cache miss) for multiple SPUs ?

And surely the utilisation of the SPU cache in this way pretty much requires that in order to run at full speed the other SPUs you're communicating with are not evicting cache contents by performing other DMAs? So your system needs to be pretty static in terms of DMA usage to reap the full benefit of what is described.

Yes it seems. The algorithm in question should be pretty regular/predictable (Some globally shared data structure needs to be consulted/updated "everytime"). In DeanoC's case it looks to be the death/alive counter.

EDIT: Ah ! DeanoC replied with more juicy details. :D
 
Last edited by a moderator:
This is all sounds very interesting :p

This appears to be the article that other blog was referencing. http://www.gameswank.com/content/view/48/

"So hats off to him for taking the time to get so much out of the PS3 and being a pioneer in RSX and Cell programming. Certainly, don't infer that the game he's working on is going to be a failure, or that he doesn't put effort into his work just because he uses Microsoft applications. If anything that shows a dedicated and passionate programmer committed to his work. Keep up the good work Ninja Theory."

Haha suck up.
 
Is it just me, or does "Atomic Cache Units" sound too... dangerous to be put into a chip? Doomsday device, surely...
More like they're using 1950's technology. Atomic caches, and those discs the games come on appear to be some kind of solidified electricity.
 
I can't believe noone has posted this yet, but here DeanoC revealed information about a demo of HS coming to the PSN (I can't wait) and some tidbits on loading in the game. There wasn't a date for the demo though, so Deano if you're reading this... hint, hint.
 
I can't believe noone has posted this yet, but here DeanoC revealed information about a demo of HS coming to the PSN (I can't wait) and some tidbits on loading in the game. There wasn't a date for the demo though, so Deano if you're reading this... hint, hint.

They just scavenged those bits from comments in Deano's blog again.
 
New screenshots:

01va4.jpg
03yl8.jpg

04tm9.jpg
05uc2.jpg

08kx5.jpg
xxxxxot9.jpg


Can't wait for this game... :Q______
 
Looks great but they really light up the characters and the water while making the mountains dark.

Almost like there's a spotlight on the characters and the water and it's dusk everywhere else.

Shouldn't the HDR show more gradations between dark and bright?
 
Status
Not open for further replies.
Back
Top