CELL Patents (J Kahle): APU, PU, DMAC, Cache interactions?

j^aws

Veteran
APU-PU-Cache-lock.jpg


SUMMARY OF THE INVENTION

[0008] The present invention provides a system and method for directly accessing a cache for data. A data transfer request is sent to a system bus for transferring data to a system memory. The data transfer request is snooped. A snoop request is sent to a cache. It is determined whether the snoop request has a valid entry in the cache. Upon determining that the snoop request has a valid entry in the cache, the data is caught and sent to the cache for update.

APU-PU-DATA.jpg


SUMMARY OF THE INVENTION

[0008] The present invention provides a system and method for improving performance of a computer system by providing a direct data transfer between different processors. The system includes a first and second processor. The first processor is in need of data. The system also includes a directory in communication with the first processor. The directory receives a data request for the data and contains information as to where the data is stored. A cache is coupled to the second processor. An internal bus is coupled between the first processor and the cache to transfer the data from the cache to the first processor when the data is found to be stored in the cache.

IBM Source :Updating remote locked cache

IBM Source: On-chip data transfer in multi-processor system

I'm pretty sure these are Cell related patents from IBM, James Kahle. They describe interactions between a processor with local memory (APU), with a DMAC, a processor with cache (PU) and system memory.

They show that the APU local memory can directly access PU cache. Also PUs can have L1, L2 and L3 caches. L2 is shared with APU local memory. L3 is shared with other PUs L3. 8) ...bye bye latencies :?:
 
Edit: I take it back...

I already saw that patent, but I need to re-read it again.

This might be a mess of Virtual to Physical address translation as usually DMAC talk in terms of Physical addresses: there is to say that I do not see Virtual Memory being used in PlayStation 3 games and even if they do use it, there is ways around it that the guys writing the CELL OS and the basic libraries for CELL can take (as long as they can assure that if I allocate a X MB chunk with malloc/new that the chunk is all physically contiguous then there should be no problem... stitching DMA packets might be another challenge to add to the table, but even that problem could be solved).
 
OMG, Xenon violates a PS3 patent because it's GPU can read from CPU cache!!!!!!!!
Ok joking aside, at least this shows they are thinking about how memory goes around efficiently. Whether they are thinking about it enough... well... we'll see..

Panajev said:
This might be a mess of Virtual to Physical address translation as usually DMAC talk in terms of Physical addresses: there is to say that I do not see Virtual Memory being used in PlayStation 3 games
Tsk tsk, you need to brush up on your cell patents Pana. ;)
Your question has been answered quite some time ago... the Virtual addresses are translated BY the DMA - controller is supposed to house a TLB (or use the host cpu one, I forget which).
 
Fafalada said:
OMG, Xenon violates a PS3 patent because it's GPU can read from CPU cache!!!!!!!!
Ok joking aside, at least this shows they are thinking about how memory goes around efficiently. Whether they are thinking about it enough... well... we'll see..

Panajev said:
This might be a mess of Virtual to Physical address translation as usually DMAC talk in terms of Physical addresses: there is to say that I do not see Virtual Memory being used in PlayStation 3 games
Tsk tsk, you need to brush up on your cell patents Pana. ;)
Your question has been answered quite some time ago... the Virtual addresses are translated BY the DMA - controller is supposed to house a TLB (or use the host cpu one, I forget which).

You are right, I am sorry those pages of my memory wewre in the swap file at the moment of that post.

Actually I answered this question to you on either here or GA as we had a nice discussion about APUs being able to DMA really or not :p.

It figures that I am the one that misses that detail now... :(.

I am more PlayStation 2 oriented now and its DMAC hates Virtual addresses :lol.

BTW, I finally got to have a single DMA call and more than one object displayed in in the scene each using its own matrix (no hardcoded 2-3 objects limit), two layers of CALL tags ( one master CALL tag per object basically that calls the first tag of a CALL chain that calls all the sub-DMA chains that upload the needed data and render the object) and double bufferign of inputs and outputs on VU1.

I have BASE = 0 and OFFSET = 512, my VU packets are less than 4 KB in size.

In each 8 KB buffer I have the input data and then an area for the data to be output to the GS (UV/ST coordinates, RGBAQ data, transformed vertices, GIFTag, etc...).

It took a while to get things working :(.
 
Re: CELL Patents (J Kahle): APU, PU, DMAC, Cache interaction

Jaws said:
I'm pretty sure these are Cell related patents from IBM, James Kahle. They describe interactions between a processor with local memory (APU), with a DMAC, a processor with cache (PU) and system memory.

They show that the APU local memory can directly access PU cache. Also PUs can have L1, L2 and L3 caches. L2 is shared with APU local memory. L3 is shared with other PUs L3. 8) ...bye bye latencies :?:

The fact these are IBM patents is fairly important, apart from DMAC this describes 'another' system pretty well as well.

BTW Latencies are still stupidly high even with lots of cache.
 
Those 2 patents are 'old'..I believe someone already posted here about them.
This is new:
Streaming data using locking cache

A system and method are provided for efficiently processing data with a cache in a computer system. The computer system has a processor, a cache and a system memory. The processor issues a data request for streaming data. The streaming data has one or more small data portions. The system memory is in communication with the processor. The system memory has a specific area for storing the streaming data. The cache is coupled to the processor. The cache has a predefined area locked for the streaming data. A cache controller is coupled to the cache and is in communication with both the processor and the system memory to transmit at least one small data portion of the streaming data from the specific area of the system memory to the predefined area of the cache when the one small data portion is not found in the predefined area of the cache.
 
What is what in those diagrams?

In the first one, is 202 supposed to be a PU and 204 an APU? In that case, what's 110? Also, I have no memory of a bus controller sitting inbetween each PU and the main system bus in previous Cell diagrams. The second diagram adds to the mess, that architecture doesn't seem to correspond with the one depicted in the first diagram. Actually, none of them look particulary Cell:y to me.

Previous Cell descriptions have had the PU core first and then a row of APUs hanging off of it, but here bits of that chain seem randomly omitted. How can a patent apply if it doesn't describe an actual implementation? Again, doesn't look like Cell to me.

This might actually be Xenon CPU core methinks. :LOL:
 
one said:
Kahle working in 2 teams at the same time? :rolleyes:

Are you sure he's actually working in either team, and don't just serve as some kind of manager/oversight dude? He could just be the one signing the patent application docs you know, doesn't mean he's the one actually working on the design. IBM's a huge corporation with many things going on at the same time, so I don't see it as anything strange if a guy has a finger in multiple projects.

Miyamoto does the same for Nintendo, so why not here? :D
 
Guden Oden said:
one said:
Kahle working in 2 teams at the same time? :rolleyes:

Are you sure he's actually working in either team, and don't just serve as some kind of manager/oversight dude? He could just be the one signing the patent application docs you know, doesn't mean he's the one actually working on the design. IBM's a huge corporation with many things going on at the same time, so I don't see it as anything strange if a guy has a finger in multiple projects.

Wow, while there are plenty of engineers in IBM, you say there is shortage of capable managers in IBM the mega-corporation? :LOL:

Guden Oden said:
Miyamoto does the same for Nintendo, so why not here? :D

When is Nintendo doing an outsourced job? :rolleyes:

Rather I'm curious about ATI that bought ex-SGI ArtX people and now making Xbox GPU and probably Nintendo one.
 
one said:
Wow, while there are plenty of engineers in IBM, you say there is shortage of capable managers in IBM the mega-corporation? :LOL:

Why would I say that? Sheesh... :rolleyes::LOL:

When is Nintendo doing an outsourced job? :rolleyes:

Well, they outsourced Metroid Prime... :LOL: Anyway, what does outsourcing have anything to do with anything in this thread?
 
Guden Oden said:
When is Nintendo doing an outsourced job? :rolleyes:

Well, they outsourced Metroid Prime... :LOL: Anyway, what does outsourcing have anything to do with anything in this thread?

Nah, I mean Nintendo is doing contracted development for other publisher or not. Why in the first place have you put software projects in here?

BTW skimming through patents I find this new patent application 'Game system with graphics processor' which describes PS2 :LOL:
 
Re: CELL Patents (J Kahle): APU, PU, DMAC, Cache interaction

Jaws said:

Are these two patents related in any way?

The former patent details a system that uses bus snooping to ensure cache coherency, the latter uses a directory based system.

The difference is huge. In a snooping system, coherency traffic goes up with n squared, where n is the number of CPUs (local memories or caches really). In a directory based system it scales with the number CPUs.

Opteron broadcasts memory requests (snooping) and scales poorly beyond 4 CPUs. SGI's Altix (and old Origin 2&3K) series and Alpha EV7s uses directory based coherency and scales to 2^10 CPUs (and more).

If they really apply to CELL, I can see snooping used in small scale CELL systems and directories used in large scale systems.

I'm puzzled that these patents are granted in the first place since there seems to be very little new in them.

Cheers
Gubbi
 
Yeah: no amount of L2 is confirmed yet (though some is suspected I guess), and where did you get the idea there would be any L3 at all? It's not in any of the illustrations.

Also, the patent seems to describe some kind of shared bus, or possibly crossbar bus, that gives processors from one PU access to cache and scratchpad belonging to another PU.
 
gubbi said:
If they really apply to CELL, I can see snooping used in small scale CELL systems and directories used in large scale systems.
It could sort of make sense though couldn't it? PE only needs to scale up to 8 APUs, while external communication is expected to scale much higher.

The odd thing is that the directory patent refers to transfers "on-chip" and alludes to the situation illustrated being on a single chip.

I'm puzzled that these patents are granted in the first place since there seems to be very little new in them.
I think they are not granted yet? Anyway, I agree - looking at that newest patent nAo posted - it doesn't seem to contain anything particularly new either.
Then again - we also saw there are patents for GCN and PS2 respectively... :?

Jaws, don't you think you're going a bit overboard with all the cache? If there'll be THAT much eDram don't expect large or lots of caches. I somehow doubt we'll see the massive eDram pool though.
 
Its probably better for them to go for more caches.

At this point, If they didn't embed the 32MB of the PSP memory on chip, I find it difficult to think that they'll make enough progress within a year to be able to embedd 64MB of memory with the kind of logics and caches that cell is suppose to have. Well my mind probably change again at the end of the year, when they supposedly going to do some kind of demonstration.
 
Guden Oden said:
Yeah: no amount of L2 is confirmed yet (though some is suspected I guess), and where did you get the idea there would be any L3 at all? It's not in any of the illustrations.

From the 2nd patent,

[0019] Preferably, the second processor 116 includes a level 1 (L1) cache (not shown). In that case, the cache 114 is a level 2 (L2) cache, whereas the directory 128 is stored in a level 3 (L3) cache (not shown).


Guden Oden said:
Also, the patent seems to describe some kind of shared bus, or possibly crossbar bus, that gives processors from one PU access to cache and scratchpad belonging to another PU.

I've tweked the diagram a bit so that the L3 cache is shared amongst the PUs. It now looks more like a unified L3 cache! 8)

Cell-mem.jpg
 
Back
Top