New IBM patents, CELL related(?)

The whole reason I brought in the article and contract was because you don't believe that the Broadband Engine is the name of the core Engine behind the PS3, but I'll go along with the rest of your argument.

The article wasn't brought up about our 64mb whether it's e-DRAM vs external fight.

This article satifies you speed demands, by saying Yellowstone = XDR.

What? 25.6GB/S/50GB/s is not enough to sustain a Teraflops class MPU, you need e-DRAM at hundreds of GB/s for this.

Now either you consider XDR Embeddeded memory and we have been having a misunderstanding. Or that article backs me up 1005 that XDR is the solution. Since I consider XDR to be a seperate chip and not embedded. = Me right, You wrong???

XDR is external yes, BE in the patent has e-DRAM, and it will also have external memory when BE is put into a system.

But you still tell me that I am wrong, where is the Proof?
Please show me evidence that proves its on chip.

The proof that the 64mb is embedded is in the patent, it's something which was proven over a year ago.

Christ, how obvious is it? It's a *Switched* bus! That should really just make you go ding dong right there, nevermind the fact that it would be so insanely stupid of Sony to even concider using a fast bus with such little memory(64mb), or the fact that Broadband Engine will sustain nowhere near a TFLOPS without direct access to e-DRAM.
 
"It's a *Switched* bus!"

Maybe this is the information you are operating on that I do not understand.

I haven't a pre-knowledge of this term.
So I assumed it described the nature of APU communication.
After all there is only one 1024bit bus.
Yet each APU has a 1024bit port to that buss.
The bus would have to "switch" among the APU. And the APU would have to be out of synch with each other to avoid clogging the 1024bit main bus.

It's because of this initial impression from the Sony patents that I was able to anticipate the information these IBM patents are now relating.

But you seem to feel that *switched* has a different meaning than what I assumed.

Please share it,
I would very much like to know how you reached the conslusion of 64MB eDRAM.

EDIT: After posting this I did some searching for "Switched Bus"
http://www.techonline.com/community/tech_group/comm/feature_article/21224
I've gained a better undertsanding of multi-layer switched busses.
But this term doesn't in anyway lead to the conclusion of eDRAM?
Perhaps you have some special definition of it, that I am still unaware of?
 
nAo,

Please read the link in my last post concerning switched bussing.

Perhaps the IBM patent has a 1/8 time phased bus to allow,
a send & receive window for each of the 4 APU.

Initially when you shared your info I was under the impression,
that this allowed for an upgrade to 8 APU.

Now I am less certain. :?

Anyone reading this, look into it and share what you think.
 
David_South#1 said:
"It's a *Switched* bus!"

Maybe this is the information you are operating on that I do not understand.

Then maybe it is time to just concede defeat, hm? :)

The path to memory in the BB patent is 1024 bits wide. There's not even a remote chance anyone would lay down an external memory bus that wide in anything outside custom military-class hardware, and even then it is doubtful.

Heck, even 512 bits gives those in the know headaches just by thinking about it... Also, we already know external memory will be of the XDR variety, and Sony's definitely not going to stick 2048 pins on that chip just for data (XDR uses differential signalling). ;)

So it really only leaves the possibility that it is on-chip. Ok? Can we move on now? :p
 
David_South#1 said:
In FIG. 1, two separate electronic chips 100 and 102 are shown separated by a dashed line not designated numerically. The chip 100 includes a plurality of processors, while chip 102 comprises associated memory to be used by the processors of chip 100. As part of the chip 102, there is shown a CDRAM (Custom Dynamic Random Access Memory) 104 and a plurality of combination OCD/OCR (Off Chip Drivers/Off Chip Receivers) operationally two way devices 106, 108, 110, 112 and 114 used for interfacing communication and data transfer between the CDRAM 104 and the CPUs (Central Processor Units) of chip 100.
Paul, are you reading this? :?:

Off chip memory (CDRAM) that sounds something like a XDR 8)
Let me know when the evidence starts proving me wrong.
Becasue right now, it's proving me right.

Also, I predicted out of phase / delayed signaling for the APU last summer and again in December.
http://forum.pcvsconsole.com/viewthread.php?tid=1330&page=6

My only regret is that I didn't revise it with spell check. ;-)

David, you predicted what :p ?!?

Ahem, other people ( cough... cugh... ) were calling about different clocking domains on the processor: APUs running at a frequency and the bus, PUs and DMACs running at other frequencies ;).
 
David_South#1 said:
Suzuoki,
If anywhere in the latest Suzuoki patent 20030229765 you find evidence to counter these facts please do share.

Because no matter the odds you say are against me,
this ‘One’ individual has the facts and quotes on his side

I find 1,024 bits of reason to counter your conclusion, at least basing our conclusion on that patent by Suzuoki Masakazu.

1,024 bits that with differential signalling would become 2,048 pins which, for an external off-chip interface, are beyond the "way too much" point by few miles.
 
David_South#1 said:
nAo,
Sorry about the typo. (How did you come up with that name?)
I also apologize if my greeting was taken wrongly.
So far it’s been Paul for the most part. I like reading other views.
And yes, it was a great find.

As for Figure 6,
In accordance with this modular structure, the number of PEs employed by a member of the network is based upon the processing power required by that member. For example, a server may employ four PEsâ€￾

The wording in this patent hasn’t changed from my sig.
As they have invented a server model they would include a server diagram, yes?

Ever heard about the "Home Server" ( likely PSX 2: PlayStation 3 + Blu-Ray Re-writeable + HDD + etc... ) ;) ?
 
fxtech said:
and so , let speak about the third patent now :)

Hey, in che parte d'Italia risiedi ?

Sento che questa non e' la prima volta che ti ho visto in un forum... ma chissa... siamo poster seeeeri noi, non postiamo mica a vanvveraaa ;).
 
[0010] In FIG. 1, a main processing unit (MPU) 10 and a direct memory access unit (DMA) 12 receive clock signal inputs from a phase lock loop (PLL) source 14 which, as shown, provides clock signals at 4 GHz. In a preferred embodiment of the invention, a base reference signal of 1 GHz is used by the PLL block 14 to generate the output clock signal. Also shown in FIG. 1 are auxiliary processing units (APUs) 16, 18, 20 and 22 which are additionally labeled APU.sub.1, APU.sub.2, APU.sub.3 and APU.sub.4, respectively. Each of these APUs has an associated I/O (Input/Output) block for receiving signals from and transmitting signals to the DMA 12.

[0011] A first I/O block 24 is associated with APU 16. A second I/O block 26 is associated with APU 18. A third I/O block 28 is associated with APU 20. A fourth and final I/O block 30 is associated with APU 22. Each of the I/O blocks is shown connected to the DMA 12 via a ring type network indicated by a dash line 32. In this manner, each of the APUs may receive the data, operate upon the data (or ignore same) and pass it to the next APU, as appropriate, in consecutive operations wherein each APU is using slightly differently timed switching operations.

[0012] A PLL 34, which in some circuit packaging instances may be the PLL 14, uses a base 1 GHz reference signal, identical to that used by PLL 14, to create a 4 GHz signal .O slashed..sub.0 on a lead 35. This 4 GHz signal is supplied to timing delay circuits 36, 38, 40 and 42. The delay circuit 36 delays the signal .O slashed..sub.0 in a manner to apply a signal .O slashed..sub.1 to be used by APU.sub.1 16. An "H" type signal path is shown internal to block 16 as a bold or wide type circuit path to help reduce any skew of the clock signal as it is distributed to each of the circuits utilizing this clock within APU.sub.1 16. The delay circuit 38 generates a clock signal .O slashed..sub.2 for application to APU 18. Although detail is not shown within block 18, it will desirably have some method of minimizing clock skew of the clock signal .O slashed..sub.2 as it is distributed within APU 18. Similarly, APUs 20 and 22 will typically provide clock skew reducing mechanisms. The delay circuit 40 generates a clock signal .O slashed..sub.3 for application to APU 20 while delay circuit 42 generates a clock signal .O slashed..sub.4 for application to APU 22.

[0013] In FIG. 2, the relative phasing of the main 1 GHz reference signal and the generated clock signals .O slashed..sub.0, .O slashed..sub.1, .O slashed..sub.2, .O slashed..sub.3, and .O slashed..sub.4 mentioned in conjunction with FIG. 1 are shown. It may be noted that .O slashed..sub.0 and .O slashed..sub.4 are 180 degrees out of phase. Thus, the switching currents for the PLLs, as well as for each of the illustrated APUs, occurs at different times, thereby reducing the current required at the appropriate switching time by at least a factor of 4
.

I think that the technique they are talking about is not fixed at 4 APUs, but it is more of a general concept that allows multiple APUs and a big saving in current generation hence power consumption and heat produced: basically, AFAIK, it seems to say that each clock ( for the PU, for the APUs, etc... ) occurs not at exactly the same time, but with a slight phase delay.

This might mean that you will need few more cicles to sunchronize signals and to transfer data from unit to unit, but performance is not something CELL lacks by design and such a trade-off ( some extra cycles here and there resulting in lower power consumption and heat produced is not a bad deal to me ).
 
So the phase-shifted clock signal is to make current draw smoother and more even rather than the staccato seen in for example the P4 "prescott" or such?

Don't see how it would result in a power saving however, though I admit I didn't actually read the patent in question so maybe the answer is hidden somewhere in all that lawyerese. :)
 
Guden Oden said:
So the phase-shifted clock signal is to make current draw smoother and more even rather than the staccato seen in for example the P4 "prescott" or such?

Don't see how it would result in a power saving however, though I admit I didn't actually read the patent in question so maybe the answer is hidden somewhere in all that lawyerese. :)

Guden, I think it might be in the bolded part of my last quote ;).

I would think that the saving is due because you do not have exactly 8 APUs, 1 PU and 1 DMAC all trying to draw current at the same exact time and the way they do it might be more efficient.
 
Panajev2001a said:
Guden, I think it might be in the bolded part of my last quote ;).

:LOL: I just saw this huge block of lawyerese, and then my eyes slipped away like water rolling off a duck. :LOL: It was a defensive reflex action, heh.
 
Guden Oden said:
Then maybe it is time to just concede defeat, hm? :)

The path to memory in the BB patent is 1024 bits wide.
Greetings Dude. *Flag* Sorry for the delay of game.
Not even close. But hopefully soon. 8)

So, you also say that the Sony patent says 1024bit buss to DRAM?
Simply reply, prove it. :arrow: Quote and link.

Panajev predicts,
By default I generally defer to you, and with good reason. ;)

We know that the PE bus handles data in 1,024 chunks.
But I have yet to see a bus or bit size to the DRAM be defined.
Same thing man, Quote it!

I’m waiting for you to "server" the proof. :rolleyes: ... :LOL:

While I was gone,
Thank you both for making ideas on phase shift more apparent.
Oden's stoccoto reply made the point more apparent. (Y)
 
Paul and Others interested,

These are all the times *switch* is used in the last Sony patent.

[0063] The basic processing module for all members of network 104 is the processor element (PE). FIG. 2 illustrates the structure of a PE. As shown in this figure, PE 201 comprises a processing unit (PU) 203, a direct memory access controller (DMAC) 205 and a plurality of attached processing units (APUs), namely, APU 207, APU 209, APU 211, APU 213, APU 215, APU 217, APU 219 and APU 221. A local PE bus 223 transmits data and applications among the APUs, DMAC 205 and PU 203. Local PE bus 223 can have, e.g., a conventional architecture or be implemented as a packet switch network. Implementation as a packet switch network, while requiring more hardware, increases available bandwidth.
A Packet switch network is a router (*SERVER*) function. Where the data being handled needs minimal processing and maximal routing. The data packets are moved as efficiently as possible and switched to the fastest route (not always the shortest). Packet switch networks are mostly used by Telephone companies. They reduce previous bandwidth constraints to a fifth that presented by circuit-switched networks. However it would also facilitate Video On Demand (VOD) and other streaming services that Sony wants and demand endless bandwidth. To support this goal an internal company was formed called, Sony BroadBand Network (SBBN).

http://www.google.com/search?num=10...p;newwindow=1&q="Sony+Broadband+Network""

[0082] FIG. 12A illustrates the control system and structure for the DRAM of a BE. A similar control system and structure is employed in processors having other sizes and containing more or less PEs. As shown in this figure, a cross-bar switch connects each DMAC 1210 of the four PEs comprising BE 1201 to eight bank controls 1206. Each bank control 1206 controls eight banks 1208 (only four are shown in the figure) of DRAM 1204. DRAM 1204, therefore, comprises a total of sixty-four banks. In a preferred embodiment, DRAM 1204 has a capacity of 64 megabytes, and each bank has a capacity of 1 megabyte. The smallest addressable unit within each bank, in this preferred embodiment, is a block of 1024 bits.
To further clarify,
The unspecified DRAM consists of 64MB total, to shared by all four PE.
Not 64MB per PE as I understood you to be stating.

[0083] BE 1201 also includes switch unit 1212. Switch unit 1212 enables other APUs on BEs closely coupled to BE 1201 to access DRAM 1204. A second BE, therefore, can be closely coupled to a first BE, and each APU of each BE can address twice the number of memory locations normally accessible to an APU. The direct reading or writing of data from or to the DRAM of a first BE from or to the DRAM of a second BE can occur through a switch unit such as switch unit 1212.

Here it says that APU outside of the BE also have access to this Shared DRAM.
And that this is possible for closely coupled BE (Onboard? Server Cabinet?).

[0084] For example, as shown in FIG. 12B, to accomplish such writing, the APU of a first. BE, e.g., APU 1220 of BE 1222, issues a write command to a memory location of a DRAM of a second BE, e.g., DRAM 1228 of BE 1226 (rather than, as in the usual case, to DRAM 1224 of BE 1222). DMAC 1230 of BE 1222 sends the write command through cross-bar switch 1221 to bank control 1234, and bank control 1234 transmits the command to an external port 1232 connected to bank control 1234. DMAC 1238 of BE 1226 receives the write command and transfers this command to switch unit 1240 of BE 1226. Switch unit 1240 identifies the DRAM address contained in the write command and sends the data for storage in this address through bank control 1242 of BE 1226 to bank 1244 of DRAM 1228. Switch unit 1240, therefore, enables both DRAM 1224 and DRAM 1228 to function as a single memory space for the APUs of BE 1222.
This explains that APU of BE’s can communicate with each other via memory.

[0088] FIG. 16 shows an alternative embodiment of the DMAC, namely, a non-distributed architecture. In this case, the structural hardware of DMAC 1606 is centralized. APUs 1602 and PU 1604 communicate with DMAC 1606 via local PE bus 1607. DMAC 1606 is connected through a cross-bar switch to a bus 1608. Bus 1608 is connected to DRAM 1610

I don’t fully understand this one.
All the APU share a single connection to the DMAC.
Cross-bar switch to a bus. Is this bus a switched bus?

Anyways these are all the times that *Switch* is mentioned in the Patent.
At no time does it state the bit size of any switch.
Elsewhere is does say that the DMAC has a memory bank of 8KB.
It can buffer transfers as 1024bits or as 512 interleaved bits in two banks.
But it never says the bit size of a bus other than the main PE bus.
So we have no information on how many bits a DRAM bus might be.

[0086] FIGS. 14A and 14B illustrate different configurations for storing and accessing the smallest addressable memory unit of a DRAM, e.g., a block of 1024 bits. In FIG. 14A, DMAC 1402 stores in a single bank 1404 eight 1024 bit blocks 1406. In FIG. 14B, on the other hand, while DMAC 1412 reads and writes blocks of data containing 1024 bits, these blocks are interleaved between two banks, namely, bank 1414 and bank 1416. Each of these banks, therefore, contains sixteen blocks of data, and each block of data contains 512 bits. This interleaving can facilitate faster accessing of the DRAM and is useful in the processing of certain applications.

To sum things up.
If there is 64MB of eDRAM it is shared by the whole BroadBand Engine.
At no time is the bit size of a switch or DMAC but beyond the PE identified.
The word “switchâ€￾ first occurs in relation to a packet switch network.
Which is a term generally applied to routing operations hubs or servers.

To me all of this reads *server* related.

Help review the following,
[0088] You don’t need a distributed DMAC in a packet switching network.
The only time a switch to a bus happens is in this alternative design where the DMAC is not distributed in the PE. Since internal processing is minimal they can take turns returning the data and do not have to compete for memory requests in order to continue processing.
So this is the only mention of switch and bus. Is it a packet network example?
 
You're way off.

Switch in this context describes the mechanism PEs use to connect to memory

Like this:

Code:
--------   -----   ----------
| PE 0 |---| S |---| Bank 0 |
--------   |   |   ----------
--------   | W |   ----------
| PE 1 |---|   |---| Bank 1 |
--------   | I |   ----------
        ..-|   |-- .. 
        ..-| T |-- .. 
        ..-|   |-- .. 
        ..-| C |-- .. 
--------   |   |   ----------
| PE M |---| H |---| Bank N |
--------   -----   ----------

Cheers
Gubbi
 
David,

[0082] FIG. 12A illustrates the control system and structure for the DRAM of a BE. A similar control system and structure is employed in processors having other sizes and containing more or less PEs. As shown in this figure, a cross-bar switch connects each DMAC 1210 of the four PEs comprising BE 1201 to eight bank controls 1206. Each bank control 1206 controls eight banks 1208 (only four are shown in the figure) of DRAM 1204. DRAM 1204, therefore, comprises a total of sixty-four banks. In a preferred embodiment, DRAM 1204 has a capacity of 64 megabytes, and each bank has a capacity of 1 megabyte. The smallest addressable unit within each bank, in this preferred embodiment, is a block of 1024 bits.

As you have also quoted they can store 1,024 bits in a single block or they can use two 512 bits blocks and inter-leaving.

If you cannot even address a quantity of data smaller than a block of 1,024 bits, I would be strongly suspecting that 1,024 bits is the minimum amount of data I can read or write.

Think about a PC, if you address data on 32 bits boundaries, you can sure have in memory a 8 bit char, but when you load it it will likely come with other 24 bits of data that will just be ignored ( probably it will be filled of all zeros ). Of course you can be clever and pack in those 32 bits 4 char variables and unpack them on the CPU and get work done, but I have the suspicion that you will not have a 1,024 bits data-type on CELL ;).

Do not worry about malloc/new, etc... the APUs allocate dynamic memory in their Local Storage.

More quotes:

[0086] FIGS. 14A and 14B illustrate different configurations for storing and accessing the smallest addressable memory unit of a DRAM, e.g., a block of 1024 bits. In FIG. 14A, DMAC 1402 stores in a single bank 1404 eight 1024 bit blocks 1406. In FIG. 14B, on the other hand, while DMAC 1412 reads and writes blocks of data containing 1024 bits, these blocks are interleaved between two banks, namely, bank 1414 and bank 1416. Each of these banks, therefore, contains sixteen blocks of data, and each block of data contains 512 bits. This interleaving can facilitate faster accessing of the DRAM and is useful in the processing of certain applications.

It says that the DMAC "reads and writes" blocks of 1,024 bits in the DRAM and it does not say that it simply buffers them.

Can this change ? Yes and here is why:

[0090] To overcome these problems, for each addressable memory location of the DRAM, an additional segment of memory is allocated in the DRAM for storing status information relating to the data stored in the memory location. This status information includes a full/empty (F/E) bit, the identification of an APU (APU ID) requesting data from the memory location and the address of the APU's local storage (LS address) to which the requested data should be read. An addressable memory location of the DRAM can be of any size. In a preferred embodiment, this size is 1024 bits.

Their goal is to place e-DRAM with a ncie 1,024 bits pipe, but if they cannot they can push up the frequency of the XDR solution they are working on ( which currently, according to that patent would be the External memory connected to the I/O ASIC ) and/or increase the number of channels ( going from a 64 bits memory controller to a 128 bits memory controller ).
 
Panajev2001a said:
fxtech said:
and so , let speak about the third patent now :)

Hey, in che parte d'Italia risiedi ?

Sento che questa non e' la prima volta che ti ho visto in un forum... ma chissa... siamo poster seeeeri noi, non postiamo mica a vanvveraaa ;).

ti ho mandato un pm , sai come si leggono? :oops:
 
Back
Top