V3,
I think External memory would be still RAM and not the optical disc...
They didn't license Yellowstone just to increase their IP portfolio
Keeping everyhting in the e-DRAM could be feasible with sub 45 nm tech as I do not see 256-512 MB of e-DRAM EVER fitting with 65 nm technology... 64 is a bit of a stretch and using 4 BEs would make the PCB board too complex...
Still they might use Yellowstone for the e-DRAM and Redstone for chip-to-chip interconnect...
I found this in the patent which actually supports your theory...
[0064] PE 201 is closely associated with a dynamic random access memory (DRAM) 225 through a high bandwidth memory connection 227. DRAM 225 functions as the main memory for PE 201.
Still it says "for PE" and not for the system... I would expect some meory to be found in the I/O CPU at least to buffer from the optical disc ( it could be the 32 MB of Direct RAMBUS DRAM inherited from PS2 backward compatibility in the I/O ASIC )...
Still the Hybrid UMA approach makes sense... and their menaing of functioning as main memory might mean something realted to the fact the DMAC's do see the e-DRAM, but not the external memory... the e-DRAM is main RAM as far as the PEs are concerned...
The Visualizer would have some e-DRAM too I'd think and so will the I/O CPU... having a decent sized RAM pool attached to the I/O ASIC as external memory could be interesting ( external is referred to compare it with internal/embedded DRAM that the BE does habe on chip... ) as it would follow quite well the Hybrid UMA principle ( shared memory, but each accessing processor has a bit of local memory [Local Storage] to buffer data and work locally while the bus is not available )
Vers,
I think I owe you an apology... I re-read the patent ( again
) and found this... ( took the time to look at it well )...
[0081] FIG. 12A illustrates the control system and structure for the DRAM of a BE. A similar control system and structure is employed in processors having other sizes and containing more or less PEs. As shown in this figure, a cross-bar switch connects each DMAC 1210 of the four PEs comprising BE 1201 to eight bank controls 1206. Each bank control 1206 controls eight banks 1208 (only four are shown in the figure) of DRAM 1204. DRAM 1204, therefore, comprises a total of sixty-four banks. In a preferred embodiment, DRAM 1204 has a capacity of 64 megabytes, and each bank has a capacity of 1 megabyte. The smallest addressable unit within each bank, in this preferred embodiment, is a block of 1024 bits.
64 banks, 1 MB each...
Each DMAC of the four PEs connects to 8 bank controls through a crossbar switch ( only one request at a time from each bank
)...
Each bank control controls 8 banks...
You transfer in 128 bytes chunks ( 1,024 bits ) from each bank...
Each bank control is connected to the Switch ( Switching logic ) which, for each bank controller, allows one transaction...
So we can have a maximum of 8 transactions active at any given time...
We have 4 PEs each with it's own DMAC which means each PE should be able to have 2 memory operations active at a given moment in time... parallel READs and WRITEs for each DMAC could be possible I'd say...
Each clock cycle to each PE can arrive+leave ( READ+WRITE ) a maximum of 2,048 bits ( 1,024 bits x 2 ) or 256 bytes ( 128 bytes x 2 ).
It makes more sense to count the bandwidth for each PE and then the aggregated BE's bandwidth...
Running at 1.8-0.9 GHz this means 460.8-230.4 GB/s between PE and DRAM...
and yes the total aggregate bandwidth for the BE is 4x higher as we have 4 PEs... ~1.8-0.9 TB/s
Still each PE would get 1/4th of that...
In addition the 1,024 bits PE bus ( inside each PU... connecting APUs and PU ) can be implemented in two possible ways...
1) one 1,024 bits bus ( one request at a time, bi-directional, but not FULL-DUPLEX )...
2) Packet Switched Network ( another switch
)... this last approach would make better use of the 460.8-240.4 GB/s the DRAM can provide to each PE... while using a normal 1,024 bits bus would have us alternate WRITEs and READs ( the DMAC could do both at the same time: we could still have a small FIFO on the DMAC )...
Approach number 1 would be cheaper and some small fixes could be implemented to increase its efficiency... a FIFO on the DMAC ( or "next to it
) would allow us to queue READs and WRITEs and, since due to latency in memory operations still existing we would see READs and WRITEs requests accumulating in the FIFO... we could speed up things by doing paired WRITEs and READs from the FIFO then