CELL Patents (J Kahle): APU, PU, DMAC, Cache interactions?

excuse the double post, but if you take this in to account, then it shows why my calculations match your example, of nvidias, of 880 million +/- a few million.

I still conclude that this figure is correct, I now very much expect to see this as the maximum (and possible) size of the cell.

I hope that all makes sense..... thanks for your time my friend. its been fun. speek to ya next time.
 
lol
580mm sq (or 300) represents the area not just one side of the square :D

imagine a chip with sides of + half meter (580mm) :D
 
I think I actually feel like a complete idiot now.... So my maths is ok, but im just working out the wrong numbers..... woops.
 
Quaz51 said:
lol
...
imagine a chip with sides of + half meter (580mm) :D

I'd like to know how much heat that thing would produce! :D ...and more importantly how many TFlops peak! :D

kyetech said:
...
So my maths is ok, but im just working out the wrong numbers..... woops.

Don't worry, I used to tell my maths teacher the same...at least you'll get 'method marks' for showing your working! ;)
 
:oops: I realise how silly i have been..... at least i realised the error of my way eventualy. :oops:

thanks for being light hearted about it guys !!! hehe. :oops: :D
 
An IBM vector patent that may or may not be related to CELL APUs SIMD units? :?

Abstract

An apparatus and method are provided for updating one or more pluralities of pointers (i.e. one or more vector pointers) which are used for accessing one or more pluralities of data elements (i.e. one or more vector data elements) in a multi-ported memory. A first register file holds the vector pointers, a second register file holds stride data, and a plurality of functional units combine data from the second register file with data from the first register file. The results of combining the data are transferred to the first register file and represent updated vector pointers. Furthermore, a third register file is provided for holding modulus selector data to specify the size of a circular buffer for circular addressing.

[0003] 2. Background of the Invention

[0004] In state-of-the-art digital signal processors (DSPs), media processors, and various other domain-specific processors, a single-instruction multiple-data (SIMD) approach is often taken for parallel execution of a single operation on one or several vectors of data elements. In most contemporary register-to-register architectures (also known as load-store architectures), the data elements involved in SIMD operations are located in a register file.

[0005] For typical algorithms executing on these processors, such as those that implement digital filtering, it would be desirable to allow for flexible read and write access to the data elements of the vectors, that is, to the individual registers in the register file. Furthermore, it would be advantageous for access to the registers not to be limited to a contiguous range of registers nor restricted with respect to vector alignment.

Apparatus and method for updating pointers for indirect and parallel register access
 
Though in the other thread I brought up Toshiba's essential role in the design of Emotion Engine, my speculation that Graphics Synthesizer was also done by Toshiba becomes doubtful to me now - rather, I start to think GS was designed mainly by SCE or other divisions in Sony. So I'm not very positive about Toshiba patents influence on Visualizer, either, except for implicit one.

These are the reports of ISSCC 2001 (in Japanese)
http://pcweb.mycom.co.jp/news/2001/02/13/27.html
http://ascii24.com/news/i/tech/article/2001/02/08/622918-000.html?geta
and they contain the report of the 2 Sony-related papers.
One is a Sony-developed quad-core processor with 4 MIPS II cores with SPU/DTU/BPU, mainly targeted for HDTV processing and other set-top boxes. Those cores are called none other than "Processor Elements"!

27al.jpg


It's in 0.25um proccess/250Mhz and scalable in the number of PEs.

The other paper is about a GS with 256Mb eDRAM for GSCube, presented by Aurangzeb Khan of Altius Solutions (his talk at other place is seen here), co-written with SCE, Sony Semiconductor Network, Sony Kihara Research Center et al.

27bl.jpg
 
Megadrive1988 said:
everything I've read by knowledgable people, indicate that the GS was Sony's baby :)

Simplex Solutions designed the GS. This PDF on Aurangzeb Khan cleary states Graphics Synthesizer and the follow up GS I-32. I'm sure Sony had lots of input.

http://64.233.167.104/search?q=cach...han.pdf+Cadence+Sony+GS&hl=en&start=1

Speaking at the International Solid-State Circuits Conference, held this week in San Francisco, the designer of the Graphics Synthesiser (GS), Simplex Solutions, unveiled a .18 micron chip almost twice the physical size of the .25 micron PlayStation 2 version (21.3mm x 21.7mm, compared to just 16.8mm x 16.88mm).

http://www.theregister.co.uk/2001/02/08/playstation_3_graphics_chip/

Last month, Simplex Solutions, which developed the GS for Sony, unveiled the next generation of the part, fabbed at 0.18 micron. The reduction in size from 0.25 micron enabled Simplex to increase the chip's on-board video RAM from 4MB to 32MB.

http://www.theregister.co.uk/2001/03/12/playstation_cpu_to_shift/


Simplex Solutions is now a part of Cadence now. All indications are that the same people that worked on the GS are working on the follow-up.

Simplex Solutions has said that they are developing the GS (graphics synthesizer) chip that will go with the PS3. It will be a 0.18 micron chip about twice the size of the PS2 version. Also, it will feature 256MB of DRAM and will be able to handle 75 million polygons per second with a rate of up to 2.6 billion pixels a second.

http://www.japan-101.com/video_games/playstation_3.htm

Some of that information like the arrival of the PS3 in 2002 turned out not to be true though. Simplex has worked very closely with Toshiba on the "X" architechture.

X Architecture eliminates the "Manhattan" architecture that has dominated chip design for the past 20 years, and starts over from scratch, the companies said. Manhattan architecture is so named because the right angles interconnect between wires resembling a city grid. By directing these wires diagonally across the chip, X architecture reduces the amount of wiring needed by 20 percent, while resulting in 10 percent better performance, 20 percent less power dissipation and 30 percent more chips per wafer, the companies said.

Using liquid routing technology from circuit design company Simplex, interconnects can branch off diagonally in any of eight directions, instead of simple right angles, the companies said.

Also Monday, a group of companies announced the formation of the X Initiative, a five-year initiative aimed at "accelerating the availability and fabrication of the X Architecture," Toshiba and Simplex said in a statement.

http://www.cnn.com/2001/TECH/ptech/06/06/new.chip.design.idg/


Maybe someone should start doing patent searches on the people from the old Simplex Solutions and Cadence.


Here is a little info on Azurangzeb Khan.

The new Cadence Design Foundry business unit will be headed by Aurangzeb Khan, former head of Simplex's SoC Design Foundry business, the team known for its unbroken track record of 1st Silicon Success, including the Sony Playstation(r)2 Graphics Synthesizer(r) and the world's first OC768 framer-mapper processor for Infineon. Khan, now Cadence corporate vice president and general manager reporting to Herscher, was president and CEO of Altius Solutions, Inc. (acquired by Simplex in 2000). He has over 16 years of management experience and over 22 years of product development experience. Before founding Altius, Khan was the vice president of silicon engineering at Cirrus Logic and the director of VLSI technology and CAD development at Tandem Computers.

http://www.cadence-europe.com/corporate/press_box/index.cfm?DisplayItem=185&Language=99

The names that can be found on the design of the GS I-32 from the Cadence (Simplex) pdf are.

Aurangzeb K. Khan, Hidetaka Magoshi, Tadashi Matsumoto, Jun-ichi
Fujita, Makoto Furuhashi, Masatoshi Imai, Yoshikazu Kurose, Morio
Sato, Katsuhiko Sato, Yujiro Yamashita, Kinying Kwan, Duc-Ngoc Le,
John H. Yu, Trung Nguyen, Steven Yang, Allen Tsou, King Chow, John
Shen, Min Li, Jun Li, Hong Zhao, Kenji Yoshida


H. Takeuchi and S. Iwasaki of Sony Kihara Research
Center Inc.,

M. Kaihatsu, A. Tamura, A. Yamazaki, T. Horioka, A. Hakomori,
T. Sekihara, M. Kitano, and K. Inoue of Sony Corp., Semiconductor Network Co.,

K. Fujita, H. Nagashima, H. Furuzono and H. Truong of Altius
Solutions, Inc.


http://www.cadence.com/whitepapers/ISSC2001150MHzGraphicsProcessor.pdf
 
Brimstone said:
Simplex Solutions designed the GS. This PDF on Aurangzeb Khan cleary states Graphics Synthesizer and the follow up GS I-32. I'm sure Sony had lots of input.cessor.pdf

Like hell they did. They're a company that deals in back-end synthesis of designs, they implimented the design in a given process.

And I think you're connecting things which aren't related concerning Simplex and PS3. But, think as you like.
 
Brimstone said:
Simplex Solutions is now a part of Cadence now. All indications are that the same people that worked on the GS are working on the follow-up.

Simplex Solutions has said that they are developing the GS (graphics synthesizer) chip that will go with the PS3. It will be a 0.18 micron chip about twice the size of the PS2 version. Also, it will feature 256MB of DRAM and will be able to handle 75 million polygons per second with a rate of up to 2.6 billion pixels a second.

http://www.japan-101.com/video_games/playstation_3.htm

This part mistakes GS1-32 (for GSCube) for PS3's GPU.
 
Vince said:
Brimstone said:
Simplex Solutions designed the GS. This PDF on Aurangzeb Khan cleary states Graphics Synthesizer and the follow up GS I-32. I'm sure Sony had lots of input.cessor.pdf

Like hell they did. They're a company that deals in back-end synthesis of designs, they implimented the design in a given process.

And I think you're connecting things which aren't related concerning Simplex and PS3. But, think as you like.


Sony designed it, but I doubt it was a one way street between Sony and Simplex.

Anyway, twice Sony has used the services of Simplex Solutions (now a part of Cadence) and they are working once again with Toshiba, so I won't be surprised to see Cadence used again.

The main point is that on that pdf it lists names of people working for Sony that are linked with the GS. If a person can do a patent search on targeting the names H. Takeuchi, S. Iwasaki, M. Kaihatsu, A. Tamura, A. Yamazaki, T. Horioka, A. Hakomori, T. Sekihara, M. Kitano, and K. Inoue you might get lucky and find a patent related to the next incarnation of the GS.
 
Brimstone said:
Sony designed it, but I doubt it was a one way street between Sony and Simplex.

Anyway, twice Sony has used the services of Simplex Solutions (now a part of Cadence) and they are working once again with Toshiba, so I won't be surprised to see Cadence used again.

Again, they are a supplier of EDA tools and do in-house synthesis. Their tools are used across the board in the back-end design of systems. Hell, IBM used their tools as well on select, more-straightforward, parts of the Cell project (I believe they used it for the layout and aggregation) that didn't require them to make inhouse ones (like on timings) -- By your logic employed above concerning the GS, they "designed" Cell too.

Excellent point about the people though...
 
Parallel arithmetic apparatus, entertainment apparatus, processing method, computer program and semiconductor device


Magoshi, Hidetaka


http://appft1.uspto.gov/netacgi/nph...goshi.IN.&OS=IN/Magoshi&RS=IN/Magoshi


Methods and apparatus for multi-processing execution of computer instructions

Magoshi, Hidetaka

http://appft1.uspto.gov/netacgi/nph...goshi.IN.&OS=IN/Magoshi&RS=IN/Magoshi



Methods and apparatus for processing pipeline instructions


http://appft1.uspto.gov/netacgi/nph...goshi.IN.&OS=IN/Magoshi&RS=IN/Magoshi

Methods and apparatus for controlling hierarchical cache memory

http://appft1.uspto.gov/netacgi/nph...goshi.IN.&OS=IN/Magoshi&RS=IN/Magoshi

Methods and apparatus for controlling a cache memory


http://appft1.uspto.gov/netacgi/nph...goshi.IN.&OS=IN/Magoshi&RS=IN/Magoshi



Some type of emulation patent by Makoto Furuhashi along with many others.

Entertainment apparatus having compatibility and computer system

http://appft1.uspto.gov/netacgi/nph...i.IN.&OS=IN/Furuhashi&RS=IN/Furuhashi
 
Brimstone said:
Parallel arithmetic apparatus, entertainment apparatus, processing method, computer program and semiconductor device


Magoshi, Hidetaka


http://appft1.uspto.gov/netacgi/nph...goshi.IN.&OS=IN/Magoshi&RS=IN/Magoshi

This patent looks like the PS2s VUs...


Brimstone said:
Methods and apparatus for multi-processing execution of computer instructions

Magoshi, Hidetaka

http://appft1.uspto.gov/netacgi/nph...goshi.IN.&OS=IN/Magoshi&RS=IN/Magoshi

nAo stumbled upon this one also... but went off in a hunt to find the linked patent described "Graphics Shading Processor" ...AFAIK, he's still looking for that one! :p ...

This patent is GPU related but isn't linked to PS1, PS2 so it's open to PS3 speculation! :p ...it talks about 64 sub-processors,

[0034] At action 310, the number of loop sets and the number of remainder loops are preferably determined. The number of loop sets is the number of times the main processor 202 passes each sub-instruction to the sub-processor 204. The number of remainder loops is the number of loops that will be performed by less than all of the sub-processors 204, 206 and 208. By way of example only, if there are 1000 loops to be performed and 64 sub-processors, there are 15 loop sets and 40 remainder loops.
 
one said:
...
These are the reports of ISSCC 2001 (in Japanese)
http://pcweb.mycom.co.jp/news/2001/02/13/27.html
http://ascii24.com/news/i/tech/article/2001/02/08/622918-000.html?geta
and they contain the report of the 2 Sony-related papers.
One is a Sony-developed quad-core processor with 4 MIPS II cores with SPU/DTU/BPU, mainly targeted for HDTV processing and other set-top boxes. Those cores are called none other than "Processor Elements"!

27al.jpg


It's in 0.25um proccess/250Mhz and scalable in the number of PEs.
...

Just been re-reading this summary powerpoint presentation on Cell graphics patents by Paul Zimmons,
http://www.cs.unc.edu/~zimmons/Zimmons__CellGFX.ppt (He also did a summary PPT on the original Cell patents, http://www.cs.unc.edu/~zimmons/CELL.ppt )

Here's a bunch of B3D threads discussing some of these patents discussed in the PPT by Dr. Zimmons,

links... , links... , links... , links... , links... , links...

Anyway, he mainly concentrates on 4 patents in his ppt...
The first one is about rendering by parallel bricks/tiles...
The second describes programming Cell...
The third is about a hardware candidate surrounding the pixelengine (not really discussed at B3D...)
The fourth is another hardware candidate for the pixelengines in the form of Salc/ Salps...

What struck me was that the third patent looks remarkably like the above die with 4 PE's, Shared Cache and a Stream controller,

pixelengine2.jpg


Now if this prototype die was available for the third patent , circa 2001, as in the above image, it seems a little too early for a sample pixelengine that may go into Cell graphics for a 2006 console release? Or perhaps this was a prototype GS2, pre Cell?

I was also thinking of the fourth patent , describing the Salc/ Salps,

Salc.jpg


There was concern about there not being any TMUs (Texture memory units) in the B3D thread discussing the Salc/ Salps...but looking at it again, each Salc seems to have local storage in the form of different types of 'latches' in the above diagram. Would this suffice in the absence of TMU's as long as all the Salc/Salps can communicate with each other :? ?

pixelengine.jpg


The other thing that struck me was that each Salc/ Salp array consists of 32 Salcs= 1 Salp and there are 256 Salps that make a PixelEngine. Each salp is capable of 'one' 32bit operation with full pipelines. Therefore 'one' pixelengine is capable of 256 operations (Flops and Ops?). Four Pixelengines = 4* 256 = 1024 Ops per cycle with full piplelines.

PS3-block.png


So looking at the GPU, 4 Pixelengines would provide 1024 32bit OPs/Flops per cycle.
The 16 Apus (each APU is capable of 8 Ops per cycle) would provide 8*16= 128 32bit OPs/Flops per cycle.

The GPU = 128 + 1024 = 1152 32bit Ops/Flops per cycle with full pipelines.

1152* 0.8 (800Mhz) = 921.6 GOPS/GFLOPS for the GPU.

For the CPU, there are 32 APUs and each APU is capable of 8 OPs/Flops per cycle,

The CPU = 32*8= 256 32bit Ops/Flops per cycle with full pipelines.

256*3.6 (3.6Ghz) = 921.6 GOPS/GFLOPS for the CPU

Interestingly the CPU @ 3.6 GHZ = 921.6 Gops/Gflops = GPU @ 800 MHZ!!! A coincidence?

It has a nice symmetry about it where given a fixed process and die area, you should be abe to extract similar processing power by varying clock and logic densities. In this case a 3.6GHz CPU has the same processing power as a 800MHz GPU.

Total PS3 power = CPU + GPU = 921.6 + 921.6 = 1843.2 GOPS/GFLOPS = 1.8432 TFLOPS/TOPS.

That surely is flamebaite! :)
 
Back
Top