What's the current status of "real-time Pixar graphics&

Pavlos · Oct 10, 2003

mrbill said:
Whoa whoa whoa whoa! Major correction on OpenGL Shading Language needed.

There are virtual *and* physical limits to the OpenGL Shading Language. What we virtualized were the things that were difficult to count in a device independent way -- temporaries, instructions and texture fetch restrictions.

But there are very real physical limits. Some of the constraints are small and harsh. Just a few: ....

Thanks for the correction, I should read the spec more carefully. I had the impression that OpenGL Shading Language virtualized all the hardware resources by performing automatic multipass in the driver. Therefore, you are right; the shaders must be broken by the application, so itâ€™s not as trivial as I though, but itâ€™s certainly feasible.

VFX_Veteran · Oct 11, 2003

Pavlos said:
Thanks for the correction, I should read the spec more carefully. I had the impression that OpenGL Shading Language virtualized all the hardware resources by performing automatic multipass in the driver. Therefore, you are right; the shaders must be broken by the application, so itâ€™s not as trivial as I though, but itâ€™s certainly feasible.

Does this mean we can stop debating about this?

Nah....

For the record, I've just recieved my first HLSL book from Wolfgang called Shader X2. I'm interested to see how they pulled off the Cooke-Torrance and Oren Nyar illumination models in hardware (which is very nice, btw). So, once I know more of your territory then I'll have more substance to talk smack!

-M

Megadrive1988 · Oct 16, 2003

If we fix the target to Toy Story quality (the real thing, not cheap imitations) then itâ€™s just a matter of time.
If the hardware companies abandon the brute force algorithms and adopt more elegant ones then it will happen sooner (probably in the next ten years).
If they continue patching the SGIâ€™s pipeline, designed for gouraud shaded scenes with a few hundreds of polygons, then it will happen later.

This I find totally understandable, as someone outside the field of graphics or even computing in general.

The way I read this is, we might see Toy Story quality with say NV100
and Playstation4.

Dio · Oct 16, 2003

If the hardware companies abandon the brute force algorithms and adopt more elegant ones then it will happen sooner (probably in the next ten years).
If they continue patching the SGIâ€™s pipeline, designed for gouraud shaded scenes with a few hundreds of polygons, then it will happen later.

What would you suggest as an alternative?

Megadrive1988 · Oct 16, 2003

basicly almost all consumer 3D (NvXX, Rxxx, Gamecube, XBox, etc) are modified SGI pipelines, right?

Raystream · Oct 17, 2003

Concerning current graphics

IIRC, there was an article the was printed in 2001 by one of the Main Playstation magazines concerning Sony. In the article they showed a picture of a possible new console system. It consisted of 32 GPUs, and weigned in about around 109 (or was it 119) Pounds

. It also looked like a giant cube or around 10 Sony Playstation 2's stacked on top of one another.

The main highlight of it was that Sony claimed that this possible new console (Which I believe containes "The Cell" chips from Sony) rendered "The Matrix" graphics in real time. According to Sony and Wachowski brothers (Who quoted on the article) said that using convential methods it would take up to 1 hour and 15 minutes to do one frame for "The Matrix" graphics, but they stated that using this console they were able to do it all in real time!

Raystream

P.S. I currently looking for the article

Tahir2 · Oct 17, 2003

I think I remember that article Raystream but I think it was a demo of the GS from Sony which hasn't materialised. I think it was an array of EE's with 32MB cache rather than the PS2's 4MB.

Can I ask before getting shotdown by everyone:
Is the problem that we are still following an ISA that is about 20 years old as the leading platform for realtime PC graphics? I am talking about x86.

I know x86 doesn't have much to do with the graphics side but currently a system is designed around this old architecture, specifically the CPU and some of the buses present in a system (PCI, AGP, even the new PCI-X).

Something like, ahem, Cell is designed to do what mathematics need most and that is shitloads of FPU power...which helps with rendering lots of 3D effects polygons, pixels, zixels and faeries (plus the odd rubber duck here and there), no?

KimB · Oct 17, 2003

What you just stated, Tahir, is one main reason for the push towards much more flexible pixel and vertex shaders. The CPU just isn't made for these calculations, but the GPU is. So, if the GPU can be made flexible enough to do all of the graphics calculations, then speed can be improved dramatically.

Dio · Oct 17, 2003

The 'x86?' question - generalised to 'RISC vs CISC' or any of a dozen other related questions - is an unusual one.

A very good article that I have no idea where I read it stated that given the way the P4 / Athlon work internally - essentially as RISC machines - the x86 ISA essentially becomes just a useful way of compressing code. (For example, it is reckoned that the same amount of instruction cache is between half again and twice as effective on x86 as it is on PowerPC, as the average instruction on x86 takes less space).

An alternate way to get the same answer is that modern ALU's and caches take up so much space, that the additional 'glue' needed to implement x86 decoding isn't particularly important. I would say the downside is likely in chip design and verification complexity more than anything else.

I would say it is probably a bit of a waste to have 'just' a CPU which emphasises mathematics processing. There is plenty of work to do that a massive math processor would be a waste to do - take Jeff Minter's comment that the 68000 on the Jaguar was just a glorified joystick processor.

As regards Cell - well, I remember Sony making similar claims for the Emotion Engine / GS combo when PS2 came out (that it was a revolutionary design that would totally change 3D gaming). By the time the PS2 shipped a PC with a decent 3D card was already significantly ahead of it. Of course, it still competes 'ok' because the PC games don't target the top-end space, but let's face it, it's only working at 640x480 resolution, it doesn't HAVE to work that hard!

KimB · Oct 17, 2003

The need to be compatible with the x86 architecture has eliminated one major facet of innovation in the PC space: the instruction set.

Personally, I think it's much more effective an argument to argue CISC vs. VLIW, instead of RISC. The basic idea is that if the "advanced" instruction set takes into account things that a CISC instruction set doesn't, information that is lost when the code is compiled for a CISC architecture, then that information can never be retrieved in full.

Said another way, if the "core" instruction set is different enough from the "external" instruction set, there's no feasible way to do the instruction decoding in realtime.

Any way you slice it, the x86 architecture has really hog-tied CPU developers, preventing them from innovating in a significant way: by improving the instruction set.

Dio · Oct 17, 2003

I would agree that it's 'restricted innovation in the instruction set' except that I'm not sure that term has any meaning. Nobody is forced to use an x86 ISA, and for many specific applications it is not used. However, given current software models, any single platform needs a standard architecture, or a very, very small subset of architectures, so I'm not sure that there's space in the real world for much innovation anyway. If the Linux source-distribution model takes off then maybe there is more opportunity...

My point is that decoding can convert anything you can express in x86 to anything you can express in most other architectures. Whether or not this decoding is simple enough to do in real time is questionable. One could argue that in the modern cores - particularly the Pentium-M and the Opteron - the goals of a VLIW architecture (regularly achieving many 'microinstructions' / 'basic operations' simultaneously 'executed' per clock) are being met.

KimB · Oct 17, 2003

No, I think a standard architecture *was* necessary, because computers just weren't fast enough.

Today we have computers that are fast enough to do the decoding in realtime. What am I trying to say? Today we could have a Java-like language that would first compile to a sort of pseudo-machine code. This code would be higher-level than machine, but most of the optimization information is already discovered, so that the rest of the compiling to a specific architecture can be done by drivers.

In other words, if the interface was standardized at a higher level, we would be better off in terms of performance today than we currently are. But, the interface was standardized long before there was an option to sacrifice some performance for the promise of extra performance in the future.

And yes, of course I am only talking about the PC market, where x86 CPU's dominate. There are other markets that use other CPU's, and these markets are the best reason to see that the x86 architecture is just plain inefficient.

Dio · Oct 17, 2003

I don't think leading-edge x86 CPU's are behind the major alternatives in the 'desktop/workstation CPU' space (PowerPC, UltraSPARC, Itanium... any others?)

I don't particularly think they're ahead, though - which, given the development money put in, maybe they should be.

Other markets have different priorities to sheer horsepower, and there competition to x86 is more intense because the decoding overhead of x86 is a relatively larger portion of the chip - no huge caches, probably no FPU's.

MfA · Oct 17, 2003

The problem with higher level virtual instruction sets is that hardware manufacturers want lockin. I think something like LLVA would be a good thing, but unless it sneaks in the sidedoor through compiler technology (LLVM, which Ill repeat is a damn shame is not considered for next gen gcc due to stupid politics) I dont see any hardware manufacturer embracing it.

It might not have been an option for x86, but it certainly is an option for Cell. I however see no indication they will go for anything but low level binary compatibility and lockin.

Panajev2001a · Oct 17, 2003

MfA said:
The problem with higher level virtual instruction sets is that hardware manufacturers want lockin. I think something like LLVA would be a good thing, but unless it sneaks in the sidedoor through compiler technology (LLVM, which Ill repeat is a damn shame is not considered for next gen gcc due to stupid politics) I dont see any hardware manufacturer embracing it.

It might not have been an option for x86, but it certainly is an option for Cell. I however see no indication they will go for anything but low level binary compatibility and lockin.

A virtual instruction set architecture (V-ISA) implemented via a processor-specific software translation layer can provide great flexibility to processor designers. Recent examples such as Crusoe and DAISY, however, have used existing hardware instruction sets as virtual ISAs, which complicates translation and optimization. In fact, there has been little research on specific designs for a virtual ISA for processors. This paper proposes a novel virtual ISA (LLVA) and a translation strategy for implementing it on arbitrary hardware. The instruction set is typed, uses an infinite virtual register set in Static Single Assignment form, and provides explicit control-flow and dataflow information, and yet uses low-level operations closely matched to traditional hardware. It includes novel mechanisms to allow more flexible optimization of native code, including a flexible exception model and minor constraints on self-modifying code. We propose a translation strategy that enables offline translation and transparent offline caching of native code and profile information, while remaining completely OS-independent. It also supports optimizations directly on the representation at install-time, runtime, and offline between executions. We show experimentally that the virtual ISA is compact, it is closely matched to ordinary hardware instruction sets, and permits very fast code generation, yet has enough high-level information to permit sophisticated program analyses and optimizations.

I agree with you Mfa, this technology does sound very interesting.

IMHO, there is some indication STI might go for something similar regarding CELL: Suzuoki's CELL patent talks about APUs all havign the same ISA and all being able to execute at least the basic Apulet ( basic = not using more than one APU in parallel ).

He also says that the register file size, the number of FP/FX Units and other resources can be changed as from what it can be gathered from the patent, all of that has no relation on the ISA.

MfA · Oct 17, 2003

There is nothing there about on-line translation or OS hooks for caching that translation, which are necessities for such a virtual instruction set.

Megadrive1988 · Oct 17, 2003

IIRC, there was an article the was printed in 2001 by one of the Main Playstation magazines concerning Sony. In the article they showed a picture of a possible new console system. It consisted of 32 GPUs, and weigned in about around 109 (or was it 119) Pounds . It also looked like a giant cube or around 10 Sony Playstation 2's stacked on top of one another.

The main highlight of it was that Sony claimed that this possible new console (Which I believe containes "The Cell" chips from Sony) rendered "The Matrix" graphics in real time. According to Sony and Wachowski brothers (Who quoted on the article) said that using convential methods it would take up to 1 hour and 15 minutes to do one frame for "The Matrix" graphics, but they stated that using this console they were able to do it all in real time!

Raystream

P.S. I currently looking for the article

I believe what you are describing here is Sony's GS Cube -

A render box with an array of 16 sets of PS2 processors. That is, 16 Emotion Engines and 16 Graphics Synthesizers. The EEs are basicly the same ones found in our PS2s but the GSs are the 'GS I-32s'. they have 32 MB of eDRAM instead of 4 MB. Also the main system memory was quadroupled. instead of 32 MB x 16, GS Cube has 128 MB x 16. So therefore, that's 2 GB of main system memory plus 512 MB of embedded eDRAM video memory cache from 16 GS I-32s

The GS Cube could not function idependantly, but used an SGI workstation
(or highend SGI Visualization system) as the host, or front-end. the GS Cube would do the rendering. They did parts of Final Fantasy TSW in realtime, at 30 or 60 fps, but the realtime versions were drastically scaled-down from the movie with far less geometry, effects, lighting, shaders, etc. The Matrix was also done on GS Cube but I'm sure what it did was a simplified realtime version ( less impressive than the movie ).

Also, besides the GSCube above, which is also known as GSCube16, there was a "GSCube64" planned. It had, or would have had, 4x the specs of the GSCube16 since it had 64 EEs and 64 GS I-32s and 4x the memory. I don't know if this version was ever demo'd but here is an EETimes article mentioning it, as well as some detail about the real-time Final Fantasy demo made for GSCube16.

http://www.eetimes.com/story/OEG20000913S0026

Sony steps up Playstation-based graphics system plans

By Yoshiko Hara

EE Times
September 13, 2000 (5:34 p.m. ET)

TOKYO â€” Sony Computer Entertainment Inc. has accelerated its plans to roll out a high-end graphics computer based on its Playstation game console technology. The company announced plans this week to introduce a system called GScube next year that will use 64 processor boards with Playstation 2 technology.
The resulting parallel-processing computer will act as a graphics visualization machine with a 3-D processing capability of 4.16 gigapolygons per second and a resolution of 60 frames/second (progressive scan) at 1,080 x 1,980 pixels, the company said.

The 64-board GScube will put Sony ahead of the graphics system road map it announced a year ago, when it promised to develop a system with 10 times the processing clout of Playstation 2 in 2000, followed by a 100-times version in 2002 and a 1,000x version before the end of the decade.

Sony demonstrated a 16-processor prototype at the Siggraph 2000 show in New Orleans last July, built in collaboration with more than 20 companies. "With the feedback from the demonstration at Siggraph, we realized that the present [16-processor] prototype did not have enough performance for 3-D graphics creation and realistic rendering in real-time," said Ken Kutaragi, president of Sony Computer Entertainment. "We are planning to introduce a system with about 64 parallel-processing units next year."

Sony and one of its collaborators, a film production company called Square, jointly demonstrated the GScube in Tokyo on Tuesday (Sept. 12) by synthesizing a footage in real-time from the computer-graphics movie version of Final Fantasy. Square is now producing this film, based on a popular computer game, for release next summer in the United States.

"At present it takes five hours to render one frame," said Kazuyuki Hashimoto, senior vice president and chief technology officer of Square USA. "If GScube can process the graphics data in real-time, that means it will take only 1/30 second per frame."

The demo with the current prototype, however, reduced rendering for textures such as hair to suit the abilities of the system. Where in the movie a character's hair was rendered as 40,000 lines, for example, the demo displayed only 4,000.

"The current prototype needs data tuning," said Hashimoto. "But as the performance of the 64-processor parallel GScube will be four times [that of today's system], it can do more." He said Square was actively investigating the possibility of using GScube in its production work.

from usenet

http://groups.google.com/groups?selm=3a302009.2841640@news.rim.net&oe=UTF-8&output=gplain

The theoretical maximum polygon throughput of the GSCube-16 is a
billion polygons per second. Developers working on the prototype say
they've achieved around 300 million polygons per second.

The theoretical maximum polygon throughput of the GSCube-64 is about 4 billion polygons per second. The GS chip seems designed to be exceptionally good at parallel processing. It outputs not only RGB,
but depth information as well so that the output of multiple GS chips
can be merged easily.

http://groups.google.com/groups?selm=8pr67d$d3b$1@nnrp1.deja.com&oe=UTF-8&output=gplain

60fps Film Rez rendering is almost here:

Sony steps up Playstation-based graphics system plans
(09/13/00, 5:34 p.m. EST)
TOKYO - Sony Computer Entertainment Inc. has accelerated its plans to
roll out a high-end graphics computer based on its Playstation game
console technology. The company announced plans this week to introduce
a system called GScube next year that will use 64 processor boards with
Playstation 2 technology.
The resulting parallel-processing computer will act as a graphics
visualization machine with a 3-D processing capability of 4.16
gigapolygons per second and a resolution of 60 frames/second
(progressive scan) at 1,080 x 1,980 pixels, the company said.
The 64-board GScube will put Sony ahead of the graphics system road map
it announced a year ago, when it promised to develop a system with 10
times the processing clout of Playstation 2 in 2000, followed by a 100-
times version in 2002 and a 1,000x version before the end of the
decade.
Sony demonstrated a 16-processor prototype at the Siggraph 2000 show in
New Orleans last July, built in collaboration with more than 20
companies. "With the feedback from the demonstration at Siggraph, we
realized that the present [16-processor] prototype did not have enough
performance for 3-D graphics creation and realistic rendering in real-
time," said Ken Kutaragi, president of Sony Computer Entertainment. "We
are planning to introduce a system with about 64 parallel-processing
units next year."
Sony and one of its collaborators, a film production company called
Square, jointly demonstrated the GScube in Tokyo on Tuesday (Sept. 12)
by synthesizing a footage in real-time from the computer-graphics movie
version of Final Fantasy. Square is now producing this film, based on a
popular computer game, for release next summer in the United States.
"At present it takes five hours to render one frame," said Kazuyuki
Hashimoto, senior vice president and chief technology officer of Square
USA. "If GScube can process the graphics data in real-time, that means
it will take only 1/30 second per frame."
The demo with the current prototype, however, reduced rendering for
textures such as hair to suit the abilities of the system. Where in the
movie a character's hair was rendered as 40,000 lines, for example, the
demo displayed only 4,000.
"The current prototype needs data tuning," said Hashimoto. "But as the
performance of the 64-processor parallel GScube will be four times
[that of today's system], it can do more." He said Square was actively
investigating the possibility of using GScube in its production work

Sony's official spec for GSCube16 is 1.2 billion polygons/sec. obviously this theoretical figure comes from 75 million poly/sec of the Graphics Synthesizer ( GS I-32) x 16.

The official spec for GSCube64 is 4.16 billion polygons/sec. I'm not sure how they got this figure, its a little less than 75 million x 64 but close enough. perhaps the memory subsystems make this the limit
(otherwise 75M x 64 is 4.8 billion)

Panajev2001a · Oct 17, 2003

MfA said:
There is nothing there about on-line translation or OS hooks for caching that translation, which are necessities for such a virtual instruction set.

Asking a bit much from an almost 3 years old patent now, aren't we ?

Maybe this is covered somewhere else as no detail on the CELL OS has leaked yet.

Tahir2 · Oct 18, 2003

Dio said:
The 'x86?' question - generalised to 'RISC vs CISC' or any of a dozen other related questions - is an unusual one.

I didn't really mean to generalise it to a CISC vs RISC issue as that is moot as you explain below. I mean more of a clean sweep to the internals of the PC starting at the motherboard level which is what is slowly happening. I would like to see a revolution rather than an evolution but that simply can't happen due to the whole industry having so much resources in keeping the internals the same.

A very good article that I have no idea where I read it stated that given the way the P4 / Athlon work internally - essentially as RISC machines - the x86 ISA essentially becomes just a useful way of compressing code. (For example, it is reckoned that the same amount of instruction cache is between half again and twice as effective on x86 as it is on PowerPC, as the average instruction on x86 takes less space).

You might have read it at www.arstechnica.com.

An alternate way to get the same answer is that modern ALU's and caches take up so much space, that the additional 'glue' needed to implement x86 decoding isn't particularly important. I would say the downside is likely in chip design and verification complexity more than anything else.

I would say it is probably a bit of a waste to have 'just' a CPU which emphasises mathematics processing. There is plenty of work to do that a massive math processor would be a waste to do - take Jeff Minter's comment that the 68000 on the Jaguar was just a glorified joystick processor.

Carmack also was critical of Atari for using such a low powered CPU and I remember Jeff Minter's comments. That guy should be making more games and spending less time with his llamas and sheep..

As regards Cell - well, I remember Sony making similar claims for the Emotion Engine / GS combo when PS2 came out (that it was a revolutionary design that would totally change 3D gaming). By the time the PS2 shipped a PC with a decent 3D card was already significantly ahead of it. Of course, it still competes 'ok' because the PC games don't target the top-end space, but let's face it, it's only working at 640x480 resolution, it doesn't HAVE to work that hard!

You have to also remember that the PS2 was released in late 2000 when the GF1 DDR and V5 5500 was released and the highend CPU's were Thunderbirds at 1.2GHz and a P4 at 1.5GHz. The PS2 can still keep pace with current development on XBOX and GC IMHO when programmed efficiently. Sony certainly did do something against the grain and have in the long run reaped some benefits going that route.

Megadrive1988 · Oct 18, 2003

You have to also remember that the PS2 was released in late 2000 when the GF1 DDR and V5 5500 was released and the highend CPU's were Thunderbirds at 1.2GHz and a P4 at 1.5GHz. The PS2 can still keep pace with current development on XBOX and GC IMHO when programmed efficiently. Sony certainly did do something against the grain and have in the long run reaped some benefits going that route.

lets see, PS2 came out in early 2000 in Japan, at the time the GF1 (Nv10)
DDR was the fastest consumer 3D card out there. the GF2gts was about to come out. When PS2 came out in north america in fall 2000, the GF2 Ultra was the fastest out there.

What's the current status of "real-time Pixar graphics&

Similar threads