Can this SONY patent be the PixelEngine in the PS3's GPU?

Sony only cares about peak performance and raw numbers, it's absolutely true. To think that they will limit the fillrate to 8GP/S just because more would be 'pointless' is wishful thinking.
 
There are people out there who do :LOL:

But wishful might have been a wrong word I agree, but you get what I mean.
 
I'm not sure if I understood the patent correctly, but it seems to me that a SALP is meant to replace fixed function math units dedicated to things like texture filtering, LOD calculations, blending etc... and that their general purpose nature means that each one can service many different currently hardwired needs. My guess is that performance suffers for a given operation versus a dedicated hardware implementation, but that this is made up for by much better utilization (and flexibility).
 
ERP said:
Quote:
I think the same 1998 argument applies, if PS2 was to have 2.4 Gpix/s, a WTF would be applied! If six years later in 2004, simple Moores law should take us above 30 Gpix/s otherwise I'd sack my R&D team! All those billions of Yens...

On the subject of what would we'd do with 10's of Gpix/sec, since the entire GPU is programmable, including the pixel engine, would we not be able to implemet a Reyes pipeline? or other exotic delights.

Allright you missed my point......

My point i what am I trading for those 10's of billions of pixels, increasing the die area to increase fillrate, means that die area can't be used for say more ALU blocks. Or better texture filtering etc etc.....

How useful is 10 billion pixels per second if your entirely limited by ALU speed. The statement is more about balance that it is about fillrate.

My concern about PS3 in general is exactly what Sony will leave off the die for cost reasons. I have to assume we'll get decent texture filtering that works this time, I have to assume we'll have a complete set of blending ops, but I do worry that they might decide on a "novel" architecture and then have someone that doesn't understand it cut significant features for cost reasons...... IMO this is what happened to the GS.....

When it comes to system performance the devil is in the details and it's the details of what I'm not seeing or hearing that worry me. We'll know soon enough and it's not like I have any control over it so........

Okay, from that angle, I suppose what you're essentailly saying is that for a given die space, what is the optimum balance of computational units for a given set of problem(s).

Well we could generalise and say for the GPU that the given problem is graphics processing. Given the die space from PS2 (180 nm) to PS3 (65 nm) is an approx increase of 8* the die area. Do we increase all the nos of ALUs by a factor of 8 from the previous gen or look at optimising ALUs per features/functions (which can be very subjective with cost). Below is the aim of this patent,

SUMMARY OF THE INVENTION
[0012] The present invention has been made under the above circumstances, and therefore an object of the present invention is to realize various operations by one computing unit without increasing the costs...

It seems to me that this Pixel Engine/ VS is a micro architecture of the whole CELL philiosopy, i.e. granular, scalable, flexible ALUs that are not fixed functioned. There doesn't seem to be any hardwiring in the GPU at all so I fail to see where there isn't any balance because everything is programmble without a cost increase? :? If the result of this architecture is higher that expected fillrate without the loss of balance then so be it. :)

The principle ALU units are APUs, SALPs and for storage, eDRAM, fighting for die space for the PS3 chipset (BE + 4VS). The question is whether,

32 + 16 APUs, 1024 SALPS (256 per VS), 64 + 32 MB of eDRAM and 32 MB of VRAM( image cache?) is a wise use of transistors on the chipsets. :LOL:

I suppose this is the preferred 'embodiment' and there's a high chance that something will get the chop for cost reasons...what will it be? It's CELL, just drop a PE or VS and everything scales down nicely. :LOL:
 
Paul said:
Sony only cares about peak performance and raw numbers, it's absolutely true. To think that they will limit the fillrate to 8GP/S just because more would be 'pointless' is wishful thinking.

*off topic*, Yeah I agree...pure marketing BS*** but that's the raison d'etra of sales & marketing, you *cough* lie *cough* to your clients and then *leggit...*...but the killer is that Joe Public doesn't know any different and a mass market brand is born...*keerchinggg!*. They must be doing something right because Joe Public don't have to buy PS! The best marketing is word of mouth....;)


PSurge said:
I'm not sure if I understood the patent correctly, but it seems to me that a SALP is meant to replace fixed function math units dedicated to things like texture filtering, LOD calculations, blending etc... and that their general purpose nature means that each one can service many different currently hardwired needs. My guess is that performance suffers for a given operation versus a dedicated hardware implementation, but that this is made up for by much better utilization (and flexibility).

Pretty much hit the nail on the head! However, they're aim in the patent is to achieve this without computational costs...but not sure if it will in the realworld...we wait in anticipation. 8)
 
Jaws said:
I suppose it could be streatched to fit the 4 VS GPU but I still find it odd that the GPU is hooked off the BE in Diag A but off the main buss in Diag C. Something just doesn't add up there? :?

All things equal with the BE and VS, which bus layout seems the most efficient, Diag A or Diag C?

The two fig. are from two different patent, the setup is different only because the figs have to accents the main purpose of the patent.
The schemas are specific to the patent, and it don't have to be "precise" on the details (the details are actually everything that it's not the subject of the patent). :D

ERP said:
I just have to laugh when people propose these ludicrous polygon counts they're expecing to see in next gen titles.

You were talking about a car game, how much polys you think we can expect on a classical racer next-gen? Around 50Mpps, 100Mpps, 200Mpps or more?
 
ERP said:
I think the same 1998 argument applies, if PS2 was to have 2.4 Gpix/s, a WTF would be applied! If six years later in 2004, simple Moores law should take us above 30 Gpix/s otherwise I'd sack my R&D team! All those billions of Yens...

On the subject of what would we'd do with 10's of Gpix/sec, since the entire GPU is programmable, including the pixel engine, would we not be able to implemet a Reyes pipeline? or other exotic delights.

Allright you missed my point......

My point i what am I trading for those 10's of billions of pixels, increasing the die area to increase fillrate, means that die area can't be used for say more ALU blocks. Or better texture filtering etc etc.....

How useful is 10 billion pixels per second if your entirely limited by ALU speed. The statement is more about balance that it is about fillrate.

My concern about PS3 in general is exactly what Sony will leave off the die for cost reasons. I have to assume we'll get decent texture filtering that works this time, I have to assume we'll have a complete set of blending ops, but I do worry that they might decide on a "novel" architecture and then have someone that doesn't understand it cut significant features for cost reasons...... IMO this is what happened to the GS.....

When it comes to system performance the devil is in the details and it's the details of what I'm not seeing or hearing that worry me. We'll know soon enough and it's not like I have any control over it so........

IMHO, the GS is just a very early design ( its feature-set freeze apparently pre-dates the EE) made with three things in mind:

Very high polygon counts, resulting in smaller than usual ( for the times ) polygons.

Very high sustained fill-rate ( PSOne's biggest weakness according to some developers, together with the slow Triangle set-up performance ).

Unparalled partially-opaque/transparent polygon processing ( alpha-blending effects ).

I do not see evidence of major features that almost made it in, but were cut out, except maybe the mip-map LOD calculation Hardware.

In the document "Designing and Programming the Emotion Engine", the EE's architects state that thanks to their achievements with the GS ( parallel rendering engine and fast e-DRAM ) they even kind of over-achieved with the Rasterizer speed and that designing the EE the were trying to catch up with that kind of potetnial geometry processign monster.

Let's se what kind of trade-off they made with the PSP before we judge them to hastily :).
 
ERP wrote:
I just have to laugh when people propose these ludicrous polygon counts they're expecing to see in next gen titles.

You were talking about a car game, how much polys you think we can expect on a classical racer next-gen? Around 50Mpps, 100Mpps, 200Mpps or more?

Wouldn't want to put a number on it at this point. Besides I'm not working on a racing game at the moment.

But I will predict that the real limit will be memory, depending on what you think memory sizes will be it'll give you a solid ballpark number.

The real big incrases this time will be in pixel processing that's where you'll see the huge leaps in performance.
 
I do not see evidence of major features that almost made it in, but were cut out, except maybe the mip-map LOD calculation Hardware.

I have it on pretty good authority that there were a number of "features" cut from the original design at the last minute. The mipmap LOD calc is the most obvious one, but I still can't believe they omitted modulate from the blending modes.

To be honest I'm looking forwards to seeing real details on PS3, but a lot of what I've heard tends to imply that Sony are pretty happy with PS2's architecture and that does frighten me somewhat. It certainly has it's strengths, but in other ways it's totally crippled.

All IMO of course.
 
Panajev said:
No, it will not explode like that, but it still is going to increase compared to what you do now especially because I do not see the jump in Math ops per cycle to be that massive to eliminate completely the use of cube-maps, 3D Textures, etc... as look-up/shortcuts.
I disagree - it's going to decrease. If nothing else, it will decrease relative to the number of math ops used, and probably further yet when you have math ops that are comparably fast to using a texture lookup table approximation instead.
Eg. when a vector normalize no longer costs an arm and a leg, using a bunch of cube-lookups for normalizing just stops making sense.

How slow would it be for the APU to DMA the current context ( Stack and PC ) back to shared DRAM ( we only have 128 KB of Ls per APU ) and start processing on a new pixel ?
I guess that context switching would not be needed shading pixels from the same primitive: the problem if primitives start descending in the 1-4 pixels range in terms of area then we have to make sure each APU is processing multiple primitives in parallel.
APUs won't work on one pixel/vertex at a time, it makes no sense to do that. Even with the short 4cycle instruction latency on VUs you need to write a pipelined loop with several vertices in flight to get optimal performance, and APU will only have a deeper pipeline, perhaps much deeper then that.
Actually on the subject of working on multiple primitives at a time, that kinda brings the question on how will pixel data to process be submitted to the APUs in the first place. Sounds a little less complex to handle then texture fetches, but still has its set issues I wouldn't be sure about off hand...


wunderchu said:
"wishful thinking" .... you make it sound as though people want the PlayStation 3 to be less powerful ............
The problem is that some might just as well... overinflating fillrate at expense of everything else could very well be making the machine weaker...
 
Fafalada said:
Panajev said:
No, it will not explode like that, but it still is going to increase compared to what you do now especially because I do not see the jump in Math ops per cycle to be that massive to eliminate completely the use of cube-maps, 3D Textures, etc... as look-up/shortcuts.
I disagree - it's going to decrease. If nothing else, it will decrease relative to the number of math ops used, and probably further yet when you have math ops that are comparably fast to using a texture lookup table approximation instead.
Eg. when a vector normalize no longer costs an arm and a leg, using a bunch of cube-lookups for normalizing just stops making sense.

You hit very good points Fafalada ( about the ability to avoid Cube-maps for things such as Vector Normalization, etc... ), but while I see the ratio of Texture Fetches vs Math Ops decreasing, I still do not see a decrease in use of Texture Ops from what we do now, instead I still see an increase ( which seems to be matched by a possible increase in Texture Fetch latency ) just one that is not as fast as the increase in Math Ops usage.

That increase unfortunately might held the pipeline back and force programmers to write pipelined loops with more and more vertices and primitives in flight which, pardon the pun, is not something primitive.

Actually on the subject of working on multiple primitives at a time, that kinda brings the question on how will pixel data to process be submitted to the APUs in the first place. Sounds a little less complex to handle then texture fetches, but still has its set issues I wouldn't be sure about off hand...

As food for thought, can you please expand on this point please ?
 
Megadrive1988 said:
so Graphics Synthesizer design was frozen in 1997?

Feature-wise more or less as you need to lock the specs down a bit before you start manufacturing... in early 1999 the EE was already running ( in limited batches I am sure ) so it is not that impossible to think that the GS ( whose design seems to pre-date the EE ) was feature-locked during the year 1997.
 
it does make sense Panajev. as PS2 was possibly going to launch in late 1999 in Japan.

I suppose PS3's GPU is now locked down, if not it will be shortly, assuming a march/spring 06 launch in Japan.
 
Panajev2001a said:
Fafalada said:
Panajev said:
No, it will not explode like that, but it still is going to increase compared to what you do now especially because I do not see the jump in Math ops per cycle to be that massive to eliminate completely the use of cube-maps, 3D Textures, etc... as look-up/shortcuts.
I disagree - it's going to decrease. If nothing else, it will decrease relative to the number of math ops used, and probably further yet when you have math ops that are comparably fast to using a texture lookup table approximation instead.
Eg. when a vector normalize no longer costs an arm and a leg, using a bunch of cube-lookups for normalizing just stops making sense.

You hit very good points Fafalada ( about the ability to avoid Cube-maps for things such as Vector Normalization, etc... ), but while I see the ratio of Texture Fetches vs Math Ops decreasing, I still do not see a decrease in use of Texture Ops from what we do now, instead I still see an increase ( which seems to be matched by a possible increase in Texture Fetch latency ) just one that is not as fast as the increase in Math Ops usage.

That increase unfortunately might held the pipeline back and force programmers to write pipelined loops with more and more vertices and primitives in flight which, pardon the pun, is not something primitive.

Actually on the subject of working on multiple primitives at a time, that kinda brings the question on how will pixel data to process be submitted to the APUs in the first place. Sounds a little less complex to handle then texture fetches, but still has its set issues I wouldn't be sure about off hand...

As food for thought, can you please expand on this point please ?

Here's another SONY patent which I don't think has been covered here before describing the graphics pipeline with data compression,

Click here for

System and method for data compression...


Here are a few pipeline Figs from the patent,

pipeline1.jpg


pipeline2.jpg


System and method for data compression
Abstract

A system and method for compressing video graphics data are provided. The system and method include generating in a graphics pipeline, from video graphics data modeling objects, vertex data corresponding to the objects, rendering the video graphics data to produce a current frame of pixel data and a reference frame of pixel data, and, based upon the vertex data, defining a search area within the reference frame for calculating a motion vector for a block of pixel data within the current frame. The current frame then is compressed using the motion vector. The use of vertex data from the graphics pipeline to define the search area substantially reduces the amount of searching necessary to generate motion vectors and perform data compression....

....BACKGROUND OF THE INVENTION
[0002] The preparation, storage and transmission of video data, and, in particular, video graphics data generated by a computer (for example, video graphics data for a computer game), require extensive computer resources and broadband network connections. These requirements are particularly severe when such data are transmitted in real time among a group of individuals connected over a local area network or a wide area network such as the Internet. Such transmitting occurs, for example, when video games are played over the Internet. Such playing, moreover, is becoming increasingly popular.

[0003] In order to reduce the amount of network capacity and computer resources required for the transmission of video data, various encoding schemes for data compression are employed. These data compression schemes include various versions of the MPEG (Motion Picture Experts Group) encoding standard, for example, MPEG-1, MPEG-2 and MPEG-4, and others. These data compression schemes reduce the amount of image information required for transmitting and reproducing motion picture sequences by eliminating redundant and non-essential information in the sequences.

[0004] For example, the only difference in many cases between two adjacent frames in a motion picture sequence is the slight shifting of certain blocks of pixels. Large blocks of pixels, representing, for example, regions of sky, walls and other stationary objects, often do not change at all between consecutive frames. Compression algorithms such as MPEG exploit this temporal redundancy to reduce the amount of data transmitted or stored for each frame.

[0005] For example, in the MPEG standard, three types of frames are defined, namely, intra frames (I-frames), predicted frames (P-frames) and bi-directionally interpolated frames (B-frames). As illustrated in FIG. 1, I-frames are reference frames for B-frames and P-frames and are only moderately compressed. P-frames are encoded with reference to a previous frame. The previous frame can be either an I-frame or a P-frame. B-frames are encoded with reference to both a previous frame and a future frame. The reference frames for B-frames also can be either an I-frame or a P-frame. B-frames are not used as references.

[0006] In order to encode predicted frames and interpolated frames from reference frames, the MPEG scheme uses various motion estimation algorithms. These motion estimation algorithms include full search algorithms, hierarchical searching algorithms and telescopic algorithms. As illustrated in FIG. 2, under the MPEG standard, each frame typically is divided into blocks of 16 by 16 pixels called a macro block. A macro block of a current frame is encoded using a reference frame by estimating the distance that the macro block moved in the current frame from the block's position in the reference frame. The motion estimation algorithm performs this estimating by comparing each macro block of the current frame to macro blocks within a search area of the reference frame to find the best matching block in the reference frame. For example, for macro block 201 of current frame 207, a comparison is made within search area 203 of reference frame 209 between macro block 201 of the current frame and each macro block 205 of the reference frame to find the best matching block in the reference frame. The position of this best matching macro block within the reference frame then is used to calculate a motion vector for macro block 201 of the current frame. Rather than transmit for current frame 207 all of the video data corresponding to macro block 201, only the motion vector is transmitted for this block. In this way, the video data for the current block are compressed.

[0007] Executing motion estimation algorithms, however, also requires substantial computer resources. Since each macro block of a current frame must be compared to numerous macro blocks of one or more reference frames, an extensive number of computations are required. For example, the three-step-search algorithm (TSS) (a hierarchical algorithm) evaluates matches at a center location and eighth surrounding locations of a search area. The location that produces the smallest difference then becomes the center of the next search area to reduce the search area by one-half. This sequence is repeated three times.

[0008] A need exists, therefore, for a more efficient and effective method for compressing video graphics data, particularly in view of the increasing demand for systems capable of playing video games in real time over the Internet and other networks.

SUMMARY OF THE INVENTION
[0009] Data compression encoders, such as MPEG encoders, employ the same method for compressing video data regardless of the source of the video data. Video data from a live performance recorded by a digital camera and simulated video data generated by a computer, therefore, are compressed in accordance with the same data compression scheme and motion estimation algorithm. When video data are generated by a computer, however, information regarding the nature and movement of objects are known prior to the data's encoding and compression. Unlike present data compression encoders, the present invention takes advantage of this information to reduce the computational steps necessary to perform data compression.....

It seems to me that this above patent takes the pipeline into account with distributed graphics in mind? :oops: What do ya experts tink? :LOL:
 
This is mine ps3 gpu specs Ithis no way near the offical specs you can quote me when you see my specs.


Visualizer
2ghz(2000 MHz)
32Pixel Pipelines
64 Gigapixels per Second (no texture)
32 Gigatexels per Second
Point, Bilinear, Trilinear, Anisotropic Mip-Map Filtering
Perspective-Correct Texture Mapping
Bump MappingReal time ray tracing *selectively applied*, Global/local illumination, Metropolis light transport, particle flow physics, LOD via dynamic selective subdivision of polygons, Bézier patches, VS driven chroma-matching, inverse/converse kinematics, priori collision, 16x FSAA,shaders, detection, etc etc.
Environment Mapping
128bit HDR rendering for FMV and CG quality graphics.
32MB Quadported Embedded DRAM
256 Gigabytes per Second eDRAM Bandwidth
128 Gigabytes eDRAM Texture Bandwidth
16bpps without effects
1-3bpps with effects
 
Back
Top