Can this SONY patent be the PixelEngine in the PS3's GPU?

Discussion in 'Console Technology' started by j^aws, Jun 1, 2004.

  1. qwerty2000

    Newcomer

    Joined:
    May 24, 2003
    Messages:
    149
    Likes Received:
    0
    Location:
    New Jersey
    That's too conservitve If I wanted to take it easy, I'll say 5-10 bpps and 900-1.5 polys per second. But the more I think about I think it's going to be 10 billion plus.
     
  2. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    We need to represent 3D meshes in better ways.
    IMHO multiresolution representations are the way to go..
    We just have to learn how doing that fast on next generation hw

    ciao,
    Marco
     
  3. ERP

    ERP
    Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    The first misconception is control point density, the number of control points required to model anything "real world" is very large. We wrote a Catmull clark surface editor in Maya and had an artist build a car, the control point density was between 1/3 to 1/4 of the number of verts in our polygon the model, however with the subdivision surface we also need to store connectivity info for the mesh. For any sort of reasonable tessalation algorithm this gets very expensive. I don't remember the exact figures off the top of my head, but it was very close between the two, with a slight edge towards polygons. The polygon mesh was also easier to work with and had less obvious problems.

    With Nurbs we had two issues, one was that it was difficult to model continuous smooth surfaces, artists apparently usually cheat in CG and just overlap pieces. The second was we needed both very large numbers of control points to model anything with accuracy, and they were utterly useless without trimming. Trimming was just was too expensive to consider.
     
  4. archie4oz

    archie4oz ea_spouse is H4WT!
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    1,608
    Likes Received:
    30
    Location:
    53:4F:4E:59
    A lot! Welding patches is a total PITA...
     
  5. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    Cheers guys....just been catching up on the thread...

    If Pana gets sensory overload from the patent, I daren't think what'd happen if he gets a PS3 dev kit! :shock:

    Would it really matter if APUs doing all shading ops (vertex and pixel) be replaced by say, vertex shading via APUs and pixel shading via these SALPs in the PixelEngine? Indeed, the GPU maybe without APUs and replaced by these SALPs entirely...:?

    mmm... I also see accepting one ISA for APU's and SALPs difficult from a Cell philosophy.:? Maybe the Cell ISA has extensions for graphics?

    I'm not clear how software Cells will work on the SALPs. Or is there meant to be a consistent ISA for Cell graphics? I never really got the whole distributed graphics thing with Cell. E.g, an app written for a Cell PDA client work on a Cell PS3 (with a different GPU)? Would the Cell OS use a JIT type compiler to hide this from other types of GPUs on different Cell clients? :?



    'Explosive amount...' :D , that's what I thought! Maybe it'll fry the GPU at 4Ghz!

    Reading Fig.6 in the patent, there are 256 SALPs (serial operation pipelines) and each SALP consists of 32 SALCs (serial arithmetic-logic circuits)...can be thought of as 256 pipelines with 32 stages each? A total of 8192 SALCs! :shock: Whether that's per PixelEngine per VS (*4) or the entire GPU, the patent isn't clear. These SALCs do seem to be tiny though, operating usually on 1-3 bits... :idea:
     
  6. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    I agree.
    In fact I'd work in another way. I'd let the artists work with the basic primitives they like, then I'd convert that primitive in a rough polygonal (triangles) model, from there one would like to convert this polygonal mesh in a preferred (loddable? compressed? etc..) representation.


    This can be avoided if one can can limit the min-max valence of a vertex,
    and can 'waste' some memory to store a lot of precalculated tables..

    I agree another time..NURBS should die ;)
    Subdivision surfaces are better in every aspect I can think of at this moment.
    Moreover one don't need to model fine details..but just a smooth surface/domain that can be displaced or bump mapped.

    ciao,
    Marco
     
  7. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    SALPs can effectively replace those parts of a GPU devoted to rasterizing, zbuffer/alpha/stencil tests. Shading is a work for the mighty APUs :)

    I hope it hasn't!
    If CELL will have some kind of support for graphics (like texture fetching instructions, but I wouldn't bet on it) I hope it will not be via ISA extensions.

    That's good for antialiasing or dithering..yeah :)
     
  8. Panajev2001a

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,187
    Likes Received:
    8
    Uh-oh... so we might have found the building block for the Pixel Engines... thanks to Sony for the fun treasure hunt game :).
     
  9. ERP

    ERP
    Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    This can be avoided if one can can limit the mix-max valence of a vertex,
    and can 'waste' some memory to store a lot of precalculated tables..
    [/quote]

    Actually Vertex Valence turned out to be much more of an issue than we had anticipated. Before we had the model built, we'd estimated <10% etraordinary verts. The model when finished it was more like 80% extraordinary.

    I spent a fair amount of time looking at the model afterwards and there were really no unreasonable constructs in there. In fact the large number of high valence verts was a result of minimizing the patches in the model.

    There are a number of compression schemes for geometry that might work, and a lot of things can be streamed with care. Where it starts to become an issue is large expanses of terrian (which for various reasons can't be a height map and can't be procedurally generated) with long viewing distances. I just have to laugh when people propose these ludicrous polygon counts they're expecing to see in next gen titles.

    Besides we should be discussing shader instructions/pixel it's a much more interesting benchmark.
     
  10. DeanoC

    DeanoC Trust me, I'm a renderer person!
    Veteran Subscriber

    Joined:
    Feb 6, 2003
    Messages:
    1,469
    Likes Received:
    185
    Location:
    Viking lands
    Absolutely as poly counts increase the artist need to be taking out of the loop with regard making render friendly art. They have enough problems making it look pretty without having to work around bizarre rules that make perfect sense from a ASM level....

    Hoppes has some good stuff with geometry images and displaced subdivision surfaces for this kind of stuff.
     
  11. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    Geometry images need more research (there are too many drawbacks at this time, imho)..DSS are very interesting, but there are literally bilions of modes to implement them in a modern GPUs, so there is a lot of research to do even along this way.
    Computer graphics is fun :)
     
  12. Fafalada

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    2,773
    Likes Received:
    49
    Yay, nAo also thinks similar ideas of Salps... perhaps I'm only a little crazy... :p

    As noted, the idea seems to be that SALPs effectively 'are' the core of the pixel engine black box (if you recall the original Cell patent, pixel engine was denoted separate from APU shaders).
    Anyway, I second nAo in regards to ISA issues, if Cell breaks on ISA problem as simple as this, it's a bit of a failed ideology from the get go.

    Well that's how it's normally done with compressed meshes no? :p The question is how much control can we afford take away from artists, I mean converting to some exotic subdivision scheme for compression may have unpredictable results...
     
  13. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    I don't know Faf..I'm a newbie as games developer ;)

    Umh..those kind of problems can be avoided. I'd worry about lossy mesh compression..
    What about a 3D artist gone crazy about the fine details he/she modelled on a bilions triangle mesh your compression scheme has just smoothed away? :D

    ciao,
    Marco
     
  14. Panajev2001a

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,187
    Likes Received:
    8
    On the manufacturing side...

    With the SALC/SALP I see a very scalable stucture that basically once again uses the strength of repeating over and over identical functional blocks, the same way we achieve strength in CELL: using more and more APUs ( Kahle's patent thought about having 1 more APU per PE than specified to increase redundancy and allow for high yelds [putting 9 APUs makes for higher chances that 8 of them are working in the final chip] ) per PE and more PEs per chip.

    I think this could be a nice way to keep manufacturing costs under control even with a big chip's surface.

    Debugging costs should be lower: once you have fully designed, synthesized, manufactured and tested an APU or a SALC block ( or a full SALP ) then replicating APUs and SALCs over the chip represents less of an issue than filling the rest of the chips's surface with other custom units ( which you have to separately develop and test ) :).

    What do you think ?
     
  15. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    Were your meshes reparametrized?
    Anyway I dont fear non-regular vertices..but I'd want to clamp valence between reasonable values..like 3-9. Higher the max valence, larger the tables..

    Sure..what about your figures for the next gen? :)
     
  16. Fafalada

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    2,773
    Likes Received:
    49
    Well yeah, that's what I meant, converting to another representation is pretty much lossy by definition.
    Anyway I guess we can always teach artists to avoid making things that get 'lost' or modified in a bad manner :p (I mean currently used realtime schemes for meshes are mostly just quantization and they still have to take care about precision issues occasionally).

    I think I still want my programmable primitive processor :p
     
  17. Panajev2001a

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,187
    Likes Received:
    8
    I do not think it would: Software Apulets do not target the Pixel Engine part directly.

    I think that the Pixel Engines would be controlled by a different software block, one that could be replaced by just about anything inside.

    Think about a room full of x86 based PCs, it does not make one bit of problem waht GPU they use as long as each supports DirectX ( well let's no scrutinize to the atom level, come one give me a little break :p ).

    I see all CELL based devices which have some sort of display attached to them or that are designed to directly output to a display to have some sort of graphics API that could run on the APUs if there were no Pixel Engine ( software fall-back ) or it would run on these SALC/SALP based Pixel Engines or other Pixel Engine Hardware which answer to the specifications of the graphics API.
     
  18. Vince

    Veteran

    Joined:
    Apr 9, 2002
    Messages:
    2,158
    Likes Received:
    7
    My current thinking is wondering if/that this Patent could describe a microarchitecture which is capable of supporting the form of rendering described in the SCE Patent nAo posted.

    My thinking after that patent centered on how you could feasible impliment something tantamount to it in silicon. Much of the flexibility it showed (and are it's strongest points) are hard to imagine being implimented in an area effecient manner due to recurrent and redundant logic constructs needed. Running it in what Jvd would incorrectly labels "full software' (or something like that which would really be akin to an APU or Shader) would be horribly ineffecient compared with a conventional raster pipeline (area/process constant) due to rasterization intrinsically having a [more] linear bound. Something that was 'tighter' in construction and granularity seemed to be needed; to take a current raster pipeline, break it into logical chunks and maximize the computation and, more importantly, the connectivity between them. That would appear to be what this is.
     
  19. DeanoC

    DeanoC Trust me, I'm a renderer person!
    Veteran Subscriber

    Joined:
    Feb 6, 2003
    Messages:
    1,469
    Likes Received:
    185
    Location:
    Viking lands
    Lets have a whip round and see if we can get one, sure we can make one with some sticky back plastic and a washing up bottle (Blue Peter reference to confuse non-brits) :lol:

    All we really need is GPU that can read and write system memory at will.... Now I wonder where we could get a GPU that can arbitarly access memory :wink:
     
  20. DeanoC

    DeanoC Trust me, I'm a renderer person!
    Veteran Subscriber

    Joined:
    Feb 6, 2003
    Messages:
    1,469
    Likes Received:
    185
    Location:
    Viking lands
    Is it just me or is this just describing a pipeline system?

    How I read it...

    Each set of 32 serial ALU combines to provide 1 operation per cycle on 4 channel high precision ALU. (FP24 would need 8 3 bit ALU per channel). Less precise data can gets more operations per cycle.

    Each one of the 256 units then operate on a single fragment as it passes through the programmed rasterisation steps.

    The actual amount of fragments issued per cycle is dependent on the rasterisation complexity. Given that that scissor test would take 8 cycles and depth buffering a minimum of 4 cycles gives you an idea of how amazing complex rasterisation is. If we are nice and reckon at 50 cycles for a fragment to go from the end of the fragment shader to framebuffer (this is far too low, think about stencil, colour and depth operations) We would get 5 actual fragments per cycle.

    A modern PC video card has something like 1000 fragments in progress and can output upto 16 per cycle, so this actually fairly un-parellel by graphics standards...
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...