News about Rambus and the PS3

Discussion in 'Console Technology' started by McFly, Jul 10, 2003.

  1. Panajev2001a

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,187
    Likes Received:
    8
    Then reduce the number of FP ops :p

    But this will help the REYES-like renderer...


    It should operate per fragment too, nothing prohibites the APUs on the Visualizer to process Pixel Programs while the APUs on the Broadband Engine do T&L and run Vertex programs... I just do not see the Pixel Engines in the Visualizer to be OVERLY complex and the architecture being generally designed to push tons of small polygons instead of bigger polygons with high degree of multi-texturing required.


    The Shaders can use how many textures they prefer, I remember the REYES pipeline and what happens in the Shading stage ( texture input is one of those things )... wether they are procedurally generated ones or not...

    What I was referring was the configuration of the Rasterizer unit: like DeanoC was commenting in a post about REYES-like renderers a while ago, the Rasterizer doesn't need to be overly complex and doesn't need to do tons of texture layers each cycle...

    Textures are sampled in the Shading phase and when the sea of micro-polygons is sent to the rasterizer what we will worry about will be if the Rasterizer can draw them on screen as fast as they come...

    The micro-polygon after the Shaders does not arrive with tons of textures to be applied in layers... during the shading phase the textures were sampled and the color of the micro-polygons was processed... if we want to accelarate and use the Shaders to process the micro-polygon until a single texture remains to be applied we can... after all, the GPU will be probably supporting texturing as I do not think they can ask developers to move in mass to a REYES like processing from night to day...

    You will still have people using regularl OpenGL pipeline processing ( I think we should see OpenGL 2.0 ).


    Except with the fact that we should have 4 PEs in the BE and that we have much more local bandwidth thanks to the e-DRAM and the PEs should also be clocked higher than 400 MHz...

    Also the BE would have a bit more local memory, the Imagine Stream Processor has 128 KB of SRF ( Stream Register File ) divided between the 8 SIMD clusters while a single PE has 128 KB and thirty-two 128 bits GPRs per each APU...

    The BE should have the clock and resource advantage over the Imagine Processor...

    I can see the influence of the Imagin on Cell... Sony supposely has been active collaborating with Universities world-wide and they could have brought in the Cell project some of the results they got...

    Look at the Stanford paper comparing REYES and OpenGL... where does the REYES pipeline spends the most time ?

    In the Geometry phase ( slicing 'n dicing, Shaders, etc... ) and much less time on the Rasterizing phase...

    This kind gives you an idea of where we have the processing bound part of the rendering time... adding to the Pixel Engines capability of doing two/four textures per cycle and then offering loop-back would be wasted for a REYES-like renderer, what we need is a VERY fast CPU to do the Shading part and when we have a 1 TFLOPS class CPU I think we have a nice candidate for the job...

    The GPU of PlayStation 3, the Visualizer as described in the patent, contains its fair share of APUs that can assist the Pixel Engines running Pixel Programs or that can assist the over-all rendering of the REYES-like renderer by helping the BE to balance the Geometry processing load...

    Distributing the processing load on a Cell system would be facilitated as the architecture was designed to have the standard units of work, the Apulets, travel from Cell to Cell to find the APU that can process them ( in short: software Cells/Apulets can migrate if the host system is running at full capacity and another connected device [the GPU would be "connected" ] has the ability of process the Apulet [and return it back in time... it would be disadvantageous if it took the Apulet more time to be sent, processed and received than waiting for a local APU to be free ).
     
  2. V3

    V3
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    3,304
    Likes Received:
    5
    Procedural texture don't come cheap.


    Only if the scene is as complex as the one in the movie.


    Then it will be better to use per fragments and typical OGL pipe, instead of REYES. You will only want to used REYES, if the resultant image is significantly better than the one that can be done on the OGL pipe.


    Ohh, I was talking about textures with respect to amount of memory. (We are on memory thread afterall :) ) But yes, we don't need alot of texture units, with the availability of vertex and fragment shaders.

    I never thought the rasteriszer would be the limiting factor. Why are you looking into the rasterizer ?



    Its also scalable, similar to how PE is scalable. They can put 64 of those processor and get 1 TFLOPS.

    Of course it will. That Imagine processor is only like what 20+ million transistors.

    The result are published, if we can get them, so would Sony.


    I am not arguing about, the rasterizing bit, what I am saying, using per fragment and larger polygon than micropolygon, ie the maximum image quality OGL pipes can give, while still working efficiently, will gives better performance, with similar image quality with its REYES like counterpart. Like I said before, we don't want REYES with quality compromised, that would be against the goal of that algo.
     
  3. Panajev2001a

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,187
    Likes Received:
    8
    1. we have FLOPS ;)

    2. Deferred Shading reduces the Shading load... unless you are telling me that sorting the HOS/Subdivision surfaces will take more rendering time than the time we save by reducing the Shading load


    There are advantages... using micro-polygons the size of 1/2 or 1/4th of a pixel will help with things like nice displacement mapping ( we need to work on a sub-pixel level for proper displacement mapping anyways... ).



    Fill-rate and Set-up engine concerns mainly...


    I do not want shitty quality for a REYES-like renderer either...

    What I would like is to bring an uniform approach, everything gets diced in micro-polygons and micro-polygons are the basic unit that gets Shaded...

    No Vertex and Pixel Shading... only micro-polygon Shading :)

    I understand REYES was thought for Quality rendering and I am not saying I would want it to be slow either...

    Let's use as the REYES paper says micro-polygons of the size of 1/4th of a pixel... that will generate more micro-polygons, we can reduce the Shading Load by using doing Hidden Surface Removal processing before Shading except we would have to displace the transformed micro-polygons before culling unseen ones...

    Transform, depth sort the HOS control points, slice 'n dice only visible patches plus visible and invisible ones that are using displacement mapping ( you can find ways to embed this info )... you will still have a Z-buffer to rely on...

    Instead of 16 samples use 4 or 8 for stochastic AA ( it is the randomly jittered pattern that matters and 4 samples is not that low )...

    There are advantages in using a REYES-like model that are unrelated to Image Quality... ( ease of programming could be one, the REYES pipeline is quite logical, uniform and neat to follow )

    *Much easier to Vectorize and distribute all the shading operations across all available APUs... Shading done at a single stage in the pipeline and on uniform objects ( micro-polygons in a grid, eye-space )

    *No need of perspective correction for textures ( speeds up Geometric Transform )... on PSOne titles where perspective correction was not available they solved the problem by subdividing geometry more finely thus reducing the texture warping effect... it would be interesting to take a game like BG: DA or other highly detailed next-generation games and find a way to disable perspective correction for textures... the texture warping would not be as bad in average as it was on PSOne and Saturn...

    *higher texture locality: a great deal of texture trashing is avoided as Shading is done in Object order...

    *easier clipping
     
  4. megadrive0088

    Regular

    Joined:
    Jul 23, 2002
    Messages:
    700
    Likes Received:
    0
    I am wondering about that article from Mercury News about the PS3 CELL.

    it said 72 processors. (8 PPCs and 64 APUs)

    that would mean 8 PEs instead of 4, for the Broadband Engine/ PS3 CPU.

    do you think they made a mistake?
     
  5. megadrive0088

    Regular

    Joined:
    Jul 23, 2002
    Messages:
    700
    Likes Received:
    0

    yeah, regular GSCube has floating point performance of 97.5 GFLOPs
    ( 6.2 GFLOPS x 16) perhaps some overhead, or wasted FP power, or the EEs are clocked slightly lower than PS2 EEs, because 6.2 GFLOPs x 16 = 99.2 GFLOPs. but whatever, 97.5 or 99.2 GFLOPs for GSCube vs the 1TFLOPs CPU of PS3.

    The GSCube has much more main memory (128MB x 16 = 2048 MB) and eDRAM video memory (32MB on GS x 16 = 512 MB eDRAM)

    plus absolutely enormous raw fillrate. it's listed as 37.7 billion pixels: 16 GS x 16 pixel pipes x Mhz (144~150 Mhz)


    official GSCube specs:

    GScube's memory bus bandwidth is 50.3 Gbytes/second, and it has a floating-point performance of 97.5 gigaflops and 3-D CG geometric transformation of 1.04 gigapolygons/s. It has 512 Mbytes of video RAM and a VRAM bandwidth of 755 Gbytes/s. The pixel fill rate is 37.7 Gbytes/s (no that has to be GPixels!) and the polygon drawing rate is 1.2 gigapolygons/s.


    that is regular GSCube with 16 PS2 chipsets and more memory per chipset. the GSCube with 64 PS2 chipsets is 4x more in every area but dont know if that version made it out.



    back to regular GSCube.... the main bandwidth is 50+ GB/sec - PS3 could match that if it has 4 channels of XDR. the GSCube's eDRAM bandwidth is 755 GB/sec. it'll be interesting to see what the eDRAM bandwidth is for PS3's CPU and GPU. hopefully in the hundreds of GB/sec.

    PS3 should crush GSCube in geometry processing. GSCube only does 1.2 billion raw polys/sec. PS3 should be in the billions.
     
  6. V3

    V3
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    3,304
    Likes Received:
    5
    :wink: 1-2 TFLOPS is alot, but not plenty.

    Yes, using occlusion culling will give you speed up.

    Again, this is image fidelity things, if PS3 can render those Dinosours from movie like Disney's Dinosour, than displacement mapping of that quality is needed. Anything less than that, OGL displacement mapping will most likely suffice.

    For the reasonable expectation of image quality, that we are going to get, going with vertex and fragment shading should give better performance.


    For PS3 with limited memory, it will probably used the bucket system, if REYES got any chance of working there.

    OGL pipe are quite simple already. But using REYES pipeline will give movie production an advantage, when they can reused their stuff from the movie to make game based on it. But with PS3 at 1-2 TFLOPS and limited memory, most likely they have some reworking to do.


    vertex and fragment shaders are vectorisable too.

    At a cost of slicing and dicing.

    Yes, that's desirable too.

    Clipping against the camera is more troublesome, than typical OGL clipping.
     
  7. Tsmit42

    Newcomer

    Joined:
    Jun 2, 2003
    Messages:
    136
    Likes Received:
    0
    Hmm I though ps2 had 32MB main memory, which would be 512MB(32*16). Also it should be 4 * 16 for the video memory, which is 64MB, which is the same as 512 Mbytes that is state later in your post.


    Again you are trying to change bits into bytes, 755gigabits is just under 100GB/sec, which is very possible for ps3 to achieve.
     
  8. archie4oz

    archie4oz ea_spouse is H4WT!
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    1,608
    Likes Received:
    30
    Location:
    53:4F:4E:59
    He's talking about the GScube though, not a bunch of parallel PS2s...
     
  9. qwerty2000

    Newcomer

    Joined:
    May 24, 2003
    Messages:
    149
    Likes Received:
    0
    Location:
    New Jersey
    I am not going to be in topic but if the ps3 has 256 of main ram it will be a big mistake. It'sgoing to be like 32 mb inthe year 2005/6. Sony should really think twice
     
  10. Tsmit42

    Newcomer

    Joined:
    Jun 2, 2003
    Messages:
    136
    Likes Received:
    0
    Ah I see, I got confused with I seen MBytes written out, just glanced and though it said Mbits.

    Ah, so the GSCube is not 16 PS2's in a box, it has much more memory per EE and GS?
     
  11. Panajev2001a

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,187
    Likes Received:
    8
    V3,

    Remember that Open Source REYES renderer you gave me the link of ?

    Well before putting it on PlayStation 2 Linux, I downloaded on my Red HAT box at work ( EV56 400 MHz and 256 MB of RAM, Red HAT 7.2 )...

    It was a bit of a pain to compile..

    The author had set in all the makefiles the home directory as <user>/src/reyes, the bool.h file was missing, define statement lacked, it could not link -lg++ ( I was able to "succesfully" compile skipping it ), &lt;string.h> typed as &lt;String.h>...

    What made it all worse is that the program Seg Faults and the author e-mail is not working anymore :(
     
  12. V3

    V3
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    3,304
    Likes Received:
    5
    Panajev2001a,

    Yeah, that site is old, Its on my bookmark, I was suprised its still up.

    There is another one, real-time one using OGL, its probably on my bookmark but my bookmark is unorganised and had like several thousands entries, can't find it :oops:

    Maybe if you google around, the site might still be up.

    Tsmit42,

    Yeah, its different. If you want parallel PS2s, there was another article on it posted on this board too, look several pages back.
     
  13. Squeak

    Veteran

    Joined:
    Jul 13, 2002
    Messages:
    1,262
    Likes Received:
    32
    Location:
    Denmark
    Is Reyes style rendering on PS2 even a good idea? When the GS fills triangles in untextured mode, it does so in a 4 * 4 pattern. Wouldn’t only filling one pixel or less mean that it only used a fraction of it potential fillrate?
    I seem to recall that the optimal size for a PS2 triangle is 32 pixels?

    Is it even technically possible to use something near the full fillrate when rendering untextured or textured geometry on ps2?

    Do other architectures like PC/xbox or Gamecube have similar problems with attaining their full fillrate, due to similar “nigglesâ€￾ in the actual filling process?
     
  14. DeanoC

    DeanoC Trust me, I'm a renderer person!
    Veteran Subscriber

    Joined:
    Feb 6, 2003
    Messages:
    1,469
    Likes Received:
    185
    Location:
    Viking lands
    AFAIK most architectures would lose ALOT of fill rate if used for a Reyes style renderer. The all use 2x2 or larger patterns, with any pixel not covered by the current triangle being wasted.
     
  15. megadrive0088

    Regular

    Joined:
    Jul 23, 2002
    Messages:
    700
    Likes Received:
    0
    PS2 has 32 MB main memory and 4 MB on GS

    however

    GSCube has 128 MB main memory per PS2 chipset and 32 MB eDRAM on each GS

    so GSCube has:

    128 MB x 16 = 2048 MegaBytes or 2 GigaBytes main memory
    32 MB eDRAM x 16 = 512 MegaBytes eDRAM graphics memory
     
  16. megadrive0088

    Regular

    Joined:
    Jul 23, 2002
    Messages:
    700
    Likes Received:
    0
    but actually we are talking about GSCube here, and it is 755 GigaBytes per second of video memory bandwidth. not 755 gigabits. it comes from PS2's 48GB/sec eDRAM bandwidth on GS x 16 GS's on GSCube.

    GSCube GS's have roughly the same bandwidth as PS2 GS but it also has 32 MB eDRAM instead of 4 MB eDRAM.

    PS3's eDRAM bandwidth could very well be in the 100s of GigaBytes per second if the main memory bandwidth is 25.6 GB/sec.

    look at PS2. it's main memory bandwidth is only 3.2 GB/sec but it's video memory bandwidth is 48 GB/sec - that's 15x greater. if the same ratio applied to PS3 it would have 384 GB of graphics memory bandwidth (25.6 x 15 ) and don't forget PS3 will have 2 pools of eDRAM. the GPU and the CPU. so PS3 will likely have two processors each with hundreds of GB/sec of eDRAM bandwidth.
     
  17. Tsmit42

    Newcomer

    Joined:
    Jun 2, 2003
    Messages:
    136
    Likes Received:
    0

    I understand, but you originally posted 755gigabits before you made the edit to your post.
     
  18. qwerty2000

    Newcomer

    Joined:
    May 24, 2003
    Messages:
    149
    Likes Received:
    0
    Location:
    New Jersey
    heres what I think the specs will be for the PS3.... the wording is kept as close as I can make it to how they would be listed in a real specs list:


    theoretical polygon rate of 15 billion per second.....

    ACTUAL draw rate will be about 10-25% of that figure so we are looking at an actual rate of 1.5-4b per second (polys)... (which at 60fps gives you over 11 million on each and every frame !!

    pixel fill rate will be about 60 - 80 gig (60 - 80,000,000 pixels a second).... the texel fill rate will be about the same as pixel fil.

    memory band width... I think will be about between 75 - 150 gigbytes (sorry for the range on that (depends if its a 512bit or 1024bit bus) again there is a formula to calculate that I can give if required.

    the core processor will run at 4ghtz.... (totalling 1 teraflop of TOTAL through put, broken down in to around 300gflops of complex/spagetti code as its known...(supplied by the 4 main processor cores on the CELL)and 700 gflops of simple floating point arithmetic (supplied by the fp co processors along side the four main cores, and are also on the CELL)

    the graphics chip will run at about a half of that so 2ghtz.....
    Will be built on .065nm fabrication technology....

    I think the core processor will comprise of 250million transistors....
    the graphics chip will be about 400 million transistors....

    I think the machine will have 512mb of memory.

    the render precision will be at 128 bit

    As for what the graphics quality and output will be..... well all I can think is 'toy story to warcraft 3 fmv's maybe better in real time' and thats not just marketing talk....



    what do you guys think?
     
  19. Paul

    Veteran

    Joined:
    Feb 22, 2003
    Messages:
    1,974
    Likes Received:
    1
    Location:
    United States
    This was my prediction also..

    Way too high.

    Internal or external memory? If external, I am unsure. Current predictions are 25gb/s.

    3-4 is my guess.

    Way way too high, expect 800mhz - 1Ghz.

    I too, think this will be the external memory.

    I expect Final fantasy the movie type tech demo's, with the actual graphics being that of final fantasy X quality CGI.
     
  20. notAFanB

    Veteran

    Joined:
    Jun 5, 2003
    Messages:
    1,165
    Likes Received:
    1
    woa don;t you think (FFTSW) tech demos is putting the bar in the stratospere? something resembling maybe.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...