Return of Cell for RayTracing? *split*

Discussion in 'Console Technology' started by Arkham night 2, Aug 31, 2018.

  1. HBRU

    Regular Newcomer

    Joined:
    Apr 6, 2017
    Messages:
    391
    Likes Received:
    40
    And now strong parallel CPU calc power seems good... small CPUs, many, wide RAM bus, special CPUs functions for RT... Yes :)
     
  2. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Alpha Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    12,406
    Likes Received:
    8,609
    Location:
    Cleveland
    Because they were directly part of the natural flow of how the discussion turned to cell.
     
  3. Skaal

    Newcomer

    Joined:
    Oct 16, 2015
    Messages:
    19
    Likes Received:
    10
    i was not suggesting Cell should play the part of a GPU in the next PS5. I was proposing to use a individual Cell as an addition and not part of the APU. I know that the Cell is not good enough as a GPU but as an dedicated RT Chip he could do wonders.


    but would that prevent a more modern Cell to be a part of a next Gen Console? The paper itself suggests that there is need for other rasterisation solutions .
     
  4. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,564
    Likes Received:
    1,981
    I know what you were saying. And no, it couldn't. Hardware designed to do RT and only RT would absolutely destroy it in performance, power usage and would take up much less space on the die. You are effectively arguing that since Cell was better at video decoding back in 2005 than a typical CPU at the time that adding Cell to modern GPUs to act as the VPU would do wonders for their ability to decode video. The dedicated fixed-function blocks in GPUs responsible for video decoding are much more performant and use up much less space and power then any evolution of Cell ever could and it's exactly the same thing with dedicated RT hardware.
     
    dobwal, Lalaland and BRiT like this.
  5. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,607
    Likes Received:
    11,036
    Location:
    Under my bridge
    As we don't know how the blocks in the GPU are facilitating raytracing, it's hard to compare. The flexibility of a raytracing processor rather than a memory-access block thing may also be better in supporting different algorithms, although GPUs are now very versatile in compute. Cell could also see a RT acceleration structure added in place of a SPE - the original vision allowed for specialised heterogeneous blocks to be added.

    At this point, all we can do for Cell, and moreso a Cell2 with suitable enhancements, is speculate performance because it hasn't seen the investment that GPU based RT has so existing examples like the IBM demo. AFAICS, the bottleneck in RT is mostly memory search and access, although complex shaders can add a significant per-ray computing requirements. I definitely think there's potential in a MIMD versus SIMD solution though. The code would be quite different and operate differently, making it very hard to compare without a test case.

    It'd be nice if @AlexV could weigh in, given a little experience in such matters. ;)

    Doing some research on PVR, all this noise nVidia is getting is ridiculous. Power are/were soooo far ahead but they've been overlooked because they are a mobile chipset used in a limited number of devices.


    I wonder what a monster-sized Power GPU with raytracing would look like?
     
    Heinrich4, eloyc and milk like this.
  6. Lalaland

    Regular

    Joined:
    Feb 24, 2013
    Messages:
    596
    Likes Received:
    265
    Well the reason PVR keep getting overlooked is that Tile Based Deferred Rendering is really good at certain things but would be rubbish at most of the things traditional raster based cards are good at. Or at least that was the reason posited back when I was lamenting the lack of coloured lighting in Quake 2 on my PVR1, now I'm sure TBDR stuff has come a long way since then but I have to assume there is a good reason no-one is licensing PVR tech to make a GPU today even if it is just a lack of Windows driver support these days. I thought I read a while back that recent NV cards do some TBDR in part of their pipeline these days anyway?
     
    Heinrich4 likes this.
  7. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    16,063
    Likes Received:
    5,016
    Yes, unfortunately PVR was out of the desktop space at that time and had been for many years. Had they still been involved in desktop PC computing they may have been able to make some headway. But being as they were basically all in on mobile at the time, what mobile developer is going to risk resources on RT for a chip that may or may not get picked up for a mobile SOC? A discrete PC solution isn't reliant on a SOC manufacturer picking it to use in a commercial product.

    It's just really unfortunate that they were basically in the wrong hardware space to get something like that pushed into real use.

    While the pick up in the desktop space (not just gamers, but research and science as well) may still have been slower than what is happening with NV's Turing chip due to being far smaller than NV in terms of size and especially marketing and developer relations dollars, it would have seen far greater adoption than it did with them being seen as a mobile graphics provider.

    Regards,
    SB
     
  8. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,564
    Likes Received:
    1,981
    Essentially I'm looking at the precedent set by every compute workload I'm aware of. You would expect that when designing hardware to accomplish a specific task (and only that task) that that hardware would be set up to do that task in the optimal way. I just don't see how you could ever expect to get better performance per watt/area/$$$ from a more flexible design. I'm not arguing that you couldn't come up with some evolution or configuration of Cell that wouldn't exceed the performance of the RT cores + the CUDA cores + Tensor cores setup in the RTX cards in raytracing given similar budgets for area and power. I'm arguing that using Cell for raytracing would use up more of the overall available power/area allocated for the chip than the RT + Tensor cores do in Turing, for example, and the only way to accommodate this would be to cut back on the CUDA cores and compromise your rasterization performance. That's not tenable in the near future. You could build a design that was better at raytracing than Turing (or something similar), but I think it's very questionable that you could build a design that was both better at raytracing and not also deficient in rasterization performance.
     
  9. Mobius1aic

    Mobius1aic Quo vadis?
    Veteran

    Joined:
    Oct 30, 2007
    Messages:
    1,649
    Likes Received:
    244
    Well it's too bad Apple has ditched them for their own graphics IP. There could've been some legit merit in Apple using PVR's embedded RT tech in specialized workstations and even in the iPad Pro for design professionals doing their work and showing it off to clients in realtime.

    Now that Imagination is defacto owned by the Chinese, there is now a new inroad for the company to produce GPUs for China's massive market for all segments. I sincerely believe dedicated GPUs could be part of that, even if they are in the lower end range. Affordable RT hardware would be a boon to a market that at the lower end can't drop oodles of cash on RTX based Quadros for real time ray tracing.

    Seems like a chance for PVR to really undercut into segments RTX probably won't cover for two to three generations.
     
    #49 Mobius1aic, Sep 3, 2018
    Last edited: Sep 3, 2018
    Heinrich4 and Silent_Buddha like this.
  10. DSoup

    DSoup meh
    Legend Veteran Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    10,922
    Likes Received:
    5,723
    Location:
    London, UK
    There were a bunch of attempts to kick-start parallel solutions to computing problems back in the 1980s. A few bespoke applications found uses for a small arrays of DSPs and Atari tried selling the world on fire with the Atari Transputer Workstation. Super, mainframes and mini computers were already heavily parallel.

    Yup but multi-core/multi-threaded solutions likely wouldn't have found adopted had Intel been able to keep cranking up clock speeds. The fact they couldn't forced the issue. PowerPC was already woking towards before Intel really began to hit a ceiling - IBM knew the technical barrier was coming from their big iron business.
     
  11. milk

    Veteran Regular

    Joined:
    Jun 6, 2012
    Messages:
    2,952
    Likes Received:
    2,514
    In part the story of PowerVR Ray Tracing is about Imagination being in the wrong market to see it adopted, but from a different perspective, it's about Caustics, the company that actually developed that tech, having been aquired by imagination and not AMD not Nvidia.
     
    #51 milk, Sep 3, 2018
    Last edited: Sep 3, 2018
  12. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    Rasterization is still very well suited to the width and burst-friendly cache and DRAM architectures, which Nvidia's research still seems to encourage using for the majority of what winds up on-screen. Cell's big assertion was that much of what had been put into CPU architecture in the last decade or so could be discarded, so in that regard wouldn't requiring a lot of investment be a sort of defeat? The reasons for why the individual elements Cell removed were so popular were very compelling ones, and the bet that standard architectures had run out of steam was proven wrong.

    The paper indicated they got the best results using Cell in a way that they believed its design didn't necessarily intend.
    The base idea of master PPE controlling SPEs was thrown out, and most of the task-based work distributors in the PS3 era came to similar conclusions.
    The idea of having SPEs adopt individual kernels and pass data between themselves over the EIB was thrown out in favor of running the same overall ray-tracing pipeline on each one--which was something the small LS was not optimal for.
    Various parts of the process benefited from having caches, so much so that a sub-optimal software cache still did better than hewing to the SPE's target workflow.
    The long pipeline and bad branch penalties required extra work to get near the utilization that either a branch predictor like a CPU or a hold until resolve like a GPU could do--although in either case that's more hardware in terms of the CPU or having more context like a GPU.
    The paper noted the SPE's SIMD ISA was generous for the time, although poorer on scalar elements.
    Multi-threading of the SPE by software was mooted, something the SPE's oversized context and heavy context switch penalties were not targeting.

    To top it off, the realities of getting to 4 GHz and above were such that perhaps the long pipeline as it was could have shrunk on future nodes with limited impact on realized clocks.

    Is an architecture with threading, branch handling, caches, more modest pipeline, and other generalist features really Cell?

    If Intel could, then so would everyone else and the 10-15 GHz processors would have been as good or better than the dual and quad cores that we got instead. Multi-core is an inferior method for scaling performance, but one that remained physically possible.
     
    Shifty Geezer and mrcorbo like this.
  13. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,607
    Likes Received:
    11,036
    Location:
    Under my bridge
    Yes. The concept was small cores, lots of 'em, and heterogeneous. The reason useful stuff was stripped out for Cell 1 was to make them small enough to get more on a die. At smaller lithographies, more can be invested per core to make it better based on how people actually need to use it, while still providing a huge CPU core count of 60+ on a die.
     
  14. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,564
    Likes Received:
    1,981
    The track record for success producing things that are 1/2 way between a traditional multi-core CPU architecture and the GPU paradigm is spotty at best. Xeon Phi has it's niche, buy it's not exactly setting the world on fire.
     
    Silent_Buddha, Lalaland and milk like this.
  15. DSoup

    DSoup meh
    Legend Veteran Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    10,922
    Likes Received:
    5,723
    Location:
    London, UK
    Intel can't because 80x86 is heavily laden with legacy requirements. Nobody researching quantum computer applications is working with hardware anywhere near as slow [in terms of clock frequencies] as anything Intel sell commercially. :nope: Cooling remains a challenge, but not impossibly so. :nope:
     
  16. turkey

    Regular Newcomer

    Joined:
    Oct 21, 2014
    Messages:
    733
    Likes Received:
    424
    Isn't that larrabee? (Failed, scrapped or not)
    Lots of x86 cores but stripped back and dropping legacy baggage where possible as it's not a general processor having to run code from decades ago.

    InterIntergly that was shown ray tracing when it was actually something being talked about.
     
    milk likes this.
  17. DSoup

    DSoup meh
    Legend Veteran Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    10,922
    Likes Received:
    5,723
    Location:
    London, UK
    :nope:
     
    turkey likes this.
  18. turkey

    Regular Newcomer

    Joined:
    Oct 21, 2014
    Messages:
    733
    Likes Received:
    424
    80x86 is not 80 time x86 cores? Ohh it's 8086 et al chip family...

    I'll bow out now as it's already over my head, but all this cell talk got me thinking about what Intel might bring to their gpu given its a clean slate design in the dx12 compute era.
    This must have been a possibility, and something they experimented with before.
     
  19. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    16,063
    Likes Received:
    5,016
    Ah the talk of high clock speed reminds me of the good old days (1992+) of Digital Equipment Corporation's (DEC) Alpha CPU using RISC (https://www.extremetech.com/computing/73096-64bit-cpus-alpha-sparc-mips-and-power and https://en.wikipedia.org/wiki/DEC_Alpha ). The king of high clock speed that outside of special cases failed to deliver comparable performance to much slower Intel architected CISC CPUs. IE - it was really good at very specific things but rubbish as a general purpose CPU.

    I wanted one quite badly back then but couldn't justify the cost even after Microsoft incorporated support for it into a version of Windows NT. :(

    Regards,
    SB
     
    BRiT likes this.
  20. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,607
    Likes Received:
    11,036
    Location:
    Under my bridge
    That's true, but that doesn't prove it'll never be the best move in future.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...