Tesla Dojo

Discussion in 'Graphics and Semiconductor Industry' started by Jawed, Aug 24, 2021.

  1. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    801
    Likes Received:
    1,630
    135 gigabytes per cabinet sound like way too low for something like full blown GPT-3 with 175 Billions parameters, which would translate into 326 gigabytes for the model alone with BF16 parameters. Wonder how it would be applicable for large models.
     
  2. AzBat

    AzBat Agent of the Bat
    Legend

    Joined:
    Apr 1, 2002
    Messages:
    7,747
    Likes Received:
    4,845
    Location:
    Alma, AR
    Thanks for the replies guys. I've watched a few videos too & I will go back & find a few(I remember watching Anatasi though)

    One thing I will say is that this was all announced at AI Day. It's a recruitment opportunity. I'm not surprised that things are not completely working(Q&A showed that). The whole point of the presentation was get people excited on new stuff & hopefully bring in more people to help solve their issues.

    Tommy McClain
     
  3. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,018
    Likes Received:
    582
    Location:
    Taiwan
    That's my question too. The size of SRAM in each node is not very big (1.25MB, smaller than many CPU's cache). Of course, it's probably more appropriate to compare it to something like a SM or CU in a GPU, then it looks huge. However, I'm not sure if it has a shared external memory, as it's not mentioned in the AI Day video. If not, it looks to me that they probably intended to use the chip as an accelerator and the host system is responsible of feeding data to it continuously.
     
  4. cheapchips

    Veteran

    Joined:
    Feb 23, 2013
    Messages:
    2,493
    Likes Received:
    2,665
    Location:
    UK
    I haven't watched Thunderf00t's video but I assume it's the usual snark. I can't help but find it as weird as Musk worshipers. Who doesn't take Musk's timelines or presentation taglines with a massive pinch of salt? Unless you bought Autopilot in 2019 expecting 2020 revenue, who cares.

    The computer vision/ decision making /training pipeline stuff Tesla showed off at AI day was impressive stuff. They've a pretty clear path to a useful humanoid robot as a product (which is very different to one that goes to the shops for you. Like Robotaxis, it doesn't matter that much if it never meets that goal).

    Agility Robotics are already selling a humanoid* robot that does pick and carry tasks. That's with a company of 30.

    *Caveat is that it uses backwards knees. 'Bird legs' are self stabilising over terrain changes to a degree that hominids aren't.
     
    #24 cheapchips, Sep 2, 2021
    Last edited: Sep 3, 2021
  5. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    801
    Likes Received:
    1,630
    In that case other question arises on whether they will be able to build a host system where host-accelerator bandwidth would not be a bottleneck for the accelerator. Of the shelf parts would be very limited in this regard, so they need custom host HW in the case of host system.
     
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    Yes, the tiles should be viewed as accelerators - or if you prefer together they make one massive accelerator which has a single-level view of working memory.

    The presentation danced around the question of the plane. I interpret this to mean that the ExaPod is really two planes and they should be considered independent. One plane has to go to the host to deal with the other plane. Each plane is effectively a single accelerator.

    Tesla is also building a custom host. It seemed as if that involves more custom silicon.

    I think we should assume that this first machine will be scrapped pretty soon, if it is even built out to the full 10 cabinets.

    For comparison, does anyone know about Tesla's version 1 and version 2 inference chips that go in their cars? All cars are currently using the version 3 chip it seems. Were the previous two designs deployed in cars?

    It's notable that no one is laughing at Tesla's current inference chip running in cars. What competing hardware is there? If we want to talk about the prospects for Dojo then it seems a comparison with the inference chip would be instructive.

    Software is clearly another problem. SpaceX's rockets are so nice because of their software...
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    HW3 is the version 3 chip I was referring to before. 2019 is when it became public:

    Tesla's New HW3 Self-Driving Computer — It's A Beast (CleanTechnica Deep Dive) | CleanTechnica

    It was the first chip for this function that Tesla designed. The prior chip, referred to as "HW2", was by NVidia.

    According to this article:

    Tesla Autopilot Mystery Solved — HW3 Full Potential Soon To Be Unlocked | CleanTechnica

    Tesla initially was "emulating" HW2 on HW3 in order to be functional. I suspect "emulating" is the wrong word. But we're never going to know the details.
     
    BRiT and Lightman like this.
  8. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,018
    Likes Received:
    582
    Location:
    Taiwan
    Inference is one thing, but Tesla Dojo is obviously designed for training, not inference.
    Of course, I don't doubt that they know what they are doing. This is not something you go blindly without at least some data to back up your design decisions (although, there are unfortunately some precedents by some very famous CPU companies which looks good on paper but turned out to be pretty bad in practice). However, it's likely that Tesla has their own design goals which may or may not be compatible with other people's requirements.
     
    iroboto and pharma like this.
  9. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,511
    Likes Received:
    24,410
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    Lightman likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...