Tesla Dojo

Discussion in 'Graphics and Semiconductor Industry' started by Jawed, Aug 24, 2021.

  1. OlegSH

    OlegSH Regular

    135 gigabytes per cabinet sound like way too low for something like full blown GPT-3 with 175 Billions parameters, which would translate into 326 gigabytes for the model alone with BF16 parameters. Wonder how it would be applicable for large models.
     
  2. AzBat

    AzBat Agent of the Bat Legend

    Thanks for the replies guys. I've watched a few videos too & I will go back & find a few(I remember watching Anatasi though)

    One thing I will say is that this was all announced at AI Day. It's a recruitment opportunity. I'm not surprised that things are not completely working(Q&A showed that). The whole point of the presentation was get people excited on new stuff & hopefully bring in more people to help solve their issues.

    Tommy McClain
     
  3. pcchen

    pcchen Moderator Moderator Veteran Subscriber

    That's my question too. The size of SRAM in each node is not very big (1.25MB, smaller than many CPU's cache). Of course, it's probably more appropriate to compare it to something like a SM or CU in a GPU, then it looks huge. However, I'm not sure if it has a shared external memory, as it's not mentioned in the AI Day video. If not, it looks to me that they probably intended to use the chip as an accelerator and the host system is responsible of feeding data to it continuously.
     
  4. cheapchips

    cheapchips Veteran

    I haven't watched Thunderf00t's video but I assume it's the usual snark. I can't help but find it as weird as Musk worshipers. Who doesn't take Musk's timelines or presentation taglines with a massive pinch of salt? Unless you bought Autopilot in 2019 expecting 2020 revenue, who cares.

    The computer vision/ decision making /training pipeline stuff Tesla showed off at AI day was impressive stuff. They've a pretty clear path to a useful humanoid robot as a product (which is very different to one that goes to the shops for you. Like Robotaxis, it doesn't matter that much if it never meets that goal).

    Agility Robotics are already selling a humanoid* robot that does pick and carry tasks. That's with a company of 30.

    *Caveat is that it uses backwards knees. 'Bird legs' are self stabilising over terrain changes to a degree that hominids aren't.
     
    Last edited: Sep 3, 2021
  5. OlegSH

    OlegSH Regular

    In that case other question arises on whether they will be able to build a host system where host-accelerator bandwidth would not be a bottleneck for the accelerator. Of the shelf parts would be very limited in this regard, so they need custom host HW in the case of host system.
     
  6. Jawed

    Jawed Legend

    Yes, the tiles should be viewed as accelerators - or if you prefer together they make one massive accelerator which has a single-level view of working memory.

    The presentation danced around the question of the plane. I interpret this to mean that the ExaPod is really two planes and they should be considered independent. One plane has to go to the host to deal with the other plane. Each plane is effectively a single accelerator.

    Tesla is also building a custom host. It seemed as if that involves more custom silicon.

    I think we should assume that this first machine will be scrapped pretty soon, if it is even built out to the full 10 cabinets.

    For comparison, does anyone know about Tesla's version 1 and version 2 inference chips that go in their cars? All cars are currently using the version 3 chip it seems. Were the previous two designs deployed in cars?

    It's notable that no one is laughing at Tesla's current inference chip running in cars. What competing hardware is there? If we want to talk about the prospects for Dojo then it seems a comparison with the inference chip would be instructive.

    Software is clearly another problem. SpaceX's rockets are so nice because of their software...
     
  7. Jawed

    Jawed Legend

    HW3 is the version 3 chip I was referring to before. 2019 is when it became public:

    Tesla's New HW3 Self-Driving Computer — It's A Beast (CleanTechnica Deep Dive) | CleanTechnica

    It was the first chip for this function that Tesla designed. The prior chip, referred to as "HW2", was by NVidia.

    According to this article:

    Tesla Autopilot Mystery Solved — HW3 Full Potential Soon To Be Unlocked | CleanTechnica

    Tesla initially was "emulating" HW2 on HW3 in order to be functional. I suspect "emulating" is the wrong word. But we're never going to know the details.
     
    BRiT and Lightman like this.
  8. pcchen

    pcchen Moderator Moderator Veteran Subscriber

    Inference is one thing, but Tesla Dojo is obviously designed for training, not inference.
    Of course, I don't doubt that they know what they are doing. This is not something you go blindly without at least some data to back up your design decisions (although, there are unfortunately some precedents by some very famous CPU companies which looks good on paper but turned out to be pretty bad in practice). However, it's likely that Tesla has their own design goals which may or may not be compatible with other people's requirements.
     
    iroboto and pharma like this.
  9. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■) Moderator Legend Alpha

  10. Jawed

    Jawed Legend

    Lightman likes this.
Loading...

Share This Page

Loading...