Nvidia Pascal Speculation Thread

Discussion in 'Architecture and Products' started by DSC, Mar 25, 2014.

Tags:
Thread Status:
Not open for further replies.
  1. entity279

    Veteran Regular Subscriber

    Joined:
    May 12, 2008
    Messages:
    1,229
    Likes Received:
    422
    Location:
    Romania
    Unlikely this will be true in for the next decade.

    The current solution is to querry the cloud for any such characteristic while providing the GPS coordinates (in an annonimized way). (I'm reluctantly working in this automotive industry, not directly involved with driving assissts and automations, though)
     
  2. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    Why then is massive computing power needed in the first place?
     
  3. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,929
    Likes Received:
    1,626
    I think the systems would need to adapt somewhat based on specific country/geography characteristics, eg. right-hand/left-hand driving, environmental disasters (earthquakes, floods, etc...). I would think continual learning and adapting would be a prime requirement.
     
  4. entity279

    Veteran Regular Subscriber

    Joined:
    May 12, 2008
    Messages:
    1,229
    Likes Received:
    422
    Location:
    Romania
    I'm clueless and curious as well. Perhaps the training of the classifier is done offline (cloud) and the resulted weights and their updates are downloaded to the car. If the sensory imput data is complex (might be, IMO), then simply taking a decision based on the input might be somewhat computationally expensive (and surely the usual sub Ghz ECUs tipically used in cars can't handle this. There are DSPs employed for these tasks in ADAS field so this is nV's competition)
     
  5. Nakai

    Newcomer

    Joined:
    Nov 30, 2006
    Messages:
    46
    Likes Received:
    10
    There are plenty deep learning and machine learning algorithms for different use cases. Popular approaches are SLAM, HOG and CNNs.

    SLAM: Location and orientation in 3D space.
    HOG: Feature distinction. Feature is present or not.
    CNN: Recognition and differentiation of multiple features (different traffic signs etc.)

    NVidia ist focussing on CNNs. CNN (Convolutional Neural Network) are prominent for a couple of years, as they achieve super-human image recognition rate. CNNs are composed of multiple layers, usually there convolutional layers (CL), (max) pooling layers (PL) and fully-connected layers (FL). CLs are used to extract certain features out of the underlying image. For each feature there exists a corresponding CL, where each neuron within an CL shares the same weights and bias. For example, if the input image is 28x28 (MNIST) and if you want to investigate features with the size of 5x5, the final CL has a size of 23x23 and each neuron in the CL has 5x5 (25) connections. So each neuron in a CL is investigating an area of 5x5 pixels. After a CL there is always a PL, which are used to raze noise and other irregularities.
    You have to use multiple parallel CLs and PLs, as you always want to search for more features, as well there are subsequent CLs and PLs, in order to extract higher level features. After some cascading CLs and PLs, there are usually FLs, at least one. These are used to ghather the extracted features and data and map them to certain identification outputs. If you want to recognize single digits (0-9) you have 10 outputs in your output layer.

    You need much processing power, in order to examine a video stream for certain objects. Still, the execution is not that dramatic, performance-wise. The training algorithms (gradient-descent, PSO) consume lots of processing power and a very iterative. Usually data sets consisting of millions of data files are used to train an artificial neural network. For simpler tasks, like MNIST and handwritten digit, the public training data set consists of 60,000 input images and their corresponding labels.

    1) Well, and how do you handle numeric problems, like overflows and underflows? I've implemented an small OpenCL lib for training and execution and a concept to execute ANNs on an FPGA.
    I've used my lib to train with fix-point numbers in order to transmigrate my networks onto my FPGA. I was using 16 bit fix-point numbers with the Q6,10 [-32,32] number format and encountered many overflows and underflows, for addition and multiplication. Two 16 bit numbers can yield a 17 bit number for addition and a 32 bit number for multiplication. In order to handle those errors, I truncated my value range. If I encountered an overflow, I set the value to max, and vice versa for underflow.

    2) Which activation function did you use? Most likely ReLU. So you reduced the final 32 bit value to a 8 bit output (signed)? Were there any problems?
     
    CarstenS and Jawed like this.
  6. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    372
    Likes Received:
    309
  7. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    773
    Likes Received:
    200
    The graphic style looks like something from ASCII.jp. If so then without additional information I unfortunately have doubts about its reliability.

    That being said… I'm not surprised that a big chip is coming first. NVIDIA's last fast DP chip was GK210 so I would guess that releasing a fast DP Pascal is a high priority for NVIDIA. I've been hoping for a while that a fast DP Pascal is introduced at GTC 2016, but given the recent rumors and signs, I suspect that if such an introduction happens at all then it may be a Tesla K20-like announcement with the actual release many months later.
     
  8. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,496
    Likes Received:
    910
    Yeah this seems purely speculative, and April sounds a bit optimistic, especially if the big chip is to come first.
     
  9. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    372
    Likes Received:
    309
    We already know long time ago that big Pascal will come first. Nvidia has NOAA to equip this year:
    http://www.anandtech.com/show/9791/...ation-to-build-tesla-weather-research-cluster
    They can afford abysmal yields in these Tesla SKUs like they did with Fermi for Oak Ridge Titan supercomputer
     
    ImSpartacus and pharma like this.
  10. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    Thank you for that explanation - the whole one, though I'm focussing just on the quoted part here. It was stated in this thread, that you would need deterministic neural networks with predictable results for use in automated driving. This would mean that most of the training should be completed in factory-delivered cars already (i.e. offline) - and the question was, whether or not the live AutoM8s would still require massive computing power, when their Neural Networks have already been pre-trained.
     
  11. McHuj

    Veteran Regular Subscriber

    Joined:
    Jul 1, 2005
    Messages:
    1,432
    Likes Received:
    553
    Location:
    Texas
    GP100 with only 4K shaders? With a 2X density increase, we're only getting 33% more shaders? Maybe we're getting a much higher clock as well.
     
  12. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    499
    Likes Received:
    177
    GM200 is a 600mm2 chip. Perhaps Nvidia doesn't want to be so aggressive for their first 16FF process. Also,I expect a GP100 shader is bigger than a GM200 shader, since it has DP and hopefully some other features.
     
  13. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,968
    Likes Received:
    4,563
    GDDR5X will only go into mass production in late Q2/early Q3.
    Those rumors can't be correct since they say GP104 would release at the same time (or even before?) GDDR5X enters mass production, while using that same memory.

    I wouldn't expect anything with GDDR5X to release before Q4 or early 2017.

    As stated before, it's unlikely that nVidia will start FinFet with a chip as large as GM200.
    For new process nodes, nVidia has tradicionally been using ~100-120mm^2 chips to try out the waters: 100mm^2 for GT216 on 40nm, 118mm^2 for GK107 on 28nm).

    Same thing with AMD, which is why their first Polaris card in the market is a laptop-oriented chip.
     
    #713 ToTTenTranz, Jan 26, 2016
    Last edited: Jan 26, 2016
  14. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,183
    Likes Received:
    1,840
    Location:
    Finland
    16nm won't probably allow you to do 600mm^2 chips. Also, (big) Pascal will have proper FP64 performance which Maxwell didn't, which eats space too.
     
  15. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,183
    Likes Received:
    1,840
    Location:
    Finland
    The more recent rumors suggest at least HBM2 Pascal coming in H2, nowhere near April
     
  16. fuboi

    Newcomer

    Joined:
    Aug 6, 2011
    Messages:
    90
    Likes Received:
    46
    Were those chips process pathfinding chips? They (GPU vendors) won't be pioneers for the new node this time, right? Apple and others are handling the newborn this round, I think. Still, a 600mm2 chip seems risky.
     
  17. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    Disable a few units, and it's not more risky than a 400mm2 chip.
     
  18. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,968
    Likes Received:
    4,563
    In the case of 40nm GT216, AMD had released the 40nm RV740 3 months before (138mm^2). For 28nm, AMD had released their entire 1st-gen GCN 28nm line-up (Tahiti, Pitcairn, Cape Verde) before the GK107 was available for laptops.
    And performance/cost + performance/power goes down.
     
  19. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    499
    Likes Received:
    177
    I think the biggest constraint is 16FF wafer space.
     
  20. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    For a die that will be used in a $5000 product, the performance/cost reduction will be irrelevant. In fact, the reduced number of units will increase performance/cost due to massive improvement in yields.

    And the performance/power reduction should be small as well. See Titan X vs GTX 980 Ti.

    And those factors have nothing to do with risk, IMO.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...