Assembly Language vs Pytorch Lightning simple neural network training time comparison

Поделиться
HTML-код
  • Опубликовано: 10 ноя 2024

Комментарии • 22

  • @JoeBurnett
    @JoeBurnett Месяц назад +1

    Another excellent video! I wish you success with your channel!

  • @valrindelsubaan9230
    @valrindelsubaan9230 Месяц назад +1

    you have very interesting videos, subscribed

  • @maximumentropyofficial9
    @maximumentropyofficial9 29 дней назад

    Do you think it is possible to program deep learning models for a GPU using assembly language for parallel calculations? Also, why aren't companies like Google or Meta programming all deep learning algorithms in assembly to gain more computational power? thank you, inspiring video

    • @ComputingMongoose
      @ComputingMongoose  29 дней назад

      A while back I searched for NVidia assembly and I couldn't find any documentation (maybe I wasn't looking hard enough or maybe things have changed now). Anyway, assembly is linked to a specific device and it may change between family of devices. Thus at least for NVidia you are stuck with nvcc compiler. Another option would be to use the shaders (I remember seeing some projects about that) but it's still high level language. Of course, I can't speak about companies and their reasons for doing things. I think however it doesn't matter that much for a big company as they can use thousands of cpus/gpus/tpus.

    • @maximumentropyofficial9
      @maximumentropyofficial9 28 дней назад

      @@ComputingMongoose Indeed, I had forgotten about that detail-the hardware/assembly dependency for coding, which is a major obstacle to device compatibility and could render all the code obsolete as soon as a new generation comes out. Thank you for your response

  • @x-12plus60
    @x-12plus60 Месяц назад +3

    my bro great information! But that blue theme hurts lot peoples :'l

    • @ComputingMongoose
      @ComputingMongoose  Месяц назад +1

      Sorry about that! Thanks for letting me know. It's actually the default setting in Midnight Commander and I'm quite used to it. I will try to change it to some other colors (maybe black background?). But there will be a couple more videos released with this blue background as I have filmed them already and they need only editing, thumbnails and youtube stuff.

    • @RichardLofty
      @RichardLofty Месяц назад +1

      Get used to it.
      This is how computers were for most of history.

    • @ComputingMongoose
      @ComputingMongoose  Месяц назад

      @@RichardLofty Indeed! And I still enjoy these basic interfaces, yet I do understand that for younger people it may seem weird.

    • @amadzarak7746
      @amadzarak7746 Месяц назад

      Haha that’s the default for midnight commander

  • @AK-vx4dy
    @AK-vx4dy Месяц назад +2

    As PyTorch states themselfs priority is usablity first, speed second and it is not only wraper on c++ library, so i suppose part of a job is made in python, i wonder also if it is possible to set some flag "silent" because if this progress is in python it also eats much of time (on such small model) even writing to file
    Can you compare with Candle ?

    • @ComputingMongoose
      @ComputingMongoose  Месяц назад +1

      I think indeed this is due mostly to the python part. I was however expecting a bit more being done in c++. I am not familiar with Candle, but I looked at it and it does seem like a good thing to try. For the moment I'm focusing on adding more complexity to my network. I"ll release some more movies with new additions. Afterwards, more experiments will follow.

    • @AK-vx4dy
      @AK-vx4dy Месяц назад

      @@ComputingMongoose Other obvious shoot is TensorFlow but i have no idea how much work is needed to use it... 😅
      I'm waiting for more assembly adventures.

    • @ComputingMongoose
      @ComputingMongoose  Месяц назад

      @@AK-vx4dy There are indeed many frameworks out there... Anyway, more assembly stuff will be coming soon.

  • @RichardLofty
    @RichardLofty Месяц назад +1

    You MUST train a network on XOR. It's nonnlinear, and is a better benchmark.

    • @ComputingMongoose
      @ComputingMongoose  Месяц назад +1

      If you're thinking about the final result (in terms of end loss, accuracy or other metric) then I agree. But from a time perspective, it doesn't really matter since the number of epochs is fixed, thus making the same number of operations. I do intend to extend the network though and add multiple layers and solve some more complex problems.

  • @fontenbleau
    @fontenbleau Месяц назад

    So, how to scale it to run GGUF models? I'm capable to run Llama3 405billions on used server 12 Ram slots motherboards(just $150), but speed is horrendous in current tools (less than 1 token/sec on 22 cores/44 threads Xeon Cpu with best q8 quality, altrough llama models are boring by censorship, it's biggest open model today).

    • @ComputingMongoose
      @ComputingMongoose  Месяц назад

      I am working towards a network with multiple layers and with some data loading. But I don't think I will implement loading of pytorch or similar models since this is quite tedious in all assembly.

    • @fontenbleau
      @fontenbleau Месяц назад

      @@ComputingMongoose so, it can be used for training models? All current open models are based on llama-cpp which as i remember Stanford uni made from Facebook "leaked"(not really) model, kinda all architecture foundation on that.

    • @ComputingMongoose
      @ComputingMongoose  Месяц назад

      @@fontenbleau I am not working towards llama architectures, but if I continue working on it, it will likely become able to train some more advanced models. But again I am not working specifically towards llama or other specific architectures. I am also not targeting specifically NLP applications (NNs can be used for image, voice, sensor data, etc. apart from NLP). However, it would make for an interesting application to be able to perform some text analysis with the assembly language network. I will have to think what is the easiest application to implement.