Building a ML Transformer in a Spreadsheet

Поделиться
HTML-код
  • Опубликовано: 6 ноя 2024

Комментарии • 21

  • @maravilhasdobrasil4498
    @maravilhasdobrasil4498 2 года назад +6

    Please, don't stop making videos. You are a legend!

  • @varunshenoy532
    @varunshenoy532 Год назад

    After reading so many blogs, papers which explained transformers, this video helped me understand the intuition behind the Query/Key and Value matrix.

  • @yraion0
    @yraion0 Год назад +1

    The best video that explain in details how the "magics" works. This is an amazing video!!!!!!

  • @1973vider1973
    @1973vider1973 Год назад +1

    Amazing explanation of how attention magic works!! Thanks a lot

  • @nguyenanhnguyen7658
    @nguyenanhnguyen7658 Год назад

    My Lord, so cool and genius !

  • @planet403
    @planet403 Год назад

    Superb video

  •  2 года назад +1

    This is fantastic video!
    (It made me think we need spreadsheets with autodiff. :D)

  • @queasybeetle
    @queasybeetle 2 года назад +2

    Excel is a deep learning cheat code

  • @3472215
    @3472215 2 года назад

    Thank you. Great work ♥️

  • @twitterpaited
    @twitterpaited 2 года назад

    Beautiful

  • @hamletrex
    @hamletrex 2 года назад

    thank you for this gems!

  • @KarthikNaga329
    @KarthikNaga329 2 года назад

    Great stuff!
    Do you know why those values were used for the positional encoding? instead of 0,1,2,3,45? Or can any values be used?
    I assume you wanted the last position to have 111, but does it have to be 111?

    • @ConceptsIlluminated
      @ConceptsIlluminated  2 года назад +3

      Yes, I picked those values in particular (0, 1, 3, 4, 6, 7) because it was easy to construct query and key matrices that would make the first query row up with the last key row, the second query row line up with the second to last key row and so on; This was easy for me to find because it just involved multiplying the query weights by -1.
      If I did 0, 1, 2, 3, 4, 5, I would have needed to find some other weights that made the 0 query row line up closest with the 5 key row and the 2 query row line up with the 4 row and this would have been tougher.
      You could probably use a single positional encoding column that ranges from -1.0 (beginning of the string) to 1.0 (end of the string). Then, you would have the same property as my 3 binary bits, where you can multiply the query weights by -1 to most strongly match with a key of opposite position.

  • @dezl
    @dezl Год назад

    Thanks for explaining this in a understandable way. I don't get how this then is used to process multiple words (am I missing something)

    • @ConceptsIlluminated
      @ConceptsIlluminated  Год назад +1

      The toy example in this video mapped the ASCII characters to numbers. In practice, words (or subwords) are turned into numbers in a process called embedding:
      - medium.com/deeper-learning/glossary-of-deep-learning-word-embedding-f90c3cec34ca
      - blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
      - dugas.ch/artificial_curiosity/GPT_architecture.html
      So there's a pre-processing step where usually the sentence or sentences (a list of words) are turned into their embeddings (a list of numbers) as they are fed into the transformer. The transformer does a bunch of math and outputs embeddings (list of numbers), which are converted back into a list of words.

  • @bshako64
    @bshako64 2 года назад +2

    I'm very confused by the claim that you cannot find a matrix to swap the order of rows at 17:50. How is the answer not just a standard permutation matrix? en.wikipedia.org/wiki/Permutation_matrix

    • @ConceptsIlluminated
      @ConceptsIlluminated  2 года назад

      As far as I understand them, permutation matrices multiplied on the right swap columns around, not rows. For example, if you put 1s on the minor diagonal of your permutation matrix, it mirrors the result across the vertical axis because it swaps the first and last column, the second and second-to-last column. I could not find a permutation matrix (or any matrix) that achieved the mirroring across the horizontal axis, i.e. that swapped rows.
      I could still be wrong though about such a matrix not existing (when multiplied on the right). It was the boldest claim in the whole video for which I did not have a great citation/proof for.

    • @bshako64
      @bshako64 2 года назад +3

      ​@@ConceptsIlluminated A function f is linear if f(x + y) = f(x) + f(y) and f(c x) = c f(x) for all x, y in domain and scalar c. It's clear that the function you describe in the video is linear by this definition.

    • @ConceptsIlluminated
      @ConceptsIlluminated  2 года назад

      I think you are correct that swapping rows is linear, given that definition, and I was incorrect in saying it was nonlinear.
      I still have been unable to find a matrix R that transposes the rows of matrix M when multiplied on the right, a la M x R. Assuming no such matrix exists, a more accurate phrasing should have been "The self-attention mechanism lets us do transformations that are impossible with just a single matrix multiplication step", yeah?

    • @ethanryu4170
      @ethanryu4170 2 года назад +1

      I actually had the same question. BTW, transpose can also be a linear transformation if combined with reshapes.

    • @bshako64
      @bshako64 2 года назад +1

      @@ConceptsIlluminated I think it can be done with a regular left matrix multiplication, ie, z = Px.