AI Hardware, Explained.

Поделиться
HTML-код
  • Опубликовано: 23 июл 2024
  • In 2011, Marc Andreessen said, “software is eating the world.” And in the last year, we’ve seen a new wave of generative AI, with some apps becoming some of the most swiftly adopted software products of all time.
    In this first part of our three-part series - we explore the terminology and technology that is now the backbone of the AI models taking the world by storm. We explore what GPUs are, how they work, and the key players like Nvidia competing for chip dominance.
    Look out for the rest of our series, where we dive even deeper; covering supply and demand mechanics, where open source plays a role, and of course… how much all of this truly costs!
    Topics Covered:
    00:00 - AI terminology and technology
    03:54 - Chips, semiconductors, servers, and compute
    05:07 - CPUs and GPUs
    06:16 - Future architecture and performance
    07:12 -The hardware ecosystem
    09:20 - Software optimizations
    11:45 -What do we expect for the future?
    14:25 - Upcoming episodes on market dynamics and cost
    Resources:
    Find Guido on LinkedIn: / appenz
    Find Guido on Twitter: / appenz
    Find a16z on Twitter: / a16z
    Find a16z on LinkedIn: / a16z
    Subscribe on your favorite podcast app: a16z.simplecast.com/
    Follow our host: / stephsmithio
    Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.
  • НаукаНаука

Комментарии • 36

  • @a16z
    @a16z  11 месяцев назад +4

    For a sneak peek into part 2 and 3, they're already live on our podcast feed! Animated explainers coming soon.
    a16z.simplecast.com/

    • @cmichael981
      @cmichael981 7 месяцев назад

      doesn't look like part 2/3 are up on the podcast feed (anymore at least) - any chance those video explainers are coming out still?

  • @a16z
    @a16z  11 месяцев назад +5

    Timestamps:
    00:00 - AI terminology and technology
    03:54 - Chips, semiconductors, servers, and compute
    05:07 - CPUs vs GPUs
    06:16 - Future architecture and performance
    07:12 -The hardware ecosystem
    09:20 - Software optimizations
    11:45 -What do we expect for the future?
    14:25 - Sneak peek into the series

  • @NarsingRaoschoolknot
    @NarsingRaoschoolknot 4 месяца назад

    Well done, very clean and clear. Love your simplicity

  • @Inclinant
    @Inclinant 4 месяца назад

    In the usual case of floating-point numbers being represented at 32-bit, is this why quantization for LLM models can be so much smaller at around 4-bit for ExLlama and making it so much easier to fit models inside the lower amounts of VRAM that consumer GPUs have?
    Incredible video, interviewer ask really though provoking and relevant questions while the interviewee is extremely knowledgeable as well. It's broken down so well too!
    Also, extremely grateful to a16z for supporting the The Bloke's work in LLM quantization! High quality quantization and simplified instructions makes LLMs so much easier to use for the average joe.
    Thanks for creating this video.

    • @msclrhd
      @msclrhd Месяц назад

      It's a trade-off between accuracy and space/performance (i.e. being able to fit the model on local hardware). A 1-bit number could represent (0, 1) or (0, 0.5) as it only has 2 values. With 2 bits you can store 4 values, so you could represent (0, 1, 2, 3), signed values (-2, -1, 0, 1), float between 0 and 1 (0, 0.25, 0.50, 0.75), etc. depending on the representation. The more bits you have the better the range (minimum, maximum) of values you can store, and the precision (gap or distance) between each value.
      Ideally you want enough bits to keep the weights of the model as close to their trained values so you don't significantly alter the behaviour of the network. Generally a quantization of 6-8 offers comparable accuracy (perplexity score) with the original, and below that you get an exponential degredation in accuracy, with below 4-bits being far worse.

  • @lnebres
    @lnebres 10 месяцев назад +1

    An excellent primer for beginners in the field.

  • @TINTUHD
    @TINTUHD 11 месяцев назад +2

    Great video. Tip of the computation innovation

  • @nickvanrensburg961
    @nickvanrensburg961 10 месяцев назад

    Excellent video. Thank you and well done

  • @adithyan_ai
    @adithyan_ai 9 месяцев назад

    Incredibly useful!! Thanks.

  • @Doggieluv25
    @Doggieluv25 10 месяцев назад

    Really helpful thank you!

  • @Matrix1Gamer
    @Matrix1Gamer 5 месяцев назад

    Guido Appenzeller is speaking my language. the lithography of chips are shrinking while consuming lots of power. Parallel computing is definitely going to be widely adopted going forward. Risc-V might replace x86 architecture.

  • @jack_fischer
    @jack_fischer 11 месяцев назад +7

    The music is very distracting. Please tone down in the future

  • @LeveragedFinance
    @LeveragedFinance 11 месяцев назад

    Great job

  • @lerwenliu9263
    @lerwenliu9263 5 месяцев назад

    Love this Channel! Could we also look at the hunger for energy consumption and the impact for climate change?

  • @thirukaruna7469
    @thirukaruna7469 11 месяцев назад

    Good one, Thx.!

  • @stachowi
    @stachowi 8 месяцев назад

    This was very good

  • @AlexHirschMusic
    @AlexHirschMusic 6 месяцев назад +1

    This is highly informative and easy to understand. As an idiot, I really appreciate that a lot.

  • @kymtoobe
    @kymtoobe Месяц назад

    This is a good video.

  • @dinoscheidt
    @dinoscheidt 11 месяцев назад

    1:24 Ehm… I would like to know, what camera and lens/focal length you use to match the boom arm and background bokeh so perfectly 🤐

    • @StephSmithio
      @StephSmithio 11 месяцев назад +2

      I use the Sony a7iv camera with a Sony FE 35mm F1.4 lens! I should note that good lighting and painting the background dark does wonders though too

  • @vai47
    @vai47 10 месяцев назад

    Older Vox style animations FTW!

  • @chenellson489
    @chenellson489 9 месяцев назад

    See you at NY Tech Week

  • @shwiftymemelord261
    @shwiftymemelord261 День назад

    it would be so cool if this main speaker was a clone

  • @gracekim2863
    @gracekim2863 10 месяцев назад

    Back to School Giveaway

  • @IAMNOTRANA
    @IAMNOTRANA 11 месяцев назад +3

    No wonder nvidia don't care about consumer GPU anymore.

    • @stachowi
      @stachowi 8 месяцев назад

      Yup, cash grab

  • @antt8550
    @antt8550 10 месяцев назад

    The future

  • @LeveragedFinance
    @LeveragedFinance 11 месяцев назад +1

    Huang's law

  • @MegaVin99
    @MegaVin99 7 месяцев назад

    Thanks for video but 4 mins before getting to any details in a 15 min video?

  • @joshuatruong2001
    @joshuatruong2001 11 месяцев назад +1

    The Render network token solves this

  • @SynthoidSounds
    @SynthoidSounds 8 месяцев назад

    A slightly different way of looking at Moore's Law is not about being "dead", but rather becoming irrelevant. Quantum computing operates very differently than binary digital computation, it's irrelevant to compare these two separate domains in terms of "how many transistors" can fit into a 2D region of space, or a FOPS performance. Aside from extreme parallelism available in QC, the next stage from "here" is in optical computing, utilizing photons instead of electrons as the computational mechanism. Also, scalable analog computing ICs (for AI engines) are being developed (IBM for example) . . . Moore's Law isn't relevant in any of these.