Pushing Java to the Limits: Processing a Billion Rows in under 2 Seconds by ROY VAN RIJN

Поделиться
HTML-код
  • Опубликовано: 30 сен 2024
  • For updates and more, join our community 👉 / devoxx-united-kingdom
    Last January a challenge was posted online by Gunnar Morling:
    How fast can you parse a file with one billion rows of weather data using Java?
    Little did I know this deceivingly simple question would lead me down a path that taught me all about: parallelism, memory mapped files, SWAR techniques (SIMD as a register), bit twiddling, branchless code, mechanical sympathy, Graal native compilation and finally... I even turned to the dark side: using sun.misc.Unsafe.
    Join me in this deep dive where I'll explain all the code changes and tricks that took me from the reference implementation which processes the billion records in less than 4 minutes, to processing everything in under two seconds.
    Who knew Java could be this fast?

Комментарии • 22

  • @TechTalksWeekly
    @TechTalksWeekly 4 месяца назад +4

    Roy's talk has been featured in the last issue of Tech Talks Weekly newsletter 🎉 Congrats!

  • @kaqqao
    @kaqqao 3 месяца назад +10

    It's disheartening to think that in this day and age performant code still has to look this awful.
    But the talk itself is great 😃

  • @KangoV
    @KangoV 4 месяца назад +9

    We should soon have the "inline" keyword. A huge effort is being performed for this in the JVM so that, for example, an array of inline objects will use contiguous memory. When iterating through, you get huge speedups as you avoid all those cache misses (30x have been seen).

  • @Anbu_Sampath
    @Anbu_Sampath 4 месяца назад +12

    Crazy engineering effort went in this challenge.

  • @northdankota
    @northdankota 4 месяца назад +2

    I think branchless programing fast too because of the CPU cache memroy loads with bulk (from the ram to l1 l2 l3 cache), like load the hole 64 bit block the requsted data around, therefore not only requested data load, the cpu loads after the data and there is a next section of the data, this adds more optimization

  • @sadiulhakim7814
    @sadiulhakim7814 7 дней назад

    Unsafe is deprecated now

  • @bilgehan
    @bilgehan 3 месяца назад +2

    It was fun and interesting until i saw unsafe. after that it felt meaningless, empty satisfaction imho

    • @ppsps5728
      @ppsps5728 3 месяца назад +3

      Same feeling, like using inline asm in C/C++ 😂

    • @royvanrijn
      @royvanrijn 3 месяца назад

      Why though?
      You don’t *need* unsafe to do all of this, for me it was actually a fun challenge to learn and use it.

  • @HedleyLuna
    @HedleyLuna Месяц назад

    This reminds me why I fell in love with CS and CPU Architecture. Though, it was probably easier to write this in ANSI C

  • @Dragiux
    @Dragiux 4 месяца назад +1

    As far as I remember constant maths can be performed at compile time (such as static final int foo = 1+1 would be written to the file as static final int foo = 2). Could you trick the compiler to do all of this heavy lifting and only produce the binary that contained final results?

    • @ericnewton5720
      @ericnewton5720 3 месяца назад +1

      Volkswagen would love to hire you for their diesel division.

  • @JavaCodeShorts
    @JavaCodeShorts 3 месяца назад +1

    Great Talk!

  • @sufilak
    @sufilak 2 месяца назад

    It was fun, but it felt like they were inventing an assembler

  • @lugburzhr8081
    @lugburzhr8081 3 месяца назад +1

    So to process 1 billion rows in Java in 2 seconds you need to use C\C++. Great job anyway!

    • @gergonagy2733
      @gergonagy2733 3 месяца назад

      That's what I was thinking of. Every level of abstraction that Java gives is intentionally thrown away to go as close to the low level details of the hardware as possible.🤣

    • @lugburzhr8081
      @lugburzhr8081 3 месяца назад +1

      @@gergonagy2733 I agree. If someone is making such language like Java, he can't not to provide some way to access low-level memory management. But could someone show me a Java dev, that committed a code with a direct memory processing to a real-life commercial product? 2-sec projects are still Core Java, but it's not "vanilla" Java.

  • @notarealperson9709
    @notarealperson9709 4 месяца назад

    imagine using java in 2024...

    • @MrKar18
      @MrKar18 4 месяца назад

      Why not? Real persons do use it 😉😂 Pun intended.

    • @MakeItStik
      @MakeItStik 4 месяца назад

      Curious to know what you use ?

    • @kaqqao
      @kaqqao 3 месяца назад +4

      Imagine saying something that brainωormed in any year

    • @a.yashwanth
      @a.yashwanth 2 месяца назад +2

      You don't need to imagine. It's the 2nd most popular programming language in the world after python.