Java, How Fast Can You Parse 1 Billion Rows of Weather Data? • Roy van Rijn • GOTO 2024

Поделиться
HTML-код
  • Опубликовано: 3 авг 2024
  • This presentation was recorded at GOTO Amsterdam 2024. #GOTOcon #GOTOams
    gotoams.nl
    Roy van Rijn - Experienced Developer & Architect, Robotics Enthusiast & Hobby Mathematician ‪@royvanrijn‬
    ORIGINAL TALK TITLE
    How Fast Can You Parse a File with 1 Billion Rows of Weather Data Using Java?
    RESOURCES
    x.com/royvanrijn
    / royvanrijn
    github.com/royvanrijn
    royvanrijn.com
    Links
    adventofcode.com
    x.com/gunnarmorling
    www.morling.dev
    ABSTRACT
    Last January a challenge was posted online by Gunnar Morling: How fast can you parse a file with 1 billion rows of weather data using Java?
    Little did I know this deceivingly simple question would lead me down a path that taught me all about: parallelism, memory mapped files, SWAR techniques (SIMD as a register), bit twiddling, branchless code, mechanical sympathy, Graal native compilation and finally... I even turned to the dark side: using sun.misc.Unsafe.
    Join me in this deep dive where I'll explain all the code changes and tricks that took me from the reference implementation which processes the billion records in 4+ minutes, to processing everything in under 2 seconds.
    Who knew Java could be this fast? [...]
    TIMECODES
    00:00 Intro
    01:49 The challenge
    06:07 Watch, learn, adopt, experiment
    08:00 Mechanical sympathy
    09:32 Temperature as integer
    10:37 Memory mapped files
    11:54 Getting unsafe
    13:31 SWAR
    17:22 Stringless
    18:18 Branchless programming
    20:35 Parse the temperature
    30:14 Keeping track
    36:22 Which JVM?
    37:21 Graal (native-image)
    39:38 Summary
    40:50 Results
    42:00 Outro
    Download slides and read the full abstract here:
    gotoams.nl/2024/sessions/3164
    RECOMMENDED BOOKS
    Monica Beckwith • JVM Performance Engineering • amzn.to/3zuJ7Ig
    Scott Oaks • Java Performance • amzn.to/4eNhlH4
    Trisha Gee, Kathy Sierra & Bert Bates • Head First Java • amzn.to/3k59BJ6
    Trisha Gee & Kevlin Henney • 97 Things Every Java Programmer Should Know • amzn.to/3kiTwJJ
    / gotocon
    / goto-
    / goto_con
    / gotoconferences
    #Java #JVM #GraalVM #Parsing #Parallelism #MemoryMappedFiles #SWAR #BitTwiddling #BranchlessCode #MechanicalSympathy #GraalNative #JavaProgramming #AdventOfCode #1BillionRowChallenge #GunnarMorling #RoyvanRijn
    Looking for a unique learning experience?
    Attend the next GOTO conference near you! Get your ticket at gotopia.tech
    Sign up for updates and specials at gotopia.tech/newsletter
    SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.
    ruclips.net/user/GotoConf...
  • НаукаНаука

Комментарии • 49

  • @TimBradleyFromOz
    @TimBradleyFromOz 21 день назад +51

    From 4 minutes 49 seconds 679 milliseconds >>> to >>> 1 second 535 miliseconds?
    Wow!
    Great talk, thanks!

  • @actorenEU
    @actorenEU 21 день назад +38

    In 1975, at the University of Delft, my professor and I collaboratively developed an assembler and interpreter for the computer practicum. We had to run this on an IBM mainframe, using a higher-level language. To make it functional, we had to employ extensive masking and shifting operations. I vividly remember the complex logical intricacies we had to navigate to get it all working correctly. From this hands-on experience, I truly admire your work and the effort it takes to even get it working.

  • @serrrsch
    @serrrsch 20 дней назад +16

    I've followed the competition on Twitter and GitHub but this talk is just a Gem in how it is being told. Big up for the presentation/slides skills!

  • @aronhegedus
    @aronhegedus 2 дня назад +1

    The --worker trick is interesting! Learned some cool things in this one

  • @juveraey
    @juveraey 22 дня назад +17

    even my low brain can understand what you said. great talk thank you Roy van Rijn

  • @jpphoton
    @jpphoton 15 часов назад +1

    very well done. one of the best imho. that's how we do it.

  • @LtdJorge
    @LtdJorge 19 дней назад +15

    If you optimize for specific arches, you can do the SIMD lookup with less instructions and much wider. For example, I’m using the memchr Rust crate by the genius BurntSushi, and specifically the AVX2 implementation. It does loops of 4 sequential comparisons with 256bit registers. The SIMD part is just 2 instructions.

  • @TechTalksWeekly
    @TechTalksWeekly 17 дней назад +4

    This is a brilliant talk and it's been featured in the last issue of Tech Talks Weekly newsletter 🎉
    Congrats Roy!

  • @djchrisi
    @djchrisi 15 дней назад +4

    The view count gives testamony what a fun challenge that was.

  • @Syntax753
    @Syntax753 19 дней назад +8

    Kudos for mentioning Advent of Code! And yeah, most people can parse 1 Billion rows of weather data between every blink (and more if using strong Java)

  • @Nashadelicable
    @Nashadelicable 21 день назад +3

    This was so much fun watching

  • @robchr
    @robchr 22 дня назад +7

    Wizard level optimizations

  • @ericm97
    @ericm97 22 дня назад +2

    What a journey. Lovely talk 🙂

  • @RumberoEuropeo
    @RumberoEuropeo 16 дней назад +2

    These tricks are great for graph analytics too!

  • @eduardopalhares3526
    @eduardopalhares3526 17 дней назад +2

    Really Great talk, thanks for sharing this knowledge

  • @wsollers1
    @wsollers1 22 дня назад +2

    This was an amazing talk

  • @chauchau0825
    @chauchau0825 22 дня назад +2

    This is gold

  • @GeorgeTsiros
    @GeorgeTsiros 2 дня назад +1

    29:16 when you've spent a lot of time coding on a device with no pipelining, no SIMD, no cache, no OOE, no hardware multiplication, no hardware float operations, not even indirect memory access, 4 MHz and 10 clock cycles to read _half_ a byte, you are basically forced to learn a lot of bit-twiddling

  • @phpngpl
    @phpngpl 9 дней назад +1

    💪💪💪GraalVM

  • @saschazapf5232
    @saschazapf5232 8 дней назад +2

    Hi Roy, remember good old Redcode Days,? Hope u doing well. Nice topic btw

    • @royvanrijn
      @royvanrijn 7 дней назад +2

      @@saschazapf5232 of course; we should revisit that 🙌🏼

    • @saschazapf5232
      @saschazapf5232 6 дней назад +2

      @@royvanrijn Revisit would not the right word for this. I really think quite often about cw. Two years ago i done some research for a better amount of rounds for the hill because of the score deviation on differents seed at 200 Rounds. After that i had some idea about another approach for optimzing paper. But i had not enough time , that time. Now i'm tring to polish up my Java Skills and consumed some video's. Imagine my face when i saw ur face and name on my yt landing page. I saw a picture of u once, literally 15 years ago but i instantly know this was u.

  • @krystianlaskowski
    @krystianlaskowski 10 дней назад +3

    Challenge and Setup:
    The speaker participated in the "1 billion row challenge" using vanilla Java with no libraries.
    The task was to parse a file with 1 billion rows, each containing a city name and temperature, and compute the minimum, maximum, and average temperature for each city.
    Initial Optimization:
    Initial baseline implementation took about 5 minutes to parse the file.
    By optimizing to use multiple threads and a concurrent map, the runtime was reduced to 2 minutes.
    Advanced Techniques:
    Further improvements included using memory-mapped files and unsafe operations to bypass boundary checks for faster access.
    Leveraged Single Instruction, Multiple Data (SIMD) techniques to process multiple bytes simultaneously, significantly speeding up parsing.
    Branchless Programming:
    Implemented branchless programming to avoid CPU pipeline stalls, using bitwise operations for conditions and calculations.
    Avoided string allocations by working directly with raw memory and integer values.
    Final Optimizations:
    Used forward probing for efficient hash map operations, eliminating collisions.
    Adopted a worker thread setup to handle file unmapping separately, further reducing processing time.
    Achieved a final runtime of 23 seconds using these combined optimizations and Java's advanced features.

  • @aronhegedus
    @aronhegedus 2 дня назад

    I'm still quote confused as to how you can read in 15Gb of text in

  • @geoffxander7970
    @geoffxander7970 16 дней назад +4

    When a software engineer stumbles upon the dark arts of real computer science...
    The only thing that comes to mind not talked about was AVX2 or SSE4.x (I don't know if they're supported natively in Java).

    • @NostraDavid2
      @NostraDavid2 9 дней назад

      I didn't see CompSci though? No math theorems, etc. Just pure engineering.

  • @akshaysom
    @akshaysom 6 дней назад +1

    For my little brain this is sorcery 😂

  • @Abhigyan103
    @Abhigyan103 15 дней назад +4

    How can I learn Java, which is this advanced, every course just teaches object oriented programming

    • @khuntasaurus88
      @khuntasaurus88 9 дней назад +1

      @@Abhigyan103 by writing code. Just. Write. Code. Pick an idea and implement it. Then benchmark it and try to optimize it

    • @GeorgeTsiros
      @GeorgeTsiros 2 дня назад +2

      from what I suspect, it is not _Java_ that you want to learn, but low level programming. Or, just, you know, _programming_

  • @meryplays8952
    @meryplays8952 16 дней назад

    ok, how about Golang?

  • @AlLiberali
    @AlLiberali 6 дней назад

    The baseline impl takes just 5 minutes to process _1 billion_ lines. I'd say the hardware is optimised enough; No need for such hacks in production code

  • @androth1502
    @androth1502 20 дней назад +8

    native compilation... is it really java anymore?

    • @sjzara
      @sjzara 16 дней назад +1

      @@androth1502 of course.

    • @zombi1034
      @zombi1034 9 дней назад +1

      Java does native compilation anyway using it's JIT (just in time compiler). The difference with GraalVM is that you do the compilation ahead of time. That's why for long running Java programs there is not really a big performance difference between native image and your regular Java program. Because all hot code (code that gets executed regularly) will get compiled to native code anyway. The Java VM just takes a bit of time to warmup.

    • @sjzara
      @sjzara 8 дней назад

      @@zombi1034 That's not really how it works. Runtime compilation can produce far better performance for long-running programs than ahead of time. Hot code is not just compiled to native code - the compilation is speculative and can be re-done if circumstances chance. There can be things like inlining of method calls that are possible speculatively but not ahead of time.

    • @androth1502
      @androth1502 8 дней назад +1

      @@zombi1034 ah. ok. i thought one did native compilation and the other to JVM. if they are both JVM then my point was irrelevant.

  • @1Eagler
    @1Eagler 14 часов назад

    If y need Java to go really fast, change it to C

  • @nightking4615
    @nightking4615 22 дня назад +2

    Using memory maps is cheating!

    • @LtdJorge
      @LtdJorge 19 дней назад +1

      The contest results were taken by putting the file in a tmpfs, so mmapping is basically free.

    • @TehGettinq
      @TehGettinq 17 дней назад

      @@LtdJorge how is it free? you still need to pay for a cpy and for page faults, no?

    • @Knirin
      @Knirin 7 дней назад

      @@TehGettinqHeavy internal optimizations in the Linux tmpfs kernel driver. Simplifying things the tmpfs driver stores files directly in the read cache memory area and uses the virtual memory subsystem for access protection. Yes page faults happen but they aren’t much slower than jumping to another function outside of the currently loaded page compared to a full seek to disk.

    • @TehGettinq
      @TehGettinq 7 дней назад

      @@Knirin would you have a source? Want to read about it

    • @Knirin
      @Knirin 7 дней назад

      @@TehGettinq A single easy to read one, no. I suggest searching for the differences between “initrd” and “initramfs”. An LWN article about those is where I became aware of the original differences. Tmpfs has had more optimizations since then.

  • @Tony-dp1rl
    @Tony-dp1rl 20 дней назад +2

    Any language that cannot do this entirely I/O limited by the reading of the billion rows from disk in parallel, should be shamed. This would run at I/O speed in JavaScript, Java, C#, Python, Turbo Pascal,, even LUA could do this. :)

    • @LtdJorge
      @LtdJorge 19 дней назад +6

      The results are gathered with the file on a tmpfs, so no, most languages wouldn’t run this at I/O (memory) speed.

  • @CenturionDobrius
    @CenturionDobrius 21 день назад +2

    Amazing presentation, thanks a lot for sharing ❤