Andrew Lamb
Andrew Lamb
  • Видео 14
  • Просмотров 14 456
6 Piotr Findeisen The Types
@findepi from SDF wrapped up the talks with a detailed exploration of types and functions in the context of Apache Arrow vs DataFusion. He explained how types are handled in Arrow and DataFusion. @findepi's insights shed light on the potential improvements that could further enhance DataFusion’s handling of data types.
Huge thanks to Sam from Synnada for this description: github.com/apache/datafusion/discussions/11431#discussioncomment-10832070
Просмотров: 121

Видео

5 Nick Karlov DataFusion as a heart of modern HTAP DB
Просмотров 16714 дней назад
@karlovnv from Tarantool followed, sharing insights on how his team is pushing the limits of big data. @karlovnv showcased their work on real massive datasets, such as handling 3,000-column dataset, processing 70TB of data in RAM and doing these things really really fast (quicker than 10ms for a fraud detection use case!?). His talk demonstrating how DataFusion plays a key role in enabling thes...
4 Marko Grujić - Database replication with DataFusion
Просмотров 10914 дней назад
@gruuya, senior staff engineer at EDB, the hero of the day who gathered all of us for this amazing event, gave a talk focused on database replication using the FDAP (Flight, DataFusion, Arrow, and Parquet) stack. @gruuya explained how this powerful combination of open-source tools enables efficient and scalable data replication, particularly in analytic environments. By leveraging Apache Arrow ...
3. Mehmet Ozan Kabak One Compute to Rule Them All: Unifiying Data & AI Workflows with DataFusion
Просмотров 16914 дней назад
@ozankabak, co-founder and CEO of Synnada, spoke about the challenges of building data-intensive applications, referring to the Data Chasm - a complex landscape with many moving parts that makes it difficult to manage data efficiently. He explained how DataFusion helps break down these barriers, allowing for a more streamlined approach to data processing. @ozankabak highlighted Synnada's contri...
Artjoms Iškovs Reducing query latency via a caching object store layer
Просмотров 35214 дней назад
Next, @mildbyte, Principal Engineer at EDB, delivered a highly technical talk on caching optimization using DataFusion in EDB. @mildbyte explained how EDB utilizes DataFusion to optimize query caching, which leads to significant performance improvements.These optimizations are crucial for managing large-scale data systems, showcasing how EDB leverages DataFusion’s capabilities effectively. Huge...
1 Andrew Lamb - DataFusion Introduction
Просмотров 44314 дней назад
The talks kicked off with @alamb, who provided an in-depth introduction to origins and goals of Apache DataFusion. He started by described DataFusion as LLVM for data systems, enabling innovation in data-intensive systems. @alamb highlighted DataFusion’s architecture, built with industrial best practices, and its ability to compete with tightly integrated systems. Finally, @alamb touched on the...
Profiling Apache DataFusion using flamegraph
Просмотров 152Месяц назад
Shows how to use flamegraphs to profile your application. This video shows how to profile a particular query using datafusion-cli and then understand the resulting flamegraph.svg
Faster DataFusion with StringView - Xiangpeng Hao (Aug 15, 2024)
Просмотров 4462 месяца назад
Xiangpeng Hao summarizes what Apache Arrow StringView is, why it can improve performance, and the practical challenges overcome when realizing the potential. Xiangpeng Hao presents his 2024 Summer Intern project at @influxdata8893: improving performance in Apache DataFusion, the query engine used in InfluxDB 3.0. Talk Abstract: We implemented a new string representation-StringView-in the Rust i...
SIGMOD 2024 Practice: Apache Arrow DataFusion A Fast, Embeddable, Modular Analytic Query Engine
Просмотров 2,1 тыс.4 месяца назад
Presentation: docs.google.com/presentation/d/1gqcxSNLGVwaqN0_yJtCbNm19-w5pqPuktII5_EDA6_k/edit#slide=id.p Paper: github.com/apache/datafusion/files/14789704/DataFusion_Query_Engine SIGMOD_2024-FINAL.pdf
2024-03-25 DataFusion Meetup Introduction
Просмотров 4077 месяцев назад
2024-03-25 DataFusion Meetup Introduction
Profiling DataFusion with Instruments (part of XCode on Mac OSx)
Просмотров 3947 месяцев назад
Show how to use the Instruments tool that comes with XTools on Mac OSX in order to see where a Rust program (in this case `cargo bench ...`) is spending its time and a brief summary of how to interpret the visualization
Apache Arrow DataFusion Architecture Part 3
Просмотров 1,7 тыс.Год назад
The part 3 in a 3 part series of "DataFusion Architecture" presentations. Part 3 covers ExecutionPlans, physical planning and physical optimizers arrow.apache.org/datafusion/ Slides: docs.google.com/presentation/d/1cA2WQJ2qg6tx6y4Wf8FH2WVSm9JQ5UgmBWATHdik0hg
Apache Arrow DataFusion Architecture Part 2
Просмотров 2,4 тыс.Год назад
The part 2 in a 3 part series of "DataFusion Architecture" presentations. Part 2 covers SQL query planning, LogicalPlans, and Expr expressions arrow.apache.org/datafusion/ Slides: docs.google.com/presentation/d/1ypylM3-w60kVDW7Q6S99AHzvlBgciTdjsAfqNP85K30
Apache Arrow DataFusion Architecture Part 1
Просмотров 5 тыс.Год назад
The first in a 3 part series of "DataFusion Architecture" presentations. Part 1 covers what is a query engine and why might you need one and a brief introduction to DataFusion arrow.apache.org/datafusion/ Slides: docs.google.com/presentation/d/1D3GDVas-8y0sA4c8EOgdCvEjVND4s2E7I6zfs67Y4j8/edit#slide=id.p

Комментарии

  • @chrisgrant6642
    @chrisgrant6642 11 дней назад

    Video pauses from 0:48 until around 1:35

  • @JwebGuru
    @JwebGuru 15 дней назад

    lol I am really feeling this talk working in arrow right now... among other things, the explosion of type combinations can be hell on compile times if you don't use dynamic dispatch everywhere (but if you do, you lose a lot of the benefits of inlining in Rust).

  • @TheSakox
    @TheSakox 18 дней назад

    Thanks for uploading these talks Andrew!

  • @jonton6981
    @jonton6981 Месяц назад

    Really appreciate the effort you put into this Andrew. Been following this project for 3+ years and am enjoying the collaboration between Datafusion, DuckDB, and Polars.

  • @maronmontano9326
    @maronmontano9326 Месяц назад

    Really like the code snippet to elaborate on how the pull-based execution model works with Rust and tokio.

  • @shanes.6414
    @shanes.6414 2 месяца назад

    Did the mentioned blog post get published yet?

    • @andrewlamb11
      @andrewlamb11 2 месяца назад

      Not yet -- we are still working on it. It was delayed for unrelated reasons. Stay tuned!

  • @andrewlamb11
    @andrewlamb11 2 месяца назад

    Slides drive.google.com/file/d/1Qqd8V6cfS9rSQ_-JrinasQJwI79qlUEV

  • @DharanAditya
    @DharanAditya 3 месяца назад

    This was an interesting talk. I watched this video after working on DF for quite some time, my mental model for logical plan & expr tree improved. Thank you @andrewlamb11

  • @jeroenvanrenterghem6163
    @jeroenvanrenterghem6163 4 месяца назад

    Had to work a bit on my rust knowledge, coming from spark. So now, a year in, let's do this...

  • @gangsu2666
    @gangsu2666 4 месяца назад

    love to see more presentation about datafusion❤

  • @DharanAditya
    @DharanAditya 5 месяцев назад

    Great presentation. Thanks Andrew. Good material to get started

  • @yanxinxiang9008
    @yanxinxiang9008 7 месяцев назад

    Looking forward to attending next time

    • @andrewlamb11
      @andrewlamb11 7 месяцев назад

      Indeed -- I hope we can hold many more such meetups around the world. Anyone interested in such things / helping to organize please come and join the conversation in github.com/apache/arrow-datafusion/discussions

  • @mengduan-f4w
    @mengduan-f4w Год назад

    Could you share the google doc link of the slides?

    • @andrewlamb11
      @andrewlamb11 Год назад

      The slides are docs.google.com/presentation/d/1ypylM3-w60kVDW7Q6S99AHzvlBgciTdjsAfqNP85K30. For some reason RUclips won't let me post them directly in the links. andrew.nerdnetworks.org/ has more

  • @po-weihuang3398
    @po-weihuang3398 Год назад

    is there a way to see the slide? the link is strange.

    • @andrewlamb11
      @andrewlamb11 Год назад

      docs.google.com/presentation/d/1D3GDVas-8y0sA4c8EOgdCvEjVND4s2E7I6zfs67Y4j8/edit#slide=id.p -- there is some sort of more intrusive verification I need to do to get the link to show above. You can also find all the links on andrew.nerdnetworks.org/

  • @kofshower
    @kofshower Год назад

    Where can I get slides?

    • @andrewlamb11
      @andrewlamb11 Год назад

      You can find a link to the slides on docs.rs/datafusion/25.0.0/datafusion/#overview--presentations In this case docs.google.com/presentation/d/1cA2WQJ2qg6tx6y4Wf8FH2WVSm9JQ5UgmBWATHdik0hg

  • @nosh3019
    @nosh3019 Год назад

    thanks a lot!! very clear

  • @nosh3019
    @nosh3019 Год назад

    super clear, love your fusion videos, thank you! wondering if you can do some on Ballista as well!

    • @andrewlamb11
      @andrewlamb11 Год назад

      Thanks -- I am not enough of an expert in Ballista to make the same kind of presentations -- maybe we can ask the community 🤔

  • @nosh3019
    @nosh3019 Год назад

    great presentation Andrew! thanks a lot, just what I'm looking for. Would be even better with a better mic :)