Видео 89
Просмотров 71 136

Faster ETL Pipeline with Bodo by Ahmad Khadem

47:43

Running a Datacenter Performance Optimization Campaign by Nadav Rotem

40:31

The Art of SIMD Programming by Sergey Slotin

52:06

Bytecode Rewrite Optimizations by Shiqi Cao

50:20

Performance Laws by Taras Tsugrii

30:20

Where Have All the Cycles Gone? by Sean Parent

59:20

Panel discussion with Ivica Bogosavljevic, Nadav Rotem, Sergey Slotin and Taras Tsugrii

In this session hosted by Taras Tsugrii, speakers from both days of Performance Summitt 2022 come together to explore what's next in performance improvement, and to answer any lingering questions posted by the talks.

Видео

Faster ETL Pipeline with Bodo by Ahmad Khadem

47:43

Faster ETL Pipeline with Bodo by Ahmad Khadem

Просмотров 1872 года назад

Faster ETL Pipeline with Bodo: A Compiler Based Parallel Computing for Big Data Analytics The growing scale and complexity of data and ML workloads demands enormous compute power. Bodo is a new compute platform that aims to improve efficiency and performance of ETL pipelines through automated parallelization of native Python and SQL workloads. Bodo’s resource efficiency and extreme performance ...

Running a Datacenter Performance Optimization Campaign by Nadav Rotem

40:31

Running a Datacenter Performance Optimization Campaign by Nadav Rotem

Просмотров 1 тыс.2 года назад

In recent years, the number of software services that are running on datacenter and cloud computing platforms has increased rapidly. Datacenters are expensive to operate and represent a significant portion of the world's energy consumption. Efforts to improve the efficiency of software that runs on datacenters can greatly reduce the energy use and costs of operations and improve the responsiven...

The Art of SIMD Programming by Sergey Slotin

52:06

The Art of SIMD Programming by Sergey Slotin

Просмотров 12 тыс.2 года назад

Modern hardware is highly parallel, but not only in terms of multiprocessing. There are many other forms of parallelism that, if used correctly, can greatly boost program efficiency - and without requiring more CPU cores. One such type of parallelism actively adopted by CPUs is "Single Instruction, Multiple Data" (SIMD): a class of instructions that can perform the same operation on a block of ...

Bytecode Rewrite Optimizations by Shiqi Cao

50:20

Bytecode Rewrite Optimizations by Shiqi Cao

Просмотров 6102 года назад

ByteDance in-house Redex passes A handful Redex passes have been developed in ByteDance in addition to existing Redex passes. We'd like to share some of them, RenameClass, StringBuilderOutliner, KtDataClass. These three passes saved about ~1M, ??%. *Exploring ART Optimization Complement * ART optimization is triggered during both jit and dex2aot, however both are running with restricted computa...

30:20

Performance Laws by Taras Tsugrii

Просмотров 8002 года назад

Most modern performance work is centered around technologies and tools. Following Jeff Bezos' advice, this talk, instead of trying to predict the future, covers some of the oldest mathematical principles and ways they've shaped modern software. Parallel and distributed computing, caching and proxy computations are just a few examples of leveraging associativity, idempotence, Kleisli category an...

Where Have All the Cycles Gone? by Sean Parent

59:20

Where Have All the Cycles Gone? by Sean Parent

Просмотров 2,7 тыс.2 года назад

Personal computers and devices are unbelievably fast but often struggle to keep up with mundane tasks. Why hasn't software managed to scale in performance with hardware? This talk looks at some of the reasons, the characteristics of fast systems, the limits of human perception, and the need to rethink how software is authored to build efficient systems. skillsmatter.com/skillscasts/17836-where-...

Why do Programs Get Slower with Time? by Ivica Bogosavljevic

38:12

Why do Programs Get Slower with Time? by Ivica Bogosavljevic

Просмотров 5392 года назад

When it comes to software performance, ideally we want two things: 1. The performance of our program doesn’t change as the features are added. 2. The runtime of the program grows linearly to the data set size. But this often fails to happen. As new features are added or the dataset grows, the software starts running much slower than one would expect. In this talk we will explore what are the fa...

Performance Summit - September 2021 - Day 2 Panel discussion

26:10

Performance Summit - September 2021 - Day 2 Panel discussion

Просмотров 1523 года назад

Join Luca Chiabrera, Paul McKenney, Daniele Salvatore Albano, Moritz Beller, and our host lonut ZoIti for a thought provoking conversation of what's next in the Computing Performance world !

Tailoring Google Dataproc to Reduce Spark Execution Time and Cut the Bill - Luca Chiabrera

28:15

Tailoring Google Dataproc to Reduce Spark Execution Time and Cut the Bill - Luca Chiabrera

Просмотров 1783 года назад

Dataproc is a fully managed and highly scalable Google service that facilitates quickly deploying clusters and executing Spark applications. Nevertheless, the user remains responsible for sizing the Dataproc infrastructure and defining the Spark application execution parameters. These activities have a relevant impact on both the execution time of the Spark applications and the cost of the Data...

The Price of Fast and Robust Concurrent Software - Paul McKenney

28:15

The Price of Fast and Robust Concurrent Software - Paul McKenney

Просмотров 4043 года назад

Although there is no magic wand that can wish away all bugs in concurrent software, there is a wealth of tools, techniques, and experience that permit moving towards that goal. This talk takes on the viewpoints of semantics, software engineering, installed base, software stack, and finally natural selection in order to name the price of fast and robust concurrent software. Performance Summit Se...

cachegrand - A Super Scalar Caching Platform - Daniele Salvatore Albano

28:26

cachegrand - A Super Scalar Caching Platform - Daniele Salvatore Albano

Просмотров 3023 года назад

cachegrand is a super-scalar caching platform, aiming initially to be a Redis drop-in replacement, to achieve high performances a lot of components have been written from the scratch with performance in mind, among other things the internal parallel hashtable, built only to store 64 bit numbers, is using a number of different techniques to maximize the throughput: data oriented design to take a...

Learning to Predict Performance Regressions in Production - Moritz Beller

30:13

Learning to Predict Performance Regressions in Production - Moritz Beller

Просмотров 2623 года назад

Imagine what we could do if we could predict performance regressions before they actually impact production, at code authorship time? In this talk, I will set the stage of the performance prediction landscape, will clarify why predicting performance regressions of a piece of code is difficult, go over the many things we learned as part of the SuperPerforator initiative at Facebook, and share ou...

Performance Summit - September 2021 - Day 1 Panel discussion

25:02

Performance Summit - September 2021 - Day 1 Panel discussion

Просмотров 593 года назад

Join Stefano Doni, Ivica Bogosavljevic, Yaniv Sabo, Simone Casagranda, Raman Pandey, Nian Sun and our host lonut ZoIti for a thought provoking conversation of what's next in the Computing Performance world !

Connected Vehicle Platform - Raman Pandey

37:04

Connected Vehicle Platform - Raman Pandey

Просмотров 2953 года назад

Connected vehicle platform is one of the important performance sensitive platforms, where the role of performance engineer is to assess the IoT devices which are connected to the cloud solution and communicate via telemetries. Performance Summit September 2021

Using ML to Automatically Optimize Kubernetes for Cost Efficiency and Reliability - Stefano Doni

26:43

Using ML to Automatically Optimize Kubernetes for Cost Efficiency and Reliability - Stefano Doni

Просмотров 3243 года назад

Using ML to Automatically Optimize Kubernetes for Cost Efficiency and Reliability - Stefano Doni

Performance Optimization Techniques for Mobile Devices: An Overview - Ivica Bogosavljevic

45:04

Performance Optimization Techniques for Mobile Devices: An Overview - Ivica Bogosavljevic

Просмотров 3373 года назад

Performance Optimization Techniques for Mobile Devices: An Overview - Ivica Bogosavljevic

Task scheduling optimizations in Mobile Application - Nian Sun

27:13

Task scheduling optimizations in Mobile Application - Nian Sun

Просмотров 2083 года назад

Task scheduling optimizations in Mobile Application - Nian Sun

Shipping 2021 user experiences on 2011 phones - Simone Casagranda & Yaniv Sabo

30:23

Shipping 2021 user experiences on 2011 phones - Simone Casagranda & Yaniv Sabo

Просмотров 1353 года назад

Shipping 2021 user experiences on 2011 phones - Simone Casagranda & Yaniv Sabo

0:43

Performance Summit Trailer Sep 2021

Просмотров 1143 года назад

Performance Summit Trailer Sep 2021

1:03:16

London Perf Summit - Panel Discussion

Просмотров 2543 года назад

London Perf Summit - Panel Discussion

We Should Become Good at Optimizing our Code - Denis Bakhvalov

29:38

We Should Become Good at Optimizing our Code - Denis Bakhvalov

Просмотров 1,2 тыс.3 года назад

We Should Become Good at Optimizing our Code - Denis Bakhvalov

Understanding and Measuring CPU throttling in containerized environments - Francesco Fabbrizio

24:21

Understanding and Measuring CPU throttling in containerized environments - Francesco Fabbrizio

Просмотров 3,3 тыс.3 года назад

Understanding and Measuring CPU throttling in containerized environments - Francesco Fabbrizio

How AI optimization will debunk 4 long-standing Java tuning myths - Stefano Doni

26:53

How AI optimization will debunk 4 long-standing Java tuning myths - Stefano Doni

Просмотров 5023 года назад

How AI optimization will debunk 4 long-standing Java tuning myths - Stefano Doni

Virtual machine warmup blows hot and cold - Laurence Tratt

32:03

Virtual machine warmup blows hot and cold - Laurence Tratt

Просмотров 4743 года назад

Virtual machine warmup blows hot and cold - Laurence Tratt

Debugging performance issues with Java Flight Recorder - Alexander Kachur

25:20

Debugging performance issues with Java Flight Recorder - Alexander Kachur

Просмотров 1 тыс.3 года назад

Debugging performance issues with Java Flight Recorder - Alexander Kachur

On access control model of Linux native performance monitoring - Alexei Budankov

28:31

On access control model of Linux native performance monitoring - Alexei Budankov

Просмотров 1103 года назад

On access control model of Linux native performance monitoring - Alexei Budankov

Query-based compiler architectures - Olle Fredriksson

28:49

Query-based compiler architectures - Olle Fredriksson

Просмотров 8483 года назад

Query-based compiler architectures - Olle Fredriksson

nanoBench: A Low-Overhead Tool for Running Microbenchmarks on x86 Systems - Andreas Abel

26:44

nanoBench: A Low-Overhead Tool for Running Microbenchmarks on x86 Systems - Andreas Abel

Просмотров 3913 года назад

nanoBench: A Low-Overhead Tool for Running Microbenchmarks on x86 Systems - Andreas Abel

Automatically avoid the top performance patterns in distributed systems - Andreas Grabner

29:37

Automatically avoid the top performance patterns in distributed systems - Andreas Grabner

Просмотров 2713 года назад

Automatically avoid the top performance patterns in distributed systems - Andreas Grabner

@alexandrulazarev6207 Месяц назад
Awesome! Watched this right after Dave Chiluk's conference reference-this video perfectly complements it. It really helps paint a clear picture of what CPU throttling is and the key takeaways from it.
@fbbfnc 13 дней назад
Thanks! Glad i could help
@OptimusVlad 3 месяца назад
I don't understand why, in the masking intro slide (at 22:42), the author says that the following has no branches: for (int i = 0; i < N; i++) s += (a[i] < 50 ? a[i] : 0); That's a ternary operation, which branches between left and right expressions. What am I missing?
@az09letters92 3 месяца назад
That will execute both options and disregard (mask) incorrect ones. Counterintuitively this yields a massive speedup! No branches.
@OptimusVlad 3 месяца назад
@@az09letters92 Is that because the compiler can prove that executing both sides has no side-effects? Because if the left or right were expressions that could have side effects, then it would be a short-circuiting branch, correct?
@petrvset1960 5 месяцев назад
Hard to understand English and unpleasantly small text...
@richard2845 6 месяцев назад
thanks so much for the tutorial
@JakobJenkov 8 месяцев назад
I personally use 9 performance principles - some of which are the same as the 5 outlined in this video. Some of the differences seem to be "CPU to data distance" and "data size" ... I cover the principles I use in my video "My 9 + 1 core performance optimization principles" : ruclips.net/video/ULlFWomaPVw/видео.html
@liaforubeb 9 месяцев назад
This is cringe
@mytech6779 Год назад
Somebody should have taken a minute to optimize that terrible audio compression.
@maximilian19931 Год назад
Adobe should optimise their software to have less source code in general based on the results of this. Write slimmer code and the performance isssue goes away.
@mrfantasticindian1593 Год назад
Awesome 🇷🇺
@DavidM_GA Год назад
The "Closure" part of the optimization is basically how Forth works: a code pointer and a piece of data for each "word".
@lenkite Год назад
Thanks for the video. Is there any new material on topic of "closure generation" interpreters ? Very hard to find material that is not bytecode based.
@Roxas99Yami Год назад
Thanks very appreciated. Especially the examples in C. Is this directky compatible in Cython ?
@Roxas99Yami Год назад
The intrinsics i mean
@martingeorgiev999 Год назад
I don't understand why these architecture specific instructions are not recognized directly by gcc on O3.
@bouazzase4202 Год назад
they are, when you give the -march= argument, otherwise the compiler doesn't know which instruction sets are allowed and will fall back to a default (usually x86-64 without avx)
@pingkai Год назад
Anybody really cares about latency what to use a RPC and streaming? If I can tolerate 18ms I did not see why I can not tolerate 100ms.
@tarastsugrii4367 Год назад
well, spending $1m vs $10m on compute makes a good case for preferring one over the other ;)
@yuangchen905 2 года назад
great video. Thank very much for your lightening example and insightful explanation!
@poker53281 2 года назад
Great talk. Thank you Sean.
@YoloMonstaaa 2 года назад
great talk
@RicardoGonzalez-tg3lw 2 года назад
This was amazingly useful! Loved it. Thanks for the great work!
@sezgin_murat 2 года назад
Amazing and full of knowledge talk.
@Roibarkan 2 года назад
50:00 Sean might be referring to “Eytzinger binary search”
@Roibarkan 2 года назад
21:45 ruclips.net/p/PLGvfHSgImk4Y1thqJLpcSscMTwgN-XM9l
@Roibarkan 2 года назад
31:43 Great talk. More about optimization remarks and viewing them can be found in Ofek’s talks such as ruclips.net/video/6HbyacS5eZQ/видео.html
@performancesummit4604 2 года назад
thank you for the kind words and sharing this excellent talk!
@karimmanaouil7278 2 года назад
This guy is seriously a god.
@Shogoeu 2 года назад
What happened to Concord.io?
@alexandergallego5899 Год назад
sold it to akamai.
@allanwind295 2 года назад
Audio is too low. The (section) flags are neat. As it's a performance presentation, it would have been neat to include a running tally as go through the optimizations. You do cover it in the conclusion, of course. Good job!
@mjwchapman 3 года назад
thank you laurence very interesting. i have a trading app written in java that takes a good few hours to speed up. this has given me the incentive to find out exactly when jit compilation is happening.
@mjwchapman 3 года назад
this was brilliant. thank you very much. next stop ScyllaDB
@AbderrahmaneBenbachir 3 года назад
I recognise Suchakra's slide at 4:00 :)
@jayakrishnanjr5194 3 года назад
Link in the desc is broken link
@performancesummit4604 3 года назад
Fixed ! Thank you for raising this !
@GlebWritesCode 3 года назад
"Developers love Kafka api" - are you serious? After a year working with Kafka, I know some configuration choices and design problems I still shudder at
@bruceritchie7613 3 года назад
Oh hell yes. Completely agree.
@alexandergallego5899 Год назад
lossy compression here. what ppl want is their existing apps to go faster w/ no code changes. the partitioning scheme of un ordered collection with totally ordered sub collections is pretty handy as a modeling. what you point out is the heavy weight nature of partitions which is true, but the mental model is helpful.
@rurban 3 года назад
He is missing baseline jits. Very fast jit compilers. They are about 10x faster than the fastest interpreter
@SimGunther 2 года назад
As he mentioned in the intro, the whole point of "fast CHEAP interpreters" is that you wouldn't need JIT compilation which is "not cheap" because special domain knowledge on assembly/hardware is necessary for JIT/inline threading.
@hardknockscoc Год назад
@@SimGunther I think rurban's point is that there are JITs that are quite simple to make. Look at the Kaleidescope tutorial for LLVM. You don't actually need to know anything about platform specific ASM.
@SimGunther Год назад
@@hardknockscocIt's one thing to use someone else's JIT library so you don't need "platform specific knowledge" about the assembly you're creating, but it's a different story rolling your own JIT library, which is what I assumed in the original comment. You could also be a LISP-kinda person if you want to write a whole interpreter in SBCL/scheme running your whole program tree with C shared libraries for performance intensive stuff. That is technically a way of going about JIT without "platform specific knowledge" unless performance is the only concern for the interpreter.
@nanman_chief 7 месяцев назад
@@hardknockscoc Even when using LLVM IR, it's a huge hassle for high-level languages. For instance, implementing a language like Scheme (or any language with call/cc) is straightforward in a virtual machine or interpreter, but it's a major headache on the machine stack. Simply translating it won't yield effective results. Another issue is compile time. Usually, we use scripting languages not for compute-bound program, but to assist the host language in configuring some data. Imagine your build system script just contains some build logic and file paths, but the script's compile time is longer than its execution time. To address this, sufficiently mature JIT compilers are typically multi-tiered, but implementing this is also a significant challenge.
@shweddedglowy 3 года назад
Benchmarking these approaches provide a very valuable insight for anyone considering these approaches! Thanks for providing all these learnings. 😎
@kokizzu 3 года назад
Why I never see throughput benchmark for this XD.
@tuberovayer9099 3 года назад
Excellent sneak peak of query based compilers.
@yafz 3 года назад
The discussion in the last 10 minutes is full of insights! Thanks!
@kadiyamsrikar9565 3 года назад
Good talk. 👍
@cepuofficial9025 4 года назад
Any tutorial on this?
@tarastsugrii4367 4 года назад
vectorized.io has an extensive documentation for Redpanda - vectorized.io/docs.
@ravengelen 4 года назад
Here is the link to the GitHub project page: github.com/Genivia/ugrep
@gukki5 4 года назад
he stated that intra-datacenter network latencies nowadays between machines was on par with inter-NUMA latencies. that’s just categorically untrue. 100s of nanoseconds vs 100s of microseconds. that’s 3 orders of magnitude.
@alexandergallego5899 4 года назад
i mentioned for a 'contended' resource. I've measured in the milliseconds.
@alexandergallego5899 4 года назад
but to add, you *can* in fact do a network call in low double digit microseconds. It's easy to measure, just hook up 2 servers back to back with some SPF link or smth and DPDK and you'll see. All of this is quite easily verifyable really.
@k2t210 4 года назад
Fantastic talk! Thanks for the wisdom!
@Super_Mario0o 5 лет назад
Great job guys, and thanks Nitin to share this discussion and your vision. I really liked the concept and topic which fits o app such Linkedin. Nevertheless, there were some missing points like: - Evaluation of used prognostic models (Accuracy for prediction concepts expressed by error analysis) - Compare ML (Xboost) Vs DL (DNN) or offline NQS - Type & quality of NNs architectures were used for predictive models (info about the best model and its hyperparameters) - feature importance mechanism was used for feature engineering just for Xboost since in Deep Learning it's done by NN itself. - Is it possible to access the dataset you used like open-source?
@NitinPasumarthy 4 года назад
Hey Mehryar. These are some good questions. Thanks for posting them. We are working on a detailed LinkedIn engineering blog focussed on Machine Learning aspects of this solution. We haven't open sourced this data yet, but we have it in our radar.

Performance Summit

Комментарии