► What should I test next? ► AWS is expensive - Infra Support Fund: buymeacoffee.com/antonputra ► Benchmarks: ruclips.net/p/PLiMWaCMwGJXmcDLvMQeORJ-j_jayKaLVn&si=p-UOaVM_6_SFx52H
Are you sure that both apps are using same amount of connections in pool? Connection pool often makes most of the performance diff in this kind of benchmark. Quarkus defaults to 20 concurrent connections, and pgpool to 4 or runtime.NumCPU() from what I have read. Have you check performance for more than 20 connections in a pool?
Perhaps you could also try helidon SE instead of quarkus. Helidon was built from the ground up by Oracle labs to use the latest java tech like virtual threads to serve requests
@@rajivkumar-ub6uj i think so, the easiest way is just package everything as docker compose or perhaps just use local minuke cluster. i'll think about it
I am go fanboy but I really like applications written in Quarkus. My first language was Java and it is mind-blowing how fast and light Quarkus feels compared to Spring
You would be surprised how far spring came his way. With that being said, for long running app spring boot as none aot compiled would remain faster thanks to the jit compiler. Quarkus is really only good of you need fast startup or low ram consumption
This second test scenario is absolute perfection in testing real world applications. It's easy to get excited about a performance difference of like 400% (for example) in a synthetic benchmark, but by including database, storage and (de)serializing, it gives a much more nuanced picture of how it would actually scale and perform. In this case I would say both applications performed well and comparable. I'd be interested in a bit of a deeper dive in these applications by including opentelemetry and seeing what functions might bottleneck.
Thanks. Well, in some tests, I used OpenTelemetry clients with this Prometheus client in both Go and Java. I'm wondering what else you would instrument besides these function calls to S3 and the database. I might include it in the following videos. example - github.com/antonputra/tutorials/blob/main/lessons/201/go-app/images.go#L50-L62
I try to improve each time I create benchmarks. Next time, I will definitely use the v2 Go SDK and apply some other recommendations from your side. Thank you for taking the time to leave this feedback.
@@AntonPutra glad I could help and you haven't taken it as a personal attack or something :D I really value your videos and open source code for every video! Looking forward to seeing more of them.
@@DillPL thank you! i actually implemented your suggested idea in the new video and reduced the size by 6 mb (45 -> 39) :) - ruclips.net/video/56TUfwejKfo/видео.html will try other tips next as well and finally update that sdk lol
this is so professional! I love it! please do bun vs deno v2 since deno has gotten npm compatibility , the only difference now between bun and deno (aside from being written in zig and rust) is the speed (I think , both have gotten very nice std library) please do a benchmark comparing everything!
@@AntonPutra your current test scenarios are very good so I wouldn't change anything. Regarding C# I would use LTS version (dotnet 8) which is the fastest one amongst other versions according to Microsoft.
Nice work! The explanation around the benchmark is easy to understand and full of information there. IMHO, you should start to build your own courses on Udemy :)
There was a non-blocking Netty server implemented with Spring Reactive Web, which is more efficient. for databae approach use R2DBC the reactive nonblocking data repository. btw spring also support graalvm and it is not outdated.
Seems like Java 21 was used but Virtual threads wasn't used for the Quarkus application. Wasn't that the whole point to using newer Java version with the performance improvements and non-blocking reactivity APIs?
@@AntonPutrahey, thanks for making this video. Just that needed to point out the code looks to be done in a older/traditional method even though Quarkus has annotations that resolves traditional blocking calls that modern programming languages like Go probably already has underlying. Great detailed video as always! Maybe I'll try this out on my local machine to test out too!
Virtual Threads are not better in performance compared to fully reactive code. Quarkus fully reactive or Spring fully reactive will always beat virtual threads, both in performance and resources usage.
Go has gc too... The explanation is because java uses a fixed heap. Normal java reserves the memory from the system upfront and you will see no change for all the run. The Quarkus optimizations makes the internal HEAP metrics visible to K8s. But the particularities of java are still visible in those behaviours as defaulting to reserve a lot of memory upfront.
@@framegrace1 the java GC could be different at different jvm implementations. But basically it works by simple principle. The jvm perform gc then it see that heap is overused. It based on heap limit. So, in this case jvm application started and used some heap. The heap usage isn't reached the GC limit - so don't need to perform gc. When traffic comes to jvm application - it increases the count of created objects and as consequence - increased heap usage. And when heap usage limit is reached then jvm perform gc and all objects created at start of application has been deleted. I don't know how GO gc works and looks like it has partially different implementation.
Interestingly, in your Test scenario 2, your Quarkus app is spiking in DB latency while having constant times in between, as if the Postgres client would be idling to gather the queries (or waiting on a lock?) and send them in bursts.
Not always true. Paying for each CPU cycle in the cloud you can easily get out of budget on scale. That is why optimization and algorithm knowledge is valuable again - it helps to save money.
Very nice video! Seems like if cost-cutting is of great concern, you'd lean towards go to keep CPU utilization down. I would love to see a similar comparison video between Java and JavaScript/Node.js.
Nice comparison! Though I wouldn't ever compare Go vs Java native for long time runners as this one. It is true that the metrics of java at startup time would be much (much) worst but bear in mind that java has been built thinking in the startup as an edge case scenario and the JVM does a lot optimisations while the program is running, it would be interesting if java is able to beat Go in the long run. In the short run I think that there is no possible discussion and Java Native is just a work around.
Great job! But a few comments: Spring supports building native images as well, and they have maven/gradle plugins and a dedicated project Spring Native for this case. Actually, we are using it in production and building most of the Spring apps in native images. Summary: GO is faster, then JVM based stuff, well no surprise here :) In general, Quarkus doesn't give anything interesting compared to Spring, it's just a bit more modern and doesn't have much legacy stuff. What might be interesting to look at in this regard is Micronaout, because it does a fundamentally different Framework (compile-time and supports native images out of the box in comparison with runtime Spring with additional projects and layers for native support). Most likely Micronaout will show similar to GO numbers.
I work on both Java and go, your results are similar to my observations. Java consumes memory due to too much of autoconfigurations which involves hell lot of classes + some of jdk had garbage collection issue but if you develop an enterprise ready application in go with distributed tracing, logging, metrics, database writes heavy operation etc, their performance is almost equivalent. I had to manually write all those functionalities in go Lang due to lack of autoconfiguration and libraries
Very good point. Java frameworks like Quarkus are doing a lot to make large scale application development easier. All of that stuff it's doing will affect runtime performance.
JIT optimization are taken out in quarkus graalVM builds for obvious reasons. While the benefits of being lower level from Graal are great, JIT optimizations are not to be underestimated and they start to trigger later on the execution so they will be less visible at first
good test after previous tries ;) but I would not accentuate memory consumption at the start of compiled java, as it does not affect anything. Also it looks like cpu doesn't do anything, so no reason to seriously compare 3% with 5%. But latency values are valuable! PS: looking at the low cpu consumption test I got an idea to test cpu intensive application. Try to create something like a redis (hashmap is fast, lets use treemap and its concurrent versions), the app will add, update and get some data, for example count of values that are greater than received in a controller. PPS: interesting to see how regular java 21 works with virtual threads, but I heard that java file io on linux is synchronous and only 22 will be modern, so it could be a reason why you got these values in the current test. Also testing regular java in a container is tricky, it’s better to test different Xmx-Xms values first, I mean starting java under the memory limit of 500mb is not the same as with 2000mb (so using compiled java leaves that headache, but compiled has lower throughput and latency :) )
Small improvement suggestion: In Java you could use a record instead of a class. This shouldn't have a big impact on the test results but at least spares you a bit of typing. It would have been great to include a base Java and a Spring/Spring Boot comparison deployed into a java21 image container here as well just to see how much of an impact Quarkus and the native container optimization really yields. So far I couldn't convince anyone at our company to try out Quarkus. Just a question out of interest: Are you going to create a benchmark framework where similar tasks are done by various language implementations and then release your findings to the public? I just stumbled across an other video and then this one was recommended to me, and to me it looks like your videos basically doing that but just with a smaller and more comprehensible scope. So a combination of runtime analysis of different languages for various tasks would definitely be helpful, I guess
Okay, you got me now. I will start trying prometheus and grafana. The question I have is which tools do you use for load testing? You are using word "client" for this. I assume you use some kind of tools, like jmeter, k6 or?
Interesting. These tests could be extended to compare Hotspot VM, Generational ZGC and a few other switches. Can you make a video of your entire testing setup (focusing on docker, kubernetes, prometheus and grafana) from scratch? I think it's totally worth it.
Thia is all good, checking how well it performs, but if its not throttling, anything is fine as long as client latency is not out of whack. I think some stress testing will also give good comparision, like you did with node and go
hey good test! can you test with go-chi instead of Fiber? go-chi is more optimized in terms of memory usage so that might explains why Java was using less memory in that first test. Overall, good video! Keep it up
It would be interesting to see what happens if you push requests to the limits and how high that limits are. Additionally for the Java it can be build to native image with spring boot as well. It sometimes not that smooth though but honestly I expect it to perform better with spring boot.
Idk, native image crashes randomly and have lower performance than jited code atm. It's good only (if not crashes) for low traffic applications on serverless.
I have the same limits for both: github.com/antonputra/tutorials/blob/main/lessons/201/deploy/java-app/deployment.yaml#L27-L33, and I run them on dedicated nodes using the ESXi Hypervisor.
@@AntonPutra I do understand, what I wanted to say is what happens if you push client requests higher and higher. The load seemed to be not that high, so the light load conditions were tested but what would happen under high load? It can be really detrimental in real world.
I’m just wondering why the memory usage goes up and up and up at the beginning, and at some point (reaching a threshold or something?) something like an aggressive GC seems to kick in never letting memory usage go up again.
C# vs Go would be amazing to see. Maybe add to test 2 some simple reads from the database, and maybe add test 3 with some simple data structure or general purpose calculations to see how well each language performs. Amazing content. I am currently writing a high performance C# application for the government with .Net 8 and it is incredibly fast. I wonder if .net 8 has been improved so much that might even beat Java at this point.
Can you elaborate on using simple data structures or general-purpose calculations to evaluate how well each language performs? I don't really want to run fibonacci anymore lol
@@AntonPutra You can try to implement 3 different types of algorithms (In addition to the 2 tests you already did in your previous video). 1- Searching Algorithms (linear search) - ex: Create a List of 1 million objects (person: {Id, Name} - Id: must be unique integer 1 to 1,000,000. Name: generate random string. Populate your list with 1 objects (Person). Test: Generate a random Int value from 1 to 1million and find the Person object (by ID) using the random generated Int, and get the Name, then find the object in the list (by Name) compare and validate both ID match 2- Sorting Algorithms (sort the entire 1 million object (person) by name. 3 - File I/O Operations - generate random Int value from 1 to 1 million, find the Person in the List, write the Person's name to the first line in a file, if file already contains a line replace with the new name. Leverage ChatGPT to create the code for you in both languages. Just some ideas, lol
Java runtimes were historically designed to consume the resources of the whole VM so may be you can compare a Java app running on a JVM (not a native image but a hotspot JVM) on a VM with 4 cores and 4 GB RAM vs a go app running on Kubernetes using that same VM
really nice approach to monitoring performance. can you make a similar video but with java profiling tool to detect which specific part of the code must be reworked?
It would be good to change a bit what the application is doing. In our company we have a piece of code that is meant to validate if we don't have any delays in network stack. To do so we tell the app to generate random 1000 bytes and sent that to client. With that nothing is cached.
Thanks! There is a very small difference in terms of scalability; both are small with a fast startup time. I think Go is a little more efficient, so potentially you would need fewer compute resources.
@@AntonPutra the way java is using memory with GraalVM is very smart, is like observing the needs, then optimise the RAM needs. This could suggest that we could provision the JAVA container with a smaller POD in term of RAM. My concern is: how well does java handle random peaks? if we have 200 req/s, than right after the RAM stabilises suddenly we get 500 req/s, how well does JAVA handle that peak? is java going to panic and ask for wayyyy more memory than it actually needs? if this is the case, than the JAVA app may actually crash for insuficient container memory. Does it make any sense, what i've just said?
Can someone explain the extreme drop in memory usage for Java here? Under Hotspot JIT I would assume something similar, but this is running on SubstrateVM instead of JVM. Sadly I have little experience with Native Image and Substrate, so an explanation eludes me. What is happening here with memory usage?
I'm not an expert in Java, and I hope to get feedback from someone who is, but it looks like it optimizes for the load that is given to the application.
probably because of garbage collections (graalvm stills have a simple GC). The GC learns what objects need to be cycle e and what does not need. Missing JIT actually affects CPU usage. You can easily test this by running a benchmark against native code and JVM. You will see that under high load JVM version uses less CPU
@@AntonPutra I think that you Just need to be careful to not compare orange and apples. I think you can compare these combinations if the tests are going to be applied to an I/O bound app: 1. Quarkus Blocking vs Spring Blocking 2. Quarkus fully reactive vs Spring Fully reactive (with webflux) 3. Quarkus with virtual threads vs Spring with virtual Threads. An extra combination could be Quarkus Fully Reactive vs Spring with Virtual Threads as virtual threads with Spring is more used than with quarkus. A comparison between reactive quarkus against Spring (without virtual threads) would not be fair
It would be interesting if you compare go vs java non native, as non native should have better performance than native. You compile java to native only if you are building a CLI or a lambda, when you need fast startup.
I do think you should do some kind of load testing on the cheap 5$ instances. For example how many requests these cheap vps can handle before they crash, using golang, rust, php etc.
Well, when you deploy to Kubernetes, you have cgroups and other constraints that could affect performance. But as soon as I find a use case where Java performs better, I'll make an updated video-maybe something like a Kafka consumer/producer data pipeline. I'll see.
A native image of a Java app would use less memory but the throughput will be worse than a JVM version. So for CPU and latency comparison you should compare with a JVM not a native app
Java native images give slower performance at runtime than normal jars because of the lack of hotspot optimizations at runtime. To achieve a similar performance it should be optimized through a previous profiling process.
well i use esxi hypervisor to host my Kubernetes at home, i have some steps to reproduce it with metalb if you are interested - github.com/antonputra/kubernetes-on-premise Also to deploy monitoring stack (prometheus & grafana & cadvisor & kube-state-metrics etc) i use terraform with yaml tf - github.com/antonputra/tutorials/tree/main/lessons/201/terraform yaml - github.com/antonputra/tutorials/tree/main/lessons/201/monitoring
By default (depending on your garbage collector choice) java will reserve a big block of memory to manage but then clean. I'm not sure how Go does it, but you can probably configure Java to "use" less memory by setting the min and max differently and other configurations.
@@AntonPutra I was thinking about CPU and memory benchmarks on the NODE, i.e. what Kubernetes vs K3s eats of the Node performance. Otherwise, I just discovered the ClickHouse and meilisearch databases, it seems really good. (sorry for my English)
This video is good but really is not a fair comparison if you used GraalVM CE. You're comparing apples to oranges. GraalVM EE has PGO which allows the native build to benefit from, well, profiling, so it'd map and optimize the call tree among other things. Obv. EE is not free. This is the whole point, Oracle would not freely distribute optimizations to the CE edition.
thanks! it's just open source and i actually have dedicated youtube tutorials how to measure, cpu/memory/vpc etc.. here is a dashboard and promql queries for this specific video - github.com/antonputra/tutorials/blob/main/lessons/201/dashboard.json java metrics - github.com/antonputra/tutorials/blob/main/lessons/201/java-app/src/main/java/com/antonputra/ImageResource.java#L51-L59 golang metrics - github.com/antonputra/tutorials/blob/main/lessons/201/go-app/metrics.go#L13-L27
@@LawZist use summary in edge cases when you have a single instance of the app and you can only scale vertically, cause it's not possible to aggregate them over multiple instances, for example to get p90 percentile for 5 replicas of your app. With summary prometheus compute p90 on the client itself. Use histogram in all other cases ruclips.net/video/WUBjlJzI2a0/видео.html ruclips.net/video/VjFFzGFyVlY/видео.html ruclips.net/video/dMca4jHaft8/видео.html ruclips.net/video/ff_XHm96PKQ/видео.html
Looks like the s3 implementation isn't very optimized yet in Java. I do think that the test is a bit skewed, because the overall latency is mostly impacted by the task that took the longest. The S3 task took an order of magnitude longer than the SQL task, so therefore it's mostly the S3 task that determined the overall latency scores. The SQL task only had a negligible impact. I'm interested to see several other task and see how they compare. Because of what I described above, you'd probably also want to scale the difficulty of each task so that the time for the fastest language to complete the task is the same for all tasks, so that they all contribute evenly to the total latency. That, or you could just use percentages for each task and show a total or average percentage for all tasks, instead of your total latency graph.
► What should I test next?
► AWS is expensive - Infra Support Fund: buymeacoffee.com/antonputra
► Benchmarks: ruclips.net/p/PLiMWaCMwGJXmcDLvMQeORJ-j_jayKaLVn&si=p-UOaVM_6_SFx52H
Are you sure that both apps are using same amount of connections in pool? Connection pool often makes most of the performance diff in this kind of benchmark. Quarkus defaults to 20 concurrent connections, and pgpool to 4 or runtime.NumCPU() from what I have read.
Have you check performance for more than 20 connections in a pool?
@@ooijaz6063 I used the defaults, but for the next tests, I'll double-check how many connections are actually opened on the PostgreSQL side.
Perhaps you could also try helidon SE instead of quarkus. Helidon was built from the ground up by Oracle labs to use the latest java tech like virtual threads to serve requests
@@111segasonic thanks i'll try it out
I’d love to see Rust thrown into the mix as well!
A benchmark must be like this. State of art. Good job!
❤
What is state of art mean ?
@@ChengPhansivang i guess something that people can relate to :)
Wow, this is really good. The setup (kubernetes cluster, prometheus, grafana ...) deserves another video.
Thanks! Just in case, the source code with all of these components is in my GitHub: github.com/antonputra/tutorials/tree/main/lessons/201/monitoring.
Hey, can you make a video on how to setup this in local? May be with k8s supplied with docker desktop if relevant?
@@rajivkumar-ub6uj i think so, the easiest way is just package everything as docker compose or perhaps just use local minuke cluster. i'll think about it
@@AntonPutra yes, compose is the best way for larger audience. Would appreciate if you can share the compose config for this, thanks in advance
@@rajivkumar-ub6uj ok
I am go fanboy but I really like applications written in Quarkus. My first language was Java and it is mind-blowing how fast and light Quarkus feels compared to Spring
some people say it is slower than jvm based, I'll see if I can test it
You would be surprised how far spring came his way. With that being said, for long running app spring boot as none aot compiled would remain faster thanks to the jit compiler. Quarkus is really only good of you need fast startup or low ram consumption
@@lufenmartofilia5804 good point
@@lufenmartofilia5804 will test, when you say long running, how long?
@AntonPutra long running is at least 10,000 tx before you start measuring. In the real world, weeks or months...
Love these benchmark videos, nice work
thank you! :)
This second test scenario is absolute perfection in testing real world applications. It's easy to get excited about a performance difference of like 400% (for example) in a synthetic benchmark, but by including database, storage and (de)serializing, it gives a much more nuanced picture of how it would actually scale and perform. In this case I would say both applications performed well and comparable. I'd be interested in a bit of a deeper dive in these applications by including opentelemetry and seeing what functions might bottleneck.
Thanks. Well, in some tests, I used OpenTelemetry clients with this Prometheus client in both Go and Java. I'm wondering what else you would instrument besides these function calls to S3 and the database. I might include it in the following videos.
example - github.com/antonputra/tutorials/blob/main/lessons/201/go-app/images.go#L50-L62
@@AntonPutra Must have missed that detail, very well done and thanks for the reply!
@@TweakMDS thanks!
But it doesn't do that much. The programs doesn't change any data. It just uploads it.
@@GBXS I'm thinking about adding an additional test with Kafka consumer/producer and perhaps a simple ETL pipeline. Any suggestions?
Finally! A detailed comparison that just doesn’t test the /hello-world endpoint
haha, thanks!
This is definitely the best DevOps channel.
❤
Love these benchmark videos, your work is amazing!
❤️
Please do Java Spring Boot (Native) vs Spring Boot (JDK) VS Quarkus (Native) vs Quarkus (JDK)
ntoed!
And add Micronaut (Native & JDK) to this chain, plz
From the whole video I have profited so much in percentails. You have clear so much
cool
I really admire the effort you put into describing why you chose your testing methodology as well as the testing itself
Interesting comparison, BUT:
- the first tests does not test the startup time itself (should be
I try to improve each time I create benchmarks. Next time, I will definitely use the v2 Go SDK and apply some other recommendations from your side. Thank you for taking the time to leave this feedback.
@@AntonPutra glad I could help and you haven't taken it as a personal attack or something :D
I really value your videos and open source code for every video!
Looking forward to seeing more of them.
@@DillPL thank you! i actually implemented your suggested idea in the new video and reduced the size by 6 mb (45 -> 39) :) - ruclips.net/video/56TUfwejKfo/видео.html
will try other tips next as well and finally update that sdk lol
бро ты красавчик, ничего лишнего, все по делу, качество и битрейт на высоте, видосик красивый, респект!
spasibo❤️
First of all, this is the best content on youtube so far.
Well done. Thank you!
thank you! :)
this is so professional!
I love it!
please do bun vs deno v2
since deno has gotten npm compatibility , the only difference now between bun and deno (aside from being written in zig and rust) is the speed (I think , both have gotten very nice std library)
please do a benchmark comparing everything!
Great video! The benchmarks were really helpful. Keep up the great work!
thank you! will do
Would love to see C# vs Go
C# vs Go vs Java would be nice
any specific test scenarios? or the same
@@AntonPutra your current test scenarios are very good so I wouldn't change anything. Regarding C# I would use LTS version (dotnet 8) which is the fastest one amongst other versions according to Microsoft.
@@krzysi3k-yt ok, I'll maybe do it next
Rust - same tests
Great videos like the rest of what you do. I'm using your video sto improve my knowledge on cloud/kubernetes area.❤❤
thank you!❤
Nice work! The explanation around the benchmark is easy to understand and full of information there. IMHO, you should start to build your own courses on Udemy :)
thanks! maybe
please do c# vs Java, use minimal api with AOT for c# and GraalVM or whatever AOT thing Java has.
ok will do soon!
Nice job, I would like to see Test 2, but with higher RPS
Okay, I might just include additional screenshots under lesson '201' in my GitHub repo
@@AntonPutra It would be great, thank you Anton!
Please test dotnet lastest 8 vs go thanks
ok, comming next
@@AntonPutraensure to use Minimal APIs and compile it AOT.
Love these benchmarks! 🎉
thanks! i try to add some extra
Loved the video, subscribed!
thanks!!
Amazing video, great job !!
thank you!
There was a non-blocking Netty server implemented with Spring Reactive Web, which is more efficient.
for databae approach use R2DBC the reactive nonblocking data repository.
btw spring also support graalvm and it is not outdated.
ok thanks, it's not outdated just it's been around for a long time
Quarkus uses non-blocking netty
Seems like Java 21 was used but Virtual threads wasn't used for the Quarkus application. Wasn't that the whole point to using newer Java version with the performance improvements and non-blocking reactivity APIs?
yeah, i used java 21. I'll make sure to test virtual threads next time, maybe try to compare different java frameworks as well
@@AntonPutrahey, thanks for making this video.
Just that needed to point out the code looks to be done in a older/traditional method even though Quarkus has annotations that resolves traditional blocking calls that modern programming languages like Go probably already has underlying.
Great detailed video as always!
Maybe I'll try this out on my local machine to test out too!
@@henryong7788 I'll soon be comparing Quarkus with Spring Boot, and I'll make sure to use the latest language features.
Virtual Threads are not better in performance compared to fully reactive code.
Quarkus fully reactive or Spring fully reactive will always beat virtual threads, both in performance and resources usage.
@@EricSouzarys good to know thanks
The explanation why java reduces memory usage is pretty simple: gc
Go has gc too...
The explanation is because java uses a fixed heap. Normal java reserves the memory from the system upfront and you will see no change for all the run.
The Quarkus optimizations makes the internal HEAP metrics visible to K8s. But the particularities of java are still visible in those behaviours as defaulting to reserve a lot of memory upfront.
@@framegrace1 the java GC could be different at different jvm implementations. But basically it works by simple principle. The jvm perform gc then it see that heap is overused. It based on heap limit.
So, in this case jvm application started and used some heap. The heap usage isn't reached the GC limit - so don't need to perform gc. When traffic comes to jvm application - it increases the count of created objects and as consequence - increased heap usage. And when heap usage limit is reached then jvm perform gc and all objects created at start of application has been deleted.
I don't know how GO gc works and looks like it has partially different implementation.
Wonderful content Anton!
thank you!
Great video! Python vs Node plz with the same scenario :)
Interestingly, in your Test scenario 2, your Quarkus app is spiking in DB latency while having constant times in between, as if the Postgres client would be idling to gather the queries (or waiting on a lock?) and send them in bursts.
If this is indeed the case it does make the results a bit harder to pull conclusions from.
yeah, I noticed it
"There are no solutions. There are only trade-offs" Thomas Sowell
servers are cheaper than developer time
true
Not always true. Paying for each CPU cycle in the cloud you can easily get out of budget on scale. That is why optimization and algorithm knowledge is valuable again - it helps to save money.
i like this working. You are so nice!!
thank you!
great demo. as Java dev it hurts seeing java losing even with quarks native build 😢😢
I'll make some more with improved Java soon
this is very neat, i love it
thank you!
Love your videos! What tool do you use for creating those amazing animations and mounting videos?
Very nice video! Seems like if cost-cutting is of great concern, you'd lean towards go to keep CPU utilization down. I would love to see a similar comparison video between Java and JavaScript/Node.js.
thanks! noted
Nice comparison! Though I wouldn't ever compare Go vs Java native for long time runners as this one. It is true that the metrics of java at startup time would be much (much) worst but bear in mind that java has been built thinking in the startup as an edge case scenario and the JVM does a lot optimisations while the program is running, it would be interesting if java is able to beat Go in the long run. In the short run I think that there is no possible discussion and Java Native is just a work around.
Great job!
But a few comments:
Spring supports building native images as well, and they have maven/gradle plugins and a dedicated project Spring Native for this case. Actually, we are using it in production and building most of the Spring apps in native images.
Summary: GO is faster, then JVM based stuff, well no surprise here :)
In general, Quarkus doesn't give anything interesting compared to Spring, it's just a bit more modern and doesn't have much legacy stuff.
What might be interesting to look at in this regard is Micronaout, because it does a fundamentally different Framework (compile-time and supports native images out of the box in comparison with runtime Spring with additional projects and layers for native support). Most likely Micronaout will show similar to GO numbers.
thank you for your feedback. i'll get back to the java world soon, maybe next week, and make a few improvements
thanks for sharing.. can you do it with nodejs :P?
Could you do the same test for Kotlin and Java ? Or Kotlin and Go. Please 🙏
ok let me see
@@AntonPutra Would love see a Quarkus and Kotlin benchmarks compared to Spring Boot and Kotlin
@@belkocik 🫡
Why does Java's memory usage is high when it is idle? Also will it also go high when it is idle after processing requests?
I work on both Java and go, your results are similar to my observations. Java consumes memory due to too much of autoconfigurations which involves hell lot of classes + some of jdk had garbage collection issue but if you develop an enterprise ready application in go with distributed tracing, logging, metrics, database writes heavy operation etc, their performance is almost equivalent. I had to manually write all those functionalities in go Lang due to lack of autoconfiguration and libraries
Very good point. Java frameworks like Quarkus are doing a lot to make large scale application development easier. All of that stuff it's doing will affect runtime performance.
Like always you rock, can you make a video about database architecture for production like MySql Replication Group etc, Thank you
thank you! let me see
I've been seeing these videos for a while and all I see is my railway bills
😂 i have some aws credit
Very Nice! great analysis
thank you!!
Enjoyable video. Subscribed.
thank you! more to come
JIT optimization are taken out in quarkus graalVM builds for obvious reasons. While the benefits of being lower level from Graal are great, JIT optimizations are not to be underestimated and they start to trigger later on the execution so they will be less visible at first
good test after previous tries ;) but I would not accentuate memory consumption at the start of compiled java, as it does not affect anything. Also it looks like cpu doesn't do anything, so no reason to seriously compare 3% with 5%. But latency values are valuable! PS: looking at the low cpu consumption test I got an idea to test cpu intensive application. Try to create something like a redis (hashmap is fast, lets use treemap and its concurrent versions), the app will add, update and get some data, for example count of values that are greater than received in a controller. PPS: interesting to see how regular java 21 works with virtual threads, but I heard that java file io on linux is synchronous and only 22 will be modern, so it could be a reason why you got these values in the current test. Also testing regular java in a container is tricky, it’s better to test different Xmx-Xms values first, I mean starting java under the memory limit of 500mb is not the same as with 2000mb (so using compiled java leaves that headache, but compiled has lower throughput and latency :) )
Thanks, I appreciate your feedback.
Small improvement suggestion: In Java you could use a record instead of a class. This shouldn't have a big impact on the test results but at least spares you a bit of typing.
It would have been great to include a base Java and a Spring/Spring Boot comparison deployed into a java21 image container here as well just to see how much of an impact Quarkus and the native container optimization really yields. So far I couldn't convince anyone at our company to try out Quarkus.
Just a question out of interest: Are you going to create a benchmark framework where similar tasks are done by various language implementations and then release your findings to the public? I just stumbled across an other video and then this one was recommended to me, and to me it looks like your videos basically doing that but just with a smaller and more comprehensible scope. So a combination of runtime analysis of different languages for various tasks would definitely be helpful, I guess
thanks, yes i'll get to Java soon and I'll try to improve a few things
@@AntonPutra Can you try using both Spring Boot Native and Quarkus to see how much of a performance difference they have
@@MovinduLochanaWijethunge yes will do
Okay, you got me now. I will start trying prometheus and grafana. The question I have is which tools do you use for load testing? You are using word "client" for this. I assume you use some kind of tools, like jmeter, k6 or?
Add rust and javascript to the mix. Thank you for your channel
will do, i'm thinking about webassembly vs js, what do you think?
what does Java needs to have request coming in to lower its memory consumption ?
Interesting. These tests could be extended to compare Hotspot VM, Generational ZGC and a few other switches. Can you make a video of your entire testing setup (focusing on docker, kubernetes, prometheus and grafana) from scratch? I think it's totally worth it.
Thia is all good, checking how well it performs, but if its not throttling, anything is fine as long as client latency is not out of whack.
I think some stress testing will also give good comparision, like you did with node and go
'll come back to java soon with improved benchmarks
hey good test! can you test with go-chi instead of Fiber? go-chi is more optimized in terms of memory usage so that might explains why Java was using less memory in that first test.
Overall, good video! Keep it up
Thank you! I used Chi for one of my projects, but I think memory usage doesn’t play a major role in the user experience, such as client latency etc..
Woow amazing effort Man, how about Rust vs Go ?
It would be interesting to see what happens if you push requests to the limits and how high that limits are.
Additionally for the Java it can be build to native image with spring boot as well. It sometimes not that smooth though but honestly I expect it to perform better with spring boot.
Idk, native image crashes randomly and have lower performance than jited code atm. It's good only (if not crashes) for low traffic applications on serverless.
@@ooijaz6063 I haven’t loaded my test app extensively but for me it worked ok and had better performance.
It may change though with strong adoption of virtual threads in next few years and servlet api will be good again.
I have the same limits for both: github.com/antonputra/tutorials/blob/main/lessons/201/deploy/java-app/deployment.yaml#L27-L33, and I run them on dedicated nodes using the ESXi Hypervisor.
@@AntonPutra I do understand, what I wanted to say is what happens if you push client requests higher and higher. The load seemed to be not that high, so the light load conditions were tested but what would happen under high load? It can be really detrimental in real world.
great video!!
thank you!
Nice video. One comment. Scale up/down is increase or decrease the machine resources like CPU and memory. Scale in/out y horizontal scaling ;)
What’s the explanation for the memory profile of Quarkus, can someone explain this?
i'll run some more tests in near future
I’m just wondering why the memory usage goes up and up and up at the beginning, and at some point (reaching a threshold or something?) something like an aggressive GC seems to kick in never letting memory usage go up again.
How much memory was allocated to each program? the vidoe focused on showing the difference but I wanted to know say is a 25% usage of what. thx
256Mi - github.com/antonputra/tutorials/blob/main/lessons/201/deploy/java-app/deployment.yaml#L32
C# vs Go would be amazing to see. Maybe add to test 2 some simple reads from the database, and maybe add test 3 with some simple data structure or general purpose calculations to see how well each language performs. Amazing content. I am currently writing a high performance C# application for the government with .Net 8 and it is incredibly fast. I wonder if .net 8 has been improved so much that might even beat Java at this point.
Can you elaborate on using simple data structures or general-purpose calculations to evaluate how well each language performs? I don't really want to run fibonacci anymore lol
@@AntonPutra You can try to implement 3 different types of algorithms (In addition to the 2 tests you already did in your previous video).
1- Searching Algorithms (linear search) - ex: Create a List of 1 million objects (person: {Id, Name} - Id: must be unique integer 1 to 1,000,000. Name: generate random string. Populate your list with 1 objects (Person). Test: Generate a random Int value from 1 to 1million and find the Person object (by ID) using the random generated Int, and get the Name, then find the object in the list (by Name) compare and validate both ID match
2- Sorting Algorithms (sort the entire 1 million object (person) by name.
3 - File I/O Operations - generate random Int value from 1 to 1 million, find the Person in the List, write the Person's name to the first line in a file, if file already contains a line replace with the new name.
Leverage ChatGPT to create the code for you in both languages. Just some ideas, lol
@@gabrielmartinez2455 thanks! i'll try it
this is a great video! tnx!
my pleasure!!
Java runtimes were historically designed to consume the resources of the whole VM so may be you can compare a Java app running on a JVM (not a native image but a hotspot JVM) on a VM with 4 cores and 4 GB RAM vs a go app running on Kubernetes using that same VM
a good indeed comparison.
only one thing wanna further look into,
how do same test behave at high throughput like 500 / 1000+ req/s
Thanks, I may include screenshots or just improve my tests in the future.
Thanks for you video! I really like it. Could you do the same tests for Spring vs Quarkus?
thanks, will do, but first rust vs go
Would be cool to see in a future video the framework web for Kotlin called Ktor.
noted!
really nice approach to monitoring performance. can you make a similar video but with java profiling tool to detect which specific part of the code must be reworked?
Perfect work 👍
thank you!❤️
Is this using graalvm oracle or graalvm community edition ? very interest on comparing graalvm oracle AOT & JVM based vs golang
it should be the community version. i'll make some more java content in the near future.
It would be good to change a bit what the application is doing. In our company we have a piece of code that is meant to validate if we don't have any delays in network stack. To do so we tell the app to generate random 1000 bytes and sent that to client. With that nothing is cached.
Hi. Nice video indeed. Can you explain why Java uses significantly less memory under load then in idle run?
What about micronaut? Would love some benchmarks on this 😊
ok, i'll take a look, i'll get back to java soon
Very informative!
How about comparing performance of java vs python stream processors in Apache Flink?
thanks, yes, i was thinking about spark/flink and different apis: python, java, scala, etc.
nice comprarison, this could work great on a batch.
how is it going to compare on an app, that has peaks during a specific time of the day?
Thanks! There is a very small difference in terms of scalability; both are small with a fast startup time. I think Go is a little more efficient, so potentially you would need fewer compute resources.
@@AntonPutra the way java is using memory with GraalVM is very smart, is like observing the needs, then optimise the RAM needs.
This could suggest that we could provision the JAVA container with a smaller POD in term of RAM.
My concern is: how well does java handle random peaks?
if we have 200 req/s, than right after the RAM stabilises suddenly we get 500 req/s, how well does JAVA handle that peak?
is java going to panic and ask for wayyyy more memory than it actually needs?
if this is the case, than the JAVA app may actually crash for insuficient container memory.
Does it make any sense, what i've just said?
@@ionutale1950 yes, it does. i'll try to configure the client next time to simulate such spikes when I compare spring boot with quarkus
I'd be interesting to compare Hotspot (various GC) vs GraalVM(Quarkus, SpringBoot)
ok let me see
It would be interesting to test long term throughput in this comparison.
@@terribleprogrammer how long? day, 2, a week?
@@AntonPutra one week would be interesting. You can also mix up jvm, graalvm and go Lang in a single video
@@terribleprogrammer ok, i'll see if it makes any difference and if it does i'll make something
Can someone explain the extreme drop in memory usage for Java here?
Under Hotspot JIT I would assume something similar, but this is running on SubstrateVM instead of JVM. Sadly I have little experience with Native Image and Substrate, so an explanation eludes me.
What is happening here with memory usage?
I'm not an expert in Java, and I hope to get feedback from someone who is, but it looks like it optimizes for the load that is given to the application.
probably because of garbage collections (graalvm stills have a simple GC). The GC learns what objects need to be cycle e and what does not need.
Missing JIT actually affects CPU usage. You can easily test this by running a benchmark against native code and JVM. You will see that under high load JVM version uses less CPU
@@EricSouzarys i'll be testing quarkus vs jvm spring boot soon, any suggestions?
@@AntonPutra I think that you Just need to be careful to not compare orange and apples.
I think you can compare these combinations if the tests are going to be applied to an I/O bound app:
1. Quarkus Blocking vs Spring Blocking
2. Quarkus fully reactive vs Spring Fully reactive (with webflux)
3. Quarkus with virtual threads vs Spring with virtual Threads.
An extra combination could be Quarkus Fully Reactive vs Spring with Virtual Threads as virtual threads with Spring is more used than with quarkus.
A comparison between reactive quarkus against Spring (without virtual threads) would not be fair
@@EricSouzarys ok noted!
It would be interesting if you compare go vs java non native, as non native should have better performance than native. You compile java to native only if you are building a CLI or a lambda, when you need fast startup.
ok noted!
I do think you should do some kind of load testing on the cheap 5$ instances. For example how many requests these cheap vps can handle before they crash, using golang, rust, php etc.
It's great if you can benchmark framework from bun runtime like Hono and ElysiaJS
ok noted!
I run some tests a while ago just benchmarking algorithms with different languages. To my surprise Java always run them faster than GO
Well, when you deploy to Kubernetes, you have cgroups and other constraints that could affect performance. But as soon as I find a use case where Java performs better, I'll make an updated video-maybe something like a Kafka consumer/producer data pipeline. I'll see.
It would be interesting to add C, Rust, NodeJS and Python to the mix.
A native image of a Java app would use less memory but the throughput will be worse than a JVM version. So for CPU and latency comparison you should compare with a JVM not a native app
ok noted!
i wona see spring boot native image vs Quarkus vs go
spring native is framework like Quarkus so it's nice to compare this 2 framework
ok noted!
Java native images give slower performance at runtime than normal jars because of the lack of hotspot optimizations at runtime. To achieve a similar performance it should be optimized through a previous profiling process.
Thanks for the feedback, someone already mentioned that. I'll run some tests in the near future
can you please do a GO vs node.js Lambda testing? with cold start time, memory usage and other metrics
ok i already have some lambda benchmarks in that playlist but i'll refresh it soon
How to create this setup of k8s + grafana + promethues .
Please create a video of this setup
well i use esxi hypervisor to host my Kubernetes at home, i have some steps to reproduce it with metalb if you are interested - github.com/antonputra/kubernetes-on-premise
Also to deploy monitoring stack (prometheus & grafana & cadvisor & kube-state-metrics etc) i use terraform with yaml
tf - github.com/antonputra/tutorials/tree/main/lessons/201/terraform
yaml - github.com/antonputra/tutorials/tree/main/lessons/201/monitoring
@@AntonPutra thanks! 🫱🏻🫲🏼
If those java containers are converted into graalvm native code, I wonder what the comparation will be ?
By default (depending on your garbage collector choice) java will reserve a big block of memory to manage but then clean. I'm not sure how Go does it, but you can probably configure Java to "use" less memory by setting the min and max differently and other configurations.
thanks, i'll try to optimize it more next time
Hi, Nice job, thank you.
idea for next benchmark test : Kubernetes vs K3s
They are platform so how would you like to compare?. If I have some nodes & VMs then I will stick to K8s, otherwise K3s.
ok, I'll see if it makes sense. I'll create some benchmarks or maybe just make comparisons.
@@AntonPutra I was thinking about CPU and memory benchmarks on the NODE, i.e. what Kubernetes vs K3s eats of the Node performance.
Otherwise, I just discovered the ClickHouse and meilisearch databases, it seems really good. (sorry for my English)
@@picatchumm64 ok, got it, basically infrastructure test, how well both can handle load etc, and which one is more efficient/cost effective
I would also compare compile time of quarkus native and go executables....
ok noted
This video is good but really is not a fair comparison if you used GraalVM CE. You're comparing apples to oranges.
GraalVM EE has PGO which allows the native build to benefit from, well, profiling, so it'd map and optimize the call tree among other things.
Obv. EE is not free. This is the whole point, Oracle would not freely distribute optimizations to the CE edition.
thanks, noted
+1 and use jdk 22 with virtual threads enabled
@@rajivkumar-ub6uj thanks, I'll use virtual threads next time, but for some reason, jdk 22 isn't available for ubuntu yet
Great Benchmark! can you share the promQL for the metrics? is it some plugin or you wrote it by yourself? thanks
thanks! it's just open source and i actually have dedicated youtube tutorials how to measure, cpu/memory/vpc etc..
here is a dashboard and promql queries for this specific video - github.com/antonputra/tutorials/blob/main/lessons/201/dashboard.json
java metrics - github.com/antonputra/tutorials/blob/main/lessons/201/java-app/src/main/java/com/antonputra/ImageResource.java#L51-L59
golang metrics - github.com/antonputra/tutorials/blob/main/lessons/201/go-app/metrics.go#L13-L27
@@AntonPutra is there any reason to prefer summary over histogram? And can you please share the link for your measure tutorials? Thanks a lot!
@@LawZist use summary in edge cases when you have a single instance of the app and you can only scale vertically, cause it's not possible to aggregate them over multiple instances, for example to get p90 percentile for 5 replicas of your app. With summary prometheus compute p90 on the client itself. Use histogram in all other cases
ruclips.net/video/WUBjlJzI2a0/видео.html
ruclips.net/video/VjFFzGFyVlY/видео.html
ruclips.net/video/dMca4jHaft8/видео.html
ruclips.net/video/ff_XHm96PKQ/видео.html
@@AntonPutra thanks!!
Would be nice to see Go (Fiber) vs Bun (Elysia)
ok noted!
and what will happend with Quarkus vs SpringBoot
interesting, i'll test it soon
Dhanyavad
my pleasure!
It would be interesting to see a test to failure, who and under what load will start throttling
yes will do with improved java next time
Looks like the s3 implementation isn't very optimized yet in Java. I do think that the test is a bit skewed, because the overall latency is mostly impacted by the task that took the longest. The S3 task took an order of magnitude longer than the SQL task, so therefore it's mostly the S3 task that determined the overall latency scores. The SQL task only had a negligible impact.
I'm interested to see several other task and see how they compare. Because of what I described above, you'd probably also want to scale the difficulty of each task so that the time for the fastest language to complete the task is the same for all tasks, so that they all contribute evenly to the total latency. That, or you could just use percentages for each task and show a total or average percentage for all tasks, instead of your total latency graph.
thanks for the feedback, i'll get back to Java soon