How Much Memory for 1,000,000 Threads in 7 Languages | Go, Rust, C#, Elixir, Java, Node, Python

ThePrimeTime

Просмотров 866 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 16 янв 2025

Комментарии • 1,3 тыс.

@jonathan-._.- Год назад ⁺¹⁶²⁰
compaaring actual threads with async tasks seems kinda weird
@ccgarciab Год назад ⁺¹⁷⁸
And workers and a plain event loop. Terrible all around.
@MikyLestat Год назад ⁺⁶⁷
They are not the same, but having async tasks is a powerful functionality that isn't available in all languages. It is correct he wasn't comparing the same, but you could argue that he was comparing how you would achieve the same thing if you wrote it in each language
@lozanov95 Год назад ⁺²⁸
@@MikyLestat Depends, because with Python you will run on a single thread, but with go for example you will use multiple threads. If you are actually computing anything this will make a significant difference.
@MikyLestat Год назад ⁺¹¹
@@lozanov95 Exactly. I think that the reason for the comparison is to get an indication of how much memory (minimally) each programming language will use to achieve the same thing. Achieving the same thing in each language is translated to using the features and constructs of each language. Python is a great language, but it isn't the fastest. The global-interpreter lock (in addition to Python being interpreted in CPython) causes it to be slow.
Just because Python doesn't really have multi-threading, it doesn't mean we shouldn't use multi-threading/tasks in other languages and then profile the memory footprint.
@davidstephen7070 Год назад ⁺²
@@MikyLestat i think, this's wrong ways to compare language that only run in single thread vs multi-thread to get requirement memory to run that tasks. garbage collector have feature to queque overload thread. so fastest process means lower memory. and for tasks that have high range let say. first task 20KB, 70th task 1MB. Initial size heap higher give good response than set initial size to 50KB and re-allocate memory size. This all dependent user hardware to choose process ways or memory ways. if memory cheaper than cpu. than go memory, if cpu cheaper then choose like go or rush that re-allocator frequently
@thedoctor5478 Год назад ⁺²²⁰⁹
Using Python's asyncio for this test was the wrong thing to do. It's similar to what was done with NodeJS. Asyncio is an event loop, not a thread. Python has threading libs for threads.
@Kobrar44 Год назад ⁺¹⁰⁸
multiprocessing xD no need for a benchmark, it would be just atrocious
@nikonyrh Год назад ⁺⁶⁹
@@Kobrar44 Yeah just run "multiprocessing.Pool(int(1e6))" and you are good to go :D Argh I hate python, but it is still my main language.
@just_a_random_ Год назад ⁺²⁹
@@nikonyrhJust curious, why do you hate Python ?
@magicbob8 Год назад ⁺⁷⁴
But asyncio is faster because pythons multithreading is so bad, so it’s what people use. And it accomplishes the same things
@ibrahimaba8966 Год назад ⁺³⁴
this is an IO-Task so asyncio is the good solution!
@nunograca2779 Год назад ⁺⁹⁵⁵
If I'm not wrong, C# uses a theard pool behind the scenes when using async/await and what it does is it recycles theards. That's why in the first test it was way up than the others. I think that was the threads pool being initialized with a bunch of threads.
@dziarskihenk8798 Год назад ⁺³⁸
this.
@3ventic Год назад ⁺⁶⁴
Yup. It always allocates a fixed size pool of managed threads depending on the system it's running on, unless you set the size yourself, which is possible and would be separately interesting for this benchmark.
@MikyLestat Год назад ⁺⁸⁸
@@3ventic The ThreadPool default is much smaller, it shouldn't take 120 MB at idle. I'm betting he wasn't distinguishing between allocated and committed memory.
@GabrielSantAna-sm9zh Год назад ⁺³⁰
as far as I know, C# also compiles the async methods to stateful classes, so it generates the states of each “step” of processing beforehand, when you create that amount of tasks you are basically creating a list of super small instances in a queue to the threadpool to consume until the next state (await) and throw again in the end of the queue
@3ventic Год назад ⁺¹¹
@@MikyLestat I was a bit mistaken, but there is a fixed minimum number of threads (ThreadPool.GetMinThreads). On my system it's 32 by default and the equivalent program on my system (1 task) takes up 195M RES 108M SHR while a million tasks is using 52 threads and 472M RES 23M SHR.
@hansenchrisw Год назад ⁺⁶³⁷
As a Java apologist, it first got virtual threads in 1997 with version 1.1 (edit: later removed and recently re-added in v 19). Also, Java (and presumably.NET) pre-allocates a bunch of memory by default. Hence how mem looks high for small numbers of threads and it doesn’t increase until you hit bigger numbers.
@Talk378 Год назад ⁺⁶⁰
Yep, rare prime L
@elraito Год назад ⁺²⁶
Yes bu ran the same code aot comüiled for c# and its only 5mb baseline. The blog author misrepresented c# badly
@hansenchrisw Год назад ⁺³¹
@@elraito no doubt, but I don’t expect someone to be proficient at all those langs/runtimes.
@giuliopimenoff Год назад ⁺⁵
That's why they should have used Kotlin coroutines
@mishikookropiridze5079 Год назад ⁺²
@@elraito That's the variation introduced by running it locally.
@devotiongeo Год назад ⁺²⁷⁷
Creating a million concurrent "tasks" (or spawning processes as we call them in Erlang/Elixir) and allowing them to remain idle is one thing, while making those processes actually do something, such as each one of them having a persistent connection to a client and feeding it, is something entirely different. In practical terms, when it comes to real-time apps, the BEAM (Elixir/Erlang) outperforms all other languages by a significant margin.
This is precisely why Brian Action and Jan Koum chose Erlang for WhatsApp after years of experience with Yahoo Messenger and Yahoo Chat Rooms. If someone hasn't had the opportunity to work with any BEAM language, the above statement may appear to them as an empty boast, and I can't blame them for that.
@ThugLifeModafocah Год назад ⁺⁵
But then this example needs to be done and showed to the world as this primeagen is reacting. I'm surprised with Elixir performance here... in a bad way.
@xbmarx Год назад ⁺⁴⁷
@@ThugLifeModafocah I'm not. Erlang processes are completely isolated. COMPLETELY. Every "task" has a separate GC, memory space, everything.
@szymonbaranowski8184 Год назад ⁺⁹
@@xbmarxso if things crush only these things crush that's a feature itself
@Aaku13 Год назад ⁺³¹
The BEAM is pretty quick, but it won't "outperform all other languages by a significant margin". Ran several huge elixir services in production with lots of traffic and our Go services were much more performant.
@osazemeusen1091 Год назад ⁺⁸
@@Aaku13I can agree for only CPU bound tasks. For IO bound tasks, Golang doesn't come close in performance to Elixir
@shreyassreenivas4786 Год назад ⁺¹⁵⁵
Go reserves 4K of memory for each thread's stack so you could do quite a bit of work on each of those threads without incurring further costs.
@demyk214 Год назад ⁺⁶
Makes sense
@-rate6326 Год назад ⁺⁸
goroutines aren't threads.
@tablettablete186 10 месяцев назад
@@-rate6326Yeah, GO actually creates all threads at startup and just assign gorourines to them.
All of this to say: it's a thread pool lol
@dejangegic 5 месяцев назад
@@-rate6326Yes, goroutines aren't threads. But they do need to run at some point and the ones that aren't running are just waiting and we aren't talking about them
@Malenbolai 5 месяцев назад ⁺⁶
programming languages assuming that you would use the threads to do actual work
@TanigaDanae Год назад ⁺¹⁵⁷
An information that has not been said in the video is that: async functions in C# are State Machines and Tasks (are part of the Task Parallel Library and) are automatically run in thread pools. So the only internal state these async functions have is the time they need to wake up, and all Tasks could theoretically have the same wakeup time.
I would've loved to see a C# Thread implementation. I suspect the C# compiler is optimizing redundant Tasks away since they lack any side effects.
@vitskr1 Год назад ⁺¹⁰
Thread pool has like 512 preallocated threads, hence high memory usage in idle. Tasks are actually running, but max degree of parallelism is 8 (8 threads CPU) so there is practically nothing allocate.
@q1joe Год назад ⁺²
@@vitskr1 you can tune this, knowing your workload though. Some languages I feel didn’t he the best showing here as the author isn’t an expert in each one, which is understandable
@monad_tcp Год назад ⁺²
@@vitskr1 Exactly what I suspected ruclips.net/video/WjKQQAFwrR4/видео.html . Its using the Server tuning, I think on Desktop the default is Number of Cores * 2 .
@monad_tcp Год назад ⁺⁴
@@vitskr1 512 threads * 512Kb = 256MB . Its not that big of a deal for servers with lots of cores.
@bangonkali Год назад ⁺¹
@@monad_tcp i agree. and irl if you plan to launch 1M concurrency your probably have the RAM to match. i still don't think many people do these in a single process anyway. probably better to distribute workload to multiple servers. i recommend orleans 7 for c# devs. 😅
@casperes0912 Год назад ⁺¹²⁹
There's also the memory vs. speed tradeoff. Sometimes keeping more things in memory can also make it faster. If the managed environments that have a higher starting point in memory usage already has a bunch of kernel threads lying dormant in a thread pool that's taking up memory but speeds up spawning of threads.
@alephcake Год назад ⁺¹⁶
if my hello world doesnt use 27 gigabytes of ram i wont write it
@maximumcockage6503 Год назад ⁺⁵
Yeah. Bun.js was priding itself on being faster than Rust in it's beta. Then when it came out and people started benchmarking it was slightly faster than rust by like a few percent, but used 40 times more memory on average.
@bryanenglish7841 Год назад ⁺³⁰⁰
You forgot the extra Rust thread it takes to track all the bullshit drama in the Rust community
@Marhaenism1930 Год назад ⁺²³
oopsy! is it new feature of crablang in 2023?
@BlackistedGod Год назад ⁺¹²
dammit why did I laugh so hard on this
@JensRoland Год назад ⁺²⁸
The Rust forums are just clogged with unproductive / outdated discussions that lead nowhere and make it harder to get anywhere as a community. The mods should simply go through all the threads once in a while and nuke the ones that are no longer relevant or helpful so the good stuff can get more space and everything would run smoother. Maybe they could even automate this with an LLM agent? They could call it “RustScheduledGarbageRemover”
@juniuwu Год назад ⁺¹⁴
@@JensRoland Garbage Collector? BAN
@JensRoland 11 месяцев назад ⁺¹⁶
@@juniuwu banning people is just garbage collection for communities ;-)
@Deemo_codes Год назад ⁺²⁶²
Each elixir process spawns with a 50k heap, garbage collection happens on a per process level (you dont stop the world, you stop a process). This is because the way processes are used in elixir is like how microservices are used. Each process does a small amount of stuff then sends a message on to another service.
The erlang vm that elixir runs on will launch 1 scheduler per cpu and does pre-emptive multitasking. So if you had 1mn processes doing stuff you would get each process executing for a few ms then being switch out and added back into the queue that the schedulers pull from. So if you have more cores you get more parallelism, if you only have 1 core you still get concurrency.
Whereas async runtimes tend to be cooperative require some form of explicit yielding from a running task, elixir will just swap stuff out. Makes it good for soft realtime stuff, if you want to do cpu intensive things you can delegat to NIFs (native implemented functions) written in C or Rust. The rust ones tend to be safer since panics are caught and raised as errors in elixir. Wheras a panic in C will crash the whole VM
@Overminddl1 Год назад ⁺²⁰
You can also specify the memory usage of a process as well on the beam VM, this significantly reducing the amount of memory something will use whenever it's spawned and doesn't really allocate anything, like in this case
@madlep Год назад ⁺¹⁹
And to do a test closer to what some of the other runtimes are doing, just call :timer.send_after(10000, :done) a million times, and then do a loop to receive :done 1 million times. Takes about 200mb instead.
@genericjam9866 Год назад ⁺⁶
Elixir / Erlang processes have far less memory by default. More like 256 bytes but depends on word size on your system iirc.
@nyahhbinghi Год назад ⁺³
really smart GC model! Elixir was very well designed
@nyahhbinghi Год назад ⁺²
I wouldn't compare it to microservices. I would just say Elixir processes are independent and don't share memory. Which really makes it unique (I don't know of another runtime like this except Node.js webworkers).
@Hallo503 Год назад ⁺¹⁸⁵
C# has the lowest memory usage because it is using the threadpool, that recycles blocking threads, like when calling Task.Delay. So there aren’t actually a million threads created but rather they are queued into the threadpool. To avoid this create the threads explicitly
@user-qu5cc5oe2h Год назад ⁺⁷⁹
pff... everyone knows that c# offloads 50% of tasks on Azure servers
@dieSpinnt Год назад
@@user-qu5cc5oe2h ROTFL.
As a first time viewer I asked myself if ThePrimeTime is always on that level of cocaine?
Well, its something different than other coding channels. A fresh breeze, so to say .... **g**
@muaathasali4509 Год назад
@@user-qu5cc5oe2h free compute hack
@qendrimimeri8561 Год назад
@@user-qu5cc5oe2h😂
@gregorymorse8423 11 месяцев назад ⁺⁶
No shit, Sherlock, all of the languages were using threadpools except Java and Rust with real worker threads. So you've failed to uniquely qualify C# altogether.
@ThePhoenixProduction Год назад ⁺⁴⁶³
Where is c++?
@ErickBuildsStuff 9 месяцев назад ⁺⁴²
None cares😅
@ThisIsMaddock 9 месяцев назад ⁺³⁴⁴
@@ErickBuildsStuffAh yes, no one cares about one of the most important and influential programming languages of all computing history
@InternetExplorer687 9 месяцев назад ⁺¹²⁷
@@ThisIsMaddockid argue that C is more influential but yeah, saying no one cares about the language most used in most performance critical applications, that also need low level access to memory, is a really big stretch.
@jstro-hobbytech 9 месяцев назад ⁺¹⁴
This guy reminds me of yongyea. Parrots other's work and makes more than the authors combined. He has no insight or original opinions or educated insight (from experiences academic or otherwise).
I hate how people raise this guy up.
Agreed on c++. That's my personal preference as I like the syntax being I learned it the same term I took cobol, Java (when it was new), visual basic and oop was still being defined.
I've never worked in industry as a programmer but keep up to a middling ability.
One thing I do know is that bullshit always smells like bullshit and this dude is full of it. People that talk during react videos do so only to fall under fair use, I see the same here transposed to a topic he is novice. Want for choice as mediocrity's excuse is no less evident than an untrained hand on display for no person's betterment or an opiate of excuse to be subject for one not turning to their purpose.
I'm as wrong as apt to be right so there's that as well.
@AOSP-is-still-Linux 8 месяцев назад ⁺⁵
@@jstro-hobbytech I personally use Rust as it keeps some of the cpp syntax and adds on top of it to prevent common mistakes.
@chigozie123 Год назад ⁺¹⁶
The go results are not surprising. It's a well-documented feature that each goroutine starts with an initially pre-allocated stack size. Prior to go 1.2, it was 4kb, then it went to 8kb, and I believe it's now at 2kb for go 1.4+.
So 2kb × 10k means an additional 20mb on start. At 100k, it means a minumum of 200mb on start.
The math seems pretty consistent with the results we see for go, although they seem to suggest that initial stacksize may be closer to 2.7kb than 2kb.
We also have to keep in mind that there is a garbage collector running in there, and we didn’t account for how much memory it requires to keep track of everything going on.
@diadetediotedio6918 Год назад ⁺²⁹¹
C# was the winner, clearly everybody was expecting this
@sanampakuwal Год назад ⁺⁷
yes
@shreyasjejurkar1233 Год назад ⁺³⁰
Of course, kudos to .NET runtime team! 😎
@mattymerr701 Год назад
Clearly they fucked their setup
[Insert cope here]
To be fair, they did fuck it but...
@cnikolov Год назад ⁺⁷
Running as AOT has even smaller footprint
@FilipCordas Год назад ⁺¹⁵
Also he wasn't using ValueTask, they reduce the memory consumption considerably. But I hate tests like this because a compiler could remove everything before the code isn't doing anything.
@Trekiros Год назад ⁺⁴⁸
Intro: let's not compare apples to potatoes
The rest of the video: compares making threads with maintaining an event queue
@MyriadColorsCM Год назад ⁺⁴²
9:30 - In the 19th century the german mathematician Georg Cantor proved that there must be more than one kind of infinity, such a the infinity of the natural numbers, and the infinity of real numbers and so on, and that there are larger infinities than others. The smallest infinity is that of the natural numbers, and its called Aleph Zero.
So yes, Buzz can indeed go to infinity and beyond, so long it is mathematical infinity.
@ko-Daegu Год назад ⁺³
pretty cool i remember studying this part of set theory and how Alef (first alphbet in Arabic) the idea is that the set of natural numbers (1, 2, 3, ...) has the smallest cardinality and is denoted as Aleph Zero (ℵ₀)
@JamieNeubertPedersen Год назад ⁺¹
Thanks. Was thinking the same.
@drtfsghdfghdgfshdgfhdgfhdg Год назад ⁺³
Nothing "and so on". That is not clear. In fact it can neither be proven not disproven with standard mathematics. It is called the continuum, hypothesis
@mykhailonikolaichuk6392 Год назад
@@drtfsghdfghdgfshdgfhdgfhdg The continuum hypothesis is that there are no intermediary infinities between "infinity of integers" and "infinity of reals". It is, indeed, but an axiom. However, the cartesian product of a set with itself ALWAYS yields a set with higher cardinality, so infinitely many distinct infinities can be constructed by the repeated usage of it.
@d7ffab979 Год назад ⁺²
@@mykhailonikolaichuk6392 That is just wrong. Infinite cartesian products of natural numbers, for examples, are "just" rational numbers.
@W1ngSMC Год назад ⁺⁷⁴
To be fair, Elixir is spawning new processes with their own memory and PID (inside the VM).
@isaacyonemoto Год назад ⁺²⁸
And also providing stuff for graceful restarts and an entire message queue
@BosonCollider Год назад ⁺¹⁵
And preemptive scheduling, if any one of them fails or blocks indefinitely it cannot take the rest down with it.
@sukidhardarisi4992 11 месяцев назад ⁺¹
usage of Task.async in elixir, it comes with lot of boiler plate that is wrapped on top of GenServer. if the test has to be performed for concurrent tasks, one could go with primitives like spawn, send and receive in order to know the true potential. Just my opinion on why elixir used a lot of memory.
@gregorymorse8423 11 месяцев назад ⁺²
It's not doing anything. The erlang process concept has nothing to do with threading. Sure it explains the memory usage, but there are ways to pool it so a maximum amount of processes could be spawned at any time.
@markusn4614 Год назад ⁺¹³⁸
That C# method has 2 extra layers, the code inside the for loop should just be tasks.Add(Task.Delay(TimeSpan.FromSeconds(10)));
@Eirenarch Год назад ⁺²¹
This 👆
They created threads to run their threads inside
@PetrVejchoda Год назад ⁺¹
@@Eirenarch No it should not. If you did it the way you describe, the work (in this case represented by Task.Delay) would not be scheduled on TaskScheduler and would instead be done on the thread that this code is running at thus blocking it and not using CPU cores to its fullest.
If any, it should be Task task = Task.Run(Task.Delay(TimeSpan ...)); tasks.Add(task); This would save some memory while still scheduling the work on worker threads.
I am not sure if there would be any benefits, if you used TaskFactory and Scheduler directly, whether it would be more performant, but I highly doubt so.
Task itself is glorified coroutine and job child. Its just a premise of an action, that can wait for other actions to complete. Task.Delay does not do anything with scheduling, or threading. It just writes a timestamp, and deposits the Task to run later, when the proper time has come. But it would not start new thread/virtual thread/Task/Coroutine. Since they are trying to figure out, how costly scheduling a new thread/virtual thread/Task/Coroutine is, this would not do the work.
@manpt123 Год назад
c# and you are the 2 most useless stuffs
@FilipCordas Год назад
Also I don't see value tasks and the list doesn't have a buffer set.
@taqial-faris6421 Год назад ⁺⁹
I was looking for this comment. Guy who created that blog clearly knows nothing since he is using chatGPT and chatGPT also knows nothing if it outputs that kind of code... But hey, even my 'senior' coworker used to write async code like that so who am I to judge.
@NameyNames Год назад ⁺⁴⁴
As likely already pointed out, C# uses a thread pool, and will definitely not create a gazillion threads in this test, and the memory required to house all of these insignificant tasks will be very small, which is apparent in the test results.
I tried it out in LinqPad, but with one additional task whose only purpose was to keep track of the number of simultaneous threads actually in use. For 1 million tasks, the actual active thread count peak never even exceeded 50 on my system (usually much lower). No wonder, when all that the tasks are "doing" is async-waiting on a delay.
This benchmark is broken in the sense that it doesn't really do what the author thinks it does, i.e. it does NOT create a lot of threads (virtual or otherwise) in all languages/runtimes, and measuring the memory usage is thus close to pointless.
@aoi0s9 5 месяцев назад
as
@baxiry. Год назад ⁺¹⁵
There is some important information not mentioned in the article. Goroutines are compared to threads, whether real or virtual, but they are not compared to an event loop. Go has event loop libraries, and since the author of the article has used the event loop in other languages, he should also use it in Go to ensure an unbiased comparison.
Additionally, the advantage of goroutines over threads is their portability; they do not depend on the operating system. If your application requires low-level operation, such as with chips or microcontrollers that do not have an operating system, a goroutine can still be executed. This is not possible with threads, as the language does not perform the task-the operating system does. Where there is no operating system, there are no threads.
One last thing: when an application uses system threads, the system reserves memory. The question is: Did the author of the article account for the memory reserved by the system?
@metaphysicalconifercone182 Год назад ⁺¹¹⁰
I wonder why Kotlin wasn't included, I guess it does share similarities with Java and Go but it's implementation of Coroutines is supposed to be different from that in Go. I guess testing it would also have to include both JVM and Native compile targets because you never know.
@avalagum7957 Год назад ⁺⁶
If you include kotlinx library, you should add Scala Actor, ZIO ... too.
@DeliOZzz Год назад ⁺⁵
@@avalagum7957 suspend keyword and channels are part of the standard kotlin library. Coroutines package includes coroutines' builders and stuff like flows.
For some reason Prime just ingores Kotlin whatsoever :/ But i'd really like to watch some quality kotlin roast.
@sharkpyro93 Год назад ⁺⁸
@@DeliOZzz cause its not a popular choice for backends, alot of people still thinks kotlin is only for android, im afraid this stigma will stick around for the time being
@AlanPCS Год назад ⁺⁵
It runs in the same VM. At most it would be equal to a competent implementation in Java only.
@wlockuz4467 Год назад ⁺³²
It should've been "To infinity and NaN" as an homage to JavaScript.
@Jmcgee1125 Год назад ⁺¹⁹
15:11 Python, by default, only uses one worker thread. When writing asyncio code you do need to be careful that you don't block. My understanding is that each event loop may have only one worker, but I'm not experienced enough to be confident in saying that.
@stevenhe3462 Год назад ⁺¹⁷
Elixir reserves 4kiB of RAM for each of its processes. Each process in Elixir has its own separate heap to eliminate the possibility of stop-the-world-GC.
@llothar68 Год назад ⁺²
Each Linux kernel thread needs 32kb (28kb of it are non swappable physical kernel stack space) + 1kb for kernel structures.
@andzagorulko Год назад ⁺²³
C# has threads. Benchmarking Tasks instead is just confusing, because those aren't theads.
@pavelyeremenko4640 Год назад ⁺⁶
As you may have noticed, he's benchmarking green threads(tasks in c#, goroutines in go, etc.) across the languages.
@carlinhos10002 Год назад ⁺⁶
C# does not have green threads. Tasks are not green threads
@pavelyeremenko4640 Год назад ⁺²
@@carlinhos10002 Now that I've re-read the definition of green threads, I'm not sure how they aren't. They are not OS managed. They are lightweight thread-like primitives managed by the runtime. What are they missing?
Wikipedia also lists them as such on en.wikipedia.org/wiki/Green_thread
Not sure if this is as important though, every language in the lists was using their concurrency primitive built on top of some managed pool anyway.
@metaltyphoon Год назад ⁺²
@@pavelyeremenko4640 he’s just making things up. Most implementations are using some abstraction over OS thread. Only one of Java and Rust versions dont do that.
@zephyrprime 8 месяцев назад ⁺¹
C# tasks use a threadpool to execute. But one thread can have multiple tasks waiting simultaneously and the code this guy used had each thread sleeping for several seconds
@SirBearingtonSupporter Год назад ⁺²¹
You actually pointed this out early on. In the Java and C# version, he uses "ArrayList" without specifying the size.
ArrayList in both these languages hold an actual Array object. It's why the lookup time for "get" is a memory address lookup time.
When Java needs to expand the array size, it creates a larger array that is twice the size of the current array size. I believe the default is 10.
Java also doesn't run the garbage collector unless it needs to be run or specifically invoked with System.gc.
Because the JRE doesn't plan ahead for your bad code, it just looks for a new place to put the object in memory, leaving all the old references that need to be deleted alone - because the GC will deal with it as needed.
Just to recap there are several arraylist objects each holding an array of size n (below) in memory - and if the JVM is given enough memory, all 11 of these will still be there.
So that means there are 20510 threads in memory on the test.
While his approach to joining all the threads was barbaric, it's also the accepted answer on StackOverflow, we are not measuring the speed of the execution, just the memory of it.
If you were not trying to measure the memory performance of threading on difference languages, I would actually give java more threads to manage the threads (parallelize stream).
Finally thoughts,
We aren't concerned about thread space in production equipment, we are concerned about execution time and if my entire program hangs because one calculation couldn't be done, I'm missing out on something important - it could be a trade, moving servo for a robotic (self driving cars) or producing an input for a chess game. Collecting the information that I can allows me to implement an algorithm that is capable of making educated guesses based of what was calculated.
If we do care about thread space, we would be better off doing single threaded applications since we don't have an overhead associated with the effing cost of the thread.
TL;DR
Something something short equal something something int because the JVM go fast blah blah addresses blah blah blah 4. (primitive array blah blah addresses, blah blah)
@igordasunddas3377 Год назад ⁺⁵¹
Man I am allergic to empty catch blocks in Java - always. After looking for exceptions that have never been rethrown or really handled, I am really on the fence. Empty catch blocks should not exist or even be allowed...
@gregorymorse8423 11 месяцев назад
You are allergic to using your brain, yes we know. Maybe if you knew what checked and unchecked exceptions are and stopped making dumb comments. This is why you should stop the drugs and go back to school, fool
@albertmagician8613 8 месяцев назад ⁺¹
I have no problems with empty catch blocks, as long as my compiler is allowed to optimize them away.
@5374seth 7 месяцев назад
I'm allergic to exceptions. I will wrap all my code with empty catch blocks to further mayhem and until everyone else is conditioned to hate exceptions too.
MWAHAHAHAHAHAHHAH
@madlep Год назад ⁺²³
The Elixir solution has a LOT of room to squeeze out. I can get it running in about 990mb with some tweaks. Main thing is the default heap size. Passing `+hms 1` as part of `erl` options sets default size to 1 4-byte word. Also, using plain spawn calls instead of Task (which accumulates results, and adds extra memory and GC and processing overhead) reduces it further.
@mennol3885 Год назад ⁺⁴
True, but as long as the "threads" don't actually do anything it is a useless comparison. The constructs on these platform all provide a different feature set, so comparing performance is bogus. I mean a C# Task is just one or a few objects waiting in several queues to be invoked by native threads in the thread pool with a job stealing algorithm. NodeJs and Python are single threaded with a single event loop. I don't know what the others do and give you for free, but this isn't apples to apples.
(Edit: I automatically type thread with a capital T)
@madlep Год назад ⁺¹⁰
@@mennol3885 Yup. The comparison is pretty meaningless. The "cheap", non-idomatic Elixir way to do this, would be to start 1,000,000 timers, and wait for them to finish. Effectively doing the same thing as some other platforms. I just tried that - uses about 200mb in total of memory.
If all it's doing is starting something that sits there idly for 10 seconds, there isn't much difference.
No point carting round a whole isolated separate stack and heap for each process, and associated house keeping. Elixir processes are cheap, but they're not *that* cheap.
@smallfox8623 Год назад ⁺⁸⁸
i'm ready for the C# arc let's go, it has a really bad reputation that is totally undeserved these days
@reddragon2358 Год назад ⁺⁶
True.
@MH_VOID Год назад ⁺¹
My personal hate for it came from the pain of trying to use it in my SW dev course on linux compared to those windoze fags who have first class support for everything, and from missing a bunch of the things I love about Rust when doing C# (e.g. immutable by default, f, u, i (though byte is fine and I guess using "long", "short", etc. isn't really bad. more just personal preference and more efficient), match, traits, enums, macros! True some of these stuff are to a decent extent available in C#, but the.. culture doesn't use them primarily like Rust does). But the language itself genuinely looks pretty nice, and has some nice features and shit even over Rust. I'm definitely comfortable calling the language "better Java", and would be okay programming in it professionally or even hobbyistically.
@reddragon2358 Год назад ⁺²
@@MH_VOID Yeah. Rust is very intriguing language (excluding the dramas and BS). Also things should be a lot better than before. Although there still is some windows/Microsoft bias in the language.
@sohn7767 Год назад ⁺²¹
I think C# is great honestly. Not the best in anything, but it’s good in many areas
@reddragon2358 Год назад ⁺¹
@@sohn7767 Yeah agree. And I think that it is its main strength. That it can be used for everything.
@autismspirit Год назад ⁺⁵⁷
tbh the C# number kind of makes sense, it scales incredibly well, especially in later .NET versions. Some C#-based fancy Unity optimizations can beat out GCC in raw speed and memory.
@autismspirit Год назад ⁺⁶
Granted, there is probably some optimization going on in Release mode, since it's not doing anything. I'd expect the memory consumption to be higher, but not 4GB high.
@marcossidoruk8033 Год назад ⁺¹¹
What do you mean by "beating GCC" last I checked GCC was a compiler.
@CorvinhoDoMal Год назад ⁺⁶
@@marcossidoruk8033 yeah, the optimizations are made by the compiler. He meant the C language, but specifically with GCC. If you used the microsoft compiler or other options you would have different performances.
@marcossidoruk8033 Год назад ⁺¹⁷
@@CorvinhoDoMal No way C# is going to beat carefully written C code in any imaginable benchmark ever, its just impossible.
Plus what he said makes no sense, "unity optimizations" how do you compare C# unity performance with C unity performance if you can't do unity scripts in C? Am I going crazy or what.
And if he means the engine that is written almost in its entirety in C++
@janus798 Год назад ⁺¹⁴
@@marcossidoruk8033 Google the Unity Burst compiler. Faster than GCC in fibonacci and NBody simulation.
@robfielding8566 Год назад ⁺²⁰
Go is definitely not a memory hog; at least for IO-intensive tasks. The main thing is that the Go libraries are always very careful to stream large inputs; rather than buffer them in memory. Java itself doesn't really have major memory issues beyond spawning threads; but in any large Java project, the code will be full of things being buffered into arrays, rather than being streamed. I tried rewriting netty to make it stop doing dumb things; and just switched (permanently) to Go. Part of Java's program is also the legal issues of shipping a JVM; and the existence of Oracle thumb-breakers and lawyers; to come punish you for shipping.
@nenadvicentic Год назад ⁺³
C# code was not written correctly. Code snippet wraps one task into another `Task.Delay(...)` into `Task.Run(...)`, creating 2 million tasks and every 2nd task wrapped into another task. Correctly written code would have had consumption ~176MB on .NET 6.
This was enough to create singular task: `tasks.Add(Task.Delay(TimeSpan.FromSeconds(10)));`
@davidramziz3200 20 дней назад ⁺¹
Apperantly alot of it was written using chat gpt, so it makes sense.
@Bourn77 Год назад ⁺⁶⁵
C# master race. Lets go.
.NET team is optimizing the fu*k out of the stack for a few years.
Hands down the best api backend language to work with. 🥰
@reddragon2358 Год назад ⁺⁷
I hope that it become so good that it could be perfectly used for full stack language.
@BosonCollider Год назад ⁺²
@@reddragon2358 It does work fairly well together with HTMX
@reddragon2358 Год назад
@@BosonCollider Oh, glad to hear, but for example with Java could be used for full stack development with the help of Java frameworks.
@mishikookropiridze5079 Год назад ⁺³
@@reddragon2358 That produces horrendous UI. Could be future using WASM.
@reddragon2358 Год назад ⁺²
@@mishikookropiridze5079 I heard that C# has UI frameworks. I hope that the get better with time.
@R4ngeR4pidz Год назад ⁺³¹
You're 100% right about the complexity of the task.
But also, I would have stopped reading after they said they used ChatGPT to come up with the code.
You need to have these contributed by people that actually write this language and that actually understand this language.
The ambiguity between what the code was actually doing in all of these was horrible, as other commenters have also pointed out.
@quachhengtony7651 Год назад ⁺¹³
C# fan bois are eating good these days
@reddragon2358 Год назад
Yup
@bahtiyarozdere9303 Год назад ⁺²
Thank you for sharing and commenting on this one. I would love to see C# with AOT compile. I believe it would make a huge difference.
@dipi71 Год назад ⁺⁴
Erlang, a language used in telecommunications, still seems to be the concurrency champion (according to a book by Röhrl and Schmiedl called »Produktiver programmieren«, I've read it in German a while ago).
@iforgot669 Год назад ⁺¹⁵
C# now has native aot and would have significantly improved the memory footprint of this
@SurvivalGamingyt Год назад ⁺⁸
Yeah, 7,4mb for just a standalone release mode app.
@sgbench Год назад ⁺¹
Also trimming
@FilipCordas Год назад ⁺²
@@sgbench ValueTasks and adding a buffer size to the list will help.
@CeleChaudary 11 месяцев назад
@@FilipCordas That's a good point
@_daniel.w Год назад ⁺⁶
I'm curious about C, C++ & Zig.
Also, I love Go. What happened, why did it end up using so much memory? Kinda sucks
@_daniel.w Год назад
@nósferratu Oh, alright.
I was watching chat go by and someone mentioned Go is stackbased or something along those lines.
Thanks for the info 👍
@hvaghani Год назад
@nósferratu right I was going to comment the same and found this
@scotter7663 Год назад ⁺¹
The C# implementation is completely bogus compared to the others. It's using a small thread pool (task.run) to set a bunch of timers (task.delay) that's why it shows low memory usage. This is not demonstrating concurrency.
If the implementation did a thread. sleep or used real threads the results would be completely different and probably worse than Java since C# doesn't have virtual threads.
In the real world Go runtimes will have considerably less memory overhead than C# or Java
@scotter7663 Год назад ⁺¹
@@_daniel.w Go has a delay() function that looks similar to what's used in the C# impl. Rework the Go implementation to use this and I suspect it will perform drastically better
@kooraiber Год назад ⁺¹⁴
My man hates C# so much, it's hilarious! To be fair though I agree with everything you said and would love to see your benchmarks about this topic.
@sanjayidpuganti Год назад ⁺¹⁶
@@cethienI love C# but hate MS. I use Rider and Linux to code in my personal time and I like it. I think it's very good for API development.
@DaddyFrosty Год назад ⁺⁶
@@cethien VS sucks, Rider rules. I do also hate Microsoft but it’s a good language nonetheless
@pavelyeremenko4640 Год назад ⁺¹
@@cethien I've been developing c# on linux and macos for a couple of years now using Rider (I just like it more but the Visual Studio is also fully cross platform).
I don't personally enjoy the language as much nowadays but the tooling is great whatever platform you pick.
@DaddyFrosty Год назад
@@pavelyeremenko4640 last time I used visual studio on mac it was only for Xamarin
@ko-Daegu Год назад
@@cethien I loooove writing Razor components 🤓
// MyComponent.razor
@using Microsoft.AspNetCore.Components
@Title
@Message
@code {
[Parameter]
public string Title { get; set; }
[Parameter]
public string Message { get; set; }
}
the fuck is this shit
@boredstudent9468 Год назад ⁺⁸
He said he launched 1 Task, as soon as you start one async task C# (in .NET 6) already sets up all the thread pool stuff and Access control. For such simple instances you should use threads in C#. Afaik it greatly improved with .NET 7. But in exchange you are prepared to scale incredibly, also yeah the .NET runtime does some incredible smart magic in the background, e.g. have a looked at LINQ performance in .NET 7.
@metaltyphoon Год назад
CAS is not a thing anymore in dotnet core world.
@sgbench Год назад ⁺¹
@@metaltyphoon CAS?
@rroscop Год назад
Can you really run 1 million C# threads?
@boredstudent9468 Год назад ⁺¹
@@rroscop on my hardware no problemo, remember that they are way more like go routines than like hardware threads, so only a dozen is actually working in parallel, the rest is just queued.
@rroscop Год назад
@@boredstudent9468 nice. Are you talking about System.Threading.Thread's? Or tasks run via Task.Run()?
my understanding was that Task.Run() used a thread pool under the hood, but real Threads were more heavyweight. I'm not a C# developer though, just dabbled
@ringishpil Год назад ⁺²⁴
Go's minimum stack size is (I think) 4KB per Goroutine and it grows/shrinks as needed. Not sure whats the minimum stack size. Therefore the ~2GBs in Go is not surprising. So in 3GB of memory, you can put 1mil/10mil and probably even 20/30 million goroutines, they will just shrink in size. You can probably with the example from Piotr do even more, since it's a very simple non-memory consuming routines. But as I said, not sure whats the minimum stack size that will be consumed by a gorutine. But its less then 4KB for sure (in your example 2.8GB/1_000_000 = 2.8KB). My guess is that is not shrinking even less than this since there is enough memory available.
Anyway you put it nicely, this is not a real world test, TCP/Websocket connection would be much better
@Rakstawr Год назад ⁺¹
Go test here was completely misrepresented by non optimized garbage collection settings and not profiling how much of that was colored for deletion.
@om3galul989 Год назад ⁺³
yea node example is not spawning threads, it's just placing tasks on the timeout callback queue of the eventloop to be executed later using the main thread.
@metaleggman18 5 месяцев назад ⁺¹
Infinity and beyond is mathematically sound because there are some infinities that are larger than others. The most trivial example would be the set of odd or even natural numbers, and the set of natural numbers. They're both countable infinite, but because the the odd or even natural number sets can be mapped one to one to their values in the natural numbers, there will always be double the numbers in the natural numbers, as in a larger infinity.
There's likely more important infinities to consider, and I might have explained that wrong or poorly, but most definitely there is more than just a single, simple infinity.
@tofaa3668 Год назад ⁺³
The issue with the java threads i feel like is not preallocating the array list, every time an arraylist gets appended it checks for the size and generates a new array. Which in this case would be a whole lot of arrays in memory for the gc to collect.
@jonstewart5525 10 месяцев назад ⁺¹
Since this is a Linux system it’s using the completely fair scheduler (cfs) which means each thread runs at the same priority (as apposed to the mlfq (multilevel feedback queue) that windows uses). The issue then is that the OS is processing at the same priority as each of the threads created so the computer just freezes up. There’s also a minimum time spent in each thread so you rarely get to execute an action.
@Lyynx92 Год назад ⁺⁵
.Net pre-allocates a thread-pool at startup though the memory shouldn't be quite that high. Pretty sure it also utilizes a work stealing scheduler under the hood for continuations and its async/.await behavior. Also if you want to further optimize for memory the ValueTask struct will do some caching cleverness to dodge Task allocations if the work is either already done or can be done synchronously. Given how simple the test is, the GC probably won't kick in as it can recycle a lot of those Task objects.
@krccmsitp2884 Год назад ⁺¹
10:08 why that old .NET version? 7.0.6 was current bay in May 2023.
@pinoniq Год назад ⁺⁴
If you want node to actiually use multiple threads, you need to tell libuv to use multiple threads. There is a env variable for this: UV_THREADPOOL_SIZE . Like you said, node has an eventloop. Thats not multi-threaded. It's single threaded with callbacks. Thats why setTimeout is more a 'minimum' guideline and not precise at all (under heavy loads). Just make a busy-wait program in node and you'll see it only filling up a single core on ur CPU
@woolfel Год назад ⁺³
back in the JDK 1.3 days, the JVM would allocate 1MB per thread, but it was changed around 1.6/1.8, I forget exactly which release they fixed that. It's also important in Java to get the memory used, not memory allocated. The biggest issue with java for me is once the JVM allocates memory, it doesn't release it until you stop the JVM process.
@shayvt Год назад ⁺⁴
C# Task is an abstraction using the threadpool. He should use the Thread class which instantiates a real thread.
@DarkOoze123 Год назад
*managed thread
@LuaanTi Год назад ⁺³
No, C# Task implies no threads whatsoever. It uses the thread pool by default for CPU work, yes, but that can easily be just the part of the job that says "this task is finished" (e.g. handling the async I/O response).
Creating an explicit thread (_not_ a hardware thread, _not_ an OS thread - you don't have control over those natively in .NET) is something completely different, and very rarely used in modern C#. It negates the whole point of using asynchronous I/O in the first place, which is avoiding the overhead of threads that do nothing but wait for something to complete (whether that's a timer or a HTTP request). Which, let's not forget, was part of the point of the original article - showing how expensive "real" threads are, and that different approaches to handling asynchronous code have vastly different results.
But that article is very flawed anyway. It would make sense to compare multi-threaded code with other ways of doing asynchronous I/O... but instead, we get an arbitrary choice of one or the other for each platform. You can have promises in any language. Many have commonly used or outright built-in APIs for that. Seeing the difference between, say, Java threads and Java Futures would be a bit illuminating, at least... though it still needs to be noted that you have a lot of control over things that absolutely crush this comparison anyway. The default stack size of a new thread on modern .NET is usually 1 MiB. Windows doesn't really allow you to go very small with thread stack sizes (you're supposed to use a few threads, not thousands). Linux is designed around multiple processes/threads using the same memory for as long as possible, so a thousand threads each with 1 MiB memory can actually occupy just a few megabytes (until you actually start to modify the memory).
Every performance benchmarks needs to have a goal. This one doesn't really seem to have one, apart from a simplistic "weird that memory usage in async stuff can vary wildly"... I mean, pretty much every platform out there allows you to pre-allocate as much unused memory as you want, but it'd be a weird way to compare different platforms, right?
@mattymerr701 Год назад ⁺¹
C# uses loads of thread pools and I think the issue is they likely didnt trim the assemblies etc so it kept a bunch of unused crap
@c4ashley Год назад ⁺⁶
The name is the C-sharpagen.
@Gennys Год назад ⁺¹
It looks as though c sharp is creating a thread pool by default instead of actually launching threads.
@3x10.8_ms Год назад ⁺²⁹
crab is fast and fox is slow
@ThePrimeTimeagen Год назад ⁺¹⁵
do a barrel roll
@zolniu 9 месяцев назад
In C# when you use Tasks with async/await, the default implementation creates a state machine that uses pre-existing thread pool to schedule execution of your tasks on the threads in the thread pool. Not only that, but it can even detect if the task in the thread is small enough to be executed synchronously - in that case it won't even end up in the thread pool - it will just execute and return as normal function call.
To test how much memory threads consume in C#, you can't use Tasks with async/await - you have to use Thread class directly - that way you circumvent all of the optimalizations done in the runtime and in the Tasks scheduler.
@TizzyD Год назад ⁺⁵
🤔 I concur with you Big P...let's look at some more real use cases. Going outside of the process itself will complicate analysis with other elements (e.g. DB, ORM, etc.) that should be held constant; however, there are good use cases to eliminate as much of the 7 layer stack as we can:
1. Storage - with the good old random file manipulation, etc.
2. Network - doing something more like a UDP listener to eliminate possible contamination with socket handling
3. Memory - malloc, 😮multi-threaded data manipulation, release (to watch garbage collection)
4. Compute - not all compute operations are math-based, but do some string parsing, concatenation, etc.
I'm thinking we want to eliminate math computations because most of those operations will come down to the underlying math implementation vs. actual performance (e.g. Fortran being fast, etc.), but network issues could have the same impact. Consider the history of Java IO vs. NIO.
@that_rendle Год назад
Hypothesis: .NET is up-front creating a Heap which it looks like is ~128MB perhaps? And also a thread pool. And then everything up to 100K tasks fits within those limits so the memory consumption stays the same. Then going to 1M tasks is exhausting that Heap so it has to be expanded. Guessing it could probably manage 250K tasks within that initial allocation? Anyway, .NET and C# are better than you think they are these days.
@remrevo3944 Год назад ⁺⁹
12:30 Per default tokio creates worker threads equal to the amount of cpu cores.
Though thinking about it, if you only use timers having a single threaded runtime would likely be just as fast and more efficient.
@llothar68 Год назад ⁺¹
Not a good choice. You often have long running threads that also do block. In fact all the systems where the kernel is not controlling the worker threads sucks. This means: Linux,Android and the BSDs. The other systems have kernel driven thread pools for much better handling making sure that IO blocks don't prevent utilisation.
@remrevo3944 Год назад ⁺¹
@llothar68 I explicitly meant that for the case of using only timers, which are neither cpu intensive nor use blocking APIs.
When using a async runtime like tokio you shouldn't use blocking APIs anyway and if you have to there is tokio::spawn_blocking, which spawns a thread/uses a thread pool.
@misterkevin_rs4401 Год назад ⁺²
C# Uses a thread pool behind the scenes with a default config of #X amount of threads depending on the system it's running, it's usually 20 if I remember correctly from my .NET days. What's interesting to me is how it can spin up more if required and scales correctly.
@FilipCordas Год назад ⁺¹
Should be equal to number of cores you have available on the machine.
@thekwoka4707 Год назад ⁺³
Why were they using the newest rust from last month and nodejs from like 4 years ago? Like AWS doesn't support the version they used. Or 3 major verisons after it.
@ivanivory6247 5 месяцев назад
This is very good question. Looks like manipulation
@konkerouf 3 месяца назад
You can go "beyond infinity" in the paradigm of transfinite numbers. You manipulate an "infinite number" called omega (the greek letter) and then you have the number omega + 1, omega + omega, omega power omega and so on.
This was primarily developped to compare the cardinal of infinite sets (ex: card(N) < card(R) even though they're both infinite)
@urbanelemental3308 Год назад ⁺⁵
Yeah, the C# example is not real threads. The code is just adding tasks to the scheduler, similar to "setTimeout" in JS. Which might be fine for most things, but each "Task" is taking up memory and then waiting to run. IMO, these tests are not good overall. I agree the Java one is probably not a good example wither with the synchronous join.
@metaltyphoon Год назад ⁺⁵
Dude… only the one of the Java and Rust was real threads. All other tasks have use a pool abstraction. I think Elixer uses actual process.
@zephyrprime 8 месяцев назад
Not full threads but not just tasks either. Tasks use a threadpool to manage execution and the .net runtime will decide how many threads are in that threadpool.
@blowfishfugu4230 Год назад ⁺¹
just for fun, did creating threads in c++ in a similar fashion:
static std::atomic toInc = 0;
{
std::vector threads;
for (int i = 0; i < 1'000'000; ++i)
{
threads.emplace_back(std::jthread{ []() {
toInc++;
} });
}
}
running on a cpu providing 8 cores it took endless (we're talking bout 15minutes) to allocate thread-handles,
resulting maxmemory consumed was 75MB.
deallocating the thread-handles took the same amount of time creating them.
so. this testcase highly depends on what kind of platform/OS is in use.
Also it's not advised to use more threads than your hardware can handle on native cores,
on my system the highest multithread-performance was
on 32 threads (including an if < 1'000'000 inside each thread's lambda).
and the peak-performance for the simple task was on singlethreaded (guess because no locking on atomic was necessary)
--- everything just observations and measurements
@joejazdz Год назад ⁺⁴
Prime will now worship at the altar of Anders (creator of C# and Typescript) /s
@reddragon2358 Год назад ⁺¹
XD
@nyahhbinghi Год назад ⁺²
If you are creating a new Elixir "process" per task it will scale up pretty linearly with the number of tasks, hence why it's high. High memory usage is not really a bad thing, perse. Likewise, the same with Go and goroutines, whereas other runtimes with a fixed threadpool or Node.js with it's single event loop won't keep climbing linearly. I would be more interested in CPU usage. You're welcome for this insight! 🤜🤛
@pdgiddie Год назад ⁺¹
This. The BEAM VM was designed to prioritise latency and predictable scalability. Copy-on-write and other memory consumption optimisations can produce latency spikes.
@SharunKumar Год назад ⁺⁸
I wanna see Nick Chapsas's reaction on this 🤣
@BracaJose Месяц назад
x2
@Aidiakapi Год назад ⁺²
Comparing memory usage of VMs is tricky. They usually behave differently based on how much system memory is available/installed and configuration/mode.
There's also the JIT compilation in most of these, which potentially adds a spike in memory usage, which might never be returned to the OS.
It's just hard to say what's happening exactly, and the one number at the end is kinda pointless.
@kippers12isOG Год назад ⁺¹
If the jit takes memory doing this, it will probably take similar memory in real use cases no? That would mean that it's valid to include it
@exmodeus Год назад
That's why some languages provide their own memory library/tool, as it's know exactly how much actual memory is being used.
@Aidiakapi Год назад
@@kippers12isOG You're missing the point. It may be doing that because it's configured to not return the memory to the OS. You can configure it differently. It behaves differently depending on the machine you're running it on, the memory configuration, the VM and GC configurations.
For example in Java, you can just tell it to reserve a large amount of memory at the start, and also put a cap on it. It will generally not allocate any system memory beyond that, and it'll not run GC until you actually need more than you reserved.
If you run it with defaults, on a system with tons of free memory (let's say >50iB free), do you want the default to be having a tiny footprint, but having to run GC more often? Or do you just say, use a few 100MiB, and almost never run the GC.
If you ran this same program on a system that's starved for memory, the VM can decide to collect 20x as frequently, and keep its overall memory footprint 10x lower.
A much better way to test would be to limit its container's RAM and see how low you can go until it starts to malfunction.
@tecoberg 9 месяцев назад ⁺⁹
Where is C++?
@Hector-bj3ls 8 месяцев назад
In Rust, the default stack size for an OS thread on all tier 1 platforms is 2MB. Not sure if it's allocated up front, but that's probably something to do with when all the memory went.
@kellybmackenzie Год назад ⁺⁴
I would have loved to see Haskell tested like this, it'd be so good
@FinnBender Год назад ⁺³
It's surprisingly bad :(
1 thread: 5.0 MB
10 threads: 4.9 MB
100 threads: 4.9 MB
1k threads: 8.3 MB
10k threads: 63.1 MB
100k threads: 803.8 MB
@kellybmackenzie Год назад
@@FinnBender Aww man! Yeah, that makes sense, Haskell is infamous for its high memory consumption because of thunks and stuff like that. I'm surprised it's that bad for 100k though, damnnn!
@Lampe2020 7 месяцев назад ⁺²
I think the C# compiler just optimized the clearly only-idling tasks away.
@tedchirvasiu Год назад ⁺⁴
Is this the first time in history he turned off the notifications before starting the video?
@ThePrimeTimeagen Год назад ⁺³
Don't tell anyone...
@Muhammed.Abd. Год назад
9:28 "would you not be at infinity, If you can go beyond?" Loved it
@TizzyD Год назад ⁺¹¹
Maybe C# is doing something like Julia, that is, postponing execution until it actually needs to do something. Or maybe Roslyn has some under-the-covers optimizations. Any CLR experts care to comment?
@protox4 Год назад ⁺⁵
`Task.Run` uses the ThreadPool by default, which is very conservative when spinning up new threads. The benchmark would pretty much make the ThreadPool never spin up new threads since each task completes immediately. It waits a good long while before deciding it actually should spin up a new one, which is why you see the memory increase at 1 million.
@my_yt666 Год назад
It creates n (depending on the CPU) managed threads for the default scheduler. If he wants to optimize for memory allocation, he should have used ValueTask and reduced the max managed threads of the default scheduler. But then again he should have measured threads instead of a higher level concept.
@mikestiver9000 Год назад ⁺⁷
Task.Run(()=>{}); does not create a thread, but will instead schedule work on the Thread pool. Task.Delay() halts execution, and 'await' returns the thread to the threadpool.
The benchmarks extremely useless for C#, since all you are doing is juggling the same handful of threads back and forth starting a task, and then doing no work until the delay is up and the Task is discarded.
You don't need many Threads when your Task doesn't actually do any computation or IO work.
@TheTim466 Год назад
@@mikestiver9000 Would be the same for any language with async capabilities or not?
@LuaanTi Год назад
@@TheTim466 It's true for any true asynchronous I/O. You can do it in Windows with a C program, no need for fancy async languages. I/O doesn't need threads, and `Task.Delay` is just I/O - you get a notification from the system timer at a given time in the future, then a threadpool thread is used to handle the continuation (which in this case essentially just signals that the task is completed). That's also why the C# version doesn't need much space for the tasks at all - just a few pointers, a cancellation token and a tiny state machine. It fits in a few dozen bytes on x64 per task. You could trim it even lower if you wanted.
@DxCKnew Месяц назад
The Task.Delay() in C# does not actually occupy a real thread for the wait, it just subscribes to a kernel event and relies on the kernel to fire it back after the time is passed. It does however create a bunch of objects like Task and internally Timer and some more which all have to be GCed eventually. chhhhh tfu!
@maxharmony6994 Год назад ⁺⁵
Now imagine giving Tom a C#
@zhh174 5 месяцев назад
18:18 C# is doing what is expected. C# async await is pretty similar to go routine and virtual threads as it can run in parallel. I think the low memory usage of C# is due to tools you have in C# to write memory efficient code. Unlike most other managed languages(Java, Go) C# has structs which are not generally allocated on the heap. Those structs are not usually used that much in user code but in runtime code they are used to optimize performance and memory. Also C# pre allocate some memory at first so it doesn't allocate much after that. Also C# caches memory heavily. That's why in case of small program C# use more memory than most language but as the program gets bigger it catches up with other languages.
@quachhengtony7651 Год назад ⁺⁵
Let's rewrite Elasticsearch, Kafka, and Cassandra in C# and get free performance
@reddragon2358 Год назад ⁺¹
Wohooo. Let's go
@jeremiahgavin9687 Год назад ⁺¹
Look up ScyllaDB as a Cassandra replacement. It's written in C++
@AndroidHutOfficial 5 месяцев назад ⁺¹
keep microsoft crap to its ecosystem.
@__idan__ Год назад ⁺¹
I have a feeling they're testing apples to oranges to melons to avocados to onions
@vighnesh153 Год назад ⁺⁵
More interested in seeing Nodejs 20 with worker threads as they claim that there is a lot of perf improvements in Node 20
@nelsonoussahsigha1300 Год назад ⁺¹
yes he could've use worker to create thread for concurrent task, by using settimeout you're still mono thread so all those setimeout will be queued inside the callback queue
@jimiscott Год назад ⁺¹²
The threads used in C# async await come from a pool and is different to system.threqding.thread
@jimiscott Год назад ⁺⁵
I wouldn't be surprised if some of this were optimised away in release mode by Roslyn. The test case is not valid.
@baka_geddy Год назад
It specifically said there were no significant difference between debug and release...
@Kazyek Год назад ⁺²
There is a huge confusions between async tasks and threads in this whole article. green threads != hardware theads, and async tasks is a separate concept that doesn't necessarly imply any threading models; the tasks can just yield on the same thread, or be distributed on a thread pool... the JS version is not even thread at ALL, single-thread, and the C# version is... probably threaded but depends on which synchronization context; that code in a blank UI application will actually only join on the UI thread, see SynchronizationContext and ConfigureAwait.
@insylogo Год назад ⁺³
AOT and tree shaking business has come a long way with c#. I would assume actual minimums an order of magnitude or less, but he did say default release configurations.
@le9038 4 месяца назад ⁺¹
Another win for the Rust Team...
@glitchedpixelscriticaldamage Год назад ⁺⁴
you spit on C# or Python ? i don't get it...
@sanjayidpuganti Год назад
He had a bad experience with C# in the past IIRC. I wouldn't take nothing from it tbh. Both are good technologies in their own way
@mugiseyebrows 8 месяцев назад
And spider says: "Do I really need eight legs?", and Gods answers: "Nobody needs eight of anything".
@istovall2624 Год назад ⁺⁴
C# to the moon! Havent finished yet. Drum roll.
@reddragon2358 Год назад
Let's go.
@MortenBendiksen 9 дней назад
Elixir reserves a separate stack AND heap space for each process. This has some advantages, but you need to pay for it with memory. So you need to do realistic tasks for it to be able to compare. Elixir might run faster and create less fragmentation than go for example. It can remain more consistent in response times theoretically than go. But it depends on the scenario. If you are doing a benchmark, you can always be working around the problems that naturally arise in each.
@nocturne6320 Год назад ⁺⁴
Idk how he compiled the C# program, but I rewrote his program line by line in Net core 6 and running it gave these results (Checked with Process Hacker 2, as Task Manager doesn't report all memory):
- 1 task = ~7.3 MB
- 10000 tasks = ~13.5 MB
- 1000000 tasks = ~430 MB
- Compiled with Net Core SDK 6.0.408
- CPU: AMD Ryzen 9 7900
- OS: Windows 10 build 19045
I assume that either C# cheats on Windows by having Windows preload the runtime into memory and then reusing it for all C# programs and it simply doesn't report the memory consumed by the runtime, or some other shenanigans are going on in that article.
I also tried building the app in the "self-contained" mode, where it includes the whole runtime in output, not requiring it to be installed and the footprint hasn't changed.
@stefano_schmidt Год назад ⁺¹
you should try creating the actual threads, instead of re-using ThreadPool (Tasks) as shown in the article
@nocturne6320 Год назад ⁺¹
@@stefano_schmidt I tried allocating actual threads with new Thread(() => {Thread.Sleep(10000);}) and that started using *a lot* of memory. Million threads took ~4 GB of memory and shutting them down with Thread.Join took forever. But considering that Threads are really not recommended by anyone these days they might be lacking the optimizations over the years.
@Alguem387 Год назад
You can try to compile AOT
@metaltyphoon Год назад ⁺¹
@@stefano_schmidt it’s useless test. The OS will just spend more time context switching than doing real work (thrashing)
@LuaanTi Год назад ⁺³
@@stefano_schmidt The whole point of asynchronous I/O is to avoid wasting threads for things that do not need threads. If you're working with well-written modern .NET code, you don't really need more threads than you have logical CPU cores, so why pay the cost? If anything, this shows exactly one of the reasons why spinning up new threads for every task you want to do is painfully wasteful.
The article tries to compare different ways of handling asynchronous code. Threads are just one of those ways, and explicit threads should be really rare in any modern codebase. The article doesn't talk about creating a million threads - it talks about a million _asynchronous tasks_ . It's the YT video that claims this is about a million threads, which is silly - there's very few platforms where the overhead from the language/runtime will be remotely comparable to the overhead from having a thread in the first place. The default stack size of a Windows thread is usually 1 or 4 MiB. It will never take less than 64 kiB (or more exactly, the page size). Now compare that to the ~230 B a C# task takes, or the ~600B (in a pre-allocated structure of at least 2 kiB) of a goroutine.
When you change the code to create threads... your memory usage comes entirely from thread stacks. Which means... what exactly? We know threads are expensive, that's why we want to avoid them! :D That's where async comes from (mostly). The real failure of the article is that it doesn't even attempt to find async tasks in each of those platforms - though that isn't all that surprising given the code was written by GPT :D
@sikor02 Год назад ⁺¹
If C# has memory available it will swallow a lot for optimizations. Once i experimented with docker and performance tested my simple api endpoint with Bombardier (tool written in GO) - bombarding it with thousands of requests. My app used 1.5 gig of ram (!). But then I started limiting my container's available memory (-m parameter), and guess what, I went down to 15 MB and still worked. GO equivalent required at least 16 megs to work. The C# API with so little memory available performed almost the same as when using 1.5 GB anyway. (The GO was like 2% faster though, not gonna lie)
@casperes0912 Год назад ⁺¹²
I will most likely need to use C# as my primary language at my next job
@reddragon2358 Год назад ⁺¹
Wish you all the best
@dziarskihenk8798 Год назад ⁺¹³
c# is life, c# is love
@reddragon2358 Год назад ⁺¹
@@dziarskihenk8798 XD
@ghevisartor6005 Год назад ⁺²
dont use maui
@codeme8016 Год назад ⁺²
.NET 8 is Native AOT now which compiles like Rust or Go and the performance overall can bit all of the others no competitor left.
@alxizr Год назад ⁺³
The nodejs example is off point. You need to choose worker threads for staying in line with all of the other examples.
The same goes for the Python AsyncIO example.
@everyhandletaken Год назад
Agree
@JustNrik Год назад
Tbh the c# example doesn't create 1 mill thread, async/await in c# is implemented in a way that it doesn't just throw everything on threads, it has a bunch of internal logic that does the async state without actually spawning thread unless they are necessary and even then, it has a threshold on how many threads can be created, so if the thread limit is not increased, it won't ever spawn that many threads, the allocations on the other hand, 1mill tasks is a LOT of memory xD
@erickmoya1401 Год назад ⁺⁴
My wife says you yell too much. I tried to prove she is wrong.
My argument didnt last a second.
@freedom_aint_free Год назад
At 9:33 it's said "I hate that phrase, to infinity and beyond" and it's said that it's nonsensical, but it's not: according to mathematics, there are many different kinds of infinities and they are not all of the same size, there are infinities bigger than the other actually: vide George Cantor work and particularly the diagonal argument, e.g. the size of the interval [0,1] alone is bigger than all natural numbers combined.
@WillEhrendreich Год назад ⁺⁴
Dotnet 6 vs dotnet 8 would result in better perf.
@gridlocdev2023 Год назад
Agreed, especially since .NET 8 will also support native AOT compilation
@mike200017 9 месяцев назад
The big flaw in this test is that the main memory footprint of threads (no matter what kind) is the amount of thread-local data that has to be duplicated. And like most things in software, there is a trade-off between memory and speed (or latency). The sorts of things that thread-local data is used for are memory allocations, garbage collection, I/O, and maybe task scheduling. Usually more thread-local data means less contention between threads on those operations. So, a benchmark that does none of those things is only looking at the cost side without the benefit side. Long running tasks that allocate a lot of memory or do lots of operations like that will benefit greatly from the lower contention associated with running in a "real" thread. Short-lived tasks that have a small footprint or mostly statically allocated memory and sit idle for a significant portion of their run-time have far less potential to contend with each other in harmful ways, so the lower memory footprint and faster spawn-time of something like a plain event-loop or coroutines wins handily. And systems using thread-pools are basically trying to find a happy medium for tasks that are a bit of both. As always, the right tool for the right job.
Obviously, if the "job" is to wait for 10 seconds 1M times concurrently, then a plain event-loop should win hands down because any decent event-loop implementation would boil that down to a single 10 second wait and then flushing an array of 1M small event objects.

Следующие

Автовоспроизведение

50 BILLION MESSAGES PER DAY WITH 32 ENGINEERS | Prime Reacts