NativeAOT in .NET 8 Has One Big Problem

Поделиться
HTML-код
  • Опубликовано: 25 дек 2024

Комментарии • 92

  • @DamianPEdwards
    @DamianPEdwards Год назад +119

    Few things that cause the slightly lower performance in native AOT apps right now. First (in apps using the web SDK) is the new DATAS Server GC mode. This new GC mode uses far less memory than traditional ServerGC by dynamically adapting memory use based on the app's demands, but in this 1st generation it impacts the performance slightly. The goal is to remove the performance impact and enable DATAS for all Server GC apps in the future.
    Second is CoreCLR in .NET 8 has Dynamic PGO enabled by default, which allows the JIT to recompile hot methods with more aggressive optimizations based on what it observes while the app is running. Native AOT has static PGO with a default profile applied and by definition can never have Dynamic PGO.
    Thirdly, JIT can detect hardware capabilities (e.g. CPU intrinsics) at runtime and target those in the code it generates. Native AOT however defaults to a highly compatible target instruction set which won't have those optimizations but you can specify them at compile time based on the hardware you know you're going to run on.
    Running the tests in video with DATAS disabled and native AOT configured for the target CPU could improve the results slightly.

    • @parlor3115
      @parlor3115 Год назад +5

      My thoughts exactly

    • @proosee
      @proosee Год назад +4

      Same old, same old - you trade startup time, executable size and memory consumption for performance like in almost every software before. But it was nice to see details, especially I didn't know that CLR is able to recompile with different settings some paths, so thank you for sharing - it is quite smart actually.

    • @SlackwareNVM
      @SlackwareNVM Год назад +6

      I'm curious, would it be possible in the future for a JIT application with Dynamic PGO that has run for a while and has made all kinds of optimizations to then create a "profile" of sorts that could be used by the Native AOT compiler to build an application that is both fast in startup time _and_ highly optimized for a given workload?

    • @proosee
      @proosee Год назад +2

      @@SlackwareNVM you said it yourself - you need to create a profile, keep it updated, save it and load it on startup - there is always a trade-off, but you can always make it smarter, for sure.

    • @terjeber
      @terjeber Год назад +2

      I also think this type of performance testing is not particularly good. For example, the client hits the same endpoint every time, which gives the JIT compiler ample opportunity to radically tune performance for that specific code path. In theory the JIT might stop executing most of the code, since it doesn't change at all under way.
      It would be far more interesting to pop some DB in there fill it with a few million records, and vary the test so that it retrieves different (or random) dataset each time. That would remove a opportunity for the runtime to optimize the code, and having a server responding with the same data on the same url over and over is nowhere near realistic.

  • @jimmyxu3819
    @jimmyxu3819 Год назад +17

    Your docker image base is different, your result can't be compared each other. Make sure you using same linux version.

  • @wknight8111
    @wknight8111 Год назад +27

    JIT is interesting because on one hand you're starting an un-optimized application and expecting to compile and optimize at runtime so people think it's going to be slower. BUT, JIT has access to all sorts of runtime statistics and runtime type information that the AOT compiler does not have. This enables some very interesting and aggressive optimizations, in theory. I don't know the full details of everything Microsoft's CLR JIT attempts to do, but the possibilities are there for JIT to be better performant, especially for long-running applications. AOT will always win for startup time and short-lived applications, but for long-running applications it's not as clear and JIT often has some advantages.

    • @_iPilot
      @_iPilot Год назад

      So, if we will share that statistic with the AOT compiler it will produce even more effective application code, won't it?

    • @NicolaiSkovvart
      @NicolaiSkovvart Год назад

      ​@@_iPilot it seems extremely likely that Static PGO + AOT would competitive if not better than Dynamic PGO + JIT. Sadly the Static PGO experience is pretty poorly supported

    • @wknight8111
      @wknight8111 Год назад +9

      @@_iPilot The problem is that you can't get runtime statistics until runtime. Everything else is just a guess, and if you guess wrong the AOT may optimize for the wrong types and make the situation worse.

    • @modernkennnern
      @modernkennnern Год назад

      ​@@wknight8111you could theoretically run the app in JIT mode and then use that metadata to compile for AOT

    • @_iPilot
      @_iPilot Год назад

      we are in age of telemetry, so runtime data can be uploaded somewhere like logs (it is actually logs, btw) to be analyzed by external application.

  • @robwalker4653
    @robwalker4653 Год назад +1

    This is my go to channel for all things .net. Gets to the point straight away!

  • @FatbocSlin
    @FatbocSlin Год назад +12

    When comparing docker performance, you are comparing apples to oranges.
    Docker base image does make a difference, you use Ubuntu 20.04 as base for your native image, .NET 8 SDK uses Debian 12 as base.
    I have compared standard .NET docker images against one based on Clear Linux, and there was 9% difference, more than the difference you found in your test.
    .NET depends on libraries included in the docker image.

  • @caunt.official
    @caunt.official Год назад +48

    4% losses on performance doesn’t really matter. What is interesting here is the actual point of bottleneck. Does NativeAOT perform better or slower with encryption algorithms? Does it perform better or slower with heap allocation? What’s exactly does affect the performance

    • @BlTemplar
      @BlTemplar Год назад +3

      AOT will always perform slower than CLR because it doesn’t have JIT and can’t optimise hotpath.
      But it will consume less memory because the code is already compiled and optimised to some extent by default. CLR needs to do all that work during runtime this is why it will also consume more memory and some extra CPU resources until the code is optimised.

    • @nocturne6320
      @nocturne6320 Год назад +6

      ​@@BlTemplarAOT should absolutely be faster than JIT. If AOT is performing slower, then the compiler is garbage. If a program written in C++ is slower than one in Java, it means the C++ code is bad, not that Java is faster than C++

    • @BlTemplar
      @BlTemplar Год назад +2

      @@nocturne6320 I am talking not about C++, I am talking specifically about AOT in C#. It’s a highly dynamic object oriented runtime which is hard to AOT. It won’t be faster than CLR in the nearest future.

    • @nocturne6320
      @nocturne6320 Год назад +1

      @@BlTemplar True, but with smarter compilation it definitely has the potential to outclass JIT, I wonder how much the MethodImpl attribute affects the performance currently

    • @maxdevos3201
      @maxdevos3201 10 месяцев назад +5

      Yes, it does! 4% matters a lot! This type of thinking is why software bloat has managed to completely undermine the hardware advancements of the last 30 years

  • @astralpowers
    @astralpowers Год назад +7

    I really want to use native AOT in our AWS lambdas. In my testing using the NET7 AOT lambda template, the startup is faster, and the performance is more stable. For one application , in the normal non-AOT lambda, the performance performance deltas are all over the place, ranging from 2ms to 400ms, but the AOT version had performance that was between 1.2ms-4ms, all the while using less memory.

    • @Denominus
      @Denominus Год назад +4

      We are doing early experiments with .NET 8 AOT. So far the latency stability, lower resource consumption and startup time improvements, even in long running apps, dramatically swings the cost/performance ratio in AOTs favor (in our tests). The sacrifice of some theoretical techempower peak performance for perf that actually matters, is completely worth it.
      We have some services that were rewritten in Go some time ago. The .NET AOT side has a ways to go before it can match that cost/perf ratio, but it’s looking promising.

  • @dimitris470
    @dimitris470 Год назад +6

    I have a feeling that any such difference is going to be swamped by I/O latency IRL anyway

  • @raduncevkirill
    @raduncevkirill Год назад +8

    I am wondering if the comparison is consistent when having different base images for the two APIs. Default one running on debian-slim and native-aot running on ubuntu. It shouldn't make a significant change, though, as Microsoft's benchmarks yield the same results.

    • @nickchapsas
      @nickchapsas  Год назад +6

      It doesn't matter. The biggest difference is on the OS level. The only real difference btween the slim or alpine versions are image size which doesn't play a role in runtime performance

  • @viko1786
    @viko1786 Год назад +1

    The AOT might be a great idea for something like Lambda in AWS. Quick spawn, go and kill process

  • @BozCoding
    @BozCoding Год назад +1

    I'm interested in using it within chiseled docker containers :) I'm sure that more changes will happen in the future to improve these too, especially as we'll see less memory usage and probably less CPU usage.

  • @wangshuo8619
    @wangshuo8619 Год назад +1

    did native aot support the reflection? Some of doc says no. Some of them say some of the reflection. I am not sure if I should migrate my code which heavily use mediatr to nativeaot. The docs are confusing

  • @emjones8092
    @emjones8092 Год назад

    Where did we land on memory consumption and cpu consumption comparisons. A smaller distribution already conserves lots of resources.
    Which is one of the big points:
    Scale to zero, cold boots, better memory efficiency, and smaller binaries are what I’m after.

  • @TheAzerue
    @TheAzerue Год назад +3

    WIll Native AOT create any issue if it is used with other nuget packages like FluentValidatrion, MediatR, Serilog etc. ?

    • @VoroninPavel
      @VoroninPavel Год назад +1

      If a library is not marked as trim or AOT friendly, you'll get warnings from trim analyzer when publishing the application. Unless those warnings are disabled like it's currently with Blazor in .NET 8

  • @protox4
    @protox4 Год назад +2

    How does it compare with ReadyToRun? It's a mixture of AOT + JIT so you should get the best of both worlds in terms of speed (maybe not file size).

  • @psaxton3
    @psaxton3 Год назад +1

    The runtime also changed from Windows to Linux when you ran containerized. Would be interested to know the numbers on a Windows container.

  • @tarun-hacker
    @tarun-hacker Год назад

    Hey Nick,
    You should probably check profile guided optimisation for AOT in .NET for better results 😅

  • @mauriciobarbosa3875
    @mauriciobarbosa3875 Год назад +2

    I'm wondering, is the same performance hit on a environment non WSL? wsl is known for being I/O slow with docker, what if the docker images are run on a full blown distro? just thinking
    also, i think you used `dotnet publish` to publish the AOT for the docker image version and `dotnet publish -c Release` for the non-aot, isnt the default publish being to Debug on aot?
    i have not coded in dotnet for a while, so sorry if i misunderstood

    • @nickchapsas
      @nickchapsas  Год назад +2

      To your first question, it doesn't make any difference, it aligns with MS's full environment performance delta. Both of them are released using "dotnet publish" because in .NET 8 -c Release is the default.

    • @mauriciobarbosa3875
      @mauriciobarbosa3875 Год назад +3

      ​@@nickchapsas
      I've tried running the same benchmark on my machine, its a M3 pro base model (11core/18Gb/512Gb)
      the results are actually surprising:
      M3 Pro - no docker
      AOT 139596.985677/s
      Normal 139472.800011/s
      M3 Pro - docker (colima on vz)
      AOT 45329.935323/s
      Normal 44474.530778/s
      so running outside of WSL did impact the result, on my machine AOT is still slight faster 🤔
      EDIT: (using the stress test with 100VUs for 60s as well)

    • @nickchapsas
      @nickchapsas  Год назад

      ​@@mauriciobarbosa3875 Were your tests hitting over 100% CPU util on the container level? Was your Macbook's CPU util less than 100% ? There are many variables. NativeAOT for this particular example will always be slower if run correctly.

    • @mauriciobarbosa3875
      @mauriciobarbosa3875 Год назад

      ​@@nickchapsasi've run the test again but now on my M1 from work, got similar results 🤔

  • @VoroninPavel
    @VoroninPavel Год назад +1

    What about comparing with ReadyToRun/CompositeReadyToRun mode?

  • @another_random_video_channel
    @another_random_video_channel Год назад +2

    I noticed that the based images are not the same. One in Ubuntu while the other is debian. Also the running container may have different resource constraints

    • @nickchapsas
      @nickchapsas  Год назад +3

      It doesn’t make any difference, feel free to grab the code and check for yourself

  • @MatteoGariglio
    @MatteoGariglio Год назад

    Hi Nick, thanks for your nice work and videos, very instructive and helpful. Could you do one about JIT compiler and the CLR? THANKS!

  • @FraserMcLean81
    @FraserMcLean81 11 месяцев назад

    Thanks Nick. Whats your terminal plugin that shows different file types in different colors?

  • @magashkinson
    @magashkinson Год назад +1

    You can drag and drop csproj file from explorer to editor tab to open it

  • @lylobean
    @lylobean Год назад

    @Nick Did you check these differences are still valid when your project uses the setting OptimizationPreference Speed, for its AoT compilation. As I think it defaults to size.

    • @warrenbuckley3267
      @warrenbuckley3267 Год назад

      I'm also wondering if you can specify what CPU instruction sets are available for a given target in the build settings (like you can for a C/C++ application), e.g., AVX2 or AVX512 etc.

  • @msafari1964
    @msafari1964 11 месяцев назад

    Hi, which cli u use for publish and so on?!

  • @_iPilot
    @_iPilot Год назад

    It looks like Microsoft is focused on reducing containers startup time including delivery to registries and to host machine. Actually, some huge applications can have several Gb container sizes, but when split to microservices that inevitably have duplicate container layers that leads to huge overhead on data transfer during deployment.

  • @jwbonnett
    @jwbonnett Год назад +1

    Unfortunetly a lot of the required NuGet packages I need use reflection and will never be reflection free, so I will not be able to use AOT. Personally I would use AOT even if it has a slight drop in performance, but it's just not there for me.

  • @T___Brown
    @T___Brown Год назад +7

    I think its a new thing and MS will make it super fast each new release. But they want to see us using it before they put effort into it.

  • @zwatotem
    @zwatotem Год назад +1

    I would love to hear, how exactly do these JIT optimization work. Right now this sounds like black magic to me.

  • @Hoop0u
    @Hoop0u Год назад

    What about when hosted in IIS?

  • @zoltanzorgo
    @zoltanzorgo Год назад

    That was interesting! I am currently working on a project that has one component running on PLCs. Yes, it is a PLC with an embedded RT Linux on top of a 600Mhz(ish) single cored ARM. The flash is somewhat limited, and it is also coumbersome to install the runtime, because there is no app repository like you have for mainstream distributions. Hence I decided to publish to linux-arm with AOT. As it will also run the CoDeSys 3.5 PLC runtime alongside, I need to be careful not to stress the resources. I was very curious what difference I could expect. It is a somewhat different workload, but still, It is good to know that I might have to consider installing the runtime anyway.

  • @yoanashih761
    @yoanashih761 Год назад +3

    Any reason for switching from Postman to Insomnia?

    • @nickchapsas
      @nickchapsas  Год назад +14

      I prefer the UX, it is correcly responsive and i hate Postman's forced account stuff

    • @Quique-sz4uj
      @Quique-sz4uj Год назад +1

      @@nickchapsas Insomnia changed and now it's quite shit like postman. It doesn't let you save your collections as files and is pushy about the account too. I prefer Bruno which is a fork of Insomnia and it saves the collection files on the file system as markdown files, which is good if you want to version control it.

    • @raykutan
      @raykutan Год назад +4

      Bruno isn't a fork if Insomnia, it's a completely different project.
      It also doesn't store requests in markdown but in a special ".bru" format

    • @mad_t
      @mad_t Год назад

      You wanted to ask if there's any reason for NOT switching from Postman to anything else, right?

    • @IncomingLegend
      @IncomingLegend Год назад +1

      @@nickchapsas why delete my comment? I didn't say anything bad, wtf? you're on their payroll or something?

  • @gregoirebaranger1696
    @gregoirebaranger1696 Год назад +1

    Performance is good enough in all cases, if you run into this kind of requests per seconds in prod I doubt you should be running a serverless / cloud container. I'm much more interested in the reduced resources required to run the app, (hat's the big big selling point of AOT in my opinion.

  • @dukefleed9525
    @dukefleed9525 Год назад

    ok, interesting, but WHY is it happening? i suppose that JIT can better take track of the *register pressure* and in a resource constrained environment this does the difference, is this the reason? would be interesting to see what happens for a single threaded application (or anyway apps with different code path for each thread)

  • @maxpuissant2
    @maxpuissant2 Год назад

    Is AoT somewhat safe or safer to deliver DLLs to clients without fear of decompile?

    • @souleymaneba9272
      @souleymaneba9272 Год назад +1

      Yes. Blazor WASM already got its AOT (WASM AOT not Native AOT). These technologies are very good especially for .NET developper because IL code is easily decompiled.

  • @BigYoSpeck
    @BigYoSpeck Год назад

    Requests per second is obviously useful for an application you expect to process lots of simple requests, but I would find a more useful benchmark to be how fast computationally and memory intensive requests can be processed
    I currently work on an application that gets a relatively small number of requests per day, but those requests involve huge data models that then go through a lot of very time consuming processing, somewhere in the region of 15 minutes for datasets in the tens of thousands. So how does AOT compare with JIT when the responses aren't simple pieces of data but there is actually some heavy computation performed on large data?

    • @simonegiuliani4913
      @simonegiuliani4913 7 месяцев назад

      It's just a really bad benchmark the one he's using and he shouldn't generalise the results so much. If that is the benchmark we should refer to, then using .NET doesn't even make sense and we should all switch to GO

  • @cwevers
    @cwevers Год назад

    You did the warmup call after the k6 test started

    • @nickchapsas
      @nickchapsas  Год назад

      It doesn't change the results, k6 takes that into account

  • @patfre
    @patfre Год назад +2

    Fun fact: the change in the csproj was bugged, it should only be on NativeAOT but was in all API templates, I reported it and got it fixed. Talking about the InvariantGlobalization

  • @simonegiuliani4913
    @simonegiuliani4913 7 месяцев назад

    Your corollary only applies to application endpoints which are not computationally intensive. It's really wrong saying "it's faster, it's slower", perhaps it should be contextualized better on the type of workload.

  • @BlTemplar
    @BlTemplar Год назад

    AOT isn’t supposed to be faster. It offers less memory consumption fast startup but not better performance.

  • @jimmymac601
    @jimmymac601 Год назад

    Just here for the comments from the Microsoft apologists.

  • @lhcyt
    @lhcyt 6 месяцев назад

    So does that mean if p >= q and both p and q is an integer, then (p!)/(q!) = Π_(n=p-q)^(p) n

  • @the-avid-engineer
    @the-avid-engineer Год назад

    Im sure the 1% of devs who are affected by the loss of 85k RPS are pushing MS to address the issue. Possibly a way to sample the JIT optimizations once they stabilize and then apply them to AOT at compile time. Kinda sounds like a form of ML

  • @sikor02
    @sikor02 Год назад

    I have the same CPU :) It's hard to saturate this beast

  • @IllidanS4
    @IllidanS4 Год назад

    To be fair I don't get why MS tries so hard to make NativeAOT the "modern thing" everything has to revolve around. Sure, you might run .NET in constrained environments or architectures where the JIT cannot run, but I really feel without it there is so much "power" that .NET has that it is losing. For quick startup they had ngen for ages, so even that point is moot.
    How often do you need to run a .NET program that changes so often and needs to be restarted so quickly that ngen or JIT are actually the bottleneck? Without JIT some code has to be interpreted which has huge performance downsides. I don't see much point for pushing NativeAOT when it breaks when using full .NET features like Linq.Expressions, reflection and MakeGenericMethod, or the DLR.

    • @ByronScottJones
      @ByronScottJones Год назад

      In lambda and other on demand invocation environments, it can make a huge difference. It's not for routine Windows desktop apps.

    • @IllidanS4
      @IllidanS4 Год назад

      @@ByronScottJones That indeed sounds like a very constrained environment, still I am not convinced that all of these improvements are just due to ditching the JIT. There could still be a way to use ngen to pre-compile a lot of what is used, or other tricks ‒ for example you can run .NET in WebAssembly where I have seen people running a warm-up code that runs JIT on a few important methods, runs some static constructors etc. and then takes the whole memory image, so you essentially end up with a pre-compiled image without any effort on .NETs part.

  • @testg4andmmm129
    @testg4andmmm129 4 месяца назад

    5% performance reduction.
    you're making click bait cover.. waste of time....
    ++++Compile to native code.
    - 5% perf reduction.
    5% is nothing....

  • @LukasPetersen-bm4ep
    @LukasPetersen-bm4ep Год назад +3

    First :D

  • @AhmedMohammed23
    @AhmedMohammed23 Год назад

    cpu buddies

  • @ilanb
    @ilanb Год назад +2

    I don't think NativeAOT will be adopted quickly.. it's like nullables, too much of a pain in the ass to use, and the benefits aren't worth the work IMO

    • @juliendebache8330
      @juliendebache8330 Год назад +14

      How is nullable references a pain in the ass? Pretty straight forward to use and not having to worry about NREs anymore is quite nice.

    • @EraYaN
      @EraYaN Год назад +3

      I mean if you are using serverless it might be worth it pretty quickly.

    • @protox4
      @protox4 Год назад

      @@juliendebache8330 It's a pain in the ass to convert older huge projects. It's just fine for new projects.