Use Arc Instead of Vec

Поделиться
HTML-код
  • Опубликовано: 24 ноя 2024

Комментарии • 422

  • @JohnPywtorak
    @JohnPywtorak Год назад +372

    As a person relatively new to Rust, I kept thinking but Vec has the macro vec! for ease. And an Arc might not be as ergonomic to get in place. It would have been nice if that pre -step was indulged. Because you might reach for a Vec based on the ease vec! provides. So helpful though and was a great learning tool and fun watch.

    • @_noisecode
      @_noisecode  Год назад +187

      That's a great point, and one I probably should have mentioned in the video! Thankfully, Arc is pretty easy to create: it implements From, so you can create one with `vec![1, 2, 3].into()`, and it also implements FromIterator, so you can create one by `.collect()`ing an iterator just like you would any other collection.
      Since the video was all about golfing unnecessary allocations etc., I should also mention that creating an Arc often involves one more memory allocation + memcpy than creating the equivalent Vec would have. There's some well-documented fine print here: doc.rust-lang.org/std/sync/struct.Arc.html#impl-FromIterator%3CT%3E-for-Arc%3C%5BT%5D%3E

    • @mateusvmv
      @mateusvmv Год назад +11

      ​@@_noisecode Is it the same as Vec::into_boxed_slice, which only re-allocates if the vec has excess capacity? Arc implements From without re-allocation.

    • @_noisecode
      @_noisecode  Год назад +38

      It's one more allocation, even if the Vec doesn't have any excess capacity, since in general Arc needs to move the data into its own allocation containing the reference count info. For the record, `From for Arc` does in fact allocate (see the implementation here: doc.rust-lang.org/src/alloc/sync.rs.html#1350 ).

    • @DBZM1k3
      @DBZM1k3 Год назад +1

      How does From compare to simply using into_boxed_slice and using Box::leak instead?

    • @AlgorithmAces
      @AlgorithmAces Год назад +1

      @Ayaan K yes

  • @amateurprogrammer25
    @amateurprogrammer25 Год назад +315

    It occurs to me that your use case for an Arc could potentially be better served by a &'static str or just an enum. If you have an in-game level editor that allows creation of new monster types, Arc would be ideal, but in most cases, the entire list of monsters that will ever exist is known at compile time. If you use an enum for the monster types, you can still derive all the times you were deriving before, with some help from the strum crate or similar you can implement as_str with custom strings containing spaces etc. very easily, your memory footprint is a _single_ word, you can #[derive(Copy)] meaning cloning is effectively instaneous, and as a bonus, you don't need a hashmap to keep track of monsters killed or enemy stats -- just declare the enum as #[repr(usize)] and use it as the index into a Vec, or better, an array.

    • @zerker2000
      @zerker2000 Год назад +43

      So much this. Hashing and string comparisons seem super overkill for a closed set known at compile time, and even if it is extensible in the editor, it still seems better to have the actual ids be a `u16` or w/e and the actual names interned in a global vec somewhere. Most operations don't care about the name! (possibly a tuple of type and instance, if you find yourself runtime spawning "guard1" "guard2" "guard3" etc)

    • @_noisecode
      @_noisecode  Год назад +177

      Thanks for mentioning this, and yes, I couldn't _possibly_ agree more that if you have a closed set of variants known at compile time, please, please use an enum--as you say, it is better in every conceivable way than making your IDs "stringly-typed", especially with the help of e.g. `strum` to get you the string versions if you do need them.
      Sometimes you do need actual dynamic strings though, and they follow a create-once-clone-often usage pattern like what I show in the video. In those cases, I believe my arguments for using Arc over String hold. For what it's worth, the real-world code that inspired the MonsterId in this video actually _was_ an ID that was loaded from a configuration file at runtime, and so there wasn't a closed set of variants known at compile time.

    • @zerker2000
      @zerker2000 Год назад +23

      Personally in that circumstance I'd still be tempted to leak a `&'static [&'static str]`, unless you're reloading the config file _frequently_ or using those string ids over the network or something. But definitely makes more sense in that instance!

    • @alexpyattaev
      @alexpyattaev Год назад +7

      Make struct MonsterID(u16), and custom constructors for it that maintain actual names in a global vec, all behind rwlock. To log you can dereference to the actual location with string data, other "normal" uses can be all in u16. then all your game logic is just moving u16 around, no pointers or anything.

    • @CamaradaArdi
      @CamaradaArdi Год назад +4

      I can envision the scenario where you load a level from a file, then you either have a &'file str which you might not want to or clone the string once, which is really not that expensive.

  • @pyromechanical489
    @pyromechanical489 Год назад +216

    Arc/Rc work best when you don't really know the lifetimes of your data, but if your program is structured in a way that makes lifetimes obvious (say, loading data at the start of a block of code and reusing that), then you can use normal &'a [T] references and get the same benefits of cheap-to-copy immutable data that can be shared between threads, and doesn't even require a pointer indirection on clone!

    • @BigCappuh
      @BigCappuh 11 месяцев назад +2

      How can you share state between threads without Arc?

    • @JeremyHaak
      @JeremyHaak 11 месяцев назад +12

      @@BigCappuh Scoped threads can capture shared references.

    • @araz911
      @araz911 2 месяца назад

      who care about this dog shitf? your code will be thrown to the trash can and replaced by AI anyways

  • @mithradates
    @mithradates Год назад +200

    Nice, did not expect a full 10+ minutes advocating for Arc over Vec on my recommendations. You deserve way more subscribers.

  • @fabbritechnology
    @fabbritechnology Год назад +438

    For high performance code, Arc is not cheap. Cloning a small string may actually be faster depending on your cpu topology and memory access patterns. As always, measure first.

    • @iilugs
      @iilugs Год назад +8

      Great point

    • @FandangoJepZ
      @FandangoJepZ Год назад +9

      Having small strings does not make your program high performance

    • @Mempler
      @Mempler Год назад +13

      A modern CPU (with avx-512 ext) can handle up to 64 bytes at the same time, nearly instantaneously. However, that's only for modern CPUs.
      Thus, if you know your architecture that you're running on, you can do pretty neat optimization

    • @warriorblood92
      @warriorblood92 Год назад +5

      what you mean by cloning small string on stack? Strings are on heap right? so cloning will occur on heap only!

    • @David_Box
      @David_Box Год назад +18

      ​@@warriorblood92 strings can very much be on the stack. A "str" is stored on the stack (well actually it's stored in a read only part of the memory, different from the stack but it functions effectively the same), and you can very much clone them to the stack (even if rust makes it rather difficult to do so).

  • @constantinhirsch7200
    @constantinhirsch7200 Год назад +32

    Rust's Arc is close to Java's default String type: Both are immutable, both will be automatically freed when no one has a reference anymore. Rust's String is more close to Java's StringBuilder.
    I see this as further validation that Arc is in fact quite a sane type to use in many situations.

    • @铜羅衛門
      @铜羅衛門 29 дней назад

      Some even argued that Rust's `String` should have been named `StrBuf` for this reason, like how we have `Path` and `PathBuf`.

  • @michawhite7613
    @michawhite7613 Год назад +22

    Another benefit of of Box is that the characters are actually mutable, even though the length isn't. So you can convert the string to uppercase or lowercase if you need to.

    • @Tumbolisu
      @Tumbolisu Год назад +5

      This makes me wonder if there are any unicode characters where the uppercase and lowercase versions take up different numbers of bytes. I imagine if you add diacritics you might find a situation where one version has a single unicode code point, while the other needs two.

    • @michawhite7613
      @michawhite7613 Год назад

      @@Tumbolisu Unicode groups characters from the same alphabet together, so I think this is unlikely to ever happen

    • @Tumbolisu
      @Tumbolisu Год назад +9

      @@michawhite7613 I actually just found an example. U+1E97 (Latin Small Letter T With Diaeresis) does not have an uppercase version, which instead is U+0054 (Latin Capital Letter T) combined with U+0308 (Combining Diaeresis).

    • @Tumbolisu
      @Tumbolisu Год назад +10

      @@michawhite7613 Oh and how could I forget! ß is one byte while ẞ is two bytes. The larger ẞ was only introduced into the German language a few years back, while the smaller ß is ancient.

    • @michawhite7613
      @michawhite7613 Год назад +4

      @@Tumbolisu You're right. The functions I'm thinking of are actually called `make_ascii_lowercase` and `make_ascii_uppercase`

  • @_jsonV
    @_jsonV Год назад +76

    As a core developer/moderator for Manim, it makes me happy to randomly find Manim-related videos in my recommended. Great job with the explanation of which data structure to use when mutability is(n't) required, and great visuals too!

    • @aemogie
      @aemogie Год назад +21

      manim-rs when /j

    • @Nick-lx4fo
      @Nick-lx4fo Год назад +4

      ​@@aemogiesomebody is probably working on it somewhere

  • @FoxDr
    @FoxDr Год назад +60

    Very good video, the advocated point is really useful indeed. I only have 2 nitpicks about it:
    - It addresses less experienced Rust developers, but you forgot to mention how to construct values of these types (not that it's exceedingly complicated). A pinned comment might help in that regard (since with the algo apparently taking a liking to it, you might get spammed with questions about construction.
    - I would generally never recommend `Rc`, since `Arc` works using relaxed atomic operations, which have no overhead compared to their non-atomic counterparts. And while the MESI protocol may cause cache misses when accessing the cache line where the counts have been updated, this is not relevant when working in a single-threaded environment. So in general, `Rc` and `Arc` have identical runtime costs (not just similar), making using `Rc` useful only when you want to semantically prevent its content's ownership from being shared across threads, without preventing the content from being punctually shared across threads.

    • @_noisecode
      @_noisecode  Год назад +37

      Great feedback, thank you. I think you're right and I went ahead and pinned the existing discussion of how to create an Arc--I agree I should have mentioned it explicitly in the video itself. Live and learn. :)
      As for Rc vs. Arc, your point is well made, but I think I will stick to my guns on recommending Rc where possible. Even if there are expert-only reasons to be sure there is no practical performance difference, this runs counter to the official guidance from the Rust standard library documentation which states that there may indeed be a performance difference (doc.rust-lang.org/std/sync/struct.Arc.html#thread-safety), and aside from performance alone, I would argue that the semantic argument is enough. If I know my type is not meant to be shared across threads, I ought to use the least powerful tool for the job (Rc) that allows me to accomplish that.

    • @zuberdave
      @zuberdave Год назад +25

      Arc's clone uses Relaxed, but its drop does not (it uses Release). In any case the atomic increment in clone is going to be more expensive than a non-atomic increment whether it's relaxed or not. Possibly you are thinking about relaxed atomic loads/stores, which are typically no more expensive than regular loads/stores.

    • @iilugs
      @iilugs Год назад +3

      @@zuberdave Great point!

  • @dekrain
    @dekrain Год назад +34

    Small correction. With Arc/Arc, the Arc pointer itself only stores 1 word, not 2, as the length is stored in the boxed cell on the heap in the String/Vec object, and String/Vec is Sized, unlike str/[T]. This can be potentially useful if space is at the most price, but you can also use a thin Box/Rc/Arc, which isn't available in standard library yet (ThinBox is in alloc, but it's unstable), which stores the length (and maybe capacity) directly next to the data, keeping the pointer single word.

    • @giganooz
      @giganooz Год назад

      Was just about to comment this. Also, hey man, didn't expect to run into you 😂

  • @enticey
    @enticey Год назад +3

    The info graphics for each explanation is expertly simple and straight forward, never change them.

  • @savagemode2150
    @savagemode2150 2 месяца назад +1

    Very good explanation, keep in mind that when dealing with buffers, Tokio has an internal library called Bytes.
    Bytes goal is to provide a robust byte array structure for network programming.
    The biggest feature it adds over Vec is shallow cloning. In other words, calling clone() on a Bytes instance does not copy the underlying data.
    Very useful and necessary..

  • @J-Kimble
    @J-Kimble Год назад +5

    I think this is the best explanation of Rust's internal memory management I've seen so far. Well done Sir!

  • @SJMG
    @SJMG Год назад +5

    That was really well done. I thought 15min on this topic was going to be a slog, but it was a well motivated, well visualized example.
    You've earned a sub. Keep up the good work, Logan!

  • @andres-hurtado-lopez
    @andres-hurtado-lopez Год назад +3

    Is not only a beautiful insight on the internals of memory allocation but also does an implacable job of explaining the topic in plain English so even entry level developers can understand the good, the bad and the ugly. Keep doing such a great job divulging such an awesome programming language !

  • @spikespaz
    @spikespaz Год назад +2

    Your channel is going to explode if you keep doing videos like this one.

  • @leddoo
    @leddoo Год назад +2

    love it!
    i often do something similar with `&'a [T]` by allocating from an arena/bump allocator. (this also has the added benefit that the allocation truncation is free)

  • @waynechoi883
    @waynechoi883 Год назад +1

    Just making this change on a large vec in my program resulted in a 5x speed up for me. Thanks for the video!

  • @kdurkiewicz
    @kdurkiewicz Год назад

    There's a disadvantage of using Rc/Arc though: these types are not serializable, while String is.

    • @_noisecode
      @_noisecode  Год назад +4

      As I mentioned in the video, there's a serde feature flag that enables support for Rc/Arc. Check the docs.

  • @phillipsusi1791
    @phillipsusi1791 6 месяцев назад

    I'll go one further... monster name strings likely are static anyhow, and so you don't even need Arc or Box, you can just use str directly. Then clones don't even need to increment a reference count on the heap, you just copy the pointer.

  • @JeremyChone
    @JeremyChone Год назад +14

    Nice video, and very interesting take. I am going to give this pattern a try in some of my code and see how it goes. Thanks for this great video!

  • @dragonmax2000
    @dragonmax2000 Год назад +18

    Really awesome insight! Please continue making these.

  • @TCSyndicate
    @TCSyndicate 7 месяцев назад +1

    Commenters have pointed it out somewhat, but this video represents a misunderstanding of the purpose of different types. What you want here is a &[T] not an Arc. The confusion is sometimes you feel forced to make an allocation, cause you're doing something like giving monsters ids from a loaded configuration file. In that case, you make 1 allocation at the start of the program for the config file, then each monster holds a &str to that allocation. Having to make an allocation for the config file, doesn't mean you need to make, or hold an allocation for each thing that uses the config file. Consider writing a programming language implementation, with multiple parsing phases. The efficient thing to do is to make 1 String allocation at the start of the program for the source code, then a lex(&str) -> Vec, containing subslices of the original String buffer.

  • @Dominik-K
    @Dominik-K Год назад

    Thanks a bunch for the clarifications. Memory allocations are one of the major factors in shaping performance characteristics and understanding them may not always be an easy task. Your video and especially the visualization help a lot! Great work

  • @dsd2743
    @dsd2743 Год назад

    As for Arc: Depending on the use case, you can just Box::leak() a String and pass around the &'static str. Typically, especially if used as IDs, the total number of such strings is low anyway.

  • @Calastrophe
    @Calastrophe Год назад +18

    I don't typically comment on videos. I have to say this was really well made, please keep up this level of content. I really enjoyed it.

  • @thorjelly
    @thorjelly Год назад +45

    I have a few concerns recommending this to a beginner "as a default". I feel like the times when you actually want to clone the arc, such as if you want to store the same list in multiple structs without dealing with lifetimes, are quite situational. Most of the time, what you should do is dereference it into a slice to pass around, because it is more performant and it is more general. But I am afraid that using an arc "as a default" would encourage a beginner to develop the bad habit of just cloning the arc everywhere. The need to pass an immutable reference/slice is not enforced by the compiler, but it is with other data types. Worse, this could give beginners a bad misunderstanding how clone works, because arc's clone is very different from most other data type's clone. Do we want the "default" to be the absolute easiest, absolute most general solution? Or do we want the default solution to be the one that enforces the best habits? I would argue for the latter.

    • @rossjennings4755
      @rossjennings4755 Год назад +1

      So what you're saying is that we should be recommending Box as default, then. Makes sense to me.

    • @thorjelly
      @thorjelly Год назад +7

      @@rossjennings4755 I would say if you're using Box you might as well just use Vec, unless for some reason you want to guarantee that it will never be resized.

    • @constantinhirsch7200
      @constantinhirsch7200 Год назад

      When you look at the Rust language survey, one bug hurdle always mentioned is the steep learning curve of Rust. Just using Arc for all Strings by default may alleviate that burden. Performance is at least on par with any GC'ed language with immutable Strings (e.g. Java) and those also run fast enough most of the time.
      And secondly, who is to say that all Rust programs must always be optimized for runtime performance? If you do some rapid development (i.e. optimizing for developer performance) in Rust, of course you can use Arc and then later on *if* the program is too slow you can still come back and optimize the critical parts. From that point of view, thinking about lifetimes a lot early in development, just to avoid the reference counting might even be considered a premature optimization.

    • @4xelchess905
      @4xelchess905 Год назад

      @@thorjelly The video mentions immutable data, in which case it won't be resized. But yeah totally agree on what you said, the default good practice should be Vec/Box for the owner and &[T] for the readers, and only consciously opt for Rc when useful or necessary.

    • @4xelchess905
      @4xelchess905 Год назад

      @@constantinhirsch7200 "who is to say that all Rust programs must always be optimized for runtime performance?". Logan Smith. Logan Smith is to say precisely that. The whole point of the video you just watch is to advocate that Arc is more performant at runtime than Vec, while being a drop in replacement.
      The gripe thorjelly and I have with it is that Arc is a lazy halfed-ass optimizations. If you want to delegate optimization for later, why touch the code at all, why learn smart pointers the wrong way where you could stick to cloning Strings ? Wouldn't that be premature optimization, or at least premature ? If you want to optimize, why use smart pointers when a slice reference is both enough and better ?

  • @NoBoilerplate
    @NoBoilerplate Год назад +2

    Fantastic video, wow!

  • @Outfrost
    @Outfrost Год назад +1

    Why would the clone performance of Arc be a factor? You get a pointer to the same exact slice. That's like taking an immutable reference to a Vec, which is faster. It does not fulfil the same role as a Vec clone, so it should not be compared to it.
    I also don't think your stack size and cache locality argument works for anything besides a static string slice. I can't imagine the semantic gymnastics needed to justify iterating over a significant number of Arc clones pointing to the same [T] in memory.
    In general I think you're making a different argument than you think you're making, and giving a different programming tip than you think you're giving.

  • @TobiasFrei
    @TobiasFrei Год назад +1

    I really admire your dense, concise way to "think" in Rust 🤓

  • @nilseg
    @nilseg Год назад +3

    Very nice video ! I love how you explain this. Can't wait your next topic. I shared it on Reddit and already lot of views and good feedbacks. Continue your work ;)

  • @jehugaleahsa
    @jehugaleahsa Год назад +1

    I think what would have helped me was a quick example of how you initialize an Rc, Arc, and Box. It's pretty obvious when the str is a compile time constant, but less obvious when it's from a runtime string. Do you simply create a String and then Arc::new on it? Does memory layout change when it's a compile-time vs runtime string?

    • @_noisecode
      @_noisecode  Год назад +1

      Check the pinned comment! (Spoiler: it's Arc::from("foo"), or Arc::from(my_string)). Memory layout doesn't change, as they're both the same type (Arc).

  • @iamtheV0RTEX
    @iamtheV0RTEX Год назад +5

    Very neat insight! My experience with Arc so far has mostly been limited to either "fake garbage collection" with the Arc anti-pattern, or sharing immutable data between threads or async futures. I've tried avoiding cloning Vecs/Strings by passing around &[T] and &str references (and their &mut counterparts) but putting lifetime annotations in your hashmap keys is a nightmare.

    • @blehbleh9283
      @blehbleh9283 Год назад +4

      How is that an antipattern for async shared state?

    • @iamtheV0RTEX
      @iamtheV0RTEX Год назад +1

      @@blehbleh9283 I didn't say it was, I said fake GC was the antipattern, where you give up on handling lifetimes and overuse Rc and Arc in cases where it's not necessary.

    • @blehbleh9283
      @blehbleh9283 Год назад

      @@iamtheV0RTEX oh okay! Thanks for teaching

  • @markay7311
    @markay7311 Год назад +9

    This to me seems like comparing apples and oranges. As you mentioned, Vec works well for a modifiable buffer. Yet, you do advocate for using a simple slice wrapped by Arc. This assumes you have the slice at compile time. How would you build your slice dynamically without Vec? It seems to me you would still need a Vec, which you can convert into [T] to wrap with Arc. Even worse, Arc is usually for multithreaded environments. Why not just use Rc? My point is, I don’t see this suggestion making any sense really, as these two type have very different specific use cases. The video was well made though, I appreciate the great effort.

    • @iilugs
      @iilugs Год назад +2

      vec![...].into() gives you an Arc, so it's one "clone", and from then on no expensive clones at all.
      So you build it initially with Vec, and then convert it. After this conversion, all the points in the video apply.
      Regarding Rc vs Arc 3:20

    • @markay7311
      @markay7311 Год назад

      @@iilugs it sounds to me like simply borrowing would do the trick

    • @expurple
      @expurple Год назад +1

      @@markay7311 It would, but only in a single-threaded environment and only if there's an obvious owner that outlives the rest. Also, Rc/Arc don't require lifetime annotations (I don't mind these, but only for simple cases with temporary "local" borrowing)

    • @PhthaloJohnson
      @PhthaloJohnson Год назад

      @@expurple Using a non mutable reference is perfectly fine for as many threads as you wish. Arc isn't some magic savior.

  • @Rose-ec6he
    @Rose-ec6he Год назад +6

    I'm not fully convinced. I'd love to see a follow-up video about this. Here's my thoughts.
    When i first saw this pop up in my feed I was very confused because Arc is a wrapper to a pointer and Vec is a data structure so comparing an Arc to a Vec seems like an unfair comparison. It seems more appropriate to me to compare Arc to Arc and there's very little difference here, though i suppose specifically when dealing with strings it's not easy to get access and use the underlying vec, nonetheless, It makes more sense to me to compare the two.
    Until you brought up the fact Arc implements deref I was thinking it was all round acpointl idea but now I'm split on the issue.
    Something else to consider is ease of use which I dont think you addressed very well.
    Lifetimes will definitely come into play here but dont with String so it won't be just as easy to pass around at all. Another barrier is if you need to build the string at runtime you will normally end up with a vec anyway which could be shrunk to size and putting the vec behind an arc would achieve mostly the same thing, in comparison having an array pre-built at compile time is very rare in my experience. There are definitely extra steps and efforr involved here which I'm not convinced you have considered carefully. There is no built-in way to convert from a vec to an array, there are some useful crates but more libraries always mean more complexity in your codebase so they're best avoided adding without some consideration.
    I also think the performance benefits you state are very exhaggerated and It's never worth talking performance without having some benchmarks to back them up imo. Strings are rarely large too so the memory reduction might be there but it would be small, but once again there's not benchmarks to back any of this up so I don't know and I'm not set in either perspective.
    I'll keep an eye on your channel. I hope to see some follow-up!

  • @RenderingUser
    @RenderingUser Год назад +1

    This could not have come at a more perfect time. I've been storing a list of a list of immutable data with a thousand elements in a vec

    • @ronniechowdhury3082
      @ronniechowdhury3082 Год назад

      You should not be storing 2 d arrays, switch to contiguous storage and store the dimensions. ndarray might be an option

    • @RenderingUser
      @RenderingUser Год назад +3

      @@ronniechowdhury3082 I wish I knew what contiguous storage means.

    • @ronniechowdhury3082
      @ronniechowdhury3082 Год назад

      @@RenderingUser just create a stuct that stores each row appended together in one long vec. Then store the width and height as usize. Finally add some methods that access a row or column at a time by slice. It will make your data access significantly faster.

  • @otaxhu
    @otaxhu Год назад +2

    great video. I'm learning Rust and this video is very helpful for understanding different ways of storaging data. I'm struggling with borrowing and ownership system but well I couldn't do any better

  • @inertia_dagger
    @inertia_dagger Год назад +1

    loved the video, and loved the discussions in the comments too. really appreciate it as the rust beginner, keep it up!

  • @endogeneticgenetics
    @endogeneticgenetics Год назад +2

    `str` can also be accessed across threads via `&str` (since its immutable). And cloning has no special properties I can think of here since the data is immutable. `Arc` only seems advantageous if you want reference counting vs relying on something like `static for a string. The video was fun either way -- but can you give a reason you'd prefer the Arc or Rc fat pointer to just referencing str?

  • @timClicks
    @timClicks Год назад +1

    Love this Logan. What a wonderful explanation and a good challenge to orthodoxy. I'll provide one answer the question that you posed a few times in the video, "Why use String (or Vec) rather than Arc?". That's because accessing the data from an Arc incurs some runtime cost to ensure that Rust's ownership semantics are upheld. That cost doesn't need to be paid by exclusively owned types.

    • @_noisecode
      @_noisecode  Год назад +1

      Thanks for the kind words. :)
      Accessing an Arc incurs no runtime cost with regard to Rust's ownership rules. The runtime cost of accessing the pointed-to data is about the same as for Vec: a pointer indirection. Possibly you are thinking of RefCell? RefCell does involve some slight runtime overhead due to essentially enforcing Rust's borrow checking rules at runtime.

    • @timClicks
      @timClicks Год назад

      @@_noisecode Oof, I knew that I should have looked that up. You're right

  • @irlshrek
    @irlshrek Год назад

    this was fun! its like when they say "make it work, then make it right, then make it fast". This is a really good example for what to do in that second or third step!

  • @GuatemalanWatermelon
    @GuatemalanWatermelon Год назад

    The visuals were fantastic in guiding me through your explanation, great stuff!

  • @asefsgrd5573
    @asefsgrd5573 Год назад +1

    I would also mention `.as_ref()` as some impl types require the exact `str` type. Great video!

  • @kirglow4639
    @kirglow4639 Год назад +1

    Awesome video and narration! Always exciting to see well-explained Rusty content. Keep it up!

  • @nebulaeandstars
    @nebulaeandstars Год назад +4

    - Need ownership over a T? use Box.
    - Need multi-ownership over a T? use Rc.
    - Need multi-ownership of a T across threads? use Arc.
    - Need an array of U? use [U;N] or &[U].
    - Need a dynamically-sized array of U? use Vec.
    Substitute U and T as-needed.

  • @Shaunmcdonogh-shaunsurfing
    @Shaunmcdonogh-shaunsurfing Год назад +2

    I’ve turned on the bell notification. Also, happy to pay for a cheat sheet on memory allocation recommendations.

  • @alexpyattaev
    @alexpyattaev Год назад +1

    The actual amount of memory allocated to a String during clone operation is actually allocstor specific. For 6 byte string I would not be surprised to see allocator using 8 byte bucket. So there will always be a degree of waste when cloning strings/vecs.

  • @SophieJMore
    @SophieJMore Год назад

    Arc is sort of similar to how a lot of other languages like Java or C# handle strings, isn't it?

  • @jermaineallgood
    @jermaineallgood Год назад

    Thank you for this insight! I’d never think to use Arc instead of Vec, probably use Criterion to see performance timing between both

  • @blehbleh9283
    @blehbleh9283 Год назад

    Arc is a godsend for concurrency

  • @sanderbos4243
    @sanderbos4243 Год назад

    Your graphics and script are a masterpiece

  • @tommyponce2511
    @tommyponce2511 Год назад

    Started watching the video thinking Arc and Vec had totally different use cases, and I'm glad you proved me wrong lol very useful info when you're trying to implement memory efficient coding. Thanks for the data man, really interesting and useful stuff. Cheers!

  • @scvnthorpe__
    @scvnthorpe__ Год назад

    Weird question but, if a Vec is meant to be growable, why does it have a defined capacity?
    My best guess is that some people might initialise a Vec with n null spaces for performance reasons (given general expected requirements) and then you'd need to know if you can safely go faster with allocations in the intended way... but it's, let's be real, a pretty poor guess lol.

  • @rohankapur5776
    @rohankapur5776 Год назад

    this was very informative. we need more rust golfing vids on youtube!

  • @cookieshade197
    @cookieshade197 Год назад +4

    I'm confused by the use case presented -- if you want cloneable, immutable string data, surely you'd just pass around indices into a big Vec, or even just &str's directly if the lifetimes allow it?
    Good video nonetheless.

    • @iwikal
      @iwikal Год назад

      Sure, you could always construct a Box and then Box::leak it to get an immortal &'static str if you're fine with never reclaiming the memory. This memory leak could become a problem if it's unbounded though. Imagine the game is able to spawn arbitrarily many monsters over time, creating more and more IDs. I'm assuming by immutable he meant "immutable as long as it's in use, but after that it gets deleted". If you want to reclaim memory by getting rid of unused IDs, the Vec strategy gets iffy. What if you want to delete an ID in the middle of the Vec? Not an unsolvable problem, but it's already getting much more complex than the simple MonsterID(String) we started with. Plus, if you actually want to access the string contents you need access to the Vec, so you need to pass around a reference to it. And if you're going multithreaded you need to protect it with a Mutex or similar. I'm not a fan.

    • @cookieshade197
      @cookieshade197 Год назад

      @@iwikal Hmm, all true on paper. I would assume that, even in a very large game, all monster ID strings ever encountered during runtime are a finite set taking up at most 10kB or so in total, or maybe 1MB if we have very long text descriptions. If the game can dynamically generate large numbers of monster ID strings, or load/deload bigger data chunks, I'd try replacing the Vec with a HashMap or similar, though that gets awkward with multithreading for the same reason.

    • @iwikal
      @iwikal Год назад

      @@cookieshade197 If you leak all IDs and the game keeps allocating new ones, it will run out of memory sooner or later (potentially much later). Maybe you can get away with it if you assume that nobody will leave the game running for more than 24h straight, but what if it's a server? Ideally it should be able to handle years of uptime.

    • @iwikal
      @iwikal Год назад

      @@cookieshade197 To elaborate, what I mean is you can't always assume that there is a reasonably sized set of possible IDs, and even if there was you'd have to construct some kind of mechanism for reusing the old ones. Say we were talking about a ClientId instead, based partially on their IP address. It just seems wrong to me that I should let those stay around in memory after the connection is terminated, until the same client connects again. Maybe they never do, in which case the memory is wasted.

    • @masondeross
      @masondeross Год назад

      @@iwikal The issue isn't running out of memory. That is almost never going to happen in a real game. The issue is cache misses. You want to be able to perform operations on a large number of monsters every single frame, and every unnecessary byte (for the particular operation, hence games using data orientated design where "wasteful" copies of monsters using different structures are fine as long as only minimal related data is kept in context for each part of logic; it isn't about total memory usage in games, which is very counterintuitive to other domains) is another monster pushed off the cache.

  • @CamembertDave
    @CamembertDave Год назад +3

    I agree with your premise and the reasons you give from 12:45, but I found your main arguments kinda... odd? In your opening points you say this is especially useful for data that implements Clone, but the usage pattern you lay out explicitly involves not cloning the data. You clone Strings in the example, but there's clearly no reason to do that because the data is immutable - you're only cloning the Strings to get multiple references to that data. Passing around multiple references to a single piece of data is the whole point of Arc, so of course that is a better solution than duplicating the data to share it. It actually feels like that's the real reason you are wanting to use Arc, but it's not mentioned in the video. You do make a good point of explaining the inefficiency of Arc though.
    The example itself also strikes me as odd, because the ids are a struct which implements Clone and then to avoid the performance cost of cloning the ids all over the place you reach for Arc, when surely the more natural optimization is to avoid the unnecessary cloning by using a struct which implements Copy instead of Clone, say MonsterId(u64)? If you really need the string data for something other than simply being an id, then you can put that in the EnemyStats struct (which I would assume contains various other data you don't want to be copying around even if immutable).
    As I said though, I do agree with your overall point. Perhaps an example that used Vec would have cleared these points up, because although - as you quite rightly point out - String and Vec work in essentially the same way, they are quite distinct semantically in most situations. It would be obvious that calling clone on the (probably very long) enemies_spawned Vec is a bad idea, for example, even if this was immutable.

  • @nazim9639
    @nazim9639 Год назад +2

    sanest rust dev

    • @Heynmffc
      @Heynmffc 4 месяца назад

      Finally someone else normal in the comments.
      Everything about this is wild and I love it.

  • @robertotomas
    @robertotomas Год назад

    Awesome! Thank you for this explanation. I’ve heard bits and pieces of this before and it was making sense that I should start doing this as I am learning rust… but this one video gave me a ton of context; I think I’m actually going to do this as a reactor phase now 😊

  • @marshallhank891
    @marshallhank891 Месяц назад +1

    it reminds me that when programming in C, we just to pass pointers around, it's cheap.

  • @jaykickliter
    @jaykickliter Год назад +5

    This video has some not-great advice and half-truths

    • @Rudxain
      @Rudxain 2 месяца назад

      I guess it's because some examples are really contrived

  • @joelmontesdeoca6572
    @joelmontesdeoca6572 Год назад

    This was fantastic. Thank you for making this video.

  • @ambuj.k
    @ambuj.k Год назад

    I recently tried using the Arc in my recent project which was a websocket chat-app and there were too many clones of a string to send to every channel. The problem with using this type is that it is not serializable or deserializable by serde and serde derive does not work on it.

    • @_noisecode
      @_noisecode  Год назад

      Arc works just fine with Serde, you just need to flip on a Serde feature flag. I mentioned this in the video and there's a link to the feature flag docs in the description. :)

    • @ambuj.k
      @ambuj.k Год назад

      @@_noisecode Hey thanks! I didn't read the description. I enabled "rc" feature and it now works!

  • @Otakutaru
    @Otakutaru Год назад +4

    You know that Rust is healthy as a language when there are videos about it that only rustaceans could fully understand and make use of

  • @DenisAndrejew
    @DenisAndrejew Год назад +2

    Good food for thought and illustrations, but I very much wish you would use Rc instead of Arc in most of this, and then showed folks how to determine if you actually need to "upgrade" to Arc when necessary. Healthier practice for the whole Rust ecosystem to not default to thread-safe types & operations when not actually necessary. We'll all pay with decreased performance of the software we use proportionately to how much thread-safe code is overused. 🙂

  • @ChetanBhasin
    @ChetanBhasin Год назад +1

    Out of curiosity, what tool are you using for making your videos?

  • @jeffg4686
    @jeffg4686 Год назад

    Great tutorial. One thing I was thinking about recently is the overuse of Result - not all functions are fallible, yet many unnecessarily return Result instead of just a value for the infallible functions. I think everyone just got used to returning Result... Worth looking into. Also worth a clippy lint if there isnt one for this. For an API, it should always be Result oc, but we're often not developing apis

    • @zombie_pigdragon
      @zombie_pigdragon Год назад +1

      Hm, do you have any examples where this has happened? I've never seen it in the wild.

  • @Kupiakos42
    @Kupiakos42 Год назад +1

    14:20 isn't totally right: Arc doesn't need to carry a length; it's a thin pointer

  • @rainerwahnsinn3262
    @rainerwahnsinn3262 Год назад

    It seems `Box` is like an immutable `String`, but even better because it lacks the capacity because it can’t ever allocate.
    In other words, if your `String` is not mutable, you should use `Box`. What am I missing?

    • @_noisecode
      @_noisecode  Год назад

      Cloning a Box requires a deep clone of the string data. Cloning Arc only does a shallow clone and bumps the refcount. If you don't need clone, you're right (as I also mention at the end), Box is your best option. If you do, Arc can be better. Both are better (for immutable strings) than String.

  • @JohnWilliams-gy5yc
    @JohnWilliams-gy5yc Год назад

    Shared-ownership immutable string with one indirection performance is somewhat impressively clever. However its benefits *_really_* shine only (1) when you do tons and tons of copy of long permanent text and (2) when your design desperately can't attach a string ownership with anything else in the same scope at all, which make that scenario kind of *_unlikely_* TBH. Moreover this may encourage so-called pre-mature optimization that gains unnecessary complexity to your design with a little speed gain.
    Kudos to very clear beautiful animation. Thank you.

  • @Turalcar
    @Turalcar Год назад +2

    I'd also look into compact_str (there are other similar crates but this one is the fastest of those I tried).

  • @clockworkop
    @clockworkop Год назад +5

    Hello, the video is great and I really like the point you are making, especially the cash proximity. I will definitely give this a try at some point. Even with that though, I have a few questions.
    By wrapping the values in Arc, you are effectively turning clones into references without lifetimes. I understand that sometimes its better and easier to work with the full owned value, but if you need that, you can just clone the shared reference on demand.
    I don't know why, but this feels a bit like Swift to me. Rust has the advantage of the ownership model so if you can do the job just with shared references, I don't see the need for Arc. But of course I could be wrong so please correct me if that's the case.

    • @_noisecode
      @_noisecode  Год назад +8

      I think it's really insightful for you to compare Arc to something you might find in Swift. Arc does share a lot of similarities with Swift's copy-on-write Array struct, and Foundation's NSArray (where `-copy` just gives you back the same array with an increased reference count). The core insight is the same: for an immutable data structure, a shallow copy is equivalent to a deep copy.
      Rust's superpower is of course that you can hand out non-owning &[T]s and be sure you aren't writing dangling reference bugs. And the video does not intend to dispute that! You should absolutely design your code to hand out references instead of ownership where it makes sense to do so. In my video, I'm just pointing out that Arc can be an optimization over Vec when you were already using Vec--in other words, in places where you've already decided that you need to give out ownership.

  • @trustytrojan
    @trustytrojan Год назад

    great video, im not the most familiar with rust but this explanation resonated with me
    but mostly all this made me think of is how Java's String is straight up immutable from the get-go 😂

    • @habba5965
      @habba5965 Год назад

      Rust's str literal is also immutable.

  • @YTCrazytieguy
    @YTCrazytieguy Год назад +9

    I completely disagree. Here's why:
    * regarding the O(1) clone, this isn't directly comparable to taking a clone of a Vec, because cloning a Vec lets you modify/consume it. It's more comparable to just taking a shared reference, which is even cheaper than Arc.clone().
    * Regarding the smaller stack size - this is plane false, as you forgot to count the two reference counts - strong and weak. Which makes it actually larger than a Vec.
    For most use cases, if you don't care about mutability you can just use a &[T]. Arc is useful when you write multithreaded code and need to deallocate the data sometime before the program end (otherwise you could just use a &'static [T]).

    • @durnsidh6483
      @durnsidh6483 Год назад +2

      The reference counts are stored on the heap, so the stack size is smaller. Also he was talking specifically about instances where you don't know the lifetime of the object when you compile, which is what you need to use references.

  • @torsten_dev
    @torsten_dev 6 месяцев назад

    I'd prefer a Cow.

  • @marcb907
    @marcb907 Год назад

    Interesting content and well explained. You should do more videos like this.

  • @SaHaRaSquad
    @SaHaRaSquad Год назад

    For short strings the smartstring library is even better: it stores strings of up to 23 bytes length in-place without any heap allocations, and imitates the String type's interface. Basically like smallvec but for strings.

  • @Turalcar
    @Turalcar Год назад +1

    Box is a reasonable default for immutable data. It's also smaller than either Vec. If you need Clone, use &[T]. Reference counting in 90% of the cases is a consequence of poor (or lack of) design.

    • @m.sierra5258
      @m.sierra5258 Год назад

      This. Thanks.

    • @iilugs
      @iilugs Год назад

      If I understand correctly, you're saying that Rc is only useful when cloning is needed, and cloning is seldom needed. Is that it?

    • @Turalcar
      @Turalcar Год назад

      @@iilugs Some sort of cloning is almost always needed. Rc is useful when you don't know what releases the last reference and you often either do or can rewrite in a way that you will so that all other references can be counted at compile time.

  • @Erhune
    @Erhune Год назад +2

    In your final section about Arc, your diagram shows Arcs having ptr+len, but in this case String is a Sized type so the Arc only has ptr. Of course that doesn't undermine your point that Arc is just plain bad :)

    • @_noisecode
      @_noisecode  Год назад

      Ack, you're right! That Arc pointing to the String should be just a single pointer, no len. Thanks for pointing that out! My mistake.

  • @Xld3beats
    @Xld3beats Год назад

    Went down the rabbit hole, the important thing I was missing is Box is not the same as Box!!!

    • @_noisecode
      @_noisecode  Год назад

      Indexing into Arc/Box etc. works just fine because they deref to [T], which is the thing that implements indexing. Try it out!

  • @craftminerCZ
    @craftminerCZ Год назад

    One thing to note about Box is that if you're trying to basically allocate a massive array on the heap, you'll hit on one of its fundamental problems, that being Box first allocated on the stack and only then copies stuff onto the heap. This results in very easy stack overflows when you're supposedly allocating on heap, unwittingly overflowing the stack in the process of trying to Box an array of a size too massive for the small default stack size rust has.

    • @iilugs
      @iilugs Год назад

      Is there any work around this?

  • @dlhsnbrn1275
    @dlhsnbrn1275 Год назад

    Another small, but interesting correction about Arc.
    The pointer in Arc does not actually point to the reference count, it points to the data. When manipulating reference counts, Arc calculates the pointer to the refcounts using very scary pointer arithmetic. I guess this is to ensure that the compiler can coerce an Arc into a &dyn Trait (for sized T). It also makes the deref implementation a no-op.

    • @_noisecode
      @_noisecode  Год назад

      Interesting, do you have a source on that? I'm seeing Arc holding a pointer to an ArcInner, which stores `strong`, then `weak`, then the possibly-DST `data`. doc.rust-lang.org/src/alloc/sync.rs.html#250 The implementation appears to access `strong`, `weak`, and `data` through normal dot syntax, including in Deref. doc.rust-lang.org/src/alloc/sync.rs.html#1545

    • @dlhsnbrn1275
      @dlhsnbrn1275 Год назад

      @@_noisecode I am completely confused. I was pretty sure I spent a long time reading this code and being amazed about it. Now that I read it again, it seems I was wrong.
      That makes me wonder how the compiler implements dynamic dispatch.

    • @_noisecode
      @_noisecode  Год назад

      It's possible it used to be as you described--I honestly don't know if it changed at some point, I just know how it looked when I was doing research for this video. :) The pointer arithmetic you're talking about seems spooky indeed so this version seems better.
      My mental model for the Arc is the same as my mental model for Arc etc. When you do Arc of !Sized, it sticks that DST in the ArcInner, and then a pointer to that ArcInner becomes a wide pointer whose metadata is the same as the metadata would be for a pointer to the nested DST. So for Arc the ArcInner's metadata becomes the slice len, and for dyn Trait it becomes the vptr. As far as I know, the details of this stuff are fairly underspecified at the moment, so I've punted on learning the rules to a tee. Unstable traits like this seem possibly related too, although I haven't studied it all carefully: doc.rust-lang.org/std/ops/trait.DispatchFromDyn.html

  • @kaikalii
    @kaikalii Год назад

    This is a great video. I'd love to see more like it.

  • @sharperguy
    @sharperguy Год назад

    So now I wonder what kind of situations Cow would be more appropriate when modifying the data might be required.

  • @user-vn9ld2ce1s
    @user-vn9ld2ce1s Год назад +5

    Correct me if i'm wrong, but if you want just some immutable data that can be passed around, you could just create a regular array, then drain it, and you'll get a 'static array slice. With that you can do anything.

    • @1vader
      @1vader Год назад +1

      I assume you mean "leak" instead of "drain". And yeah, that can also be a good option, as long as you don't eventually want to free them again.

    • @user-vn9ld2ce1s
      @user-vn9ld2ce1s Год назад +1

      @@1vader Yeah, that's what i meant.

  • @Kupiakos42
    @Kupiakos42 Год назад +1

    I wonder if we could save some cycles by instead having the `ptr` at 10:00 point directly to the data instead of needing to offset. It would require a negative offset for accessing strong and weak but that's much rarer than Deref.

  • @sliced_array
    @sliced_array 4 месяца назад

    Thanks! This was great as a beginner to rust.

  • @EgnachHelton
    @EgnachHelton Год назад

    If you need a even more powerful version of Arc that's basically a mini version control systems, consider immutable data structures like those in "im" crate.

  • @zeburgerkang
    @zeburgerkang 10 месяцев назад

    subbed and saved for future reference... easy to understand explanation.

  • @ouchlock
    @ouchlock Год назад

    awesome, wanna more content on Rust like this

  • @professornumbskull5555
    @professornumbskull5555 Год назад +2

    2:49
    1. Why on earth would you clone immutable data? Just use an immutable reference.
    2. What about the strong and the weak counts, who'll account for that?
    4:50
    1. Why use string id in the first place, why not just a u64 id? Then there exists no issue to begin with...
    2. Ok, you want to use a string ID, but why would anybody but your monster need that? You wanna store all the monsters you've created store them in a vec and have their state as dead or alive.
    3. Why are you creating Monster with Monster ID, shouldn't the monster generate one? If you want to control Monster types, you should use an enum, string shouldn't even be on your list of things to use, if you really really wanna, use &'static str, no need of string to begin with.
    4. Again if you don't need to modify the data, just use an immutable reference, don't clone it, problem solved.
    5. If you use the reference, the word point disappears, so...
    I don't see the point made in this video.

  • @rsnively
    @rsnively Год назад +2

    Great explanation

  • @FinaISpartan
    @FinaISpartan Год назад +2

    Just a reminder that if you dont need thread saftey, you're better off with Box or Rc

  • @harleyspeedthrust4013
    @harleyspeedthrust4013 Год назад +2

    This seems like something that could be done by the compiler. If the compiler encounters say a Vec, and it can prove that after construction, no mutable methods are called on the vec or on its clones, then it could convert it into an Arc. I don't know if that's theoretically impossible but I can't think of any counterexamples off the top of my head

  • @tyu3456
    @tyu3456 Год назад

    Awesome video!! Btw I love the font you're using, looks kinda like LaTex

    • @_noisecode
      @_noisecode  Год назад +1

      It is! Courtesy of the Manim library--see the link in the description. :)

  • @gideonunger7284
    @gideonunger7284 Год назад

    Box should be the default. Rc, Arc only if its shared which Vec wouldnt be for example so very different use case.
    Sadly both on linux and windows Box visualizer is broken in the debugger. Arc, Rc of slices have workinv debugger visualizers.

  • @vmarzein
    @vmarzein Год назад

    Really like this video. Nice that it has no music over it

  • @1vader
    @1vader Год назад +2

    6:55 As far as I can tell, it generally doesn't allocate extra space if you create the whole string at once with String::from (and ofc String::with_capacity). Which actually seems a bit odd, afaik most allocators only give out 8 byte aligned regions so it would make sense if String just took the rest as well. Though I guess in that case, a realloc that stays below the actual region size probably also would be free.

    • @Kupiakos42
      @Kupiakos42 Год назад

      The allocator API atm requires that you pass in the same layout for deallocation as you got for allocation. Excess capacity info is otherwise lost for Arc and so the conversion from String does a clone iff len ≠ cap.
      In other words, this exact allocation attitude works well with converting to smart pointers that treat length and capacity as equal

    • @1vader
      @1vader Год назад

      @@Kupiakos42 I wasn't talking about smart pointers. I was talking about String.

    • @Kupiakos42
      @Kupiakos42 Год назад

      @@1vader I mean smart pointers and how they interact with allocators is a specific reason for String, and the global allocator generally, to allocate exact space instead of a multiple of 8.

    • @1vader
      @1vader Год назад

      @@Kupiakos42 I'm not really sure I understand what you mean. It sounds like you're saying it's so that the String allocation can just be re-used for the Arc allocation during conversion from String to Arc but that doesn't seem possible since Arc needs extra stuff at the front of the allocation. And you also said in your initial comment that a clone is done iff len = capacity? But shouldn't it be the other way around?

    • @1vader
      @1vader Год назад

      @@Kupiakos42 Checking the source code, it indeed looks like Arc always creates a new allocation and copies everything over.

  • @soulldev
    @soulldev Год назад

    Great video, great channel, Arc is so good.

  • @Hector-bj3ls
    @Hector-bj3ls 10 месяцев назад

    I found a downside to this recently. `Arc` and `Rc` are not `serde::Serialize`.

  • @DylanRJohnston
    @DylanRJohnston Год назад +6

    Unless you’re planning on generating monster IDs at runtime why not just drop the ARC and use &’static str? Or for that matter why not use a unique zero width type and make MonsterID a trait?

    • @appelnonsurtaxe
      @appelnonsurtaxe Год назад

      The ustr crate also provides a convenient _leaked_ string type with O(1) comparison time a O(n) construction. If the variants aren't known at compile time but don't need freeing after they're known, it can be a good approach.