As a person relatively new to Rust, I kept thinking but Vec has the macro vec! for ease. And an Arc might not be as ergonomic to get in place. It would have been nice if that pre -step was indulged. Because you might reach for a Vec based on the ease vec! provides. So helpful though and was a great learning tool and fun watch.
That's a great point, and one I probably should have mentioned in the video! Thankfully, Arc is pretty easy to create: it implements From, so you can create one with `vec![1, 2, 3].into()`, and it also implements FromIterator, so you can create one by `.collect()`ing an iterator just like you would any other collection. Since the video was all about golfing unnecessary allocations etc., I should also mention that creating an Arc often involves one more memory allocation + memcpy than creating the equivalent Vec would have. There's some well-documented fine print here: doc.rust-lang.org/std/sync/struct.Arc.html#impl-FromIterator%3CT%3E-for-Arc%3C%5BT%5D%3E
@@_noisecode Is it the same as Vec::into_boxed_slice, which only re-allocates if the vec has excess capacity? Arc implements From without re-allocation.
It's one more allocation, even if the Vec doesn't have any excess capacity, since in general Arc needs to move the data into its own allocation containing the reference count info. For the record, `From for Arc` does in fact allocate (see the implementation here: doc.rust-lang.org/src/alloc/sync.rs.html#1350 ).
It occurs to me that your use case for an Arc could potentially be better served by a &'static str or just an enum. If you have an in-game level editor that allows creation of new monster types, Arc would be ideal, but in most cases, the entire list of monsters that will ever exist is known at compile time. If you use an enum for the monster types, you can still derive all the times you were deriving before, with some help from the strum crate or similar you can implement as_str with custom strings containing spaces etc. very easily, your memory footprint is a _single_ word, you can #[derive(Copy)] meaning cloning is effectively instaneous, and as a bonus, you don't need a hashmap to keep track of monsters killed or enemy stats -- just declare the enum as #[repr(usize)] and use it as the index into a Vec, or better, an array.
So much this. Hashing and string comparisons seem super overkill for a closed set known at compile time, and even if it is extensible in the editor, it still seems better to have the actual ids be a `u16` or w/e and the actual names interned in a global vec somewhere. Most operations don't care about the name! (possibly a tuple of type and instance, if you find yourself runtime spawning "guard1" "guard2" "guard3" etc)
Thanks for mentioning this, and yes, I couldn't _possibly_ agree more that if you have a closed set of variants known at compile time, please, please use an enum--as you say, it is better in every conceivable way than making your IDs "stringly-typed", especially with the help of e.g. `strum` to get you the string versions if you do need them. Sometimes you do need actual dynamic strings though, and they follow a create-once-clone-often usage pattern like what I show in the video. In those cases, I believe my arguments for using Arc over String hold. For what it's worth, the real-world code that inspired the MonsterId in this video actually _was_ an ID that was loaded from a configuration file at runtime, and so there wasn't a closed set of variants known at compile time.
Personally in that circumstance I'd still be tempted to leak a `&'static [&'static str]`, unless you're reloading the config file _frequently_ or using those string ids over the network or something. But definitely makes more sense in that instance!
Make struct MonsterID(u16), and custom constructors for it that maintain actual names in a global vec, all behind rwlock. To log you can dereference to the actual location with string data, other "normal" uses can be all in u16. then all your game logic is just moving u16 around, no pointers or anything.
I can envision the scenario where you load a level from a file, then you either have a &'file str which you might not want to or clone the string once, which is really not that expensive.
Another benefit of of Box is that the characters are actually mutable, even though the length isn't. So you can convert the string to uppercase or lowercase if you need to.
This makes me wonder if there are any unicode characters where the uppercase and lowercase versions take up different numbers of bytes. I imagine if you add diacritics you might find a situation where one version has a single unicode code point, while the other needs two.
@@michawhite7613 I actually just found an example. U+1E97 (Latin Small Letter T With Diaeresis) does not have an uppercase version, which instead is U+0054 (Latin Capital Letter T) combined with U+0308 (Combining Diaeresis).
@@michawhite7613 Oh and how could I forget! ß is one byte while ẞ is two bytes. The larger ẞ was only introduced into the German language a few years back, while the smaller ß is ancient.
For high performance code, Arc is not cheap. Cloning a small string may actually be faster depending on your cpu topology and memory access patterns. As always, measure first.
A modern CPU (with avx-512 ext) can handle up to 64 bytes at the same time, nearly instantaneously. However, that's only for modern CPUs. Thus, if you know your architecture that you're running on, you can do pretty neat optimization
@@warriorblood92 strings can very much be on the stack. A "str" is stored on the stack (well actually it's stored in a read only part of the memory, different from the stack but it functions effectively the same), and you can very much clone them to the stack (even if rust makes it rather difficult to do so).
Arc/Rc work best when you don't really know the lifetimes of your data, but if your program is structured in a way that makes lifetimes obvious (say, loading data at the start of a block of code and reusing that), then you can use normal &'a [T] references and get the same benefits of cheap-to-copy immutable data that can be shared between threads, and doesn't even require a pointer indirection on clone!
Rust's Arc is close to Java's default String type: Both are immutable, both will be automatically freed when no one has a reference anymore. Rust's String is more close to Java's StringBuilder. I see this as further validation that Arc is in fact quite a sane type to use in many situations.
As a core developer/moderator for Manim, it makes me happy to randomly find Manim-related videos in my recommended. Great job with the explanation of which data structure to use when mutability is(n't) required, and great visuals too!
Small correction. With Arc/Arc, the Arc pointer itself only stores 1 word, not 2, as the length is stored in the boxed cell on the heap in the String/Vec object, and String/Vec is Sized, unlike str/[T]. This can be potentially useful if space is at the most price, but you can also use a thin Box/Rc/Arc, which isn't available in standard library yet (ThinBox is in alloc, but it's unstable), which stores the length (and maybe capacity) directly next to the data, keeping the pointer single word.
Very good video, the advocated point is really useful indeed. I only have 2 nitpicks about it: - It addresses less experienced Rust developers, but you forgot to mention how to construct values of these types (not that it's exceedingly complicated). A pinned comment might help in that regard (since with the algo apparently taking a liking to it, you might get spammed with questions about construction. - I would generally never recommend `Rc`, since `Arc` works using relaxed atomic operations, which have no overhead compared to their non-atomic counterparts. And while the MESI protocol may cause cache misses when accessing the cache line where the counts have been updated, this is not relevant when working in a single-threaded environment. So in general, `Rc` and `Arc` have identical runtime costs (not just similar), making using `Rc` useful only when you want to semantically prevent its content's ownership from being shared across threads, without preventing the content from being punctually shared across threads.
Great feedback, thank you. I think you're right and I went ahead and pinned the existing discussion of how to create an Arc--I agree I should have mentioned it explicitly in the video itself. Live and learn. :) As for Rc vs. Arc, your point is well made, but I think I will stick to my guns on recommending Rc where possible. Even if there are expert-only reasons to be sure there is no practical performance difference, this runs counter to the official guidance from the Rust standard library documentation which states that there may indeed be a performance difference (doc.rust-lang.org/std/sync/struct.Arc.html#thread-safety), and aside from performance alone, I would argue that the semantic argument is enough. If I know my type is not meant to be shared across threads, I ought to use the least powerful tool for the job (Rc) that allows me to accomplish that.
Arc's clone uses Relaxed, but its drop does not (it uses Release). In any case the atomic increment in clone is going to be more expensive than a non-atomic increment whether it's relaxed or not. Possibly you are thinking about relaxed atomic loads/stores, which are typically no more expensive than regular loads/stores.
Commenters have pointed it out somewhat, but this video represents a misunderstanding of the purpose of different types. What you want here is a &[T] not an Arc. The confusion is sometimes you feel forced to make an allocation, cause you're doing something like giving monsters ids from a loaded configuration file. In that case, you make 1 allocation at the start of the program for the config file, then each monster holds a &str to that allocation. Having to make an allocation for the config file, doesn't mean you need to make, or hold an allocation for each thing that uses the config file. Consider writing a programming language implementation, with multiple parsing phases. The efficient thing to do is to make 1 String allocation at the start of the program for the source code, then a lex(&str) -> Vec, containing subslices of the original String buffer.
That was really well done. I thought 15min on this topic was going to be a slog, but it was a well motivated, well visualized example. You've earned a sub. Keep up the good work, Logan!
This to me seems like comparing apples and oranges. As you mentioned, Vec works well for a modifiable buffer. Yet, you do advocate for using a simple slice wrapped by Arc. This assumes you have the slice at compile time. How would you build your slice dynamically without Vec? It seems to me you would still need a Vec, which you can convert into [T] to wrap with Arc. Even worse, Arc is usually for multithreaded environments. Why not just use Rc? My point is, I don’t see this suggestion making any sense really, as these two type have very different specific use cases. The video was well made though, I appreciate the great effort.
vec![...].into() gives you an Arc, so it's one "clone", and from then on no expensive clones at all. So you build it initially with Vec, and then convert it. After this conversion, all the points in the video apply. Regarding Rc vs Arc 3:20
@@tourwithMarkay It would, but only in a single-threaded environment and only if there's an obvious owner that outlives the rest. Also, Rc/Arc don't require lifetime annotations (I don't mind these, but only for simple cases with temporary "local" borrowing)
Very nice video ! I love how you explain this. Can't wait your next topic. I shared it on Reddit and already lot of views and good feedbacks. Continue your work ;)
I'll go one further... monster name strings likely are static anyhow, and so you don't even need Arc or Box, you can just use str directly. Then clones don't even need to increment a reference count on the heap, you just copy the pointer.
Is not only a beautiful insight on the internals of memory allocation but also does an implacable job of explaining the topic in plain English so even entry level developers can understand the good, the bad and the ugly. Keep doing such a great job divulging such an awesome programming language !
I have a few concerns recommending this to a beginner "as a default". I feel like the times when you actually want to clone the arc, such as if you want to store the same list in multiple structs without dealing with lifetimes, are quite situational. Most of the time, what you should do is dereference it into a slice to pass around, because it is more performant and it is more general. But I am afraid that using an arc "as a default" would encourage a beginner to develop the bad habit of just cloning the arc everywhere. The need to pass an immutable reference/slice is not enforced by the compiler, but it is with other data types. Worse, this could give beginners a bad misunderstanding how clone works, because arc's clone is very different from most other data type's clone. Do we want the "default" to be the absolute easiest, absolute most general solution? Or do we want the default solution to be the one that enforces the best habits? I would argue for the latter.
@@rossjennings4755 I would say if you're using Box you might as well just use Vec, unless for some reason you want to guarantee that it will never be resized.
When you look at the Rust language survey, one bug hurdle always mentioned is the steep learning curve of Rust. Just using Arc for all Strings by default may alleviate that burden. Performance is at least on par with any GC'ed language with immutable Strings (e.g. Java) and those also run fast enough most of the time. And secondly, who is to say that all Rust programs must always be optimized for runtime performance? If you do some rapid development (i.e. optimizing for developer performance) in Rust, of course you can use Arc and then later on *if* the program is too slow you can still come back and optimize the critical parts. From that point of view, thinking about lifetimes a lot early in development, just to avoid the reference counting might even be considered a premature optimization.
@@thorjelly The video mentions immutable data, in which case it won't be resized. But yeah totally agree on what you said, the default good practice should be Vec/Box for the owner and &[T] for the readers, and only consciously opt for Rc when useful or necessary.
@@constantinhirsch7200 "who is to say that all Rust programs must always be optimized for runtime performance?". Logan Smith. Logan Smith is to say precisely that. The whole point of the video you just watch is to advocate that Arc is more performant at runtime than Vec, while being a drop in replacement. The gripe thorjelly and I have with it is that Arc is a lazy halfed-ass optimizations. If you want to delegate optimization for later, why touch the code at all, why learn smart pointers the wrong way where you could stick to cloning Strings ? Wouldn't that be premature optimization, or at least premature ? If you want to optimize, why use smart pointers when a slice reference is both enough and better ?
love it! i often do something similar with `&'a [T]` by allocating from an arena/bump allocator. (this also has the added benefit that the allocation truncation is free)
Thanks a bunch for the clarifications. Memory allocations are one of the major factors in shaping performance characteristics and understanding them may not always be an easy task. Your video and especially the visualization help a lot! Great work
Love this Logan. What a wonderful explanation and a good challenge to orthodoxy. I'll provide one answer the question that you posed a few times in the video, "Why use String (or Vec) rather than Arc?". That's because accessing the data from an Arc incurs some runtime cost to ensure that Rust's ownership semantics are upheld. That cost doesn't need to be paid by exclusively owned types.
Thanks for the kind words. :) Accessing an Arc incurs no runtime cost with regard to Rust's ownership rules. The runtime cost of accessing the pointed-to data is about the same as for Vec: a pointer indirection. Possibly you are thinking of RefCell? RefCell does involve some slight runtime overhead due to essentially enforcing Rust's borrow checking rules at runtime.
I think what would have helped me was a quick example of how you initialize an Rc, Arc, and Box. It's pretty obvious when the str is a compile time constant, but less obvious when it's from a runtime string. Do you simply create a String and then Arc::new on it? Does memory layout change when it's a compile-time vs runtime string?
Check the pinned comment! (Spoiler: it's Arc::from("foo"), or Arc::from(my_string)). Memory layout doesn't change, as they're both the same type (Arc).
As for Arc: Depending on the use case, you can just Box::leak() a String and pass around the &'static str. Typically, especially if used as IDs, the total number of such strings is low anyway.
Very neat insight! My experience with Arc so far has mostly been limited to either "fake garbage collection" with the Arc anti-pattern, or sharing immutable data between threads or async futures. I've tried avoiding cloning Vecs/Strings by passing around &[T] and &str references (and their &mut counterparts) but putting lifetime annotations in your hashmap keys is a nightmare.
@@blehbleh9283 I didn't say it was, I said fake GC was the antipattern, where you give up on handling lifetimes and overuse Rc and Arc in cases where it's not necessary.
Why would the clone performance of Arc be a factor? You get a pointer to the same exact slice. That's like taking an immutable reference to a Vec, which is faster. It does not fulfil the same role as a Vec clone, so it should not be compared to it. I also don't think your stack size and cache locality argument works for anything besides a static string slice. I can't imagine the semantic gymnastics needed to justify iterating over a significant number of Arc clones pointing to the same [T] in memory. In general I think you're making a different argument than you think you're making, and giving a different programming tip than you think you're giving.
I agree with your premise and the reasons you give from 12:45, but I found your main arguments kinda... odd? In your opening points you say this is especially useful for data that implements Clone, but the usage pattern you lay out explicitly involves not cloning the data. You clone Strings in the example, but there's clearly no reason to do that because the data is immutable - you're only cloning the Strings to get multiple references to that data. Passing around multiple references to a single piece of data is the whole point of Arc, so of course that is a better solution than duplicating the data to share it. It actually feels like that's the real reason you are wanting to use Arc, but it's not mentioned in the video. You do make a good point of explaining the inefficiency of Arc though. The example itself also strikes me as odd, because the ids are a struct which implements Clone and then to avoid the performance cost of cloning the ids all over the place you reach for Arc, when surely the more natural optimization is to avoid the unnecessary cloning by using a struct which implements Copy instead of Clone, say MonsterId(u64)? If you really need the string data for something other than simply being an id, then you can put that in the EnemyStats struct (which I would assume contains various other data you don't want to be copying around even if immutable). As I said though, I do agree with your overall point. Perhaps an example that used Vec would have cleared these points up, because although - as you quite rightly point out - String and Vec work in essentially the same way, they are quite distinct semantically in most situations. It would be obvious that calling clone on the (probably very long) enemies_spawned Vec is a bad idea, for example, even if this was immutable.
`str` can also be accessed across threads via `&str` (since its immutable). And cloning has no special properties I can think of here since the data is immutable. `Arc` only seems advantageous if you want reference counting vs relying on something like `static for a string. The video was fun either way -- but can you give a reason you'd prefer the Arc or Rc fat pointer to just referencing str?
@@RenderingUser just create a stuct that stores each row appended together in one long vec. Then store the width and height as usize. Finally add some methods that access a row or column at a time by slice. It will make your data access significantly faster.
I'm confused by the use case presented -- if you want cloneable, immutable string data, surely you'd just pass around indices into a big Vec, or even just &str's directly if the lifetimes allow it? Good video nonetheless.
Sure, you could always construct a Box and then Box::leak it to get an immortal &'static str if you're fine with never reclaiming the memory. This memory leak could become a problem if it's unbounded though. Imagine the game is able to spawn arbitrarily many monsters over time, creating more and more IDs. I'm assuming by immutable he meant "immutable as long as it's in use, but after that it gets deleted". If you want to reclaim memory by getting rid of unused IDs, the Vec strategy gets iffy. What if you want to delete an ID in the middle of the Vec? Not an unsolvable problem, but it's already getting much more complex than the simple MonsterID(String) we started with. Plus, if you actually want to access the string contents you need access to the Vec, so you need to pass around a reference to it. And if you're going multithreaded you need to protect it with a Mutex or similar. I'm not a fan.
@@iwikal Hmm, all true on paper. I would assume that, even in a very large game, all monster ID strings ever encountered during runtime are a finite set taking up at most 10kB or so in total, or maybe 1MB if we have very long text descriptions. If the game can dynamically generate large numbers of monster ID strings, or load/deload bigger data chunks, I'd try replacing the Vec with a HashMap or similar, though that gets awkward with multithreading for the same reason.
@@cookieshade197 If you leak all IDs and the game keeps allocating new ones, it will run out of memory sooner or later (potentially much later). Maybe you can get away with it if you assume that nobody will leave the game running for more than 24h straight, but what if it's a server? Ideally it should be able to handle years of uptime.
@@cookieshade197 To elaborate, what I mean is you can't always assume that there is a reasonably sized set of possible IDs, and even if there was you'd have to construct some kind of mechanism for reusing the old ones. Say we were talking about a ClientId instead, based partially on their IP address. It just seems wrong to me that I should let those stay around in memory after the connection is terminated, until the same client connects again. Maybe they never do, in which case the memory is wasted.
@@iwikal The issue isn't running out of memory. That is almost never going to happen in a real game. The issue is cache misses. You want to be able to perform operations on a large number of monsters every single frame, and every unnecessary byte (for the particular operation, hence games using data orientated design where "wasteful" copies of monsters using different structures are fine as long as only minimal related data is kept in context for each part of logic; it isn't about total memory usage in games, which is very counterintuitive to other domains) is another monster pushed off the cache.
great video. I'm learning Rust and this video is very helpful for understanding different ways of storaging data. I'm struggling with borrowing and ownership system but well I couldn't do any better
The actual amount of memory allocated to a String during clone operation is actually allocstor specific. For 6 byte string I would not be surprised to see allocator using 8 byte bucket. So there will always be a degree of waste when cloning strings/vecs.
Here we could just have a long String containing all monsters and then just slice the individual monsters. Would be most memory efficient (no additional allocation in average, only maybe at the end of all monsters, not sure how shrinken will work in the end). This optimizes memory usage, but the disadvantage could be that accessing one monster might end up that you would have to load two different cache lines what the usual allocation would prevent. Of course, on the other side, other monsters might already be in a recent cache line as compensation. I would guess that in average the speed should be faster (just less memory in total => higher chance of being L1/L2/L3 cached already), but some individual access might be slower id the &str happens to be located crossing a cache line. Disclaimer: I usually program in Python 😂 so might be slightly off here. But interestingly, Python has a similar singleton pattern optimization for symbol like strings, storing them globally in one place and then just pointering to them in a similar safe immutable way as Arc with the assumption that strings that look like symbols are likely to be reused unaltered. Of course, it's still way slower than any Rust code and significant small memory overhead, but still the same idea to have cloning them in O(1) time and space.
this was fun! its like when they say "make it work, then make it right, then make it fast". This is a really good example for what to do in that second or third step!
Started watching the video thinking Arc and Vec had totally different use cases, and I'm glad you proved me wrong lol very useful info when you're trying to implement memory efficient coding. Thanks for the data man, really interesting and useful stuff. Cheers!
I'm not fully convinced. I'd love to see a follow-up video about this. Here's my thoughts. When i first saw this pop up in my feed I was very confused because Arc is a wrapper to a pointer and Vec is a data structure so comparing an Arc to a Vec seems like an unfair comparison. It seems more appropriate to me to compare Arc to Arc and there's very little difference here, though i suppose specifically when dealing with strings it's not easy to get access and use the underlying vec, nonetheless, It makes more sense to me to compare the two. Until you brought up the fact Arc implements deref I was thinking it was all round acpointl idea but now I'm split on the issue. Something else to consider is ease of use which I dont think you addressed very well. Lifetimes will definitely come into play here but dont with String so it won't be just as easy to pass around at all. Another barrier is if you need to build the string at runtime you will normally end up with a vec anyway which could be shrunk to size and putting the vec behind an arc would achieve mostly the same thing, in comparison having an array pre-built at compile time is very rare in my experience. There are definitely extra steps and efforr involved here which I'm not convinced you have considered carefully. There is no built-in way to convert from a vec to an array, there are some useful crates but more libraries always mean more complexity in your codebase so they're best avoided adding without some consideration. I also think the performance benefits you state are very exhaggerated and It's never worth talking performance without having some benchmarks to back them up imo. Strings are rarely large too so the memory reduction might be there but it would be small, but once again there's not benchmarks to back any of this up so I don't know and I'm not set in either perspective. I'll keep an eye on your channel. I hope to see some follow-up!
It seems `Box` is like an immutable `String`, but even better because it lacks the capacity because it can’t ever allocate. In other words, if your `String` is not mutable, you should use `Box`. What am I missing?
Cloning a Box requires a deep clone of the string data. Cloning Arc only does a shallow clone and bumps the refcount. If you don't need clone, you're right (as I also mention at the end), Box is your best option. If you do, Arc can be better. Both are better (for immutable strings) than String.
A minor mistake at 14:28, Arc doesn't store length for sized data (for example, std::mem::size_of::() == std::mem::size_of::() ), only the pointer. But the main point about needless double indirection is of course still valid.
Very good explanation, keep in mind that when dealing with buffers, Tokio has an internal library called Bytes. Bytes goal is to provide a robust byte array structure for network programming. The biggest feature it adds over Vec is shallow cloning. In other words, calling clone() on a Bytes instance does not copy the underlying data. Very useful and necessary..
great video, im not the most familiar with rust but this explanation resonated with me but mostly all this made me think of is how Java's String is straight up immutable from the get-go 😂
In your final section about Arc, your diagram shows Arcs having ptr+len, but in this case String is a Sized type so the Arc only has ptr. Of course that doesn't undermine your point that Arc is just plain bad :)
- Need ownership over a T? use Box. - Need multi-ownership over a T? use Rc. - Need multi-ownership of a T across threads? use Arc. - Need an array of U? use [U;N] or &[U]. - Need a dynamically-sized array of U? use Vec. Substitute U and T as-needed.
I completely disagree. Here's why: * regarding the O(1) clone, this isn't directly comparable to taking a clone of a Vec, because cloning a Vec lets you modify/consume it. It's more comparable to just taking a shared reference, which is even cheaper than Arc.clone(). * Regarding the smaller stack size - this is plane false, as you forgot to count the two reference counts - strong and weak. Which makes it actually larger than a Vec. For most use cases, if you don't care about mutability you can just use a &[T]. Arc is useful when you write multithreaded code and need to deallocate the data sometime before the program end (otherwise you could just use a &'static [T]).
The reference counts are stored on the heap, so the stack size is smaller. Also he was talking specifically about instances where you don't know the lifetime of the object when you compile, which is what you need to use references.
Hello, the video is great and I really like the point you are making, especially the cash proximity. I will definitely give this a try at some point. Even with that though, I have a few questions. By wrapping the values in Arc, you are effectively turning clones into references without lifetimes. I understand that sometimes its better and easier to work with the full owned value, but if you need that, you can just clone the shared reference on demand. I don't know why, but this feels a bit like Swift to me. Rust has the advantage of the ownership model so if you can do the job just with shared references, I don't see the need for Arc. But of course I could be wrong so please correct me if that's the case.
I think it's really insightful for you to compare Arc to something you might find in Swift. Arc does share a lot of similarities with Swift's copy-on-write Array struct, and Foundation's NSArray (where `-copy` just gives you back the same array with an increased reference count). The core insight is the same: for an immutable data structure, a shallow copy is equivalent to a deep copy. Rust's superpower is of course that you can hand out non-owning &[T]s and be sure you aren't writing dangling reference bugs. And the video does not intend to dispute that! You should absolutely design your code to hand out references instead of ownership where it makes sense to do so. In my video, I'm just pointing out that Arc can be an optimization over Vec when you were already using Vec--in other words, in places where you've already decided that you need to give out ownership.
Awesome! Thank you for this explanation. I’ve heard bits and pieces of this before and it was making sense that I should start doing this as I am learning rust… but this one video gave me a ton of context; I think I’m actually going to do this as a reactor phase now 😊
I wonder if we could save some cycles by instead having the `ptr` at 10:00 point directly to the data instead of needing to offset. It would require a negative offset for accessing strong and weak but that's much rarer than Deref.
6:55 As far as I can tell, it generally doesn't allocate extra space if you create the whole string at once with String::from (and ofc String::with_capacity). Which actually seems a bit odd, afaik most allocators only give out 8 byte aligned regions so it would make sense if String just took the rest as well. Though I guess in that case, a realloc that stays below the actual region size probably also would be free.
The allocator API atm requires that you pass in the same layout for deallocation as you got for allocation. Excess capacity info is otherwise lost for Arc and so the conversion from String does a clone iff len ≠ cap. In other words, this exact allocation attitude works well with converting to smart pointers that treat length and capacity as equal
@@1vader I mean smart pointers and how they interact with allocators is a specific reason for String, and the global allocator generally, to allocate exact space instead of a multiple of 8.
@@Kupiakos42 I'm not really sure I understand what you mean. It sounds like you're saying it's so that the String allocation can just be re-used for the Arc allocation during conversion from String to Arc but that doesn't seem possible since Arc needs extra stuff at the front of the allocation. And you also said in your initial comment that a clone is done iff len = capacity? But shouldn't it be the other way around?
One thing to note about Box is that if you're trying to basically allocate a massive array on the heap, you'll hit on one of its fundamental problems, that being Box first allocated on the stack and only then copies stuff onto the heap. This results in very easy stack overflows when you're supposedly allocating on heap, unwittingly overflowing the stack in the process of trying to Box an array of a size too massive for the small default stack size rust has.
Box is a reasonable default for immutable data. It's also smaller than either Vec. If you need Clone, use &[T]. Reference counting in 90% of the cases is a consequence of poor (or lack of) design.
@@iilugs Some sort of cloning is almost always needed. Rc is useful when you don't know what releases the last reference and you often either do or can rewrite in a way that you will so that all other references can be counted at compile time.
I recently tried using the Arc in my recent project which was a websocket chat-app and there were too many clones of a string to send to every channel. The problem with using this type is that it is not serializable or deserializable by serde and serde derive does not work on it.
Arc works just fine with Serde, you just need to flip on a Serde feature flag. I mentioned this in the video and there's a link to the feature flag docs in the description. :)
Great tutorial. One thing I was thinking about recently is the overuse of Result - not all functions are fallible, yet many unnecessarily return Result instead of just a value for the infallible functions. I think everyone just got used to returning Result... Worth looking into. Also worth a clippy lint if there isnt one for this. For an API, it should always be Result oc, but we're often not developing apis
For short strings the smartstring library is even better: it stores strings of up to 23 bytes length in-place without any heap allocations, and imitates the String type's interface. Basically like smallvec but for strings.
Unless you’re planning on generating monster IDs at runtime why not just drop the ARC and use &’static str? Or for that matter why not use a unique zero width type and make MonsterID a trait?
The ustr crate also provides a convenient _leaked_ string type with O(1) comparison time a O(n) construction. If the variants aren't known at compile time but don't need freeing after they're known, it can be a good approach.
If you need a even more powerful version of Arc that's basically a mini version control systems, consider immutable data structures like those in "im" crate.
Correct me if i'm wrong, but if you want just some immutable data that can be passed around, you could just create a regular array, then drain it, and you'll get a 'static array slice. With that you can do anything.
Shared-ownership immutable string with one indirection performance is somewhat impressively clever. However its benefits *_really_* shine only (1) when you do tons and tons of copy of long permanent text and (2) when your design desperately can't attach a string ownership with anything else in the same scope at all, which make that scenario kind of *_unlikely_* TBH. Moreover this may encourage so-called pre-mature optimization that gains unnecessary complexity to your design with a little speed gain. Kudos to very clear beautiful animation. Thank you.
Good food for thought and illustrations, but I very much wish you would use Rc instead of Arc in most of this, and then showed folks how to determine if you actually need to "upgrade" to Arc when necessary. Healthier practice for the whole Rust ecosystem to not default to thread-safe types & operations when not actually necessary. We'll all pay with decreased performance of the software we use proportionately to how much thread-safe code is overused. 🙂
The sus thing about Arc is that it still has an underlying mutable state that needs to be synchronized across threads (the reference counters which are atomic variables) and will thus introduce memory barriers (the cpu will try to make sure that all writes done before changing an atomic variable affect all reads that happen after the new atomic variable's value is read on another thread, even if they are unrelated). Copying doesn't do that, so in some high concurrency scenario (and if the string/vec isn't very large) you may find that the cost of copying the string or array (usually an aligned memcpy) is smaller than the cost of synchronization. Rc doesn't have that though.
First of, just wow. What a nugget. Id like to see more such generic stuff. Rust is still not decades long, so best practices are not fully known. If you want ur videos even better here is my advice: 1. Include some running code snippets where you maybe measure it, break it purposefully. Taking no more then 10 lines of code and operate on it. For example try with reference..include scenario where reference wont cut it, put Rc. Introduce threading, break it. Add Arc. Put it in struct, etc.. 2. Keep video no longer then 5 minutes ideally, but pack them with such nuggets. If you need more then 5 minutes to explain some concept, that deserves a video on its own,. 3. Enjoy ur million subs. I might be wrong about all of it, but its my humble advice. :)
I appreciate the kind words and feedback! Clearly I'm just getting started and I'm still figuring out what works, so I'll definitely keep these suggestions in mind. :)
Box should be the default. Rc, Arc only if its shared which Vec wouldnt be for example so very different use case. Sadly both on linux and windows Box visualizer is broken in the debugger. Arc, Rc of slices have workinv debugger visualizers.
14:46 "You cannot do any better than this" Of course you can, you can use `&str`. If the list of monster names is known at compile time, you can even use `&'static str`.
Arc can't be a thing or I'm missing something: help: the trait `Sized` is not implemented for `str` note: required by a bound in `Arc::::new` Maybe he wanted to refer to &str?
I definitely did mean Arc, not Arc (something it’s hard to imagine a use for). You’re correct that you can’t create an Arc with new()-typically you’d use e.g. .into() on a String. Check the pinned comment.
@@_noisecode so the idea is to have a owned value first like Vec or String, and transform that value later to Arc/Rc. If we're going to do cloning, it's worth it as you explained. But if we don't... is still worthy?
As a person relatively new to Rust, I kept thinking but Vec has the macro vec! for ease. And an Arc might not be as ergonomic to get in place. It would have been nice if that pre -step was indulged. Because you might reach for a Vec based on the ease vec! provides. So helpful though and was a great learning tool and fun watch.
That's a great point, and one I probably should have mentioned in the video! Thankfully, Arc is pretty easy to create: it implements From, so you can create one with `vec![1, 2, 3].into()`, and it also implements FromIterator, so you can create one by `.collect()`ing an iterator just like you would any other collection.
Since the video was all about golfing unnecessary allocations etc., I should also mention that creating an Arc often involves one more memory allocation + memcpy than creating the equivalent Vec would have. There's some well-documented fine print here: doc.rust-lang.org/std/sync/struct.Arc.html#impl-FromIterator%3CT%3E-for-Arc%3C%5BT%5D%3E
@@_noisecode Is it the same as Vec::into_boxed_slice, which only re-allocates if the vec has excess capacity? Arc implements From without re-allocation.
It's one more allocation, even if the Vec doesn't have any excess capacity, since in general Arc needs to move the data into its own allocation containing the reference count info. For the record, `From for Arc` does in fact allocate (see the implementation here: doc.rust-lang.org/src/alloc/sync.rs.html#1350 ).
How does From compare to simply using into_boxed_slice and using Box::leak instead?
@Ayaan K yes
It occurs to me that your use case for an Arc could potentially be better served by a &'static str or just an enum. If you have an in-game level editor that allows creation of new monster types, Arc would be ideal, but in most cases, the entire list of monsters that will ever exist is known at compile time. If you use an enum for the monster types, you can still derive all the times you were deriving before, with some help from the strum crate or similar you can implement as_str with custom strings containing spaces etc. very easily, your memory footprint is a _single_ word, you can #[derive(Copy)] meaning cloning is effectively instaneous, and as a bonus, you don't need a hashmap to keep track of monsters killed or enemy stats -- just declare the enum as #[repr(usize)] and use it as the index into a Vec, or better, an array.
So much this. Hashing and string comparisons seem super overkill for a closed set known at compile time, and even if it is extensible in the editor, it still seems better to have the actual ids be a `u16` or w/e and the actual names interned in a global vec somewhere. Most operations don't care about the name! (possibly a tuple of type and instance, if you find yourself runtime spawning "guard1" "guard2" "guard3" etc)
Thanks for mentioning this, and yes, I couldn't _possibly_ agree more that if you have a closed set of variants known at compile time, please, please use an enum--as you say, it is better in every conceivable way than making your IDs "stringly-typed", especially with the help of e.g. `strum` to get you the string versions if you do need them.
Sometimes you do need actual dynamic strings though, and they follow a create-once-clone-often usage pattern like what I show in the video. In those cases, I believe my arguments for using Arc over String hold. For what it's worth, the real-world code that inspired the MonsterId in this video actually _was_ an ID that was loaded from a configuration file at runtime, and so there wasn't a closed set of variants known at compile time.
Personally in that circumstance I'd still be tempted to leak a `&'static [&'static str]`, unless you're reloading the config file _frequently_ or using those string ids over the network or something. But definitely makes more sense in that instance!
Make struct MonsterID(u16), and custom constructors for it that maintain actual names in a global vec, all behind rwlock. To log you can dereference to the actual location with string data, other "normal" uses can be all in u16. then all your game logic is just moving u16 around, no pointers or anything.
I can envision the scenario where you load a level from a file, then you either have a &'file str which you might not want to or clone the string once, which is really not that expensive.
Another benefit of of Box is that the characters are actually mutable, even though the length isn't. So you can convert the string to uppercase or lowercase if you need to.
This makes me wonder if there are any unicode characters where the uppercase and lowercase versions take up different numbers of bytes. I imagine if you add diacritics you might find a situation where one version has a single unicode code point, while the other needs two.
@@Tumbolisu Unicode groups characters from the same alphabet together, so I think this is unlikely to ever happen
@@michawhite7613 I actually just found an example. U+1E97 (Latin Small Letter T With Diaeresis) does not have an uppercase version, which instead is U+0054 (Latin Capital Letter T) combined with U+0308 (Combining Diaeresis).
@@michawhite7613 Oh and how could I forget! ß is one byte while ẞ is two bytes. The larger ẞ was only introduced into the German language a few years back, while the smaller ß is ancient.
@@Tumbolisu You're right. The functions I'm thinking of are actually called `make_ascii_lowercase` and `make_ascii_uppercase`
Nice, did not expect a full 10+ minutes advocating for Arc over Vec on my recommendations. You deserve way more subscribers.
you too :-)
For high performance code, Arc is not cheap. Cloning a small string may actually be faster depending on your cpu topology and memory access patterns. As always, measure first.
Great point
Having small strings does not make your program high performance
A modern CPU (with avx-512 ext) can handle up to 64 bytes at the same time, nearly instantaneously. However, that's only for modern CPUs.
Thus, if you know your architecture that you're running on, you can do pretty neat optimization
what you mean by cloning small string on stack? Strings are on heap right? so cloning will occur on heap only!
@@warriorblood92 strings can very much be on the stack. A "str" is stored on the stack (well actually it's stored in a read only part of the memory, different from the stack but it functions effectively the same), and you can very much clone them to the stack (even if rust makes it rather difficult to do so).
Arc/Rc work best when you don't really know the lifetimes of your data, but if your program is structured in a way that makes lifetimes obvious (say, loading data at the start of a block of code and reusing that), then you can use normal &'a [T] references and get the same benefits of cheap-to-copy immutable data that can be shared between threads, and doesn't even require a pointer indirection on clone!
How can you share state between threads without Arc?
@@BigCappuh Scoped threads can capture shared references.
who care about this dog shitf? your code will be thrown to the trash can and replaced by AI anyways
Rust's Arc is close to Java's default String type: Both are immutable, both will be automatically freed when no one has a reference anymore. Rust's String is more close to Java's StringBuilder.
I see this as further validation that Arc is in fact quite a sane type to use in many situations.
Some even argued that Rust's `String` should have been named `StrBuf` for this reason, like how we have `Path` and `PathBuf`.
@@铜羅衛門 That would have been amazing....
As a core developer/moderator for Manim, it makes me happy to randomly find Manim-related videos in my recommended. Great job with the explanation of which data structure to use when mutability is(n't) required, and great visuals too!
manim-rs when /j
@@aemogiesomebody is probably working on it somewhere
Small correction. With Arc/Arc, the Arc pointer itself only stores 1 word, not 2, as the length is stored in the boxed cell on the heap in the String/Vec object, and String/Vec is Sized, unlike str/[T]. This can be potentially useful if space is at the most price, but you can also use a thin Box/Rc/Arc, which isn't available in standard library yet (ThinBox is in alloc, but it's unstable), which stores the length (and maybe capacity) directly next to the data, keeping the pointer single word.
Was just about to comment this. Also, hey man, didn't expect to run into you 😂
Very good video, the advocated point is really useful indeed. I only have 2 nitpicks about it:
- It addresses less experienced Rust developers, but you forgot to mention how to construct values of these types (not that it's exceedingly complicated). A pinned comment might help in that regard (since with the algo apparently taking a liking to it, you might get spammed with questions about construction.
- I would generally never recommend `Rc`, since `Arc` works using relaxed atomic operations, which have no overhead compared to their non-atomic counterparts. And while the MESI protocol may cause cache misses when accessing the cache line where the counts have been updated, this is not relevant when working in a single-threaded environment. So in general, `Rc` and `Arc` have identical runtime costs (not just similar), making using `Rc` useful only when you want to semantically prevent its content's ownership from being shared across threads, without preventing the content from being punctually shared across threads.
Great feedback, thank you. I think you're right and I went ahead and pinned the existing discussion of how to create an Arc--I agree I should have mentioned it explicitly in the video itself. Live and learn. :)
As for Rc vs. Arc, your point is well made, but I think I will stick to my guns on recommending Rc where possible. Even if there are expert-only reasons to be sure there is no practical performance difference, this runs counter to the official guidance from the Rust standard library documentation which states that there may indeed be a performance difference (doc.rust-lang.org/std/sync/struct.Arc.html#thread-safety), and aside from performance alone, I would argue that the semantic argument is enough. If I know my type is not meant to be shared across threads, I ought to use the least powerful tool for the job (Rc) that allows me to accomplish that.
Arc's clone uses Relaxed, but its drop does not (it uses Release). In any case the atomic increment in clone is going to be more expensive than a non-atomic increment whether it's relaxed or not. Possibly you are thinking about relaxed atomic loads/stores, which are typically no more expensive than regular loads/stores.
@@zuberdave Great point!
Commenters have pointed it out somewhat, but this video represents a misunderstanding of the purpose of different types. What you want here is a &[T] not an Arc. The confusion is sometimes you feel forced to make an allocation, cause you're doing something like giving monsters ids from a loaded configuration file. In that case, you make 1 allocation at the start of the program for the config file, then each monster holds a &str to that allocation. Having to make an allocation for the config file, doesn't mean you need to make, or hold an allocation for each thing that uses the config file. Consider writing a programming language implementation, with multiple parsing phases. The efficient thing to do is to make 1 String allocation at the start of the program for the source code, then a lex(&str) -> Vec, containing subslices of the original String buffer.
I think this is the best explanation of Rust's internal memory management I've seen so far. Well done Sir!
The info graphics for each explanation is expertly simple and straight forward, never change them.
That was really well done. I thought 15min on this topic was going to be a slog, but it was a well motivated, well visualized example.
You've earned a sub. Keep up the good work, Logan!
This to me seems like comparing apples and oranges. As you mentioned, Vec works well for a modifiable buffer. Yet, you do advocate for using a simple slice wrapped by Arc. This assumes you have the slice at compile time. How would you build your slice dynamically without Vec? It seems to me you would still need a Vec, which you can convert into [T] to wrap with Arc. Even worse, Arc is usually for multithreaded environments. Why not just use Rc? My point is, I don’t see this suggestion making any sense really, as these two type have very different specific use cases. The video was well made though, I appreciate the great effort.
vec![...].into() gives you an Arc, so it's one "clone", and from then on no expensive clones at all.
So you build it initially with Vec, and then convert it. After this conversion, all the points in the video apply.
Regarding Rc vs Arc 3:20
@@iilugs it sounds to me like simply borrowing would do the trick
@@tourwithMarkay It would, but only in a single-threaded environment and only if there's an obvious owner that outlives the rest. Also, Rc/Arc don't require lifetime annotations (I don't mind these, but only for simple cases with temporary "local" borrowing)
@@expurple Using a non mutable reference is perfectly fine for as many threads as you wish. Arc isn't some magic savior.
Your channel is going to explode if you keep doing videos like this one.
There's a disadvantage of using Rc/Arc though: these types are not serializable, while String is.
As I mentioned in the video, there's a serde feature flag that enables support for Rc/Arc. Check the docs.
Very nice video ! I love how you explain this. Can't wait your next topic. I shared it on Reddit and already lot of views and good feedbacks. Continue your work ;)
I'll go one further... monster name strings likely are static anyhow, and so you don't even need Arc or Box, you can just use str directly. Then clones don't even need to increment a reference count on the heap, you just copy the pointer.
Is not only a beautiful insight on the internals of memory allocation but also does an implacable job of explaining the topic in plain English so even entry level developers can understand the good, the bad and the ugly. Keep doing such a great job divulging such an awesome programming language !
i think u meant to say impeccable
@@nel_tu_ Yup, sorry about the typo
Just making this change on a large vec in my program resulted in a 5x speed up for me. Thanks for the video!
Nice video, and very interesting take. I am going to give this pattern a try in some of my code and see how it goes. Thanks for this great video!
I have a few concerns recommending this to a beginner "as a default". I feel like the times when you actually want to clone the arc, such as if you want to store the same list in multiple structs without dealing with lifetimes, are quite situational. Most of the time, what you should do is dereference it into a slice to pass around, because it is more performant and it is more general. But I am afraid that using an arc "as a default" would encourage a beginner to develop the bad habit of just cloning the arc everywhere. The need to pass an immutable reference/slice is not enforced by the compiler, but it is with other data types. Worse, this could give beginners a bad misunderstanding how clone works, because arc's clone is very different from most other data type's clone. Do we want the "default" to be the absolute easiest, absolute most general solution? Or do we want the default solution to be the one that enforces the best habits? I would argue for the latter.
So what you're saying is that we should be recommending Box as default, then. Makes sense to me.
@@rossjennings4755 I would say if you're using Box you might as well just use Vec, unless for some reason you want to guarantee that it will never be resized.
When you look at the Rust language survey, one bug hurdle always mentioned is the steep learning curve of Rust. Just using Arc for all Strings by default may alleviate that burden. Performance is at least on par with any GC'ed language with immutable Strings (e.g. Java) and those also run fast enough most of the time.
And secondly, who is to say that all Rust programs must always be optimized for runtime performance? If you do some rapid development (i.e. optimizing for developer performance) in Rust, of course you can use Arc and then later on *if* the program is too slow you can still come back and optimize the critical parts. From that point of view, thinking about lifetimes a lot early in development, just to avoid the reference counting might even be considered a premature optimization.
@@thorjelly The video mentions immutable data, in which case it won't be resized. But yeah totally agree on what you said, the default good practice should be Vec/Box for the owner and &[T] for the readers, and only consciously opt for Rc when useful or necessary.
@@constantinhirsch7200 "who is to say that all Rust programs must always be optimized for runtime performance?". Logan Smith. Logan Smith is to say precisely that. The whole point of the video you just watch is to advocate that Arc is more performant at runtime than Vec, while being a drop in replacement.
The gripe thorjelly and I have with it is that Arc is a lazy halfed-ass optimizations. If you want to delegate optimization for later, why touch the code at all, why learn smart pointers the wrong way where you could stick to cloning Strings ? Wouldn't that be premature optimization, or at least premature ? If you want to optimize, why use smart pointers when a slice reference is both enough and better ?
Really awesome insight! Please continue making these.
love it!
i often do something similar with `&'a [T]` by allocating from an arena/bump allocator. (this also has the added benefit that the allocation truncation is free)
Thanks a bunch for the clarifications. Memory allocations are one of the major factors in shaping performance characteristics and understanding them may not always be an easy task. Your video and especially the visualization help a lot! Great work
Love this Logan. What a wonderful explanation and a good challenge to orthodoxy. I'll provide one answer the question that you posed a few times in the video, "Why use String (or Vec) rather than Arc?". That's because accessing the data from an Arc incurs some runtime cost to ensure that Rust's ownership semantics are upheld. That cost doesn't need to be paid by exclusively owned types.
Thanks for the kind words. :)
Accessing an Arc incurs no runtime cost with regard to Rust's ownership rules. The runtime cost of accessing the pointed-to data is about the same as for Vec: a pointer indirection. Possibly you are thinking of RefCell? RefCell does involve some slight runtime overhead due to essentially enforcing Rust's borrow checking rules at runtime.
@@_noisecode Oof, I knew that I should have looked that up. You're right
I think what would have helped me was a quick example of how you initialize an Rc, Arc, and Box. It's pretty obvious when the str is a compile time constant, but less obvious when it's from a runtime string. Do you simply create a String and then Arc::new on it? Does memory layout change when it's a compile-time vs runtime string?
Check the pinned comment! (Spoiler: it's Arc::from("foo"), or Arc::from(my_string)). Memory layout doesn't change, as they're both the same type (Arc).
I don't typically comment on videos. I have to say this was really well made, please keep up this level of content. I really enjoyed it.
As for Arc: Depending on the use case, you can just Box::leak() a String and pass around the &'static str. Typically, especially if used as IDs, the total number of such strings is low anyway.
Very neat insight! My experience with Arc so far has mostly been limited to either "fake garbage collection" with the Arc anti-pattern, or sharing immutable data between threads or async futures. I've tried avoiding cloning Vecs/Strings by passing around &[T] and &str references (and their &mut counterparts) but putting lifetime annotations in your hashmap keys is a nightmare.
How is that an antipattern for async shared state?
@@blehbleh9283 I didn't say it was, I said fake GC was the antipattern, where you give up on handling lifetimes and overuse Rc and Arc in cases where it's not necessary.
@@iamtheV0RTEX oh okay! Thanks for teaching
Why would the clone performance of Arc be a factor? You get a pointer to the same exact slice. That's like taking an immutable reference to a Vec, which is faster. It does not fulfil the same role as a Vec clone, so it should not be compared to it.
I also don't think your stack size and cache locality argument works for anything besides a static string slice. I can't imagine the semantic gymnastics needed to justify iterating over a significant number of Arc clones pointing to the same [T] in memory.
In general I think you're making a different argument than you think you're making, and giving a different programming tip than you think you're giving.
I really admire your dense, concise way to "think" in Rust 🤓
I agree with your premise and the reasons you give from 12:45, but I found your main arguments kinda... odd? In your opening points you say this is especially useful for data that implements Clone, but the usage pattern you lay out explicitly involves not cloning the data. You clone Strings in the example, but there's clearly no reason to do that because the data is immutable - you're only cloning the Strings to get multiple references to that data. Passing around multiple references to a single piece of data is the whole point of Arc, so of course that is a better solution than duplicating the data to share it. It actually feels like that's the real reason you are wanting to use Arc, but it's not mentioned in the video. You do make a good point of explaining the inefficiency of Arc though.
The example itself also strikes me as odd, because the ids are a struct which implements Clone and then to avoid the performance cost of cloning the ids all over the place you reach for Arc, when surely the more natural optimization is to avoid the unnecessary cloning by using a struct which implements Copy instead of Clone, say MonsterId(u64)? If you really need the string data for something other than simply being an id, then you can put that in the EnemyStats struct (which I would assume contains various other data you don't want to be copying around even if immutable).
As I said though, I do agree with your overall point. Perhaps an example that used Vec would have cleared these points up, because although - as you quite rightly point out - String and Vec work in essentially the same way, they are quite distinct semantically in most situations. It would be obvious that calling clone on the (probably very long) enemies_spawned Vec is a bad idea, for example, even if this was immutable.
loved the video, and loved the discussions in the comments too. really appreciate it as the rust beginner, keep it up!
Awesome video and narration! Always exciting to see well-explained Rusty content. Keep it up!
`str` can also be accessed across threads via `&str` (since its immutable). And cloning has no special properties I can think of here since the data is immutable. `Arc` only seems advantageous if you want reference counting vs relying on something like `static for a string. The video was fun either way -- but can you give a reason you'd prefer the Arc or Rc fat pointer to just referencing str?
The visuals were fantastic in guiding me through your explanation, great stuff!
I would also mention `.as_ref()` as some impl types require the exact `str` type. Great video!
This could not have come at a more perfect time. I've been storing a list of a list of immutable data with a thousand elements in a vec
You should not be storing 2 d arrays, switch to contiguous storage and store the dimensions. ndarray might be an option
@@ronniechowdhury3082 I wish I knew what contiguous storage means.
@@RenderingUser just create a stuct that stores each row appended together in one long vec. Then store the width and height as usize. Finally add some methods that access a row or column at a time by slice. It will make your data access significantly faster.
I'm confused by the use case presented -- if you want cloneable, immutable string data, surely you'd just pass around indices into a big Vec, or even just &str's directly if the lifetimes allow it?
Good video nonetheless.
Sure, you could always construct a Box and then Box::leak it to get an immortal &'static str if you're fine with never reclaiming the memory. This memory leak could become a problem if it's unbounded though. Imagine the game is able to spawn arbitrarily many monsters over time, creating more and more IDs. I'm assuming by immutable he meant "immutable as long as it's in use, but after that it gets deleted". If you want to reclaim memory by getting rid of unused IDs, the Vec strategy gets iffy. What if you want to delete an ID in the middle of the Vec? Not an unsolvable problem, but it's already getting much more complex than the simple MonsterID(String) we started with. Plus, if you actually want to access the string contents you need access to the Vec, so you need to pass around a reference to it. And if you're going multithreaded you need to protect it with a Mutex or similar. I'm not a fan.
@@iwikal Hmm, all true on paper. I would assume that, even in a very large game, all monster ID strings ever encountered during runtime are a finite set taking up at most 10kB or so in total, or maybe 1MB if we have very long text descriptions. If the game can dynamically generate large numbers of monster ID strings, or load/deload bigger data chunks, I'd try replacing the Vec with a HashMap or similar, though that gets awkward with multithreading for the same reason.
@@cookieshade197 If you leak all IDs and the game keeps allocating new ones, it will run out of memory sooner or later (potentially much later). Maybe you can get away with it if you assume that nobody will leave the game running for more than 24h straight, but what if it's a server? Ideally it should be able to handle years of uptime.
@@cookieshade197 To elaborate, what I mean is you can't always assume that there is a reasonably sized set of possible IDs, and even if there was you'd have to construct some kind of mechanism for reusing the old ones. Say we were talking about a ClientId instead, based partially on their IP address. It just seems wrong to me that I should let those stay around in memory after the connection is terminated, until the same client connects again. Maybe they never do, in which case the memory is wasted.
@@iwikal The issue isn't running out of memory. That is almost never going to happen in a real game. The issue is cache misses. You want to be able to perform operations on a large number of monsters every single frame, and every unnecessary byte (for the particular operation, hence games using data orientated design where "wasteful" copies of monsters using different structures are fine as long as only minimal related data is kept in context for each part of logic; it isn't about total memory usage in games, which is very counterintuitive to other domains) is another monster pushed off the cache.
Fantastic video, wow!
great video. I'm learning Rust and this video is very helpful for understanding different ways of storaging data. I'm struggling with borrowing and ownership system but well I couldn't do any better
The actual amount of memory allocated to a String during clone operation is actually allocstor specific. For 6 byte string I would not be surprised to see allocator using 8 byte bucket. So there will always be a degree of waste when cloning strings/vecs.
Here we could just have a long String containing all monsters and then just slice the individual monsters.
Would be most memory efficient (no additional allocation in average, only maybe at the end of all monsters, not sure how shrinken will work in the end).
This optimizes memory usage, but the disadvantage could be that accessing one monster might end up that you would have to load two different cache lines what the usual allocation would prevent. Of course, on the other side, other monsters might already be in a recent cache line as compensation. I would guess that in average the speed should be faster (just less memory in total => higher chance of being L1/L2/L3 cached already), but some individual access might be slower id the &str happens to be located crossing a cache line.
Disclaimer: I usually program in Python 😂 so might be slightly off here. But interestingly, Python has a similar singleton pattern optimization for symbol like strings, storing them globally in one place and then just pointering to them in a similar safe immutable way as Arc with the assumption that strings that look like symbols are likely to be reused unaltered. Of course, it's still way slower than any Rust code and significant small memory overhead, but still the same idea to have cloning them in O(1) time and space.
this was fun! its like when they say "make it work, then make it right, then make it fast". This is a really good example for what to do in that second or third step!
Started watching the video thinking Arc and Vec had totally different use cases, and I'm glad you proved me wrong lol very useful info when you're trying to implement memory efficient coding. Thanks for the data man, really interesting and useful stuff. Cheers!
Arc is sort of similar to how a lot of other languages like Java or C# handle strings, isn't it?
I’ve turned on the bell notification. Also, happy to pay for a cheat sheet on memory allocation recommendations.
this was very informative. we need more rust golfing vids on youtube!
Your graphics and script are a masterpiece
I'm not fully convinced. I'd love to see a follow-up video about this. Here's my thoughts.
When i first saw this pop up in my feed I was very confused because Arc is a wrapper to a pointer and Vec is a data structure so comparing an Arc to a Vec seems like an unfair comparison. It seems more appropriate to me to compare Arc to Arc and there's very little difference here, though i suppose specifically when dealing with strings it's not easy to get access and use the underlying vec, nonetheless, It makes more sense to me to compare the two.
Until you brought up the fact Arc implements deref I was thinking it was all round acpointl idea but now I'm split on the issue.
Something else to consider is ease of use which I dont think you addressed very well.
Lifetimes will definitely come into play here but dont with String so it won't be just as easy to pass around at all. Another barrier is if you need to build the string at runtime you will normally end up with a vec anyway which could be shrunk to size and putting the vec behind an arc would achieve mostly the same thing, in comparison having an array pre-built at compile time is very rare in my experience. There are definitely extra steps and efforr involved here which I'm not convinced you have considered carefully. There is no built-in way to convert from a vec to an array, there are some useful crates but more libraries always mean more complexity in your codebase so they're best avoided adding without some consideration.
I also think the performance benefits you state are very exhaggerated and It's never worth talking performance without having some benchmarks to back them up imo. Strings are rarely large too so the memory reduction might be there but it would be small, but once again there's not benchmarks to back any of this up so I don't know and I'm not set in either perspective.
I'll keep an eye on your channel. I hope to see some follow-up!
It seems `Box` is like an immutable `String`, but even better because it lacks the capacity because it can’t ever allocate.
In other words, if your `String` is not mutable, you should use `Box`. What am I missing?
Cloning a Box requires a deep clone of the string data. Cloning Arc only does a shallow clone and bumps the refcount. If you don't need clone, you're right (as I also mention at the end), Box is your best option. If you do, Arc can be better. Both are better (for immutable strings) than String.
A minor mistake at 14:28, Arc doesn't store length for sized data (for example, std::mem::size_of::() == std::mem::size_of::() ), only the pointer. But the main point about needless double indirection is of course still valid.
Very good explanation, keep in mind that when dealing with buffers, Tokio has an internal library called Bytes.
Bytes goal is to provide a robust byte array structure for network programming.
The biggest feature it adds over Vec is shallow cloning. In other words, calling clone() on a Bytes instance does not copy the underlying data.
Very useful and necessary..
Thank you for this insight! I’d never think to use Arc instead of Vec, probably use Criterion to see performance timing between both
14:20 isn't totally right: Arc doesn't need to carry a length; it's a thin pointer
Arc is a godsend for concurrency
sanest rust dev
Finally someone else normal in the comments.
Everything about this is wild and I love it.
You know that Rust is healthy as a language when there are videos about it that only rustaceans could fully understand and make use of
great video, im not the most familiar with rust but this explanation resonated with me
but mostly all this made me think of is how Java's String is straight up immutable from the get-go 😂
Rust's str literal is also immutable.
In your final section about Arc, your diagram shows Arcs having ptr+len, but in this case String is a Sized type so the Arc only has ptr. Of course that doesn't undermine your point that Arc is just plain bad :)
Ack, you're right! That Arc pointing to the String should be just a single pointer, no len. Thanks for pointing that out! My mistake.
Best Rust channel on YT! Super high quality!
- Need ownership over a T? use Box.
- Need multi-ownership over a T? use Rc.
- Need multi-ownership of a T across threads? use Arc.
- Need an array of U? use [U;N] or &[U].
- Need a dynamically-sized array of U? use Vec.
Substitute U and T as-needed.
I completely disagree. Here's why:
* regarding the O(1) clone, this isn't directly comparable to taking a clone of a Vec, because cloning a Vec lets you modify/consume it. It's more comparable to just taking a shared reference, which is even cheaper than Arc.clone().
* Regarding the smaller stack size - this is plane false, as you forgot to count the two reference counts - strong and weak. Which makes it actually larger than a Vec.
For most use cases, if you don't care about mutability you can just use a &[T]. Arc is useful when you write multithreaded code and need to deallocate the data sometime before the program end (otherwise you could just use a &'static [T]).
The reference counts are stored on the heap, so the stack size is smaller. Also he was talking specifically about instances where you don't know the lifetime of the object when you compile, which is what you need to use references.
Hello, the video is great and I really like the point you are making, especially the cash proximity. I will definitely give this a try at some point. Even with that though, I have a few questions.
By wrapping the values in Arc, you are effectively turning clones into references without lifetimes. I understand that sometimes its better and easier to work with the full owned value, but if you need that, you can just clone the shared reference on demand.
I don't know why, but this feels a bit like Swift to me. Rust has the advantage of the ownership model so if you can do the job just with shared references, I don't see the need for Arc. But of course I could be wrong so please correct me if that's the case.
I think it's really insightful for you to compare Arc to something you might find in Swift. Arc does share a lot of similarities with Swift's copy-on-write Array struct, and Foundation's NSArray (where `-copy` just gives you back the same array with an increased reference count). The core insight is the same: for an immutable data structure, a shallow copy is equivalent to a deep copy.
Rust's superpower is of course that you can hand out non-owning &[T]s and be sure you aren't writing dangling reference bugs. And the video does not intend to dispute that! You should absolutely design your code to hand out references instead of ownership where it makes sense to do so. In my video, I'm just pointing out that Arc can be an optimization over Vec when you were already using Vec--in other words, in places where you've already decided that you need to give out ownership.
Interesting content and well explained. You should do more videos like this.
Awesome! Thank you for this explanation. I’ve heard bits and pieces of this before and it was making sense that I should start doing this as I am learning rust… but this one video gave me a ton of context; I think I’m actually going to do this as a reactor phase now 😊
I wonder if we could save some cycles by instead having the `ptr` at 10:00 point directly to the data instead of needing to offset. It would require a negative offset for accessing strong and weak but that's much rarer than Deref.
6:55 As far as I can tell, it generally doesn't allocate extra space if you create the whole string at once with String::from (and ofc String::with_capacity). Which actually seems a bit odd, afaik most allocators only give out 8 byte aligned regions so it would make sense if String just took the rest as well. Though I guess in that case, a realloc that stays below the actual region size probably also would be free.
The allocator API atm requires that you pass in the same layout for deallocation as you got for allocation. Excess capacity info is otherwise lost for Arc and so the conversion from String does a clone iff len ≠ cap.
In other words, this exact allocation attitude works well with converting to smart pointers that treat length and capacity as equal
@@Kupiakos42 I wasn't talking about smart pointers. I was talking about String.
@@1vader I mean smart pointers and how they interact with allocators is a specific reason for String, and the global allocator generally, to allocate exact space instead of a multiple of 8.
@@Kupiakos42 I'm not really sure I understand what you mean. It sounds like you're saying it's so that the String allocation can just be re-used for the Arc allocation during conversion from String to Arc but that doesn't seem possible since Arc needs extra stuff at the front of the allocation. And you also said in your initial comment that a clone is done iff len = capacity? But shouldn't it be the other way around?
@@Kupiakos42 Checking the source code, it indeed looks like Arc always creates a new allocation and copies everything over.
One thing to note about Box is that if you're trying to basically allocate a massive array on the heap, you'll hit on one of its fundamental problems, that being Box first allocated on the stack and only then copies stuff onto the heap. This results in very easy stack overflows when you're supposedly allocating on heap, unwittingly overflowing the stack in the process of trying to Box an array of a size too massive for the small default stack size rust has.
Is there any work around this?
I'd prefer a Cow.
Box is a reasonable default for immutable data. It's also smaller than either Vec. If you need Clone, use &[T]. Reference counting in 90% of the cases is a consequence of poor (or lack of) design.
This. Thanks.
If I understand correctly, you're saying that Rc is only useful when cloning is needed, and cloning is seldom needed. Is that it?
@@iilugs Some sort of cloning is almost always needed. Rc is useful when you don't know what releases the last reference and you often either do or can rewrite in a way that you will so that all other references can be counted at compile time.
I recently tried using the Arc in my recent project which was a websocket chat-app and there were too many clones of a string to send to every channel. The problem with using this type is that it is not serializable or deserializable by serde and serde derive does not work on it.
Arc works just fine with Serde, you just need to flip on a Serde feature flag. I mentioned this in the video and there's a link to the feature flag docs in the description. :)
@@_noisecode Hey thanks! I didn't read the description. I enabled "rc" feature and it now works!
Went down the rabbit hole, the important thing I was missing is Box is not the same as Box!!!
Indexing into Arc/Box etc. works just fine because they deref to [T], which is the thing that implements indexing. Try it out!
Great tutorial. One thing I was thinking about recently is the overuse of Result - not all functions are fallible, yet many unnecessarily return Result instead of just a value for the infallible functions. I think everyone just got used to returning Result... Worth looking into. Also worth a clippy lint if there isnt one for this. For an API, it should always be Result oc, but we're often not developing apis
Hm, do you have any examples where this has happened? I've never seen it in the wild.
This was fantastic. Thank you for making this video.
Out of curiosity, what tool are you using for making your videos?
For short strings the smartstring library is even better: it stores strings of up to 23 bytes length in-place without any heap allocations, and imitates the String type's interface. Basically like smallvec but for strings.
how is Box more efficient than &str? don’t they both just point to some data on the heap?
It's not. They are the same. The difference is only in semantics. When Box goes out of scope it frees whatever it points to
@@Turalcar ahh i get it now. thanks!
Unless you’re planning on generating monster IDs at runtime why not just drop the ARC and use &’static str? Or for that matter why not use a unique zero width type and make MonsterID a trait?
The ustr crate also provides a convenient _leaked_ string type with O(1) comparison time a O(n) construction. If the variants aren't known at compile time but don't need freeing after they're known, it can be a good approach.
Awesome video!! Btw I love the font you're using, looks kinda like LaTex
It is! Courtesy of the Manim library--see the link in the description. :)
If you need a even more powerful version of Arc that's basically a mini version control systems, consider immutable data structures like those in "im" crate.
So now I wonder what kind of situations Cow would be more appropriate when modifying the data might be required.
Correct me if i'm wrong, but if you want just some immutable data that can be passed around, you could just create a regular array, then drain it, and you'll get a 'static array slice. With that you can do anything.
I assume you mean "leak" instead of "drain". And yeah, that can also be a good option, as long as you don't eventually want to free them again.
@@1vader Yeah, that's what i meant.
Shared-ownership immutable string with one indirection performance is somewhat impressively clever. However its benefits *_really_* shine only (1) when you do tons and tons of copy of long permanent text and (2) when your design desperately can't attach a string ownership with anything else in the same scope at all, which make that scenario kind of *_unlikely_* TBH. Moreover this may encourage so-called pre-mature optimization that gains unnecessary complexity to your design with a little speed gain.
Kudos to very clear beautiful animation. Thank you.
This is a great video. I'd love to see more like it.
Good food for thought and illustrations, but I very much wish you would use Rc instead of Arc in most of this, and then showed folks how to determine if you actually need to "upgrade" to Arc when necessary. Healthier practice for the whole Rust ecosystem to not default to thread-safe types & operations when not actually necessary. We'll all pay with decreased performance of the software we use proportionately to how much thread-safe code is overused. 🙂
subbed and saved for future reference... easy to understand explanation.
Thanks! This was great as a beginner to rust.
awesome, wanna more content on Rust like this
The sus thing about Arc is that it still has an underlying mutable state that needs to be synchronized across threads (the reference counters which are atomic variables) and will thus introduce memory barriers (the cpu will try to make sure that all writes done before changing an atomic variable affect all reads that happen after the new atomic variable's value is read on another thread, even if they are unrelated). Copying doesn't do that, so in some high concurrency scenario (and if the string/vec isn't very large) you may find that the cost of copying the string or array (usually an aligned memcpy) is smaller than the cost of synchronization. Rc doesn't have that though.
Wouldn't copying a Vec require a heap allocation and thus induce synchronisation into the allocator anyways?
Really like this video. Nice that it has no music over it
I agree bro
I agree with the caveat that if it is used as an id of sort, Eq and Hash should use pointer (this is not unsafe).
Just a reminder that if you dont need thread saftey, you're better off with Box or Rc
First of, just wow. What a nugget. Id like to see more such generic stuff. Rust is still not decades long, so best practices are not fully known.
If you want ur videos even better here is my advice:
1. Include some running code snippets where you maybe measure it, break it purposefully. Taking no more then 10 lines of code and operate on it.
For example try with reference..include scenario where reference wont cut it, put Rc. Introduce threading, break it. Add Arc. Put it in struct, etc..
2. Keep video no longer then 5 minutes ideally, but pack them with such nuggets. If you need more then 5 minutes to explain some concept, that deserves a video on its own,.
3. Enjoy ur million subs.
I might be wrong about all of it, but its my humble advice. :)
I appreciate the kind words and feedback! Clearly I'm just getting started and I'm still figuring out what works, so I'll definitely keep these suggestions in mind. :)
As a creator with successful 20-minute videos, I disagree with your point #2.
@@strager_ No way dude. I love your content. Hashmap one was a blast.
I found a downside to this recently. `Arc` and `Rc` are not `serde::Serialize`.
It's a feature flag in serde
Box should be the default. Rc, Arc only if its shared which Vec wouldnt be for example so very different use case.
Sadly both on linux and windows Box visualizer is broken in the debugger. Arc, Rc of slices have workinv debugger visualizers.
14:46 "You cannot do any better than this" Of course you can, you can use `&str`. If the list of monster names is known at compile time, you can even use `&'static str`.
use std::rc::Rc;
struct SomeStruct {
a: A,
b: i8,
}
impl SomeStruct{
pub fn foo(&self) -> A {
(self.a).clone()
}
pub fn bar(&mut self) {
self.b += 1;
}
}
struct StrStruct{
pub fn foo(&self) -> &str {
(self.a).clone()
}
pub fn bar(&mut self) {
self.b += 1;
}
}
fn main() {
let mut spam = SomeStruct::{a: Rc::from("spam"), b: 1};
let a = spam.foo();
spam.bar();
println!("{a}");
let mut eggs = SomeStruct::{a: "spam", b:1};
let b = eggs.foo();
eggs.bar();
println!("{b}");
let mut ham = StrStruct{a: "spam", b:1};
let c = ham.foo();
ham.bar();
println!("{c}");
}
Arc can't be a thing or I'm missing something:
help: the trait `Sized` is not implemented for `str`
note: required by a bound in `Arc::::new`
Maybe he wanted to refer to &str?
I definitely did mean Arc, not Arc (something it’s hard to imagine a use for). You’re correct that you can’t create an Arc with new()-typically you’d use e.g. .into() on a String. Check the pinned comment.
@@_noisecode so the idea is to have a owned value first like Vec or String, and transform that value later to Arc/Rc. If we're going to do cloning, it's worth it as you explained. But if we don't... is still worthy?
As I mentioned near the end, if you're not going to do any cloning, I'd recommend Box, since it has the least memory overhead of any of the options.
I'd also look into compact_str (there are other similar crates but this one is the fastest of those I tried).