Linux vs. FreeBSD: Uncovering the Truth Behind ZFS Performance

DJ Ware

Просмотров 8 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 янв 2025

Комментарии • 67

@flyofthefirefly 19 дней назад ⁺¹²
Ayyyy, IT Santa!
Thank you for the benchmarks.
@derekr54 18 дней назад ⁺²
Wishing you and your family a Merry Christmas and a Happy New Year,thanks for the videos DJ much appreciated.
@Psychx_ 18 дней назад ⁺⁴
I'd be interested in seeing ZFS vs. LVM with a simple compressed btrfs volume on top.
@simian3455 18 дней назад ⁺⁹
Honestly, his benchmarks are like a holiday gift from the IT Santa.
@MilenkoMarkov 18 дней назад
holiday git :)
@tmendoza6 17 дней назад
Hope you had an great Christmas and thank you for the excellent content
@eugenesmirnov252 18 дней назад ⁺¹
Merry Christmas, @DJWare and all DJWare appreciation society!
Hope see more videos in NY!
@guilherme5094 18 дней назад
Thanks DJ and Merry Christmas!
@HilbertSpacersson 12 дней назад ⁺¹
Hello thanks for the video! Naive question: Does running it under proxmox potentially affect performance?
@CyberGizmo 12 дней назад ⁺²
a ittle bit yes but not as much as you might think
@stanlee-eq7lu 14 дней назад ⁺²
I'm off topic for ZFS but still on storage. What about the HAMMER FS in Dragonfly BSD?
@krasen4oo 12 дней назад
@@stanlee-eq7lu what about it?
@mirror1766 12 дней назад ⁺¹
Comparisons would be nice as it has a number of modern reliable filesystem features for it. On FreeBSD there is some support through filesystems/hammer2 but not sure how that compares vs the original OS. Don't know if Linux or others have a port for it.
@mirror1766 12 дней назад
ZSTD is the new kid on the block for compression in ZFS. It is much slower than LZ4 but depending on storage+CPU+ZSTD level you may or may not hit a CPU limit. I admit I have done little to test the negative compression levels of ZSTD but it seems LZ4 is still the fastest choice as a general compressor and finding a system with fast enough storage + slow enough CPU to hit a limit with LZ4 would require very fast nvme drives, likely still in a speed-benefiting RAID, or many drives in a fast RAID, or a very slow CPU. ZSTD decompression usually remains at similar speeds while the compression is massively dropping when you increase ZSTD levels. Though ZSTD itself doesn't yet have multithreaded decompression, ZFS can decompress multiple records simultaneously so that limitation rarely impacts use in ZFS.
For compressible data, larger recordsizes usually are beneficial but it should be tested and the optimized choice can easily vary per file. There are some datastreams that do worse once it goes past a certain point. Similarly the algorithm and in the case of ZSTD the levels also don't have a clear answer so sometimes data is smaller vs larger with each choice.
For incompressible data, any record that doesn't compress at least a certain amount will be stored without compression and LZ4 is very quick to determine that on a datastream so it can abort before compression was finished. ZSTD has some trisks to try to give it an early abort by testing the data with LZ4, then testing with lower ZSTD levels before compressing with higher ZSTD levels. Unfortunately I don't know how to turn off ZSTD's attempt at unselected compressors which in my opinion should be an option as the different levels can impact compression which impacts if it fits the desired savings.
Additonally for incompressible data, some data within it is often compressible. Files often have metadata that may still save a bit. I thought I recall ZFS uses the compression setting for compressing its own structures too so savings can be had there separately from the contents of the file. Any compression/decompression that happens at a throughput which is not slower than the drive is faster data I/O to move the data with the cost being added to CPU+RAM to process it.
All of this should slow down I/O latency as it takes time to do any of this with the data, just as it takes time to compute+compare a checksum (though that should be faster). If you regularly read small blocks of data in a file, a large record size requires the entire record be read from disk, checksummed/decrypted/decompressed/... Some parts of that can happen in parallel. I haven't found latency measurements for the different ZFS options but they are there as another performance metric to consider that could be slowing down working with files.
If I recall, relatime is a variation of atime where atime writes happen but only if the read is a certain time after the last recorded read instead of it happening on every read. Atime can cause pool slowdowns in my experience both by being that extra write order whenever a file is read + those writes are fragmented thanks to copy-on-write design so seeking is involved to read that atime data separate from reading other file contents. Seek times matter less on SSDs but they do happen there too but magnetic drives can reach horrific performance on basic tasks like listing files in my experience with any atime writes.
My understanding is FreeBSD favors writing data to the beginning of the drive while Linux favors evenly spreading the data; this is applied blindly to both magnetic and solid state drives for both operating systems so FreeBSD's technique is better for magnetic while Linux is better for SSD. ZFS caching in RAM and on some types of cache devices both make analyzing how that compares more of a challenge. You can disable caching specifically for testing how the disks do in an uncached state but the results are only useful to represent your system for times where ZFS wouldn't have that data cached which happens when ARC data was pruned to make room for other data or not yet read after a reboot, if cache vdevs aren't part of the pool or big enough that they held the data, etc.
Did you open a PR for the iozone oddities you observed?
Random reads + writes may not apply to the main data itself, but it does apply to ZFS data structures. ZFS scatters a write into several pieces as metadata gets multiple copies written to disk and they try to spread it to different areas so one bad area on disk can't hit it all. As mentioned earlier, atime/relatime causes more seeking. I increase vfs.zfs.txg.timeout to higher values to try to minimize some fragmentation as I found it wasn't beneficial to lower it to 5 back when the default got changed from 30 to 5. I was experiencing system responsiveness issues too like other people back when that change was recommended and later adjusted but it didn't help my system. i7-3820, 32GB RAM, magnetic media.
If using sync=disabled, consider using a fast ZIL instead or if you still insist, try to make sure you have battery backup with orderly shutdown on powerloss and maybe consider power loss protection for drives.
The 80% filled pool guideline is old information. Depending on workload you 'could' have allocation algorithms cause slower performance well below 80% but many users can exceed 95% without those slower algorithms needing to jump into gear these days. I think a workaround to force free space was to create a dataset in the pool that you don't use (mark it read only and don't store other datasets/pools in it with `zfs recv` to keep it empty) and set the refreservation property to a value you want to keep free within the pool. There always seems to be ways that limits/quotas could be exceeded but it should minimize a general 'oops' being able to fill space you wanted kept free.
Consider TRIM to keep up performance both for SSDs and some magnetic drives, specifically some shingled magnetic recording (SMR) drives accept those commands. Some drives have a performance impact when processing TRIM so you may need to decide if it should be on vs if you run it on a schedule.
If you want to see bigger changes in the comparisons, consider comparing older vs newer ZFS versions. Its not uncommon that Linux distros have an out of date version and FreeBSD's ZFS in base is sometimes a little older than the latest depending on the FreeBSD version and the FreeBSD port is far enough outdated it should be either updated or removed as it is no longer a way to test newer code than is in base.
@N0zer0 18 дней назад ⁺²
This is when you have been naughty all year long and Santa brings you ZFS benchmark instead of real gifts.
@husanaaulia4717 2 дня назад
2:16 isn't zstd newer?
@elalemanpaisa 18 дней назад
1:40 you were saying mounting linux zfs on freebsd and vice versa.. I am still wondering if this is even safe on write mode and not leading to corruption due to different driver implemantions
@Mudflap1110 19 дней назад ⁺²
Would appreciate the tests on real hardware. I'm sticking to truenas core for now. IXsystems will probably improve Linux performance soon. Also, personally, I've been moving to debian, for servers. Sticking with openbsd and freebsd for networking stack. Thanks for the coverage. Happy holidays!
@paulhernaus 19 дней назад ⁺⁴
If you are using the file systems for video files, wouldn't it be better to have no compression at all?
@CyberGizmo 19 дней назад ⁺²
if it were H.264 yes,if its Prores like mine, no compression helps
@Yxcell 18 дней назад
I thought so, too. Overlaying the filesystem's compression on top of the already-compressed video files seems like unnecessary redundancy.
@CyberGizmo 18 дней назад ⁺⁶
@@Yxcell There is a big difference between ProRes and H.264 perhaps a video on this would help but essentially ProRes is an intra-frame codec, meaning each frame is compressed independently and contains all the data required to reconstruct the image.
• It uses lossy compression but retains a high level of detail, which means there’s still some redundancy and patterns in the data that can be further compressed. H.264:
• H.264 is an inter-frame codec, meaning it uses predictive algorithms to eliminate redundant information between consecutive frames (e.g., by storing only the differences between frames).
• It’s designed to maximize compression efficiency, removing as much unnecessary data as possible while maintaining visual quality.
Not everything is intuitive in this world :)
@Yxcell 18 дней назад
@@CyberGizmo Ah, thanks. I was completely unfamiliar with ProRes and how it handles compression. I appreciate the helpful info. :)
@katrinabryce 18 дней назад
The advice is usually to set the compression at the fastest/lowest setting rather than turn it off.
@bertnijhof5413 19 дней назад ⁺¹
Remarks:
I used ZFS and lz ompression on the 2nd slowest Ryzen ever, the Ryzen 3 2200G (4C4T; 3.2/3.7 GHz). It slowed down boot times of the Xubuntu VM from ~7 secs on the 2200G to ~4.5 sec on the 5600GT, but during normal operation it was not really noticeable.
@CyberGizmo 18 дней назад
No problems on an Intel 12th Gen lz4 is fast as heck and low cpu use
@TheLinuxNinja 18 дней назад ⁺²
Do a ZFS vs BTRFS please 😊
@tonnylins 18 дней назад
He's probably going to include btrfs and zfs on the next round of linux filesystem comparisons, as he usually attempts to include them each time he does linux filesystem comparison benchmarks.
@savagepro9060 19 дней назад ⁺⁵
I just had a RUclipsr kept a promise and upload a video on installing FreeBSD on an old MacBook Pro.
Just waiting to acquire a used MacBook on eBay!
@CyberGizmo 19 дней назад ⁺²
nice, i assume an x86 based mac?
@savagepro9060 19 дней назад ⁺²
@@CyberGizmo Yes indeed, SPECIFICALLY
@NitroNilz 13 дней назад
Link! Link! Link! Link!🔗🔗🔗🔗
@savagepro9060 13 дней назад ⁺²
@@NitroNilz "youtubecom/watch?v=BrBmJSnxF0g&t=1008s"
@CB0T 18 дней назад
Merry Christmas all!
@georgH 18 дней назад
That was awesome!! Thank you!
@jakobw135 18 дней назад ⁺¹
Isn't Free BSD a variant of Unix, like Linux is?
@anonymoususerinterface 18 дней назад ⁺⁵
FreeBSD is decendant from Unix by AT&T Bell Labs. UofCalifornia Berkeley made Berkely Unix but then turned it to Berk software distribution iirc. Linux is a unix-LIKE kernel which does many of the same things as og unix but is not related to it, linus torvalds was inspired to make linux bcz of unix.
FreeBSD is not Unix certified tho. current unix's are Illumos distributions like SmartOS, OpenIndiana, OmniOS. MacOS is also unix certified. There have been linux systems that are unix certified. unix certification is basically a series of tests to see if an OS does things in the way that UNIX does. you dont have to be related to unix.
TLDR FreeBSD is dervied from UNIX but is not UNIX certified rn. Linux is a UNIX -LIKE system not actually related to UNIX but is partially UNIX and POSIX compliant.
@NitroNilz 13 дней назад
@@anonymoususerinterface ONLY MacOS is certified UNIX, illumos wouldn't waste money on that.
@framegrace1 18 дней назад ⁺¹
ZFS is amazing, but the limitation of not being able to grow pools is too much of a limitation for most of my uses. You can't always pre-buy the space you will need some years in advance, and I hate having to compromise architecture decisions on that. Most of what I do doesn't really depend of squeezing that last percent of performance.
Is really pitty it lacks so basic use, in this sort of levels, performace + manageability is a zero sum game.
@CyberGizmo 18 дней назад ⁺¹
You can grow pools, but you cant grow vdevs, if you configure a mirror, just add a 2nd mirror to the pool, you can do the same for raidz, just add a second three drive raidz to the pool.
@flasksval 16 дней назад
@@CyberGizmoand it adds more performance at the same time. A limitation in one way, a benefit in another. Should be mentioned.Also, I miss expanding RAID5/6, I miss the low cost of unraid. Never had bitrot with ZFS and that is everything for long-term storage. It could take years before you see it and your backups might not be old enough.
@mirror1766 12 дней назад
Adding individual disks to a raidz will likely be available in OpenZFS 2.3 (it is in release candidates) thanks to github.com/openzfs/zfs/pull/15022 already being brought in. This will allow adding the disk to a pool while it is running but only 1 disk can be added at a time with a data reflow step that has to complete before adding the next disk. Pool redundancy is maintained during the reflow and reflow operation is paused on disk failure until a replacement disk is fully online again. It won't likely be as good as a recreated pool from an overhead standpoint since block+parity sets are not rewritten from scratch and retain the data to parity ration but at least it will be an option. New data will get the new block to parity ratio. The way data to parity is accounted for will likely lead to space use seeming weird. Look at "RAIDZ Expansion by Matt Ahrens & Don Brady" for video detail about this feature.
You can grow the pool by replacing all disks one at a time as long as the pool has redundancy but some ZFS layouts don't let you just add a single additional disk.
If you can have keep 2x the storage space that you need then partition each drive in half and make your pool for the first partition. If you need to rearrange any pool structure such as adding a disk or just rebuild it to minimize fragmentation then you make a pool on the second partitions. This technique is filesystem independent and gets past any limitations of filesystem/device resize/rewrite limitations as it is actually a completely separate rewrite that is happening; limitations come from tools used to do the rewrite like zfs send/recv not being able to alter record sizes of the stream and not understanding block cloning; consider tools with less filesystem knowledge like cp/tar/rsync/etc. to get past those but remember they have their own drawbacks.
Use zfs send+recv to get the data to its new home. You can use incremental sends to perform most of it during convenient hours and have only a small final snapshot you send when you take the system down to migrate to it. Alternatively you may be able to set the second pool up as a mirror of the first to have it replicate but there can be disadvantages to that.
Most people who can't keep 2x the space also can't keep 2x the space for a backup. ZFS, RAID, and other filesystems aren't an alternative to backups so your data is at risk. When you realize you need to grow and can't just buy 1 more disk for your desired and used layout, save those older small disks so you can finally create a backup pool.
Only having 1 backup has a risk in that at times you are restoring from backup you likely only have 1 complete copy of the data so its not currently backed up.
@framegrace1 11 дней назад
@@CyberGizmo That's Ok in the virtual world, (but not as convenient, really). But on real hardware, is a pain.
@xenoman2238 16 дней назад
Sir, you both look and sound a lot like actor Donald Sutherland!
Merry Christmas
@Jerrec 18 дней назад ⁺¹
zme mirror sdb sec - should sec be sdc?
lz4 isnt cpu intensive. zstd is much more. lz4 is in zfs for ages now. zstd is the new kid on the block. kz4 is fast and zstd is much more effective. I use lz4 for file servers and zstd for archives. But I will go zstd for everything (that is not compressed) in the future.
@savagepro9060 19 дней назад ⁺²
Thumbnail -->> Oh come on. An iconic ice-bird does not stand a chance against a raging devil!
Beastie turns a Christmas Showdown into a Christmas Chow Down!
@jyvben1520 19 дней назад
Happy Spinning disks
@tonnylins 18 дней назад
gogo zfs!
@AllanSoelberg-d3j 17 дней назад
How is LZ4 the new kid on the block.... it's literally been the default ZFS setting for the past decade!.
also from my own, testing... not a deep dive, just some rough basic testing.
ZSTD-3 seems to have the same CPU utilization but can achieve 800% better compression in some rare cases.
also there is no benefit, from running ZFS without compression.. and if someone could magick such a thing into existence, i would suspect it would be a very niche usecase.
LZ4 should certainly never affect performance due to CPU utilization, LZ4 can run on the slowest oldest CPU's one can imagine, and still provide many GB of throughput bandwidth.
ARC_MAX tho the max limit, will as a general rule of thumb do just about nothing, the ARC will immediately release memory back to the system, when the system requests it.
only real benefit i'm aware of, from using ARC_MAX is that you can see your free RAM, while if the ARC is allowed to take what it wants, then one has to add the ARC + the free system RAM to get the truly usable free memory of the system.
Fwrite testing... i would suggest trying a flame diagram.
the thoughput setting usually gives like a 10x to 12x performance boost, so maybe one has it and the other doesn't
yes i've also always heard that freebsd zfs is faster, but i suspect thats just a matter of time, before both are neck and neck in almost everything....
might be a few years out still... never actually tested freebsd zfs.
which you pretty much prove is just about already a thing.
did think that freebsd and linux was running the same version of zfs these days, since openzfs2.0, but maybe i misunderstood that, never actually tried freebsd.
in regard to sync=disabled, it will do wonders for the iops, especially random... since it will ignore the sync requirements and basically make random writes more sequential, and thus saving a lot of it..... but when the shit hits the fan, running sync=disabled can cause data corruption in files, so for perfect data integrity, i certainly can't recommend it.
i can however say, the real damage is atleast with my hardware very limited and happens only in the rarest of cases.
if i could hit the iops i need without running sync=disabled i would.... but i can't.
interesting video, i will have to dig into the normalization thing.... wasn't aware of that.... i wonder if that can make windows created pools work on linux.
had some issues with that.
happy ZFS user tho, i'm never going back to anything less than ZFS, its amazing.
@mirror1766 11 дней назад
I wouldn't worry about the rare cases when making a general compression choice and would focus on the common cases unless those rare cases have a major negative to them. ZSTD-3 (the default ZSTD?) was much slower / more CPU overhead in my testing. Did you just see them as both insignificant enough that your CPU use was very low while your disk was the bottleneck or did you get actual CPU comparisions? Did you look at how much data was compressing vs early abort was in use and did that matter much? Did you watch for latency and RAM use differences?
No compression being beneficial is rare but certainly possible. There are other choices too like just compressing zeroes that users can consider too. Turning off compression instead of LZ4 without testing is generally just bad administration.
Some disks individually are getting into many GB of throughput, and people are making arrays of some of those disks too. Even ARC is too much overhead for some fast disk arrays so we have the direct-io feature coming to help as some users found it is a very noticeable bottleneck.
ARC release (=prune) isn't free or immediate. It has received multiple optimizations to help systems where it was becoming a problem and some were inspired by troubleshooting arc_prune performance bugs. Add to that the effort to fix the bugs themselves and its better but all shows that it isn't instant. There is also a bottom end (and more commonly what i tweak by increasing it) so its not just 'ARC+FREE=truly free memory'.
Its common that Linux distros and FreeBSD are not all on the same version. FreeBSD is usually very quick to pull it in but major OpenZFS versions don't populate out to every FreeBSD version that is supported. Linux distros similarly don't always offer their users updated versions. You find reports of this by seeing what people use in github issues and even OpenZFS Leadership meeting members often make mention of their issues dealing with new feature/fix/test and old distro that doesn't have a recent version.
@laneromel5667 18 дней назад ⁺⁹
I tried ZFS, compared to a RAID controller, ZFS slow, very very slow.
@musiqtee 17 дней назад ⁺⁵
If speed is of most importance, ZFS demands quite the hardware.
However, if pure speed (of large amounts) isn’t too important, the other logistics of ZFS are hard to beat. Backups, sharing/ACL, bulk replication or multiple client side “file systems” (FS agnostic host datasets etc) are easier in software (tuning, VMs, middleware, networking…) than with proprietary HW.
E.g. typical 3.2.1. storage, replication & failover is well covered on bread & butter hardware in “homelab” or small business settings. The big players have equally “bigger” wallets, so caveats surely apply…
@flasksval 16 дней назад ⁺³
If that is your only conclusion you haven't learned anything from your "try". There is much more to learn.
@stanlee-eq7lu 14 дней назад
@@laneromel5667- very true
@mirror1766 11 дней назад ⁺²
ZFS is more about reliability and capability than it is about performance, though additions are there that focus on performance so sometimes its a loss and sometimes a win. A RAID controller handles the case of knowing a drive is failed and rebuilding a new disk with mirror/parity data. It has no awareness of the data on the drive to do that and just uses its known state of it being good vs failed. Changing a failed drive to a good state, even if it only lost power for a moment will require fully rebuilding it. If a disk hasn't outright failed, a RAID controller likely skips checking the parity and only reads one side of a mirror. Without checksums, it cannot know if the data is correct and will gladly return corrupted data as long as the drive state didn't get toggled from good. Its also common that with a RAID controller that those drives cannot migrate from that model of controller, and sometimes are trapped to that specific controller, without full data loss.
On FreeBSD you can avoid the proprietary RAID controller in another way by using geom providers to create RAID in software; now your disks can move freely to other systems and controllers. You can get some other features like checksums but its normally less optimized than how ZFS achieved it for its space overhead.
ZFS watches data throughout with checksums and in the case of doing rebuilds of a failed disk it will only have to read and write allocated disks instead of entire drives as long as the drives are
@michaelthompson7217 17 дней назад
🎅
@viggokallman1649 19 дней назад
First
@kebugcheck 19 дней назад ⁺⁵
@@viggokallman1649 ladies first
@grahamritchie672 18 дней назад
@@kebugcheck you old somoothy
@oflameo8927 18 дней назад ⁺¹
I honestly never trusted Debian based distros such as Ubuntu for performance. They do too much nutty stuff with the packaging. I would go for Arch of Fedora based ones instead.
@CyberGizmo 18 дней назад ⁺²
Sure, you can try those, but remember ZFS is a dkms package and you may run into issues if the OpenZFS versions doesn't support the kernel your distro installed, and I did try Fedora 41, SMB is broken on its builds right now. I have run OpenZFS on Debian for about seven years and no issues even with a dkms package
@framegrace1 18 дней назад
Unless they change the default glibc, there's not much packaging can change in this case.
@oflameo8927 18 дней назад
@@CyberGizmo That is a good point, and seemingly Fedora is putting their chips on BTRFS even though their main corporate sponsor, Red Hat said they don't like BTRFS, but they have other corporate sponsors who adore it.
@mirror1766 11 дней назад
@@framegrace1 Some add performance overhead and bloat as they look at container based packaging systems which is usually done to avoid dependency management issues.
@framegrace1 11 дней назад
@@mirror1766 That's for flatpacks and similar, and, AFAIK, Arch and Fedora uses also those as much or as less as Debian do.

Следующие

Автовоспроизведение

Linux Filesystem 2025: Getting the Formula Right