Find Duplicate Files In Linux With Awk In Under A Minute!
HTML-код
- Опубликовано: 7 ноя 2024
- AWK is powerful and can be your friend in Linux. Here we show how we can use awk to detect duplicate files in Linux.Taking output from md5sum we can see duplicate content. Passing that to AWK we can create arrays for each entry
md5sum *
md5sum * | awk ' brace bracket count[$1]++ brace bracket'
md5sum * | awk 'brace bracket count[$1]++ brace bracket END brace bracket for (k in count) print count[k] brace bracket'
Additionally you can find my video courses on Pluralsight: pluralsight.com... and take time to see my own site www.theurbanpen...
~-~~-~~~-~~-~
Please watch: "RHCSA 9 Working With Podman Containers"
• How To Use Podman Cont...
~-~~-~~~-~~-~
Wow, GREAT tutorial. I used 'rdfind' for finding duplicate files within Linux. But not having to install aditional software is a huge advantage for me. Thanks!
Thanks, did not know of rdfind
I'm gonna be honest with you. You make some of the best videos on the Linux OS. Your direction and explanation of the bash are absolutely awesome. I hope you can make videos more frequently.
Great video. I always wanted to learn awk and this video was an excellent start!
Thank you
Thanks a lot for all this valuable informations you give to us!
:)
just thank you, I owe a lot to you and your training videos
thank you, and congratulate yourself for your own effort in learning
*when are you going to upload new vids ?*
hello, how do you prepare for an lpic3 if there is no book currently that has updated content?
No, thanks - I'll just use python :D
I love syntax of bash when you need to do anything even slightly more complex than list files.
In powershell if you slam your head against keyboard repeatedly you'll get syntax errors.
If you do the same in bash you'll get Kubernetes cluster, upgraded kernel and installed arch on a separate partition. :D
Troubleshooting a fubar'd system would be interesting, or perhaps something instructional like Distributed SSH (DSH), or even a video covering the building a kernel for a system in 2023.
Thanks again for another nice video!
Thank you
I got a question that if there was a danger of just deleting the duplicated file. As I know the message digest algorithm which is md5 in this video will generate the same result even if the source input is different 😂. Maybe I am over considered.😅
Is it possible to improve this command to have it look through directories recursively? As far as I can see md5sum does not have a -r or --recursive option.
find . -type f | xargs md5sum | awk '{ count [$1]++; name[$1]=name[$1] " " $2} END { for(k in count) if(count[k] > 1) print name[k] }' | sort
I would recommend using a variation on find(1) to get a listing of files recursively.
```bash
# Find all files (not directories) under current working directory and execute md5 sum
find . -type f | xargs md5sum
```
Alternatively you could use "actions", which is a feature supported directly by find (man find) to eliminate the need for xargs in the previous pipeline. That look something like this.
```bash
find . -type f -execdir md5sum "{}" \;
```
Another interesting method is this one, which generates a sequence of md5sum filepath commands and runs them by piping into bash:
```bash
find . -type f | awk '{print "md5sum " $1}' | bash
```
Note that these options might not handle cases like spaces in filenames and that kind of thing. Flags like -print0 for the find command and -0 for xargs might come in handy in those cases.
Sir, I have over 50000 dups on my laptop. looking at your vid. looking at asking for your help, thanks
Thank you.
thanks
Czkawka