Find Duplicate Files In Linux With Awk In Under A Minute!

Поделиться
HTML-код
  • Опубликовано: 7 ноя 2024
  • AWK is powerful and can be your friend in Linux. Here we show how we can use awk to detect duplicate files in Linux.Taking output from md5sum we can see duplicate content. Passing that to AWK we can create arrays for each entry
    md5sum *
    md5sum * | awk ' brace bracket count[$1]++ brace bracket'
    md5sum * | awk 'brace bracket count[$1]++ brace bracket END brace bracket for (k in count) print count[k] brace bracket'
    Additionally you can find my video courses on Pluralsight: pluralsight.com... and take time to see my own site www.theurbanpen...
    ~-~~-~~~-~~-~
    Please watch: "RHCSA 9 Working With Podman Containers"
    • How To Use Podman Cont...
    ~-~~-~~~-~~-~

Комментарии • 23

  • @WoutiecomNL
    @WoutiecomNL Год назад +3

    Wow, GREAT tutorial. I used 'rdfind' for finding duplicate files within Linux. But not having to install aditional software is a huge advantage for me. Thanks!

  • @scruffyjohn5234
    @scruffyjohn5234 Год назад

    I'm gonna be honest with you. You make some of the best videos on the Linux OS. Your direction and explanation of the bash are absolutely awesome. I hope you can make videos more frequently.

  • @Pedro-fd9tv
    @Pedro-fd9tv Год назад +3

    Great video. I always wanted to learn awk and this video was an excellent start!

  • @gaiusbaltar7122
    @gaiusbaltar7122 Год назад +3

    Thanks a lot for all this valuable informations you give to us!

  • @sozinonl
    @sozinonl Год назад +1

    just thank you, I owe a lot to you and your training videos

    • @theurbanpenguin
      @theurbanpenguin  Год назад

      thank you, and congratulate yourself for your own effort in learning

  • @terry.chootiyaa
    @terry.chootiyaa Год назад +1

    *when are you going to upload new vids ?*

  • @allisondealmeida
    @allisondealmeida Год назад

    hello, how do you prepare for an lpic3 if there is no book currently that has updated content?

  • @Gosu9765
    @Gosu9765 Год назад +3

    No, thanks - I'll just use python :D
    I love syntax of bash when you need to do anything even slightly more complex than list files.
    In powershell if you slam your head against keyboard repeatedly you'll get syntax errors.
    If you do the same in bash you'll get Kubernetes cluster, upgraded kernel and installed arch on a separate partition. :D

  • @screamingiraffe
    @screamingiraffe Год назад

    Troubleshooting a fubar'd system would be interesting, or perhaps something instructional like Distributed SSH (DSH), or even a video covering the building a kernel for a system in 2023.

  • @pbezunartea
    @pbezunartea Год назад +1

    Thanks again for another nice video!

  • @zlqpzww9929
    @zlqpzww9929 Год назад

    I got a question that if there was a danger of just deleting the duplicated file. As I know the message digest algorithm which is md5 in this video will generate the same result even if the source input is different 😂. Maybe I am over considered.😅

  • @petregmd
    @petregmd Год назад

    Is it possible to improve this command to have it look through directories recursively? As far as I can see md5sum does not have a -r or --recursive option.

    • @chaz6399
      @chaz6399 Год назад

      find . -type f | xargs md5sum | awk '{ count [$1]++; name[$1]=name[$1] " " $2} END { for(k in count) if(count[k] > 1) print name[k] }' | sort

    • @RoboDragonJediKnight
      @RoboDragonJediKnight Год назад

      I would recommend using a variation on find(1) to get a listing of files recursively.
      ```bash
      # Find all files (not directories) under current working directory and execute md5 sum
      find . -type f | xargs md5sum
      ```
      Alternatively you could use "actions", which is a feature supported directly by find (man find) to eliminate the need for xargs in the previous pipeline. That look something like this.
      ```bash
      find . -type f -execdir md5sum "{}" \;
      ```
      Another interesting method is this one, which generates a sequence of md5sum filepath commands and runs them by piping into bash:
      ```bash
      find . -type f | awk '{print "md5sum " $1}' | bash
      ```
      Note that these options might not handle cases like spaces in filenames and that kind of thing. Flags like -print0 for the find command and -0 for xargs might come in handy in those cases.

  • @jamesbaxter2812
    @jamesbaxter2812 5 месяцев назад

    Sir, I have over 50000 dups on my laptop. looking at your vid. looking at asking for your help, thanks

  • @moreirajesse
    @moreirajesse Год назад

    Thank you.

  • @Foche_T._Schitt
    @Foche_T._Schitt Год назад

    Czkawka