Knowledge clip: Keeping research data organized

Поделиться
HTML-код
  • Опубликовано: 4 янв 2025

Комментарии • 70

  • @G_Whiz
    @G_Whiz Год назад +170

    I use random password generators for file naming. Every time I open a file, it's like a tiny surprise party or similar to the rush I get when hitting a good roll at the casino if the file is actually the one I'm looking for. I like to live under constant stimulation and stress, it helps me feel alive.

  • @jenswurm
    @jenswurm Год назад +26

    One often overlooked aspect is a depreciation strategy. At some point documents may no longer be relevant, and one wants them out of sight of the normal document archive, while still being able to find them with relative ease.

  • @lazygardens
    @lazygardens Год назад +5

    This is excellent!
    If you don't have revision control software, always have a folder for your source material, and copy that material out to work on it.
    So many people accidentally edit the source image.

  • @CrazyDriverSwed
    @CrazyDriverSwed Год назад

    Most filesystem also provides ways to add tags to files, tags that can be used when searching for a specific file or a group of files.

  • @marcogalloro
    @marcogalloro Год назад

    Extremely valuable!🗂️🙌🏻

  • @l00k4tstuff
    @l00k4tstuff Год назад +16

    The best way to organize files is to study taxonomy and apply those principles to the structure.

  • @pieterkops
    @pieterkops Год назад +56

    Don't store all the information in the filename! Use metadata instead. This makes sorting, filtering and grouping much easier. Think of it as dynamic folders.

    • @joaopedrorocha5693
      @joaopedrorocha5693 Год назад +6

      Details should be on the metadata, but some info on the name can be very useful sometimes ...
      I've had an annoying problem once for leaving almost all info on the metadata of my files. I've had to store a lot of data on the cloud and was using a binary data format with embedded metadata, therefore to read the metadata was necessary to run a program, so to do this it was necessary to move the files over the network to a compute node to run the program and get the metadata to them select what i needed, which took a lot of time.
      if i've had a better naming convention for my use case i would be able pre-filter the data only by pattern matching the filename.

    • @giac0416
      @giac0416 Год назад +2

      can you give as an example? thanks

    • @TheDiveO
      @TheDiveO Год назад +1

      well, file and folder names are actually meta data. but then, this is an ex-cathedra video. some people are busy organizing their life and telling others what to do, erm TODO. others are living their life.

    • @jenswurm
      @jenswurm Год назад +3

      I wish normal file systems would support labeling. That would make things a lot easier. Some documents might, despite the best efforts of coming up with an organization scheme, be relevant in multiple places. For example, the balance statement of an investment account may belong into the folder that is related to that bank, but it's also relevant for doing one's taxes.

    • @lazygardens
      @lazygardens Год назад +4

      @@jenswurm You can use symlinks.
      The bank account has the real file
      The tax prep folder has a symlink. It looks like a file, and when you click it, it opens the file in the bank account folder.
      en.wikipedia.org/wiki/Symbolic_link

  • @jimgrant1776
    @jimgrant1776 Год назад +14

    Ghent University Data Stewards - Excellent video. Maybe the best I’ve seen on folder / file organization.
    However, I disagree with you on some aspects of this.
    1 - Folder Names and Structure - Don’t design your folder names around file attributes. That is what tags are for. Folder names should be “categories” of topics/objects/nouns. The structure should be hierarchical. - - - I do agree that the “categories” should not overlap. Some professional consulting firms have developed a concept / methodology for organizing information called MECE. It stands for mutually exclusive (ME - no overlap) and collectively exhaustive (CE - nothing overlooked). - - - If it appears that a file can logically fit into more than one folder, that’s a signal that either the hierarchical file structure needs to be changed or tags should be used.
    2 - File Names - Don’t put dates in file names. If you do, files with common topics probably can’t be sorted and listed together. Windows and Mac operating systems create and maintain file “Create Dates”” and “Last Modified Dates”. If some other date (like “Due Date”) is important, add it as a tag (attribute) of the file. - - - Unquestionably, the best and most understandable / relatable file naming scheme is to make the names be specific cases of a topic/object/noun with are deeper in the hierarchy than the folder names.

    • @Junnaris
      @Junnaris Год назад

      Thank you for sahring! I find that information very useful.

    • @l00k4tstuff
      @l00k4tstuff Год назад +5

      I disagree with the dates in file names kibosh. Sure, you can turn on the file manager to show details, but then you have too many columns. Similarly, when files are a catalog of collection, then it is important to include the date (and time, and then also a unique identifier if more than one in a second) of the capture. This is much better than creating a hash for creating distinct file names because the date&time stamp has orderly meaning.

    • @cowboybob7093
      @cowboybob7093 Год назад +4

      @@l00k4tstuff Agree, create and modify dates don't necessarily reflect important aspects of why a date is used. Those dates can be naïvely manipulated by the OS or programs. Moving files between systems has headaches too. Finally, details vs name view, a "screenful" of files is a handy arbitrary amount of names but few detailed.

  • @anitat9727
    @anitat9727 2 месяца назад

    Thank you for this

  • @BrandspankingFilm
    @BrandspankingFilm Месяц назад

    Currently i am doing researtch on this subject, i'd be very interested to talk to people and find out how they approach this within their organisation. Feel free to reach out.

  • @yohanesliong4818
    @yohanesliong4818 5 месяцев назад

    Very informative. Thank you

  • @EverCraft_File_History
    @EverCraft_File_History 6 месяцев назад

    What software do people typically use for version control of research data?

  • @darked89
    @darked89 Год назад

    I would also add a strong recommendation to avoid if possible binary formats. It is almost trivial to track or spot differences between say two TSV files without opening them (git and diff).

  • @edwingonzalez6399
    @edwingonzalez6399 Год назад

    En la carpeta en donde vayan a estar varias revisiones siempre pongo una carpeta llamada "_Superados". Esto con la finalidad de que muevo para alla todas las versiones obsoletas y solo me queda la ultima, pero no me deshago de las anterior por precaución. Así luego de 1 año o más, cuando voy a buscar el ultimo reporte, no tengo que lidiar con una lista de versiones.

  • @vaughngaminghd
    @vaughngaminghd Год назад +18

    I make a point of never using "final" in the file name: best way to jinx the project…😆

    • @cowboybob7093
      @cowboybob7093 Год назад +3

      When I do a light developing task like a near-throwaway script I'll start with _name99_ and work down. By default the most recent rev sorts to the top.

    • @Blast-Forward
      @Blast-Forward Год назад +2

      That's why I love Linux, you can have
      final
      Final
      fInal
      fiNal
      finAl
      finaL
      FInal
      FiNal
      FinAl
      FinaL
      FINal
      FInAl
      FInaL
      FINAl
      FINaL
      and most importantly
      FINAL
      Takes you a long time to run out of finals.

    • @atlasstone6896
      @atlasstone6896 Год назад

      This this is what I am saying hahahahah
      If you say final it will never finish!

    • @EverCraft_File_History
      @EverCraft_File_History 6 месяцев назад +1

      @@Blast-Forward hahahahhh, I couldn't help but laugh when I saw this.

  • @Pedritox0953
    @Pedritox0953 Год назад

    Great video!

  • @quochuynh184
    @quochuynh184 Год назад +1

    The underscore disables Windows search capability for file names.

    • @cowboybob7093
      @cowboybob7093 Год назад +1

      Not quite my experience, but I know what you mean. The way I look at underscore vs hyphen is underscore is just another letter but hyphen includes invisible white space. If you're referencing the folder tree search in Windows Explorer, and you notice a different behavior, I'll defer to your experience. That feature was so bad for so long I've only started using it recently. I know what I'm about to write is out of the stone-age, but it's not unusual for me to open a command prompt and do a dir /s /b > .\allfiles.txt - - - then use `find` to search the new file. The disk access is heavy one time, after that the text file is opened in memory and searched in a flash. The method has its obvious drawbacks, but they all do.

  • @danielcraft7342
    @danielcraft7342 Год назад

    Any tip to do if I have a lot of audio files or video files? In my example I have like 30 audios of the same event. Any idea?

    • @jon9103
      @jon9103 Год назад

      Like any other set of files, it depends on what differences between the files do you care about, what makes each file unique. Surely there must be something, otherwise why bother having 30 of them? With only your vague description it's impossible to know what's important to you.

    • @lazygardens
      @lazygardens Год назад +1

      Like multiple videos of a wedding ...
      Event and date would be important. After that, by what's happening you want to recognize quickly.
      2020_BearAttack_screams.mp4
      BearAttack_2020_screams.mp4
      Are they sequential captures?
      Simultaneous captures by different equipment?
      What is important is that you can look at the directory and pick out the file you want easily. (and write down your scheme so you can have an assistant find it).

    • @danielcraft7342
      @danielcraft7342 Год назад

      @@lazygardens thank you, it actually help me a lot

  • @sebastianpozo8305
    @sebastianpozo8305 2 года назад +2

    JUST AMAZING! CAN'T BELIEVE THAT YOU ONLY HAVE 2065 VIEWS.

  • @Justopensourceandme
    @Justopensourceandme Год назад

    You need git or related tools to tracking your project files states

  • @sheffin007
    @sheffin007 2 года назад

    Thank you

  • @reyesmedicen
    @reyesmedicen Год назад

    Amazing

  • @phpn99
    @phpn99 Год назад +9

    These were "best practices" in 1989. The world needs to move from data categorization that is based on "where", to one that is based on "what". This means that categorization is only accessorily related to folder trees ; it should be primarily done via metadata, or emergent metadata extracted by modern search engines. This old way to do things is based on the concept of taxonomy ; information is not best done in a hierarchy but in a network.

    • @timbehrens9678
      @timbehrens9678 Год назад +9

      Learning and implementing "metadata extraction by modern search engines" shouldn't be a prerequisite for a bachelor thesis in biology or sociology. In many cases best practices of 1989 are still the best.

    • @lazygardens
      @lazygardens Год назад +3

      OK ... where is your video showing WHAT metadata to put in a file to make it extractable by "modern search engines". Explain your schema.
      And what does the poor PhD candidate do when the "modern search engine" is unavailable because the frigging network is down?

    • @darked89
      @darked89 Год назад

      Good luck extracting metadata from the sequencing machine created foobar.fastq.gz
      One has to have unique file names and either encode project + sample etc in the file name or have a database where each file has an entry describing it.

  • @ElCidPhysics90
    @ElCidPhysics90 Год назад +2

    If using the date as file name I would start with year then month then day e.g. 2023 07 04

  • @valerio4044
    @valerio4044 Год назад

    Habría que ver qué dice el paper pero yo creo que que la IA tiene memorizada la película y lo que prevee es el minuto que está viendo el ratón en base a la lectura de las ondas cerebrales

  • @SphereofTime
    @SphereofTime Год назад

    1:18

  • @atlasstone6896
    @atlasstone6896 Год назад

    Dont use acronyms!
    You will forget it !
    Your team mates will ask what is it!
    It is not the old era of pc where every bit counted and screens were small just write a long name it is ok !
    As for versions use date and the time of day in 24 hour format
    Like 2023/12/24-1830
    By the way pro time every hour of edit save a version!
    And this system will allow you to keep going it will push you once you see the files and notice that you have not been working your brain will say oh start work look there are dates we didn't work!

  • @dannytan8080
    @dannytan8080 Год назад +2

    a bit obnoxious to call your own preferred folder organization style as "Best Practices"

  • @madwhitehatter
    @madwhitehatter 2 года назад +3

    The 90s called.

  • @kathyglass2922
    @kathyglass2922 2 года назад +2

    Seriously? You have to tell me the advantages to having an organized file system? Geesh, there was a reason I clicked on the this to begin with. Wonder what it was.

    • @gr8dvd
      @gr8dvd Год назад

      😂😂😂

    •  Год назад

      Obvious as it may seem, I can understand why a university feels the need to produce this sort of video.

    • @gr8dvd
      @gr8dvd Год назад

      @ OP is not questioning why video is needed, she questioning the need to explain why being organized is a good thing.

  • @petersalt2342
    @petersalt2342 4 месяца назад

    A file system from a university? Definitely a No Thanks !

  • @ahmadkavie4178
    @ahmadkavie4178 3 месяца назад

    very thank you