A Deep Dive Into Pathlib And The Magic Behind It
HTML-код
- Опубликовано: 27 июл 2024
- If you’re not yet using pathlib for dealing with files and directories, you’re missing out. This video takes a close look at the pathlib library in Python and explains some of the magic that goes into it, as well as how you can use it in your own code.
The code I worked on in this video is available here: github.com/ArjanCodes/2022-pa...
💡 Get my FREE 7-step guide to help you consistently design great software: arjancodes.com/designguide.
💻 ArjanCodes Blog: www.arjancodes.com/blog
🎓 Courses:
The Software Designer Mindset: www.arjancodes.com/mindset
The Software Designer Mindset Team Packages: www.arjancodes.com/sas
The Software Architect Mindset: Pre-register now! www.arjancodes.com/architect
Next Level Python: Become a Python Expert: www.arjancodes.com/next-level...
The 30-Day Design Challenge: www.arjancodes.com/30ddc
🛒 GEAR & RECOMMENDED BOOKS: kit.co/arjancodes.
👍 If you enjoyed this content, give this video a like. If you want to watch more of my upcoming videos, consider subscribing to my channel!
💬 Discord: discord.arjan.codes
🐦Twitter: / arjancodes
🌍LinkedIn: / arjancodes
🕵Facebook: / arjancodes
📱Instagram: / arjancodes
👀 Code reviewers:
- Yoriz
- Ryan Laursen
- James Dooley
- Dale Hagglund
🎥 Video edited by Mark Bacskai: / bacskaimark
🔖 Chapters:
0:00 Intro
0:55 Why not simply use strings
1:10 Issues with strings
2:20 Basic usage of pathlib
4:02 Back slashes and forward slashes
5:13 Reading file from a path
6:03 Resolving paths
6:41 Useful path properties
8:15 Checking whether path is a file or a directory
8:43 Creating and deleting files and directories
10:43 Reading paths from configuration files
11:13 Pathlib and Pydantic
12:31 Operator overloading in Python
16:34 Outro
#arjancodes #softwaredesign #python
DISCLAIMER - The links in this description might be affiliate links. If you purchase a product or service through one of those links, I may receive a small commission. There is no additional charge to you. Thanks for supporting my channel so I can continue to provide you with free content each week!
💡 Get my FREE 7-step guide to help you consistently design great software: arjancodes.com/designguide.
Functions that accept Path are quite a mess without type annotations, because a lot of users intuitively try to call them with strings. Having a mix of functions that expect Path or str in your code is inevitable due to external libraries. For more public functions I often accept union and run `path = Path(path)` in the very first line. I don't think we'll be ever able to get rid of this assumption that Path and str can be used interchangeably, but that's fine as long as everyone is aware of that. It's almost like a natural language - you can't force it.
As a side note, I remember I once used some serialization library (probably for the damned YAML) that had a different behavior for str (just parse it directly) and Path (read file and parse the content). That could be a part of an interesting design for some newly growing ecosystem, but it's not at least confusing for Python. I would consider it a bad design at this point.
Nice! Thank you for the tip!
Thanks for the tip
Truth be told I sometimes write new functions and almost always assume I am sending a string rather than a path object... I do the same thing.
Again, a great example of properly paced, informative, and reliable tips and insights. I stumbled over several chapters of a textbook and multiple web sources before searching diligently for Argan options on this library. As a retired hobbyist, I try to learn more about intermediate uses, and I'll start here first next time. Hope you keep cranking out even more episodes!
Awesome video Arjan! I personally use `Path(__file__)` a lot to build paths relative to the script that's calling it.
Great tip, thank you!
@@ArjanCodes Same for me. Yet, when working with Jupyter Notebooks, this might result in an error. Therefore I am using the following:
current_dir = Path(__file__).parent if "__file__" in locals() else Path.cwd()
Got any tips for a tempfile? Seems that tempfile kinda belongs here also...
4:40 that’s why I use Pathlib. Makes for beautiful code (especially relative to the alternative)
I have been using pathlib for a bit over a year now and I like it very much. Better security and processing, I have however, noticed it is slightly slower than using os functions, but that it is not enough to make much of a difference unless you are running a looping function doing a lot of pathlib processes, but then again if you are doing that it is probably code you should refactor anyway.
Wow, you made me feel like an expert for a second. Big thanks. The amount of times I learn new things from your videos is really high. Big fan, top-notch content in terms of quality. Today, it must be one of the first times I see a video on something that I discovered 6 months ago. I will still watch it till the end, just to see how you apply it.
As always great content. Learned a lot in just a few minutes. 👏
Thanks so much David, glad the content is helpful! :)
Thank you for this Path tutorial Arjan! 🙏
I especially liked the part about Path support in Pydantic and the part about operator overloading.
My pleasure 😊
FINALLY! Was way overdue. Happy you finally learned about pathlib. Been loving it for a while now.
Better late than never! 😁
Pathlib, i use it a lot, but operator overload was a great idea for my current code! Thanks Arjen
I really needed this. Thanks for taking the time to record it and explain things in an interesting, easy-to-understand way.
Only gripe would be needing a clear screen in between demoing each command; but that's just a pet peeve of mine.
Thank you Kenneth, glad you liked the video!
Thank you Arjan!
Great to see you dive more into the usefulness of defining dunder methods on classes. The Python datamodel is really rich. When I first discovered how to use it It really made me feel like I had more power when using Python.
holy sh... Really i like your videos. So clear the way you explain. Thanks a lot. I didnt know abouth how to use pathlib (im staring with Python) and i was asking how use dounder methods. thanks
Thank you Arjan 🙏
Thanks so much Aashay, glad you liked it!
Great video Arjan !
Thanks Jorge, happy you’re enjoying the content! :)
Awesome video Arjan! You are a very good source for "voortschrijdend inzicht" :)
Thank you, glad you liked the video!
Wow Arjan, that's a lovely holiday picture.
Thanks so much Harald, glad you liked it!
Great video ☺️, but I am surprised you didn't mention the constructor magic. Path would return an instance of different class based on the operating system. (I remember that in one of your code roasts you mentioned that you don't recommend doing something like this)
I use Path from Python 3.6 and I didn't know about read_text and write_text! Nice! Thank you Arjan! I was using `with open(file, 'r') as f: lines = f.readlines()` ...
Pandas query magic? Think I’ve forgotten that issue.
Great video. Thank you!
One of my favorites is Path.parents which returns a parents object (I think it's ultimately a generator) with all of the parent parts.
Hi Arjan. Thank you for your awesome content!
What about a video about using Python "stub" files (.pyi)?
Gonna be a long cold day in hell before I actually divide an object by a string by my own hands.
Nice topic. touch is great for creating empty files. write_text will create the file if it doesn't already exist.
Hi Arjan, again enjoyed your video. Super clear. Thanks. 👍
To be honest, until now I've always been working with the functions from 'import os.path' (and os.getcwd() etc.). But I somehow never took the step to switch to pathlib. From now on I am going to use pathlib. ☺️
Question: why did you not mention that with strings you are not totally on your own (as the os.path functions exist for a long time)?
Remark/suggestion: nice that you touch operator overloading. But I think that topic deserved a video on its own. Considered doing that?
I honestly should probalby use Pathlib more than i do. I don't do a ton on the FS so i get pretty lazy with it, but it Pathlibs ability to make dynamic file paths is so much better than f-strings.
Hi Arjan, your videos are great. Thanks for giving your time to help people to program better. I just have one question, Can we avoid the usage of filesystem functions of "os" module just with "pathlib"? I mean, you use chdir and I'm not shure if pathlib give us functionality like that. What are the cases in which we have to use the os module.
I think you can use ___post_init___ to explicitly convert Settings.path to Path and thus make Path work with dataclasses, can't you?
as always, cool.
I always thought the slash operator was goofy. I felt like Python idiomacy at the time would have called for using the plus operator for adding paths.
good thing they didn’t think the same
@@cristian-bull I'd say most people feel the same way. I think other languages and frameworks use a similar syntax as well when working with paths.
But if you got a str instead of a path, + would do the wrong thing while / would throw an error, which is way more important. Also when in the filesystem mindset, this_dir/subfolder/file is more readable than this_dir + subfolder + file.
/ not making sense for strings is a feature.
Wow! 1.3k in an hour, I remember when this was just 100 in the first hour. Arjan getting that 🎂 must be the Dutch Brötchen and Gouda Kaas
Gouda cheese has a very strong gravitational effect. I don't have proof yet, but I expect that black holes taste a lot like really old Gouda cheese.
I was surprised that there isn't a standard cross-platform module for verifying that a file or path name consists of valid characters for that platform. Most articles suggest using os.path.is_file() or pathlib equivalent, or trying to open the file, but I want to check that a user has entered a valid filename before accessing the file system.
The only thing I found was the pathvalidate module and I needed to wrap that to do what I wanted with pathlib.Path() across both Windows and Linux.
Path is great for file objects that already exist. For installing or making files, I have a lot more fun building it from scratch. It helps me track what exactly is going on
Thanks
Your Welcome Mrityunjay. :)
this is a real good stuff
Thank you Rodolfo, glad you liked the video!
I made it a habit to always use pathlib after I used it the first time. Too many times a fast hack became the core of something bigger.
thank you
Glad you enjoyed it! :)
The actual "mess" is caused by Windows and POSIX being two different path specifications. It is almost like left-hand driving cars and right-hand driving cars. And when you as a driver are used to one specification, switching as a driver to the other one is prone to cause mistakes...
Ha! It's a very well designed module, indeed!
Do they have permissions support? Or how do you handle permissions between systems? It would be interesting if you were making some kind of file scan for security and wanted to know you got permissions because you were a part of a group or you were the owner. I guess I could see if you could do a path.permissions(‘file.txt’) and have it return you a tuple containing a string for windows or posix and a dictionary of the permissions.
Many people still don't know about pathlib and struggle with os.path.sep. Always a mess to read.
You don't need Path.cwd in your cases. Just stay relative and let the OS handle finding the files. Also touch() and then write_text() is not a great example because write_text creates the file anyway.
looking at the way you use path makes me wonder how Path sets the current working directory in projects and whether or not it offers a way to globally set that value. It would be useful for setting relative paths in the cloud.
Thank you for this video! I think pathlib is great and I use it as much as I can!
Some suggestions to improve how you overload operators:
- it would be nice to make sure that the "other" object has a valid type (we want floats for true duv and another vevtor for add)
- if the Vector class was more generic, we should also make sure that what we're doing would still work for any of it's subclass. One thought about that: instead or creating a new Vector object, it's better to create a new instance of the current class (we can use self.__class__ I think).
Thank you :) very helpful video. Python operator overloading is wonderful until you run into logical `and` and `or`. Then it just makes me sad. I was tinkering away on building a really nice little domain specific language and then found out the whole idea cant work because you cant overload those operations. It seems like a niche problem, but it has a big impact on pandas too... that's why we have to use `|` and `&` instead of `or` and `and`
How do you open the termintal in a tab?
pandas query has something bad to it? Since it uses something like a eval statement i think it can become a problem if you take user input inside it ... But for a data analysis workflow it seems quite nice, since we often need quick and dirty code to answer one-off questions and throw away.
There is other problems to it?
I have not quite understood how to deal with path value in config file. I have to use pydantic class instead of dataclass? but can pydantic work the same way with dataclass and how? Thx.
I've been using Pathlib ever since it becomes generally available. I think in Python 3.6? Or was it 3.7?
Using Path.glob() and .rglob() reduces greatly the needs to recursively do os.walk()
Also, the .expanduser() method is just AMAZING... "~" will be converted to the right thing depending on OS (/home/user in *nix, or "C:\Users\user" in Windows)
Oh, hello from Indonesia! 😄
what's the pandas query story?
13:30 At least in some languages like Julia, this is called "promotion", where mixed types are "promoted" to the most general type if there is an exact conversion: since for any integer float(n: int)==n, mixed use of integers and floats promotes everything to floats.
Edit: in hindsight, that might not be quite correct, since promotion implies that the values are converted on the spot, while python can keep carrying an int until the cows come home. Probably because everything is an object, so there's no benefit to the change.
9:20 If I need to perform json.dump(data, file), how it can be done?
I used file.write_text(json.dumps(data))
We must follow the PATHlib
Looks like you removed VIM extension, any reason why?
Where is the difference between pathlib and os? It seems that many features are implemented in both. When use what lib and why?
I've used pathlib here and there but somehow never knew about the / operator.
The intial demo stuff reallly belonged in a Jupyter notebook I think
Okay, so 6:36 confused me, because what if you have 2 files with identical names, in different directories? How would Python know which one to use if you use a path.resolve( )?
for example, what if in both "C:\Users
ed\Desktop\data" and in "C:\Users
ed\Desktop\science" we had a file called a.txt,
so in the program do this:
path=Path("a.txt")
print(f"{path.resolve()}")
how will Python know whether it wants to select the a.txt from data or science?
Great video! A small correction: a vector always has two pairs of x and y coordinates (x1, y1) (x2, y2). The example in the videos was still a point
True. I have a background in graphics where vectors are mostly used to represent orientations. The origin is then set to (0,0) and left out of the data structure altogether for efficiency reasons.
I've come to use
`path = Path("foo", "bar", "qux")`
rather than
`path = Path("foo") / "bar" / "qux"`
It is a bit more readable and behaves better with black formatting.
It's so Pythonic. You have a set of functions specialized to deal with paths and as a side node you can create text files and write to them.
It would be awesome to see a good video on pydantic vs dataclasses and real life cases when to use what etc.
Hey, Arjan! I love your videos-very informative! I would like to ask: Would it be possible to split your screen vertically, so that the console appears on the right (perhaps smaller) and the code on the left? The switch between code and console is the only thing that slightly irks me (namely the visual jump + not having the code in sight when it is executed). Otherwise I find your style very slick and clear : )
How does the / operator work with Windows style paths? Does it differ from joinpath()?
Edit: Never mind, they work the same (except that joinpath can take more arguments).
You can use the Path constructor with multiple arguments too: Path("usr", "bin")
I've found that even on windows you can just write your paths in posix style and python will convert them for you, as it should be.
What's wrong with pandas query?
This is like asking: _What's wrong with going to MacDonalds to get milk, sugar, salt, and pepper instead of Tesco?_ *pathlib* is a tiny package focussed on navigating paths/files. Whereas *pandas* has a completely different purpose.
Why pathlib over os?
why doesnt "bin" / "python" cause a TypeError: unsupported operand type(s) for /: 'str' and 'str' ?
This is because the expression is evaluated from left to right. So the Path object combined with "bin" results in a new Path object, which is then in turn combined with "python" to create another new Path object.
@@ArjanCodes so the Path object has an operator overload dunder method for the division operator? could have used that example in the video if that's really how it works... but thats kinda cool I guess
The operator overload part of this video does kinda seem unrelated and tacked-on if viewers dont make this connection, and the example and the explanation for it is really far apart (almost 10 minutes!)
@@danielschmider5069 ...he shows exactly that in the video?
@@manuelstausberg8923 uhm, no? he doesnt show how pathlib does this at all, he makes some random example with vector division which seems completely out of place in this video.
@@danielschmider5069 Ah I see, I misread your previous comment. I personally think the given example is fine, but yes it is a bit disjoint from where overloading first appears in the video (then again, it would not make sense to break up the part about Path objects to explain overloading)...
I shall henceforth make nonchalant use of 'grandparent path' as if everyone else around me is ignorant for not knowing about it.
If you really want to go all the way, the next level is aunt/uncle and niece/nephew paths and regularly using that to tell people where to move files to.
It's time to let os.path go. Long live to pathlib!
It hasn't behaved well when combined with Pandas
So with pathlib you can construct a Path object by using a DIVISON operator on STRINGS? Mind blown.
The GitHub link is not active pal. Update it please.
Oops - done!
Haha. I follow you on GitHub but we need to look out for each other on RUclips. 😂
Also, cloudpathlib
thank me later >:D
People should know that pathlib is very slow (comparitively). Some packages removed use of it to significantly speed up their libraries.
Still a fantastic library that makes paths MUCH easier to work in. I use it all the time.
Anyone here use shutil?
just for removing non empty directories, which can´t be done with pathlib
@@cristian-bull I find it useful for moving files.
Grandparent haha
Using "/" for building Path objects is neat, but feels dirty. Yes, it's reasonably intuitive, but it's definitely a perversion of the general intent of the division operator. Oh well...
Progressive insight, yes! I used to think the best way to git was to pull-merge, but NOW I think pull-rebase is even better. I didn't really understand rebase... But once I did, I had that progressive insight and was forced to commit and push to my new understanding to behavior origin. Puns intended.
1. PurePath
2. PurePosixPath
3. PureWindowsPath
4. PosixPath
5. WindowsPath
6. Path
The additional complexity introduced by pathlib isn't rewarded with sufficient additional functionality.
Now a path can be an object or a string? Not worth it.
Paths are not datetime level problems where an entire class is required. There are no weird timezone / dst edge cases. The only thing that has any complexity is Windows / Unix paths, solved by os.path functions.
The only useful function I've encountered in pathlib is the resolve() function.
Path.mkdir is quite useful because it can create the missing parent folders for me, and ignore the creation if the folder already exists
@@pacersgo os.makedirs(path, exist_ok=True)
complexity? lmao
you can start using the library for real stuff after like 12 minutes. You don't even need all that, skip 1 to 5, 6 is already useful enough.
A Path is never a string. What are you talking about? It can be converted into a string.
You can ignore all the subclasses. Just work with Path and compare them by isinstance(x, Path). You probably don't see the advantages of pathlib.
@@sausix That's my point. Now a path can be two things and isinstance is required. There's no advantages because there's no additonal functionality over os.path.
It's a different syntax and I can see the ergonomic appeal, but ergonomics goes out of the window if you need isinstance to determine whether it's a Path or a string.
Cool feature about Path.parent.parent is that it can be accessed as Path.parents[1]