I am in the middle of a project for work right now requiring me to grab a list of files, identifying the parent directories, file names, and other information. I then use the path and an Amazon module to upload the files to an S3 bucket. I have used the os module in the past to deal with files, which is what I did this time.

After perusing an article on the Real Python site about pathlib, I wanted to try it out to see if it would produce the information I was getting from os. Here’s what I came up with:

# import pathlib instead of using os
import pathlib
# creates a Windows path object from a path on my Windows machine
dir = pathlib.Path("C:\\Users\\miles\\OneDrive\\Desktop\\JavaScript")
# creates a generator object that is iterable
result = list(dir.rglob("*"))
# iterates through a list of path objects
for r in result:
    # .stat() provides information about the file, such as time last 
    # accessed and the total size of the file
    print(r.stat())
    # this gives True or False depending on if is a file
    print(r.is_file())
    # same as is_file() but is True if a directory
    print(r.is_dir())
    # printing r seems to give a string representing the path
    print(r)

Here is a snippet of my original code, which uses os.walk:

# function to create the directory list
def create_dir_list():
    # provide directory as a string, just as above
    rootdir = 'C:\\Users\\Miles\\Desktop\\documentportal\\Builders'
    dirss = []
    # iterate through the generator object provided by os.walk
    # os.walk creates a generator object with a tuple for each directory
    for subdir, dirs, files in os.walk(rootdir):
        for dir in dirs:
            # os.walk only provides us the subdirectory name, so we have to join
            dir = os.path.join(dir, subdir)
            # we also have to do some string manipulation on the directory name
            dir = dir.replace("C:\\Users\\Miles\\Desktop\\documentportal\\", "").replace("\\", "/")

Along with the disadvantages of os.walk listed in the comments above, I also have to identify folders versus files and label them. You can tell that .is_file() and .is_dir() would have been very useful for this. Also, pathlib has a long list of functions that give information about my files and directories, which may be useful in another context.

If I could do it all over again, I would use pathlib!