Have a mess of files to read into Python? Maybe you downloaded Kaiko trade data, with unpredictable sub-directories and file names, from Penn+Box. Or maybe you’ve dropped TXT, PDF, and PY files into a single working directory that you’d rather not reorganize. A simple script will find the files you need, listing their names and paths for easy processing.
Because this process involves exploring our operating system’s file structure, we begin by importing the
os module into our Python environment:
This module, on top of a standard Python installation, should address any dependencies in our upcoming file-listing code.
Let’s define our file-listing function. We can name it, unimaginatively,
list_files and give it two arguments,
def list_files(filepath, filetype):
filepath will tell the function where to start looking for files. This argument will take a file path string in your operating system’s format. (Be sure to encode or escape characters as appropriate.) When the function runs, it will assume this base directory contains all of the files and/or subfolders we need it to check.
filetype will tell the function what kind of file to find. This argument will take a file extension in string format (e.g.:
Within our function, we’ll need to store any relevant file paths our script finds. Let’s create an empty list for this purpose:
paths = 
Practically speaking, our function will find each file within
filepath, check whether its file extension matches a given
filetype, and add relevant results to
paths. We begin this iterative process with a
for loop to find and examine each file:
for root, dirs, files in os.walk(filepath):
In this configuration,
os.walk() finds each file and path in
filepath and generates a 3-tuple (a type of 3-item list) with components we will refer to as
files lists all file names within a path, our function will iterate through each individual file name. Iterating again involves another
for file in files:
file-level loop, our function can examine various aspects of each file. You may want to customize this section if your application has other requirements. For now, we’ll focus on checking files for a matching file extension.
Because comparing strings is case-sensitive while file extensions are not, we use the
lower() method to convert both
filetype to lower-case strings (
filetype.lower(), respectively). This avoids confusion due to mismatched capitalization.
In turn, the
endswith() method will compare the end of our lower case
file (where the file extension lives) to the lower case
True for a match or
We include our Boolean (
False) result in an
if statement so that only a matching file type (
True outcome) triggers the next stage of our function.
If the file’s extension matches, we want to add
file and its location to
paths, our list of relevant file paths.
os.path.join() will combine the root file path and file name to construct a complete address our operating system can reference. The
append() method will add this complete file address to our list of paths:
Our sets of loops will iterate through our folders and files, dutifully developing our
paths list. In order to make this list available outside of our function, we need one final line:
Altogether, our code should read as follows:
import os def list_files(filepath, filetype): paths =  for root, dirs, files in os.walk(filepath): for file in files: if file.lower().endswith(filetype.lower()): paths.append(os.path.join(root, file)) return(paths)
list_files function—after you’ve run the above—and saving the resulting file locations list as an object might look something like this:
my_files_list = list_files(' C:\\Users\\Public\\Downloads', '.csv')
Now that your code can find files it needs, you can focus on merging data, analyzing text, or conducting whatever research you imagine.