3.3: Other Corpus Files#

There are a few other files in the downloaded corpus folder. While important, none are as important as the driver.py file. We will cover the rest of these files below.

Data Folder#

The data folder contains only a .gitignore file when downloaded (we will discuss .gitignore later), but is otherwise empty. This is as intended as the data folder is the place where the actual corpus documents will be placed. In the practice session, we will download and place documents in this folder, but for now it should remain empty and unedited.

.gitignore File#

Here is the official documentation on .gitignore files. The .gitignore file contains individual lines of file names that Git should ignore when processing a repository. If you are not using Git, GitHub, or versioning, you can delete and ignore the .gitignore files with no harm to any future QuantGov library actions. If you are using any of the aforementioned items, .gitignore files are extremely useful and the official documentation should be read in full.

README File#

It is standard that repositories and other data related projects have a README file. This file typically provides information on about the repository or downloaded files. This file is never used in any code that is run and is simply informational.

Requirements File#

A requirements file or similar document is also standard in code repositories and data projects. This file help programmers set up an environment to run code as it informs the programmer which libraries are required to run the code. This file is never used in any code that is run and is simply informational.

Snakefile#

The snakefile in the download is also a script included primarily as a resource for advanced programmers. Part of the Snakemake workflow management system, snakefiles allow users to run multiple scripts, analyses, and code tidbits in a specific order - and in an automated process. This file could be useful for an individual looking to create an automated pipeline with the QuantGov library but will not be used by the majority of users.