5.1: Estimator Creation#

An estimator is another term used to describe a machine learning algorithm or model. Generally, these words can be used interchangeably. To start a new quantgov estimator, use the following command quantgov start estimator myname where myname is the name you want to give to the estimator. This will copy the estimator skeleton to the folder with that name in a similar manner to the start corpus command.

For the practice estimator, lets specifically make this command the following:

quantgov start estimator federal_register_estimator

This command will populate a folder in the given directory that contains a few files.

Data Folder#

The data folder contains only a .gitignore file when downloaded (we will discuss .gitignore later), but is otherwise empty. This is as intended as the data folder is the place where the actual output from the analyses will go.

Scripts Folder#

The script folder contains three Python scripts: vectorize_trainers.py, create_labels.py, and candidate_models.py. These scripts can be customized and will be used to help create the the estimator.

.gitignore File#

Here is the official documentation on .gitignore files. The .gitignore file contains individual lines of file names that Git should ignore when processing a repository. If you are not using Git, GitHub, or versioning, you can delete and ignore the .gitignore files with no harm to any future QuantGov library actions. If you are using any of the aforementioned items, .gitignore files are extremely useful and the official documentation should be read in full.

README File#

It is standard that repositories and other data related projects have a README file. This file typically provides information on about the repository or downloaded files. This file is never used in any code that is run and is simply informational.

Requirements File#

A requirements file or similar document is also standard in code repositories and data projects. This file help programmers set up an environment to run code as it informs the programmer which libraries are required to run the code. This file is never used in any code that is run and is simply informational.