Chapter 3: Build a QuantGov Corpus#


Within the QuantGov framework, a corpus refers to a set of documents that are related to each other and will be analyzed together. Typically, a corpus will also include a driver. A driver’s job is to index the documents and direct any code to each document one by one. The following sections will walk through the steps needed to create a corpus. The example will use a pre-made corpus build into the QuantGov library. After walking through the example, it should be easy to transition to making a corpus of your own.

Build your First Corpus#

The fastest way to start a new corpus is to open up a command prompt with the Windows + R keys and the “cmd” command. After this, navigate to the location that you would like to build the corpus by using navigation commands.

  • cd dir1 changes current directory to “dir1”.

  • cd / changes current directory to root directory, I.e. top directory.

  • cd \ won’t work as “” is escape character.

  • cd .. changes current directory to one level up.

  • cd ~ changes current directory to user’s home, same as “cd” with no arguments.

  • cd - changes current directory to the directory that you were last in.

Once in the correct location, use the command quantgov start corpus NAME where NAME is whatever you want the corpus to be named. This command will copy the skeleton corpus from https://github.com/quantgov/corpus and put it in your current directory location. The following sections will walk through the different components of the newly downloaded corpus.