4.2: Try NLP on the Practice Corpus#

To analyze the 2016 Federal Register corpus, navigate in your command prompt to the directory above the corpus (the same one where you ran the quantgov start command) and run the following:

quantgov nlp count_words federal_register

You should see a whole lot of lines whiz through your terminal that are describing the amount of words in each document. By default, the QuantGov library outputs to the terminal, but let’s override this default. Now run the following:

quantgov nlp count_words federal_register -o wordcount.csv

After a few seconds, a file named wordcount.csv should appear with the columns “section” and “docno” (as described in our index in section 3.4) and “words” - which contains the number of words in the given document.

The format of the above command is the same for each NLP command in the QuantGov library. This format is a call to the proper Python library quantgov, the type of analysis nlp, the specific analysis count_words, the location of the corpus federal_register, and the name of the output file -o wordcount.csv.

Tip

Here are some examples of other NLP commands:
quantgov nlp sentiment_analysis folder1/foder2/corpus_folder -o sentiment.csv
quantgov nlp count_occurrences federal_register shall must -o counted_words.csv