A brief exploration was done on the full dataset using the module pandas-profiling. The module uses pandas to load a dataset and automatically produce quantile/descriptive statistics, common values, extreme values, skew, kurtosis etc.
+
The script used to generate this report is located in ./exploration/report.py.
+
The report can be viewed by clicking the Data Exploration Report tab at the top of the page.
@@ -436,7 +570,7 @@ optional arguments:
--output OUTPUT Full path to the output file without extension.
The default value for input is ./data/input/pp-2020.csv and the default value for output is ./data/output/pp-2020.
-
If passing in values for input/output these should be full paths to the files. The test will parse these inputs as a str() and pass this to beam.io.ReadFromText().
+
If passing in values for input/output these should be full paths to the files. The test will parse these inputs as a str() and pass this to beam.io.ReadFromText().