diff --git a/404.html b/404.html index 65f92e8..d68a870 100644 --- a/404.html +++ b/404.html @@ -158,6 +158,59 @@ + + +
@@ -170,8 +223,10 @@
+ -
@@ -317,7 +449,7 @@
- + diff --git a/discussion/exploration.html b/discussion/exploration.html new file mode 100644 index 0000000..0472cb8 --- /dev/null +++ b/discussion/exploration.html @@ -0,0 +1,548 @@ + + + + + + + + + + + + + + + + + Data Exploration Report - The Street Group Technical Test + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + +
+
+
+ + + +
+
+
+ + +
+
+
+ + +
+
+ + + + + + + +

Data Exploration Report

+

A brief exploration was done on the full dataset using the module pandas-profiling. The module uses pandas to load a dataset and automatically produce quantile/descriptive statistics, common values, extreme values, skew, kurtosis etc.

+

The script used to generate this report is located in ./exploration/report.py.

+

The report can be viewed by clicking the Data Exploration Report tab at the top of the page.

+ + + + + + + +
+
+
+ +
+ + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/discussion/introduction.html b/discussion/introduction.html new file mode 100644 index 0000000..26e140a --- /dev/null +++ b/discussion/introduction.html @@ -0,0 +1,553 @@ + + + + + + + + + + + + + + + + + Introduction - The Street Group Technical Test + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + +
+
+
+ + + +
+
+
+ + +
+
+
+ + +
+
+ + + + + + + +

Introduction

+

This section will go through some discussion of the test including:

+
    +
  • Data exploration
  • +
  • Cleaning the data
  • +
  • Interpreting the results
  • +
  • Deploying on GCP DataFlow
  • +
  • Improvements
  • +
+ + + + + + + +
+
+
+ +
+ + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/documentation/installation.html b/documentation/installation.html index a5b79e5..66139f9 100644 --- a/documentation/installation.html +++ b/documentation/installation.html @@ -163,6 +163,61 @@ + + +
@@ -175,8 +230,10 @@ @@ -383,7 +517,7 @@
pip install poetry
 

From the root of the repo install the dependencies with:

-
poetry install --nodev
+
poetry install --no-dev
 
@@ -457,7 +591,7 @@
- + diff --git a/documentation/usage.html b/documentation/usage.html index 0af3c3b..d25b4ad 100644 --- a/documentation/usage.html +++ b/documentation/usage.html @@ -163,6 +163,61 @@ + + +
@@ -175,8 +230,10 @@ @@ -436,7 +570,7 @@ optional arguments: --output OUTPUT Full path to the output file without extension.

The default value for input is ./data/input/pp-2020.csv and the default value for output is ./data/output/pp-2020.

-

If passing in values for input/output these should be full paths to the files. The test will parse these inputs as a str() and pass this to beam.io.ReadFromText().

+

If passing in values for input/output these should be full paths to the files. The test will parse these inputs as a str() and pass this to beam.io.ReadFromText().

Run the pipeline

To run the pipeline and complete the task run:

analyse-properties --runner DirectRunner
@@ -476,6 +610,21 @@ optional arguments:
         
       
       
+        
+        
+