This page documents how to run the pipeline locally to complete the task for the dataset for 2020.
+
The pipeline also runs in GCP using DataFlow and is discussed further on but can be viewed here. We also discuss how to adapt the pipeline so it can run against the full dataset.
usage: analyse-properties [-h][--input INPUT][--output OUTPUT]
+
+optional arguments:
+ -h, --help show this help message and exit
+ --input INPUT Full path to the input file.
+ --output OUTPUT Full path to the output file without extension.
+
+
The default value for input is ./data/input/pp-2020.csv and the default value for output is ./data/output/pp-2020.
+
If passing in values for input/output these should be full paths to the files. The test will parse these inputs as a str() and pass this to beam.io.ReadFromText().
This documentation accompanies the technical test for the Street Group.
+
The following pages will guide the user through installing the requirements, and running the task to complete the test. In addition, there is some discussion around the approach, and any improvements that could be made.
+
Navigate the pages in order by using the section links in the left menu, or by using bar at the bottom of the page. The table of contents in the right menu can be used to navigate sections on each page.
+
+
Note
+
All paths in this documentation, e.g ./analyse_properties/data/output refer to the location of the directory/file from the root of the repo.