updating documentation

2026-02-06 16:05:43 +00:00 · 2021-09-28 00:29:41 +01:00
parent c481c1a976
commit 8a0d8085a2
7 changed files with 28 additions and 23 deletions
--- a/docs/documentation/installation.md
+++ b/docs/documentation/installation.md
@@ -6,21 +6,26 @@ The task has been tested on MacOS Big Sur and WSL2. The task should run on Windo

 For Beam 2.32.0 the supported versions of the Python SDK can be found [here](https://cloud.google.com/dataflow/docs/concepts/sdk-worker-dependencies#sdk-for-python).

-## Poetry
+## Pip

-The test uses [Poetry](https://python-poetry.org) for dependency management.
-
-!!! info inline end
-    If you already have Poetry installed globally you can go straight to the `poetry install` step.
-
-In a virtual environment install poetry:
+In a virtual environment run from the root of the repo:

 ```bash
-pip install poetry
+pip install -r requirements.txt
 ```

+## Poetry (Alternative)
+
+Install [Poetry](https://python-poetry.org) *globally*
+
 From the root of the repo install the dependencies with:

 ```bash
 poetry install --no-dev
 ```
+
+Activate the shell with:
+
+```bash
+poetry shell
+```
--- a/docs/documentation/usage.md
+++ b/docs/documentation/usage.md
@@ -2,7 +2,7 @@

 This page documents how to run the pipeline locally to complete the task for the [dataset for 2020](https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads#section-1).

-The pipeline also runs in GCP using DataFlow and is discussed further on but can be viewed here. We also discuss how to adapt the pipeline so it can run against [the full dataset](https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads#single-file).
+The pipeline also runs in GCP using DataFlow and is discussed further on but can be viewed [here](../dataflow/index.md). We also discuss how to adapt the pipeline so it can run against [the full dataset](https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads#single-file).

 ## Download dataset

@@ -20,20 +20,20 @@ to download the data for 2020 and place in the input directory above.

 ## Entrypoint

-The entrypoint to the pipeline is `analyse-properties`.
+The entrypoint to the pipeline is `analyse_properties.main`.

 ## Available options

 Running

 ```bash
-analyse-properties --help
+python -m analyse_properties.main --help
 ```

 gives the following output:

 ```bash
-usage: analyse-properties [-h] [--input INPUT] [--output OUTPUT]
+usage: analyse_properties.main [-h] [--input INPUT] [--output OUTPUT]

 optional arguments:
  -h, --help       show this help message and exit
@@ -43,14 +43,17 @@ optional arguments:

 The default value for input is `./data/input/pp-2020.csv` and the default value for output is `./data/output/pp-2020`.

-If passing in values for `input`/`output` these should be **full** paths to the files. The test will parse these inputs as a `str()` and pass this to `#!python beam.io.ReadFromText()`.
-
 ## Run the pipeline

 To run the pipeline and complete the task run:

 ```bash
-analyse-properties --runner DirectRunner
+python -m analyse_properties.main \
+--runner DirectRunner \
+--input ./data/input/pp-2020.csv \
+--output ./data/output/pp-2020
 ```

+from the root of the repo.
+
 The pipeline will use the 2020 dataset located in `./data/input` and output the resulting `.json` to `./data/output`.