updating documentation

This commit is contained in:
2021-09-28 00:29:41 +01:00
parent c481c1a976
commit 8a0d8085a2
7 changed files with 28 additions and 23 deletions

View File

@@ -27,9 +27,6 @@ To get around public IP quotas I created a VPC in the `europe-west1` region that
Assuming the `pp-2020.csv` file has been placed in the `./input` directory in the bucket you can run a command similar to:
!!! caution
Use the command `python -m analyse_properties.main` as the entrypoint to the pipeline and not `analyse-properties` as the module isn't installed with poetry on the workers with the configuration below.
```bash
python -m analyse_properties.main \
--runner DataflowRunner \

View File

@@ -55,7 +55,7 @@ A possible solution would be to leverage BigQuery to store the results of the ma
In addition to creating the mapping table `(key, value)` pairs, we also save these pairs to BigQuery at this stage. We then yield the element as it is currently written to allow the subsequent stages to make use of this data.
Remove the condense mapping table stage as it is no longer needed.
Remove the condense mapping table stage as it is no longer needed (which also saves a bit of time).
Instead of using: