updating documentation

2026-02-06 16:05:43 +00:00 · 2021-09-28 00:29:41 +01:00
parent c481c1a976
commit 8a0d8085a2
7 changed files with 28 additions and 23 deletions
--- a/docs/dataflow/index.md
+++ b/docs/dataflow/index.md
@@ -27,9 +27,6 @@ To get around public IP quotas I created a VPC in the `europe-west1` region that

 Assuming the `pp-2020.csv` file has been placed in the `./input` directory in the bucket you can run a command similar to:

-!!! caution
-    Use the command `python -m analyse_properties.main` as the entrypoint to the pipeline and not `analyse-properties` as the module isn't installed with poetry on the workers with the configuration below.
-
 ```bash
 python -m analyse_properties.main \
    --runner DataflowRunner \
--- a/docs/dataflow/scaling.md
+++ b/docs/dataflow/scaling.md
@@ -55,7 +55,7 @@ A possible solution would be to leverage BigQuery to store the results of the ma

 In addition to creating the mapping table `(key, value)` pairs, we also save these pairs to BigQuery at this stage. We then yield the element as it is currently written to allow the subsequent stages to make use of this data.

-Remove the condense mapping table stage as it is no longer needed.
+Remove the condense mapping table stage as it is no longer needed (which also saves a bit of time).

 Instead of using: