diff --git a/notes/documentation/dataflow.md b/notes/documentation/dataflow.md index d0a82ea..fd483e4 100644 --- a/notes/documentation/dataflow.md +++ b/notes/documentation/dataflow.md @@ -2,23 +2,65 @@ +## Examples + +Full example of beam pipeline on dataflow: + + + +## Setup + Export env variable: `export GOOGLE_APPLICATION_CREDENTIALS="/home/dtomlinson/git-repos/work/street_group/street_group_tech_test/street-group-0c490d23a9d0.json"` -Run the pipeline: +## Run pipeline +### Dataflow + +#### Monthly dataset + +```bash python -m analyse_properties.main \ - --region europe-west2 \ - --input gs://street-group-technical-test-dmot/input/pp-monthly-update-new-version.csv \ - --output gs://street-group-technical-test-dmot/output/pp-monthly-update-new-version \ + --region europe-west1 \ + --input gs://street-group-technical-test-dmot-euw1/input/pp-monthly-update-new-version.csv \ + --output gs://street-group-technical-test-dmot-euw1/output/pp-monthly-update-new-version \ --runner DataflowRunner \ --project street-group \ - --temp_location gs://street-group-technical-test-dmot/tmp + --temp_location gs://street-group-technical-test-dmot-euw1/tmp \ + --subnetwork=https://www.googleapis.com/compute/v1/projects/street-group/regions/europe-west1/subnetworks/europe-west-1-dataflow \ + --no_use_public_ips +``` +#### Full dataset + +```bash +python -m analyse_properties.main \ + --region europe-west1 \ + --input gs://street-group-technical-test-dmot-euw1/input/pp-complete.csv \ + --output gs://street-group-technical-test-dmot-euw1/output/pp-complete \ + --runner DataflowRunner \ + --project street-group \ + --temp_location gs://street-group-technical-test-dmot-euw1/tmp \ + --subnetwork=https://www.googleapis.com/compute/v1/projects/street-group/regions/europe-west1/subnetworks/europe-west-1-dataflow \ + --no_use_public_ips +``` + +### Locally + +Run the pipeline locally: + +`python -m analyse_properties.main --runner DirectRunner` ## Errors Unsubscriptable error on window: + +## Documentation + +Running in its own private VPC without public IPs + +- +-