mirror of
https://github.com/dtomlinson91/street_group_tech_test
synced 2025-12-22 03:55:43 +00:00
updating dataflow notes
This commit is contained in:
@@ -18,18 +18,19 @@ Export env variable:
|
|||||||
|
|
||||||
### Dataflow
|
### Dataflow
|
||||||
|
|
||||||
#### Monthly dataset
|
#### Yearly dataset
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python -m analyse_properties.main \
|
python -m analyse_properties.main \
|
||||||
--region europe-west1 \
|
--region europe-west1 \
|
||||||
--input gs://street-group-technical-test-dmot-euw1/input/pp-monthly-update-new-version.csv \
|
--input gs://street-group-technical-test-dmot-euw1/input/pp-2020.csv \
|
||||||
--output gs://street-group-technical-test-dmot-euw1/output/pp-monthly-update-new-version \
|
--output gs://street-group-technical-test-dmot-euw1/output/pp-2020 \
|
||||||
--runner DataflowRunner \
|
--runner DataflowRunner \
|
||||||
--project street-group \
|
--project street-group \
|
||||||
--temp_location gs://street-group-technical-test-dmot-euw1/tmp \
|
--temp_location gs://street-group-technical-test-dmot-euw1/tmp \
|
||||||
--subnetwork=https://www.googleapis.com/compute/v1/projects/street-group/regions/europe-west1/subnetworks/europe-west-1-dataflow \
|
--subnetwork=https://www.googleapis.com/compute/v1/projects/street-group/regions/europe-west1/subnetworks/europe-west-1-dataflow \
|
||||||
--no_use_public_ips
|
--no_use_public_ips \
|
||||||
|
--worker_machine_type=n1-highmem-2
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Full dataset
|
#### Full dataset
|
||||||
@@ -44,13 +45,11 @@ python -m analyse_properties.main \
|
|||||||
--temp_location gs://street-group-technical-test-dmot-euw1/tmp \
|
--temp_location gs://street-group-technical-test-dmot-euw1/tmp \
|
||||||
--subnetwork=https://www.googleapis.com/compute/v1/projects/street-group/regions/europe-west1/subnetworks/europe-west-1-dataflow \
|
--subnetwork=https://www.googleapis.com/compute/v1/projects/street-group/regions/europe-west1/subnetworks/europe-west-1-dataflow \
|
||||||
--no_use_public_ips \
|
--no_use_public_ips \
|
||||||
--worker_machine_type=n1-highmem-8
|
--worker_machine_type=n1-highmem-8 \
|
||||||
|
--num_workers=3 \
|
||||||
|
--autoscaling_algorithm=NONE
|
||||||
```
|
```
|
||||||
|
|
||||||
—-disk_size_gb=50 \
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Locally
|
### Locally
|
||||||
|
|
||||||
Run the pipeline locally:
|
Run the pipeline locally:
|
||||||
@@ -83,3 +82,7 @@ Use `--experiments=enable_prime`
|
|||||||
Deploying a pipeline (with scaling options): <https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline>
|
Deploying a pipeline (with scaling options): <https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline>
|
||||||
|
|
||||||
Available VM types (with pricing): <https://cloud.google.com/compute/vm-instance-pricing#n1_predefined>
|
Available VM types (with pricing): <https://cloud.google.com/compute/vm-instance-pricing#n1_predefined>
|
||||||
|
|
||||||
|
Performance
|
||||||
|
|
||||||
|
Sideinput performance: <https://stackoverflow.com/questions/48242320/google-dataflow-apache-beam-python-side-input-from-pcollection-kills-perform>
|
||||||
|
|||||||
Reference in New Issue
Block a user