6 Commits

11 changed files with 158 additions and 107 deletions

View File

@@ -286,20 +286,12 @@ def run(argv=None, save_main_session=True):
"""Entrypoint and definition of the pipeline."""
logging.getLogger().setLevel(logging.INFO)
# Default input/output files
# Default input/output files when ran from base of repo with files in ./data
input_file = (
pathlib.Path(__file__).parents[1]
/ "data"
/ "input"
/ "pp-2020.csv"
# / "pp-complete.csv"
pathlib.Path("./data/input/pp-2020.csv")
)
output_file = (
pathlib.Path(__file__).parents[1]
/ "data"
/ "output"
/ "pp-2020"
# / "pp-complete"
pathlib.Path("./data/output/pp-2020")
)
# Arguments

View File

@@ -27,9 +27,6 @@ To get around public IP quotas I created a VPC in the `europe-west1` region that
Assuming the `pp-2020.csv` file has been placed in the `./input` directory in the bucket you can run a command similar to:
!!! caution
Use the command `python -m analyse_properties.main` as the entrypoint to the pipeline and not `analyse-properties` as the module isn't installed with poetry on the workers with the configuration below.
```bash
python -m analyse_properties.main \
--runner DataflowRunner \
@@ -43,7 +40,7 @@ python -m analyse_properties.main \
--worker_machine_type=n1-highmem-2
```
The output file from this pipeline is publically available and can be downloaded [here](https://storage.googleapis.com/street-group-technical-test-dmot-euw1/output/pp-2020-00000-of-00001.json).
The output file from this pipeline is publicly available and can be downloaded [here](https://storage.googleapis.com/street-group-technical-test-dmot-euw1/output/pp-2020-00000-of-00001.json).
The job graph for this pipeline is displayed below:

View File

@@ -55,7 +55,7 @@ A possible solution would be to leverage BigQuery to store the results of the ma
In addition to creating the mapping table `(key, value)` pairs, we also save these pairs to BigQuery at this stage. We then yield the element as it is currently written to allow the subsequent stages to make use of this data.
Remove the condense mapping table stage as it is no longer needed.
Remove the condense mapping table stage as it is no longer needed (which also saves a bit of time).
Instead of using:

View File

@@ -21,7 +21,7 @@ The mapping table takes each row and creates a `(key,value)` pair with:
- The key being the id across all columns (`id_all_columns`).
- The value being the raw data as an array.
The mapping table is then condensed to a single dictionary with these key, value pairs and is used as a side input further down the pipeline.
The mapping table is then condensed to a single dictionary with these key, value pairs (automatically deduplicating repeated rows) and is used as a side input further down the pipeline.
This mapping table is created to ensure the `GroupByKey` operation is as quick as possible. The more data you have to process in a `GroupByKey`, the longer the operation takes. By doing the `GroupByKey` using just the ids, the pipeline can process the files much quicker than if we included the raw data in this operation.

View File

@@ -64,7 +64,7 @@ We strip all leading/trailing whitespace from each column to enforce consistency
Some of the data is repeated:
- Some rows repeated, with the same date + price + address information but with a unique transaction id.
- Some rows are repeated, with the same date + price + address information but with a unique transaction id.
<details>
<summary>Example (PCollection)</summary>
@@ -87,7 +87,7 @@ Some of the data is repeated:
]
},
{
"fd4634faec47c29de40bbf7840723b41": [
"gd4634faec47c29de40bbf7840723b42": [
"317500",
"2020-11-13 00:00",
"B90 3LA",

View File

@@ -6,21 +6,26 @@ The task has been tested on MacOS Big Sur and WSL2. The task should run on Windo
For Beam 2.32.0 the supported versions of the Python SDK can be found [here](https://cloud.google.com/dataflow/docs/concepts/sdk-worker-dependencies#sdk-for-python).
## Poetry
## Pip
The test uses [Poetry](https://python-poetry.org) for dependency management.
!!! info inline end
If you already have Poetry installed globally you can go straight to the `poetry install` step.
In a virtual environment install poetry:
In a virtual environment run from the root of the repo:
```bash
pip install poetry
pip install -r requirements.txt
```
## Poetry (Alternative)
Install [Poetry](https://python-poetry.org) *globally*
From the root of the repo install the dependencies with:
```bash
poetry install --no-dev
```
Activate the shell with:
```bash
poetry shell
```

View File

@@ -2,7 +2,7 @@
This page documents how to run the pipeline locally to complete the task for the [dataset for 2020](https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads#section-1).
The pipeline also runs in GCP using DataFlow and is discussed further on but can be viewed here. We also discuss how to adapt the pipeline so it can run against [the full dataset](https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads#single-file).
The pipeline also runs in GCP using DataFlow and is discussed further on but can be viewed [here](../dataflow/index.md). We also discuss how to adapt the pipeline so it can run against [the full dataset](https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads#single-file).
## Download dataset
@@ -20,20 +20,20 @@ to download the data for 2020 and place in the input directory above.
## Entrypoint
The entrypoint to the pipeline is `analyse-properties`.
The entrypoint to the pipeline is `analyse_properties.main`.
## Available options
Running
```bash
analyse-properties --help
python -m analyse_properties.main --help
```
gives the following output:
```bash
usage: analyse-properties [-h] [--input INPUT] [--output OUTPUT]
usage: analyse_properties.main [-h] [--input INPUT] [--output OUTPUT]
optional arguments:
-h, --help show this help message and exit
@@ -43,14 +43,17 @@ optional arguments:
The default value for input is `./data/input/pp-2020.csv` and the default value for output is `./data/output/pp-2020`.
If passing in values for `input`/`output` these should be **full** paths to the files. The test will parse these inputs as a `str()` and pass this to `#!python beam.io.ReadFromText()`.
## Run the pipeline
To run the pipeline and complete the task run:
```bash
analyse-properties --runner DirectRunner
python -m analyse_properties.main \
--runner DirectRunner \
--input ./data/input/pp-2020.csv \
--output ./data/output/pp-2020
```
from the root of the repo.
The pipeline will use the 2020 dataset located in `./data/input` and output the resulting `.json` to `./data/output`.

View File

@@ -4,7 +4,7 @@
This documentation accompanies the technical test for the Street Group.
The following pages will guide the user through installing the requirements, and running the task to complete the test. In addition, there is some discussion around the approach, and any improvements that could be made.
The following pages will guide the user through installing the requirements, and running the task to complete the test. In addition, there is some discussion around the approach, and scaling the pipeline.
Navigate sections using the tabs at the top of the page. Pages in this section can be viewed in order by using the section links in the left menu, or by using bar at the bottom of the page. The table of contents in the right menu can be used to navigate sections on each page.

141
poetry.lock generated
View File

@@ -136,7 +136,7 @@ unicode_backport = ["unicodedata2"]
name = "click"
version = "8.0.1"
description = "Composable command line interface toolkit"
category = "main"
category = "dev"
optional = false
python-versions = ">=3.6"
@@ -148,7 +148,7 @@ importlib-metadata = {version = "*", markers = "python_version < \"3.8\""}
name = "colorama"
version = "0.4.4"
description = "Cross-platform colored terminal text."
category = "main"
category = "dev"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
@@ -258,9 +258,9 @@ python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*"
[[package]]
name = "ghp-import"
version = "2.0.1"
version = "2.0.2"
description = "Copy your docs directly to the gh-pages branch."
category = "main"
category = "dev"
optional = false
python-versions = "*"
@@ -268,7 +268,7 @@ python-versions = "*"
python-dateutil = ">=2.8.1"
[package.extras]
dev = ["twine", "markdown", "flake8"]
dev = ["twine", "markdown", "flake8", "wheel"]
[[package]]
name = "google-api-core"
@@ -535,7 +535,7 @@ grpcio = ">=1.0.0,<2.0.0dev"
[[package]]
name = "grpcio"
version = "1.40.0"
version = "1.41.0"
description = "HTTP/2-based RPC framework"
category = "main"
optional = false
@@ -545,7 +545,7 @@ python-versions = "*"
six = ">=1.5.2"
[package.extras]
protobuf = ["grpcio-tools (>=1.40.0)"]
protobuf = ["grpcio-tools (>=1.41.0)"]
[[package]]
name = "grpcio-gcp"
@@ -622,7 +622,7 @@ six = "*"
name = "importlib-metadata"
version = "4.8.1"
description = "Read metadata from Python packages"
category = "main"
category = "dev"
optional = false
python-versions = ">=3.6"
@@ -653,7 +653,7 @@ plugins = ["setuptools"]
name = "jinja2"
version = "3.0.1"
description = "A very fast and expressive template engine."
category = "main"
category = "dev"
optional = false
python-versions = ">=3.6"
@@ -691,7 +691,7 @@ python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*"
name = "markdown"
version = "3.3.4"
description = "Python implementation of Markdown."
category = "main"
category = "dev"
optional = false
python-versions = ">=3.6"
@@ -705,7 +705,7 @@ testing = ["coverage", "pyyaml"]
name = "markupsafe"
version = "2.0.1"
description = "Safely add untrusted strings to HTML/XML markup."
category = "main"
category = "dev"
optional = false
python-versions = ">=3.6"
@@ -737,7 +737,7 @@ python-versions = "*"
name = "mergedeep"
version = "1.3.4"
description = "A deep merge function for 🐍."
category = "main"
category = "dev"
optional = false
python-versions = ">=3.6"
@@ -762,7 +762,7 @@ tests = ["pytest", "pytest-mpl"]
name = "mkdocs"
version = "1.2.2"
description = "Project documentation with Markdown."
category = "main"
category = "dev"
optional = false
python-versions = ">=3.6"
@@ -785,7 +785,7 @@ i18n = ["babel (>=2.9.0)"]
name = "mkdocs-material"
version = "7.3.0"
description = "A Material Design theme for MkDocs"
category = "main"
category = "dev"
optional = false
python-versions = "*"
@@ -800,7 +800,7 @@ pymdown-extensions = ">=7.0"
name = "mkdocs-material-extensions"
version = "1.0.3"
description = "Extension pack for Python Markdown."
category = "main"
category = "dev"
optional = false
python-versions = ">=3.6"
@@ -956,7 +956,7 @@ python-versions = ">=3.6"
[[package]]
name = "platformdirs"
version = "2.3.0"
version = "2.4.0"
description = "A small Python module for determining appropriate platform-specific dirs, e.g. a \"user data dir\"."
category = "dev"
optional = false
@@ -1111,7 +1111,7 @@ python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
name = "pygments"
version = "2.10.0"
description = "Pygments is a syntax highlighting package written in Python."
category = "main"
category = "dev"
optional = false
python-versions = ">=3.5"
@@ -1186,7 +1186,7 @@ pylint = ">=1.7"
name = "pymdown-extensions"
version = "8.2"
description = "Extension pack for Python Markdown."
category = "main"
category = "dev"
optional = false
python-versions = ">=3.6"
@@ -1253,7 +1253,7 @@ numpy = ">=1.13.3"
name = "pyyaml"
version = "5.4.1"
description = "YAML parser and emitter for Python"
category = "main"
category = "dev"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*"
@@ -1261,7 +1261,7 @@ python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*"
name = "pyyaml-env-tag"
version = "0.1"
description = "A custom YAML tag for referencing environment variables in YAML files. "
category = "main"
category = "dev"
optional = false
python-versions = ">=3.6"
@@ -1452,7 +1452,7 @@ type_image_path = ["imagehash", "pillow"]
name = "watchdog"
version = "2.1.5"
description = "Filesystem events monitoring"
category = "main"
category = "dev"
optional = false
python-versions = ">=3.6"
@@ -1471,7 +1471,7 @@ python-versions = "*"
name = "zipp"
version = "3.5.0"
description = "Backport of pathlib-compatible object wrapper for zip files"
category = "main"
category = "dev"
optional = false
python-versions = ">=3.6"
@@ -1482,7 +1482,7 @@ testing = ["pytest (>=4.6)", "pytest-checkdocs (>=2.4)", "pytest-flake8", "pytes
[metadata]
lock-version = "1.1"
python-versions = "^3.7"
content-hash = "c9292b385b6067c194a7e31bc62ea4c04c99b951d3f4fa1b9b8f081ddf270c4c"
content-hash = "c710ab077268b067a2d2e900a7ca426bac3a9d9512d63ef3b517cd0e55477329"
[metadata.files]
apache-beam = [
@@ -1601,7 +1601,8 @@ future = [
{file = "future-0.18.2.tar.gz", hash = "sha256:b1bead90b70cf6ec3f0710ae53a525360fa360d306a86583adc6bf83a4db537d"},
]
ghp-import = [
{file = "ghp-import-2.0.1.tar.gz", hash = "sha256:753de2eace6e0f7d4edfb3cce5e3c3b98cd52aadb80163303d1d036bda7b4483"},
{file = "ghp-import-2.0.2.tar.gz", hash = "sha256:947b3771f11be850c852c64b561c600fdddf794bab363060854c1ee7ad05e071"},
{file = "ghp_import-2.0.2-py3-none-any.whl", hash = "sha256:5f8962b30b20652cdffa9c5a9812f7de6bcb56ec475acac579807719bf242c46"},
]
google-api-core = [
{file = "google-api-core-1.31.2.tar.gz", hash = "sha256:8500aded318fdb235130bf183c726a05a9cb7c4b09c266bd5119b86cdb8a4d10"},
@@ -1716,50 +1717,50 @@ grpc-google-iam-v1 = [
{file = "grpc-google-iam-v1-0.12.3.tar.gz", hash = "sha256:0bfb5b56f648f457021a91c0df0db4934b6e0c300bd0f2de2333383fe958aa72"},
]
grpcio = [
{file = "grpcio-1.40.0-cp35-cp35m-macosx_10_10_intel.whl", hash = "sha256:6f8f581787e739945e6cda101f312ea8a7e7082bdbb4993901eb828da6a49092"},
{file = "grpcio-1.40.0-cp35-cp35m-manylinux2010_i686.whl", hash = "sha256:a4389e26a8f9338ca91effdc5436dfec67d6ecd296368dba115799ae8f8e5bdb"},
{file = "grpcio-1.40.0-cp35-cp35m-manylinux2010_x86_64.whl", hash = "sha256:fb06708e3d173e387326abcd5182d52beb60e049db5c3d317bd85509e938afdc"},
{file = "grpcio-1.40.0-cp35-cp35m-manylinux2014_i686.whl", hash = "sha256:f06e07161c21391682bfcac93a181a037a8aa3d561546690e9d0501189729aac"},
{file = "grpcio-1.40.0-cp35-cp35m-manylinux2014_x86_64.whl", hash = "sha256:5ff0dcf66315f3f00e1a8eb7244c6a49bdb0cc59bef4fb65b9db8adbd78e6acb"},
{file = "grpcio-1.40.0-cp35-cp35m-win32.whl", hash = "sha256:ba9dd97ea1738be3e81d34e6bab8ff91a0b80668a4ec81454b283d3c828cebde"},
{file = "grpcio-1.40.0-cp35-cp35m-win_amd64.whl", hash = "sha256:e12d776a240fee3ebd002519c02d165d94ec636d3fe3d6185b361bfc9a2d3106"},
{file = "grpcio-1.40.0-cp36-cp36m-linux_armv7l.whl", hash = "sha256:6b9b432f5665dfc802187384693b6338f05c7fc3707ebf003a89bd5132074e27"},
{file = "grpcio-1.40.0-cp36-cp36m-macosx_10_10_x86_64.whl", hash = "sha256:886d056f5101ac513f4aefe4d21a816d98ee3f9a8e77fc3bcb4ae1a3a24efe26"},
{file = "grpcio-1.40.0-cp36-cp36m-manylinux2010_i686.whl", hash = "sha256:b1b34e5a6f1285d1576099c663dae28c07b474015ed21e35a243aff66a0c2aed"},
{file = "grpcio-1.40.0-cp36-cp36m-manylinux2010_x86_64.whl", hash = "sha256:17ed13d43450ef9d1f9b78cc932bcf42844ca302235b93026dfd07fb5208d146"},
{file = "grpcio-1.40.0-cp36-cp36m-manylinux2014_i686.whl", hash = "sha256:e19de138199502d575fcec5cf68ae48815a6efe7e5c0d0b8c97eba8c77ae9f0e"},
{file = "grpcio-1.40.0-cp36-cp36m-manylinux2014_x86_64.whl", hash = "sha256:a812164ceb48cb62c3217bd6245274e693c624cc2ac0c1b11b4cea96dab054dd"},
{file = "grpcio-1.40.0-cp36-cp36m-manylinux_2_24_aarch64.whl", hash = "sha256:eedc8c3514c10b6f11c6f406877e424ca29610883b97bb97e33b1dd2a9077f6c"},
{file = "grpcio-1.40.0-cp36-cp36m-win32.whl", hash = "sha256:1708a0ba90c798b4313f541ffbcc25ed47e790adaafb02111204362723dabef0"},
{file = "grpcio-1.40.0-cp36-cp36m-win_amd64.whl", hash = "sha256:d760a66c9773780837915be85a39d2cd4ab42ef32657c5f1d28475e23ab709fc"},
{file = "grpcio-1.40.0-cp37-cp37m-linux_armv7l.whl", hash = "sha256:8a35b5f87247c893b01abf2f4f7493a18c2c5bf8eb3923b8dd1654d8377aa1a7"},
{file = "grpcio-1.40.0-cp37-cp37m-macosx_10_10_x86_64.whl", hash = "sha256:45704b9b5b85f9bcb027f90f2563d11d995c1b870a9ee4b3766f6c7ff6fc3505"},
{file = "grpcio-1.40.0-cp37-cp37m-manylinux2010_i686.whl", hash = "sha256:4967949071c9e435f9565ec2f49700cebeda54836a04710fe21f7be028c0125a"},
{file = "grpcio-1.40.0-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:1f9ccc9f5c0d5084d1cd917a0b5ff0142a8d269d0755592d751f8ce9e7d3d7f1"},
{file = "grpcio-1.40.0-cp37-cp37m-manylinux2014_i686.whl", hash = "sha256:5729ca9540049f52c2e608ca110048cfabab3aeaa0d9f425361d9f8ba8506cac"},
{file = "grpcio-1.40.0-cp37-cp37m-manylinux2014_x86_64.whl", hash = "sha256:edddc849bed3c5dfe215a9f9532a9bd9f670b57d7b8af603be80148b4c69e9a8"},
{file = "grpcio-1.40.0-cp37-cp37m-manylinux_2_24_aarch64.whl", hash = "sha256:49155dfdf725c0862c428039123066b25ce61bd38ce50a21ce325f1735aac1bd"},
{file = "grpcio-1.40.0-cp37-cp37m-win32.whl", hash = "sha256:913916823efa2e487b2ee9735b7759801d97fd1974bacdb1900e3bbd17f7d508"},
{file = "grpcio-1.40.0-cp37-cp37m-win_amd64.whl", hash = "sha256:24277aab99c346ca36a1aa8589a0624e19a8e6f2b74c83f538f7bb1cc5ee8dbc"},
{file = "grpcio-1.40.0-cp38-cp38-linux_armv7l.whl", hash = "sha256:a66a30513d2e080790244a7ac3d7a3f45001f936c5c2c9613e41e2a5d7a11794"},
{file = "grpcio-1.40.0-cp38-cp38-macosx_10_10_x86_64.whl", hash = "sha256:e2367f2b18dd4ba64cdcd9f626a920f9ec2e8228630839dc8f4a424d461137ea"},
{file = "grpcio-1.40.0-cp38-cp38-manylinux2010_i686.whl", hash = "sha256:27dee6dcd1c04c4e9ceea49f6143003569292209d2c24ca100166660805e2440"},
{file = "grpcio-1.40.0-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:d271e52038dec0db7c39ad9303442d6087c55e09b900e2931b86e837cf0cbc2e"},
{file = "grpcio-1.40.0-cp38-cp38-manylinux2014_i686.whl", hash = "sha256:41e250ec7cd7523bf49c815b5509d5821728c26fac33681d4b0d1f5f34f59f06"},
{file = "grpcio-1.40.0-cp38-cp38-manylinux2014_x86_64.whl", hash = "sha256:33dc4259fecb96e6eac20f760656b911bcb1616aa3e58b3a1d2f125714a2f5d3"},
{file = "grpcio-1.40.0-cp38-cp38-manylinux_2_24_aarch64.whl", hash = "sha256:72b7b8075ee822dad4b39c150d73674c1398503d389e38981e9e35a894c476de"},
{file = "grpcio-1.40.0-cp38-cp38-win32.whl", hash = "sha256:a93490e6eff5fce3748fb2757cb4273dc21eb1b56732b8c9640fd82c1997b215"},
{file = "grpcio-1.40.0-cp38-cp38-win_amd64.whl", hash = "sha256:d3b4b41eb0148fca3e6e6fc61d1332a7e8e7c4074fb0d1543f0b255d7f5f1588"},
{file = "grpcio-1.40.0-cp39-cp39-linux_armv7l.whl", hash = "sha256:fbe3b66bfa2c2f94535f6063f6db62b5b150d55a120f2f9e1175d3087429c4d9"},
{file = "grpcio-1.40.0-cp39-cp39-macosx_10_10_x86_64.whl", hash = "sha256:ecfd80e8ea03c46b3ea7ed37d2040fcbfe739004b9e4329b8b602d06ac6fb113"},
{file = "grpcio-1.40.0-cp39-cp39-manylinux2010_i686.whl", hash = "sha256:d487b4daf84a14741ca1dc1c061ffb11df49d13702cd169b5837fafb5e84d9c0"},
{file = "grpcio-1.40.0-cp39-cp39-manylinux2010_x86_64.whl", hash = "sha256:c26de909cfd54bacdb7e68532a1591a128486af47ee3a5f828df9aa2165ae457"},
{file = "grpcio-1.40.0-cp39-cp39-manylinux2014_i686.whl", hash = "sha256:1d9eabe2eb2f78208f9ae67a591f73b024488449d4e0a5b27c7fca2d6901a2d4"},
{file = "grpcio-1.40.0-cp39-cp39-manylinux2014_x86_64.whl", hash = "sha256:4c2baa438f51152c9b7d0835ff711add0b4bc5056c0f5df581a6112153010696"},
{file = "grpcio-1.40.0-cp39-cp39-manylinux_2_24_aarch64.whl", hash = "sha256:bf114be0023b145f7101f392a344692c1efd6de38a610c54a65ed3cba035e669"},
{file = "grpcio-1.40.0-cp39-cp39-win32.whl", hash = "sha256:5f6d6b638698fa6decf7f040819aade677b583eaa21b43366232cb254a2bbac8"},
{file = "grpcio-1.40.0-cp39-cp39-win_amd64.whl", hash = "sha256:005fe14e67291498989da67d454d805be31d57a988af28ed3a2a0a7cabb05c53"},
{file = "grpcio-1.40.0.tar.gz", hash = "sha256:3d172158fe886a2604db1b6e17c2de2ab465fe0fe36aba2ec810ca8441cefe3a"},
{file = "grpcio-1.41.0-cp310-cp310-linux_armv7l.whl", hash = "sha256:9ecd0fc34aa46eeac24f4d20e67bafaf72ca914f99690bf2898674905eaddaf9"},
{file = "grpcio-1.41.0-cp310-cp310-macosx_10_10_universal2.whl", hash = "sha256:d539ebd05a2bbfbf897d41738d37d162d5c3d9f2b1f8ddf2c4f75e2c9cf59907"},
{file = "grpcio-1.41.0-cp310-cp310-manylinux_2_17_aarch64.whl", hash = "sha256:2410000eb57cf76b05b37d2aee270b686f0a7876710850a2bba92b4ed133e026"},
{file = "grpcio-1.41.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:be3c6ac822edb509aeef41361ca9c8c5ee52cb9e4973e1977d2bb7d6a460fd97"},
{file = "grpcio-1.41.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a0c4bdd1d646365d10ba1468bcf234ea5ad46e8ce2b115983e8563248614910a"},
{file = "grpcio-1.41.0-cp310-cp310-win32.whl", hash = "sha256:7033199706526e7ee06a362e38476dfdf2ddbad625c19b67ed30411d1bb25a18"},
{file = "grpcio-1.41.0-cp310-cp310-win_amd64.whl", hash = "sha256:fb64abf0d92134cb0ba4496a3b7ab918588eee42de20e5b3507fe6ee16db97ee"},
{file = "grpcio-1.41.0-cp36-cp36m-linux_armv7l.whl", hash = "sha256:b6b68c444abbaf4a2b944a61cf35726ab9645f45d416bcc7cf4addc4b2f2d53d"},
{file = "grpcio-1.41.0-cp36-cp36m-macosx_10_10_x86_64.whl", hash = "sha256:5292a627b44b6d3065de4a364ead23bab3c9d7a7c05416a9de0c0624d0fe03f4"},
{file = "grpcio-1.41.0-cp36-cp36m-manylinux2010_i686.whl", hash = "sha256:1820845e7e6410240eff97742e9f76cd5bf10ca01d36a322e86c0bd5340ac25b"},
{file = "grpcio-1.41.0-cp36-cp36m-manylinux2010_x86_64.whl", hash = "sha256:462178987f0e5c60d6d1b79e4e95803a4cd789db961d6b3f087245906bb5ae04"},
{file = "grpcio-1.41.0-cp36-cp36m-manylinux_2_17_aarch64.whl", hash = "sha256:7b07cbbd4eea56738e995fcbba3b60e41fd9aa9dac937fb7985c5dcbc7626260"},
{file = "grpcio-1.41.0-cp36-cp36m-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:3a92e4df5330cd384984e04804104ae34f521345917813aa86fc0930101a3697"},
{file = "grpcio-1.41.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ccd2f1cf11768d1f6fbe4e13e8b8fb0ccfe9914ceeff55a367d5571e82eeb543"},
{file = "grpcio-1.41.0-cp36-cp36m-win32.whl", hash = "sha256:59645b2d9f19b5ff30cb46ddbcaa09c398f9cd81e4e476b21c7c55ae1e942807"},
{file = "grpcio-1.41.0-cp36-cp36m-win_amd64.whl", hash = "sha256:0abd56d90dff3ed566807520de1385126dded21e62d3490a34c180a91f94c1f4"},
{file = "grpcio-1.41.0-cp37-cp37m-linux_armv7l.whl", hash = "sha256:9674a9d3f23702e35a89e22504f41b467893cf704f627cc9cdd118cf1dcc8e26"},
{file = "grpcio-1.41.0-cp37-cp37m-macosx_10_10_x86_64.whl", hash = "sha256:c95dd6e60e059ff770a2ac9f5a202b75dd64d76b0cd0c48f27d58907e43ed6a6"},
{file = "grpcio-1.41.0-cp37-cp37m-manylinux2010_i686.whl", hash = "sha256:a3cd7f945d3e3b82ebd2a4c9862eb9891a5ac87f84a7db336acbeafd86e6c402"},
{file = "grpcio-1.41.0-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:c07acd49541f5f6f9984fe0adf162d77bf70e0f58e77f9960c6f571314ff63a4"},
{file = "grpcio-1.41.0-cp37-cp37m-manylinux_2_17_aarch64.whl", hash = "sha256:7da3f6f6b857399c9ad85bcbffc83189e547a0a1a777ab68f5385154f8bc1ed4"},
{file = "grpcio-1.41.0-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:39ce785f0cbd07966a9019386b7a054615b2da63da3c7727f371304d000a1890"},
{file = "grpcio-1.41.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:07594e585a5ba25cf331ddb63095ca51010c34e328a822cb772ffbd5daa62cb5"},
{file = "grpcio-1.41.0-cp37-cp37m-win32.whl", hash = "sha256:3bbeee115b05b22f6a9fa9bc78f9ab8d9d6bb8c16fdfc60401fc8658beae1099"},
{file = "grpcio-1.41.0-cp37-cp37m-win_amd64.whl", hash = "sha256:dcb5f324712a104aca4a459e524e535f205f36deb8005feb4f9d3ff0a22b5177"},
{file = "grpcio-1.41.0-cp38-cp38-linux_armv7l.whl", hash = "sha256:83c1e731c2b76f26689ad88534cafefe105dcf385567bead08f5857cb308246b"},
{file = "grpcio-1.41.0-cp38-cp38-macosx_10_10_x86_64.whl", hash = "sha256:5d4b30d068b022e412adcf9b14c0d9bcbc872e9745b91467edc0a4c700a8bba6"},
{file = "grpcio-1.41.0-cp38-cp38-manylinux2010_i686.whl", hash = "sha256:d71aa430b2ac40e18e388504ac34cc91d49d811855ca507c463a21059bf364f0"},
{file = "grpcio-1.41.0-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:c8c5bc498f6506b6041c30afb7a55c57a9fd535d1a0ac7cdba9b5fd791a85633"},
{file = "grpcio-1.41.0-cp38-cp38-manylinux_2_17_aarch64.whl", hash = "sha256:a144f6cecbb61aace12e5920840338a3d246123a41d795e316e2792e9775ad15"},
{file = "grpcio-1.41.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:e516124010ef60d5fc2e0de0f1f987599249dc55fd529001f17f776a4145767f"},
{file = "grpcio-1.41.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c1e0a4c86d4cbd93059d5eeceed6e1c2e3e1494e1bf40be9b8ab14302c576162"},
{file = "grpcio-1.41.0-cp38-cp38-win32.whl", hash = "sha256:a614224719579044bd7950554d3b4c1793bb5715cbf0f0399b1f21d283c40ef6"},
{file = "grpcio-1.41.0-cp38-cp38-win_amd64.whl", hash = "sha256:b2de4e7b5a930be04a4d05c9f5fce7e9191217ccdc174b026c2a7928770dca9f"},
{file = "grpcio-1.41.0-cp39-cp39-linux_armv7l.whl", hash = "sha256:056806e83eaa09d0af0e452dd353db8f7c90aa2dedcce1112a2d21592550f6b1"},
{file = "grpcio-1.41.0-cp39-cp39-macosx_10_10_x86_64.whl", hash = "sha256:5502832b7cec670a880764f51a335a19b10ff5ab2e940e1ded67f39b88aa02b1"},
{file = "grpcio-1.41.0-cp39-cp39-manylinux2010_i686.whl", hash = "sha256:585847ed190ea9cb4d632eb0ebf58f1d299bbca5e03284bc3d0fa08bab6ea365"},
{file = "grpcio-1.41.0-cp39-cp39-manylinux2010_x86_64.whl", hash = "sha256:d0cc0393744ce3ce1b237ae773635cc928470ff46fb0d3f677e337a38e5ed4f6"},
{file = "grpcio-1.41.0-cp39-cp39-manylinux_2_17_aarch64.whl", hash = "sha256:2882b62f74de8c8a4f7b2be066f6230ecc46f4edc8f42db1fb7358200abe3b25"},
{file = "grpcio-1.41.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:297ee755d3c6cd7e7d3770f298f4d4d4b000665943ae6d2888f7407418a9a510"},
{file = "grpcio-1.41.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ace080a9c3c673c42adfd2116875a63fec9613797be01a6105acf7721ed0c693"},
{file = "grpcio-1.41.0-cp39-cp39-win32.whl", hash = "sha256:1bcbeac764bbae329bc2cc9e95d0f4d3b0fb456b92cf12e7e06e3e860a4b31cf"},
{file = "grpcio-1.41.0-cp39-cp39-win_amd64.whl", hash = "sha256:4537bb9e35af62c5189493792a8c34d127275a6d175c8ad48b6314cacba4021e"},
{file = "grpcio-1.41.0.tar.gz", hash = "sha256:15c04d695833c739dbb25c88eaf6abd9a461ec0dbd32f44bc8769335a495cf5a"},
]
grpcio-gcp = [
{file = "grpcio-gcp-0.2.2.tar.gz", hash = "sha256:e292605effc7da39b7a8734c719afb12ec4b5362add3528d8afad3aa3aa9057c"},
@@ -2153,8 +2154,8 @@ pillow = [
{file = "Pillow-8.3.2.tar.gz", hash = "sha256:dde3f3ed8d00c72631bc19cbfff8ad3b6215062a5eed402381ad365f82f0c18c"},
]
platformdirs = [
{file = "platformdirs-2.3.0-py3-none-any.whl", hash = "sha256:8003ac87717ae2c7ee1ea5a84a1a61e87f3fbd16eb5aadba194ea30a9019f648"},
{file = "platformdirs-2.3.0.tar.gz", hash = "sha256:15b056538719b1c94bdaccb29e5f81879c7f7f0f4a153f46086d155dffcd4f0f"},
{file = "platformdirs-2.4.0-py3-none-any.whl", hash = "sha256:8868bbe3c3c80d42f20156f22e7131d2fb321f5bc86a2a345375c6481a67021d"},
{file = "platformdirs-2.4.0.tar.gz", hash = "sha256:367a5e80b3d04d2428ffa76d33f124cf11e8fff2acdaa9b43d545f5c7d661ef2"},
]
prospector = [
{file = "prospector-1.5.1-py3-none-any.whl", hash = "sha256:47f8ff3fd36ae276967eb392ca20b300a7bdea66c0d0252250a4d89a6c03ab15"},

View File

@@ -7,13 +7,13 @@ authors = ["Daniel Tomlinson <dtomlinson@panaetius.co.uk>"]
[tool.poetry.dependencies]
python = "^3.7"
apache-beam = {extras = ["gcp"], version = "^2.32.0"}
mkdocs = "^1.2.2"
mkdocs-material = "^7.3.0"
[tool.poetry.dev-dependencies]
# pytest = "^5.2"
prospector = "^1.5.1"
pandas-profiling = "^3.0.0"
mkdocs = "^1.2.2"
mkdocs-material = "^7.3.0"
[build-system]
requires = ["poetry-core>=1.0.0"]

53
requirements.txt Normal file
View File

@@ -0,0 +1,53 @@
apache-beam==2.32.0; python_version >= "3.6"
avro-python3==1.9.2.1; python_version >= "3.6"
cachetools==4.2.2; python_version >= "3.6" and python_version < "4.0" and (python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0")
certifi==2021.5.30; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.6.0" and python_version >= "3.6"
charset-normalizer==2.0.6; python_full_version >= "3.6.0" and python_version >= "3.6"
crcmod==1.7; python_version >= "3.6"
dill==0.3.1.1; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.1.0" and python_version >= "3.6"
docopt==0.6.2; python_version >= "3.6"
fastavro==1.4.5; python_version >= "3.6"
fasteners==0.16.3; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.5.0"
future==0.18.2; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.3.0" and python_version >= "3.6"
google-api-core==1.31.2; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0"
google-apitools==0.5.31; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.5.0"
google-auth==1.35.0; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0"
google-cloud-bigquery==2.6.1; python_version >= "3.6"
google-cloud-bigtable==1.7.0; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.4.0"
google-cloud-core==1.7.2; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0"
google-cloud-datastore==1.15.3; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.4.0"
google-cloud-dlp==1.0.0; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.4.0"
google-cloud-language==1.3.0; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.4.0"
google-cloud-pubsub==1.7.0; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.5.0"
google-cloud-recommendations-ai==0.2.0; python_version >= "3.6"
google-cloud-spanner==1.19.1; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.4.0"
google-cloud-videointelligence==1.16.1; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.4.0"
google-cloud-vision==1.0.0; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.4.0"
google-crc32c==1.2.0; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0"
google-resumable-media==1.3.3; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0"
googleapis-common-protos==1.53.0; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0"
grpc-google-iam-v1==0.12.3; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.5.0"
grpcio-gcp==0.2.2; python_version >= "3.6"
grpcio==1.41.0; python_version >= "3.6" and (python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.4.0")
hdfs==2.6.0; python_version >= "3.6"
httplib2==0.19.1; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.5.0"
idna==3.2; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.6.0" and python_version >= "3.6"
numpy==1.20.3; python_version >= "3.7"
oauth2client==4.1.3; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.5.0"
orjson==3.6.3; python_version >= "3.7"
packaging==21.0; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0"
proto-plus==1.19.0; python_version >= "3.6"
protobuf==3.18.0; python_version >= "3.6" and (python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0")
pyarrow==4.0.1; python_version >= "3.6"
pyasn1-modules==0.2.8; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0"
pyasn1==0.4.8; python_version >= "3.6" and python_full_version < "3.0.0" and python_version < "4" and (python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0") or python_version >= "3.6" and python_full_version >= "3.6.0" and python_version < "4" and (python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0")
pydot==1.4.2; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6"
pymongo==3.12.0; python_version >= "3.6"
pyparsing==2.4.7; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.6.0" and python_version >= "3.6"
python-dateutil==2.8.2; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.3.0" and python_version >= "3.6"
pytz==2021.1; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0"
requests==2.26.0; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.6.0" and python_version >= "3.6"
rsa==4.7.2; python_version >= "3.6" and python_version < "4" and (python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0")
six==1.16.0; python_version >= "3.6" and python_full_version < "3.0.0" or python_version >= "3.6" and python_full_version >= "3.6.0"
typing-extensions==3.7.4.3; python_version >= "3.6"
urllib3==1.26.7; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.6.0" and python_version < "4" and python_version >= "3.6"