Merge final release (#1)

* adding initial skeleton

* updating .gitignore

* updating dev dependencies

* adding report.py

* updating notes

* adding prospector.yaml

* updating beam to install gcp extras

* adding documentation

* adding data exploration report + code

* adding latest beam pipeline code

* adding latest beam pipeline code

* adding debug.py

* adding latesty beam pipeline code

* adding latest beam pipeline code

* adding latest beam pipeline code

* updating .gitignore

* updating folder structure for data input/output

* updating prospector.yaml

* adding latest beam pipeline code

* updating prospector.yaml

* migrate beam pipeline to main.py

* updating .gitignore

* updating .gitignore

* adding download script for data set

* adding initial docs

* moving inputs/outputs to use pathlib

* removing shard_name_template from output file

* adding pyenv 3.7.9

* removing requirements.txt for documentation

* updating README.md

* updating download data script for new location in GCS

* adding latest beam pipeline code for dataflow

* adding latest beam pipeline code for dataflow

* adding latest beam pipeline code for dataflow

* moving dataflow notes

* updating prospector.yaml

* adding latest beam pipeline code for dataflow

* updating beam pipeline to use GroupByKey

* updating download_data script with new bucket

* update prospector.yaml

* update dataflow documentation with new commands for vpc

* adding latest beam pipeline code for dataflow with group optimisation

* updating dataflow documentation

* adding latest beam pipeline code for dataflow with group optimisation

* updating download_data script with pp-2020 dataset

* adding temporary notes

* updating dataflow notes

* adding latest beam pipeline code

* updating dataflow notes

* adding latest beam pipeline code for dataflow

* adding debug print

* moving panda-profiling report into docs

* updating report.py

* adding entrypoint command

* adding initial docs

* adding commands.md to notes

* commenting out debug imports

* updating documentation

* updating latest beam pipeline with default inputs

* updating poetry

* adding requirements.txt

* updating documentation
This commit is contained in:
dtomlinson91
2021-09-28 00:31:09 +01:00
committed by GitHub
parent 8a22bfebe1
commit 80376a662e
34 changed files with 5667 additions and 1 deletions

43
mkdocs.yaml Normal file
View File

@@ -0,0 +1,43 @@
site_name: The Street Group Technical Test
repo_url: https://github.com/dtomlinson91/street_group_tech_test
use_directory_urls: false
nav:
- Documentation:
- Welcome: index.md
- Installation: documentation/installation.md
- Usage: documentation/usage.md
- Discussion:
- Introduction: discussion/introduction.md
- Data Exploration Report: discussion/exploration.md
- Cleaning: discussion/cleaning.md
- Approach: discussion/approach.md
- Results: discussion/results.md
- DataFlow:
- Running on DataFlow: dataflow/index.md
- Scaling to the Full DataSet: dataflow/scaling.md
- Data Exploration Report: pandas-profiling/report.html
theme:
name: material
palette:
primary: indigo
accent: blue
features:
navigation.tabs: true
markdown_extensions:
- admonition
- codehilite:
guess_lang: true
- toc:
permalink: true
- pymdownx.highlight
- pymdownx.superfences
- pymdownx.inlinehilite
- pymdownx.snippets
- pymdownx.arithmatex:
generic: true
plugins:
- search:
lang: en
extra_javascript:
- https://polyfill.io/v3/polyfill.min.js?features=es6
- https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js