Commit Graph

56 Commits

Author SHA1 Message Date
886a37ca94 updating report.py 2021-09-27 21:18:14 +01:00
3263b3dd8b moving panda-profiling report into docs 2021-09-27 21:18:06 +01:00
dffc6aa553 adding debug print 2021-09-27 21:17:49 +01:00
f9eeb8bfad adding latest beam pipeline code for dataflow 2021-09-27 21:17:39 +01:00
cad6612ebe updating dataflow notes 2021-09-27 03:39:40 +01:00
391861d80c adding latest beam pipeline code 2021-09-27 03:39:30 +01:00
f60beb4565 updating dataflow notes 2021-09-27 03:18:49 +01:00
f2ed60426d adding temporary notes 2021-09-27 03:18:42 +01:00
7db1edb90c updating download_data script with pp-2020 dataset 2021-09-27 03:18:33 +01:00
3a74579440 adding latest beam pipeline code for dataflow with group optimisation 2021-09-27 03:18:17 +01:00
377e3c703f updating dataflow documentation 2021-09-27 01:35:48 +01:00
a8fc06c764 adding latest beam pipeline code for dataflow with group optimisation 2021-09-26 23:28:58 +01:00
eaa36877f6 update dataflow documentation with new commands for vpc 2021-09-26 23:28:35 +01:00
1941fcb7bf update prospector.yaml 2021-09-26 23:28:21 +01:00
99e67c2840 updating download_data script with new bucket 2021-09-26 23:28:12 +01:00
8e8469579e updating beam pipeline to use GroupByKey 2021-09-26 20:29:11 +01:00
4e3771c728 adding latest beam pipeline code for dataflow 2021-09-26 17:15:35 +01:00
8856a9763f updating prospector.yaml 2021-09-26 17:15:24 +01:00
fded858932 moving dataflow notes 2021-09-26 17:15:17 +01:00
bb71d55f8c adding latest beam pipeline code for dataflow 2021-09-26 16:23:19 +01:00
8047b5ced4 adding latest beam pipeline code for dataflow 2021-09-26 16:16:58 +01:00
9f53c66975 adding latest beam pipeline code for dataflow 2021-09-26 15:57:00 +01:00
e6ec110d54 updating download data script for new location in GCS 2021-09-26 14:56:53 +01:00
83807616e0 Merge branch 'wip/pathlib' into develop 2021-09-26 14:56:21 +01:00
7f874fa6f6 updating README.md 2021-09-26 14:56:01 +01:00
b8a997084d removing requirements.txt for documentation 2021-09-26 14:55:48 +01:00
c4e81065b1 adding pyenv 3.7.9 2021-09-26 14:55:05 +01:00
62bd0196ad removing shard_name_template from output file 2021-09-26 06:11:42 +01:00
7f9b7e4bfd moving inputs/outputs to use pathlib 2021-09-26 06:03:55 +01:00
7962f40e32 adding initial docs 2021-09-26 01:34:06 +01:00
2a43ea1946 adding download script for data set 2021-09-26 01:18:23 +01:00
07d176be79 updating .gitignore 2021-09-26 01:10:53 +01:00
f804e85cc3 updating .gitignore 2021-09-26 01:10:32 +01:00
9fdc6dce05 migrate beam pipeline to main.py 2021-09-26 01:10:09 +01:00
54cf5e3e36 updating prospector.yaml 2021-09-26 01:09:48 +01:00
2e42a453b0 adding latest beam pipeline code 2021-09-25 22:15:58 +01:00
adfbd8e93d updating prospector.yaml 2021-09-25 22:15:46 +01:00
1bd54f188d updating folder structure for data input/output 2021-09-25 22:15:39 +01:00
a7c52b1085 updating .gitignore 2021-09-25 22:15:10 +01:00
24420c8935 adding latest beam pipeline code 2021-09-25 18:25:08 +01:00
44f346deff Merge remote-tracking branch 'origin/develop' into develop 2021-09-25 16:48:04 +01:00
a37e7817c3 adding latest beam pipeline code 2021-09-25 16:47:52 +01:00
539e4c7786 adding latesty beam pipeline code 2021-09-25 16:47:17 +01:00
214ce77d8f adding debug.py 2021-09-25 16:47:08 +01:00
47a4ac4bc3 adding latest beam pipeline code 2021-09-25 01:44:37 +01:00
aa61ea9c57 adding latest beam pipeline code 2021-09-25 00:48:41 +01:00
94cc22a385 adding data exploration report + code 2021-09-25 00:48:34 +01:00
a05182892a adding documentation 2021-09-25 00:48:13 +01:00
c38a10ca2f updating beam to install gcp extras 2021-09-25 00:48:04 +01:00
ab993fc030 adding prospector.yaml 2021-09-25 00:47:49 +01:00