Files
street_group_tech_test/docs/discussion/results.md
dtomlinson91 80376a662e Merge final release (#1)
* adding initial skeleton

* updating .gitignore

* updating dev dependencies

* adding report.py

* updating notes

* adding prospector.yaml

* updating beam to install gcp extras

* adding documentation

* adding data exploration report + code

* adding latest beam pipeline code

* adding latest beam pipeline code

* adding debug.py

* adding latesty beam pipeline code

* adding latest beam pipeline code

* adding latest beam pipeline code

* updating .gitignore

* updating folder structure for data input/output

* updating prospector.yaml

* adding latest beam pipeline code

* updating prospector.yaml

* migrate beam pipeline to main.py

* updating .gitignore

* updating .gitignore

* adding download script for data set

* adding initial docs

* moving inputs/outputs to use pathlib

* removing shard_name_template from output file

* adding pyenv 3.7.9

* removing requirements.txt for documentation

* updating README.md

* updating download data script for new location in GCS

* adding latest beam pipeline code for dataflow

* adding latest beam pipeline code for dataflow

* adding latest beam pipeline code for dataflow

* moving dataflow notes

* updating prospector.yaml

* adding latest beam pipeline code for dataflow

* updating beam pipeline to use GroupByKey

* updating download_data script with new bucket

* update prospector.yaml

* update dataflow documentation with new commands for vpc

* adding latest beam pipeline code for dataflow with group optimisation

* updating dataflow documentation

* adding latest beam pipeline code for dataflow with group optimisation

* updating download_data script with pp-2020 dataset

* adding temporary notes

* updating dataflow notes

* adding latest beam pipeline code

* updating dataflow notes

* adding latest beam pipeline code for dataflow

* adding debug print

* moving panda-profiling report into docs

* updating report.py

* adding entrypoint command

* adding initial docs

* adding commands.md to notes

* commenting out debug imports

* updating documentation

* updating latest beam pipeline with default inputs

* updating poetry

* adding requirements.txt

* updating documentation
2021-09-28 00:31:09 +01:00

52 lines
2.1 KiB
Markdown

# Results
The resulting output `.json` looks like (for the previous example using No. 1 `B90 3LA`):
```json
[
{
"property_id": "fe205bfe66bc7f18c50c8f3d77ec3e30",
"readable_address": "1 VERSTONE ROAD\nSHIRLEY\nSOLIHULL\nWEST MIDLANDS\nB90 3LA",
"flat_appartment": "",
"builing": "",
"number": "1",
"street": "VERSTONE ROAD",
"locality": "SHIRLEY",
"town": "SOLIHULL",
"district": "SOLIHULL",
"county": "WEST MIDLANDS",
"postcode": "B90 3LA",
"property_transactions": [
{
"price": 317500,
"transaction_date": "2020-11-13",
"year": 2020
}
],
"latest_transaction_year": 2020
}
]
```
The standard property information is included, we will briefly discuss the additional fields included in this output file.
## readable_address
The components that make up the address in the dataset are often repetitive, with the locality, town/city, district and county often sharing the same result. This can result in hard to read addresses if we just stacked all the components sequentially.
The `readable_address` provides an easy to read address that strips this repetiveness out, by doing pairwise comparisons to each of the four components and applying a mask. The result is an address that could be served to the end user, or easily displayed on a page.
This saves any user having to apply the same logic to simply display the address somewhere, the full address of a property should be easy to read and easily accessible.
## property_transactions
This array contains an object for each transaction for that property that has the price and year as an `int`, with the date having the `00:00` time stripped out.
## latest_transaction_year
The date of the latest transaction is extracted from the array of `property_transactions` and placed in the top level of the `json` object. This allows any end user to easily search for properties that haven't been sold in a period of time, without having to write this logic themselves.
A consumer should be able to use this data to answer questions like:
- Give me all properties in the town of Solihull that haven't been sold in the past 10 years.