mirror of
https://github.com/dtomlinson91/street_group_tech_test
synced 2025-12-22 11:55:45 +00:00
* adding initial skeleton * updating .gitignore * updating dev dependencies * adding report.py * updating notes * adding prospector.yaml * updating beam to install gcp extras * adding documentation * adding data exploration report + code * adding latest beam pipeline code * adding latest beam pipeline code * adding debug.py * adding latesty beam pipeline code * adding latest beam pipeline code * adding latest beam pipeline code * updating .gitignore * updating folder structure for data input/output * updating prospector.yaml * adding latest beam pipeline code * updating prospector.yaml * migrate beam pipeline to main.py * updating .gitignore * updating .gitignore * adding download script for data set * adding initial docs * moving inputs/outputs to use pathlib * removing shard_name_template from output file * adding pyenv 3.7.9 * removing requirements.txt for documentation * updating README.md * updating download data script for new location in GCS * adding latest beam pipeline code for dataflow * adding latest beam pipeline code for dataflow * adding latest beam pipeline code for dataflow * moving dataflow notes * updating prospector.yaml * adding latest beam pipeline code for dataflow * updating beam pipeline to use GroupByKey * updating download_data script with new bucket * update prospector.yaml * update dataflow documentation with new commands for vpc * adding latest beam pipeline code for dataflow with group optimisation * updating dataflow documentation * adding latest beam pipeline code for dataflow with group optimisation * updating download_data script with pp-2020 dataset * adding temporary notes * updating dataflow notes * adding latest beam pipeline code * updating dataflow notes * adding latest beam pipeline code for dataflow * adding debug print * moving panda-profiling report into docs * updating report.py * adding entrypoint command * adding initial docs * adding commands.md to notes * commenting out debug imports * updating documentation * updating latest beam pipeline with default inputs * updating poetry * adding requirements.txt * updating documentation
52 lines
2.1 KiB
Markdown
52 lines
2.1 KiB
Markdown
# Results
|
|
|
|
The resulting output `.json` looks like (for the previous example using No. 1 `B90 3LA`):
|
|
|
|
```json
|
|
[
|
|
{
|
|
"property_id": "fe205bfe66bc7f18c50c8f3d77ec3e30",
|
|
"readable_address": "1 VERSTONE ROAD\nSHIRLEY\nSOLIHULL\nWEST MIDLANDS\nB90 3LA",
|
|
"flat_appartment": "",
|
|
"builing": "",
|
|
"number": "1",
|
|
"street": "VERSTONE ROAD",
|
|
"locality": "SHIRLEY",
|
|
"town": "SOLIHULL",
|
|
"district": "SOLIHULL",
|
|
"county": "WEST MIDLANDS",
|
|
"postcode": "B90 3LA",
|
|
"property_transactions": [
|
|
{
|
|
"price": 317500,
|
|
"transaction_date": "2020-11-13",
|
|
"year": 2020
|
|
}
|
|
],
|
|
"latest_transaction_year": 2020
|
|
}
|
|
]
|
|
```
|
|
|
|
The standard property information is included, we will briefly discuss the additional fields included in this output file.
|
|
|
|
## readable_address
|
|
|
|
The components that make up the address in the dataset are often repetitive, with the locality, town/city, district and county often sharing the same result. This can result in hard to read addresses if we just stacked all the components sequentially.
|
|
|
|
The `readable_address` provides an easy to read address that strips this repetiveness out, by doing pairwise comparisons to each of the four components and applying a mask. The result is an address that could be served to the end user, or easily displayed on a page.
|
|
|
|
This saves any user having to apply the same logic to simply display the address somewhere, the full address of a property should be easy to read and easily accessible.
|
|
|
|
## property_transactions
|
|
|
|
This array contains an object for each transaction for that property that has the price and year as an `int`, with the date having the `00:00` time stripped out.
|
|
|
|
## latest_transaction_year
|
|
|
|
The date of the latest transaction is extracted from the array of `property_transactions` and placed in the top level of the `json` object. This allows any end user to easily search for properties that haven't been sold in a period of time, without having to write this logic themselves.
|
|
|
|
A consumer should be able to use this data to answer questions like:
|
|
|
|
- Give me all properties in the town of Solihull that haven't been sold in the past 10 years.
|