* adding initial skeleton * updating .gitignore * updating dev dependencies * adding report.py * updating notes * adding prospector.yaml * updating beam to install gcp extras * adding documentation * adding data exploration report + code * adding latest beam pipeline code * adding latest beam pipeline code * adding debug.py * adding latesty beam pipeline code * adding latest beam pipeline code * adding latest beam pipeline code * updating .gitignore * updating folder structure for data input/output * updating prospector.yaml * adding latest beam pipeline code * updating prospector.yaml * migrate beam pipeline to main.py * updating .gitignore * updating .gitignore * adding download script for data set * adding initial docs * moving inputs/outputs to use pathlib * removing shard_name_template from output file * adding pyenv 3.7.9 * removing requirements.txt for documentation * updating README.md * updating download data script for new location in GCS * adding latest beam pipeline code for dataflow * adding latest beam pipeline code for dataflow * adding latest beam pipeline code for dataflow * moving dataflow notes * updating prospector.yaml * adding latest beam pipeline code for dataflow * updating beam pipeline to use GroupByKey * updating download_data script with new bucket * update prospector.yaml * update dataflow documentation with new commands for vpc * adding latest beam pipeline code for dataflow with group optimisation * updating dataflow documentation * adding latest beam pipeline code for dataflow with group optimisation * updating download_data script with pp-2020 dataset * adding temporary notes * updating dataflow notes * adding latest beam pipeline code * updating dataflow notes * adding latest beam pipeline code for dataflow * adding debug print * moving panda-profiling report into docs * updating report.py * adding entrypoint command * adding initial docs * adding commands.md to notes * commenting out debug imports * updating documentation * updating latest beam pipeline with default inputs * updating poetry * adding requirements.txt * updating documentation
2.1 KiB
Results
The resulting output .json looks like (for the previous example using No. 1 B90 3LA):
[
{
"property_id": "fe205bfe66bc7f18c50c8f3d77ec3e30",
"readable_address": "1 VERSTONE ROAD\nSHIRLEY\nSOLIHULL\nWEST MIDLANDS\nB90 3LA",
"flat_appartment": "",
"builing": "",
"number": "1",
"street": "VERSTONE ROAD",
"locality": "SHIRLEY",
"town": "SOLIHULL",
"district": "SOLIHULL",
"county": "WEST MIDLANDS",
"postcode": "B90 3LA",
"property_transactions": [
{
"price": 317500,
"transaction_date": "2020-11-13",
"year": 2020
}
],
"latest_transaction_year": 2020
}
]
The standard property information is included, we will briefly discuss the additional fields included in this output file.
readable_address
The components that make up the address in the dataset are often repetitive, with the locality, town/city, district and county often sharing the same result. This can result in hard to read addresses if we just stacked all the components sequentially.
The readable_address provides an easy to read address that strips this repetiveness out, by doing pairwise comparisons to each of the four components and applying a mask. The result is an address that could be served to the end user, or easily displayed on a page.
This saves any user having to apply the same logic to simply display the address somewhere, the full address of a property should be easy to read and easily accessible.
property_transactions
This array contains an object for each transaction for that property that has the price and year as an int, with the date having the 00:00 time stripped out.
latest_transaction_year
The date of the latest transaction is extracted from the array of property_transactions and placed in the top level of the json object. This allows any end user to easily search for properties that haven't been sold in a period of time, without having to write this logic themselves.
A consumer should be able to use this data to answer questions like:
- Give me all properties in the town of Solihull that haven't been sold in the past 10 years.