Dataset statistics
| Number of variables | 16 |
|---|---|
| Number of observations | 26321785 |
| Missing cells | 32579197 |
| Missing cells (%) | 7.7% |
| Total size in memory | 3.1 GiB |
| Average record size in memory | 128.0 B |
Variable types
| Categorical | 15 |
|---|---|
| Numeric | 1 |
record_status has constant value "A" | Constant |
transaction_id has a high cardinality: 26321785 distinct values | High cardinality |
date_of_transfer has a high cardinality: 9698 distinct values | High cardinality |
postcode has a high cardinality: 1274429 distinct values | High cardinality |
paon has a high cardinality: 508466 distinct values | High cardinality |
saon has a high cardinality: 58202 distinct values | High cardinality |
street has a high cardinality: 320310 distinct values | High cardinality |
locality has a high cardinality: 23716 distinct values | High cardinality |
town_city has a high cardinality: 1171 distinct values | High cardinality |
district has a high cardinality: 463 distinct values | High cardinality |
county has a high cardinality: 130 distinct values | High cardinality |
saon has 23252525 (88.3%) missing values | Missing |
street has 411893 (1.6%) missing values | Missing |
locality has 8868564 (33.7%) missing values | Missing |
price is highly skewed (γ1 = 212.5121542) | Skewed |
transaction_id has unique values | Unique |
Reproduction
| Analysis started | 2021-09-24 16:09:53.212210 |
|---|---|
| Analysis finished | 2021-09-24 16:15:35.611974 |
| Duration | 5 minutes and 42.4 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 26321785 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.8 MiB |
| {A2EF370F-229F-405B-A6EC-3D501915AD67} | 1 |
|---|---|
| {5F9A7C5C-E476-4986-8722-CCD108EB8B6E} | 1 |
| {E31EE92A-128A-4AA3-81B2-E7A8B3A4C90D} | 1 |
| {919FEC05-D788-9A90-E053-6C04A8C0A300} | 1 |
| {6FFBE023-6AE3-451E-ADA4-8D66F231D8F8} | 1 |
| Other values (26321780) |
Unique
| Unique | 26321785 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | {F887F88E-7D15-4415-804E-52EAC2F10958} |
|---|---|
| 2nd row | {40FD4DF2-5362-407C-92BC-566E2CCE89E9} |
| 3rd row | {7A99F89E-7D81-4E45-ABD5-566E49A045EA} |
| 4th row | {28225260-E61C-4E57-8B56-566E5285B1C1} |
| 5th row | {444D34D7-9BA6-43A7-B695-4F48980E0176} |
Common Values
| Value | Count | Frequency (%) |
| {A2EF370F-229F-405B-A6EC-3D501915AD67} | 1 | < 0.1% |
| {5F9A7C5C-E476-4986-8722-CCD108EB8B6E} | 1 | < 0.1% |
| {E31EE92A-128A-4AA3-81B2-E7A8B3A4C90D} | 1 | < 0.1% |
| {919FEC05-D788-9A90-E053-6C04A8C0A300} | 1 | < 0.1% |
| {6FFBE023-6AE3-451E-ADA4-8D66F231D8F8} | 1 | < 0.1% |
| {174380A1-DCA2-4DB5-A7C3-661C6C5D9798} | 1 | < 0.1% |
| {1D30B696-41C4-4D78-A474-EE409899F476} | 1 | < 0.1% |
| {C18420AC-7FE9-4E93-89AC-53697390DE27} | 1 | < 0.1% |
| {62E23376-BBCE-435C-AE3B-A2CA076580F4} | 1 | < 0.1% |
| {BC6C63E2-BDA1-498E-9E99-3B4A64C67381} | 1 | < 0.1% |
| Other values (26321775) | 26321775 |
| Distinct | 218688 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 205226.2064 |
| Minimum | 1 |
|---|---|
| Maximum | 630000000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 200.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 36000 |
| Q1 | 80500 |
| median | 142500 |
| Q3 | 235000 |
| 95-th percentile | 500000 |
| Maximum | 630000000 |
| Range | 629999999 |
| Interquartile range (IQR) | 154500 |
Descriptive statistics
| Standard deviation | 813350.2288 |
|---|---|
| Coefficient of variation (CV) | 3.963188927 |
| Kurtosis | 86409.10552 |
| Mean | 205226.2064 |
| Median Absolute Deviation (MAD) | 71290 |
| Skewness | 212.5121542 |
| Sum | 5.401920082 × 1012 |
| Variance | 6.615385948 × 1011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 250000 | 283789 | 1.1% |
| 125000 | 246148 | 0.9% |
| 120000 | 216854 | 0.8% |
| 150000 | 205955 | 0.8% |
| 110000 | 192318 | 0.7% |
| 175000 | 185637 | 0.7% |
| 115000 | 182009 | 0.7% |
| 135000 | 181500 | 0.7% |
| 60000 | 180568 | 0.7% |
| 130000 | 178803 | 0.7% |
| Other values (218678) | 24268204 |
| Value | Count | Frequency (%) |
| 1 | 101 | |
| 5 | 2 | < 0.1% |
| 10 | 8 | < 0.1% |
| 11 | 3 | < 0.1% |
| 15 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 630000000 | 1 | |
| 594300000 | 1 | |
| 569200000 | 1 | |
| 448500000 | 1 | |
| 448300979 | 1 |
| Distinct | 9698 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.8 MiB |
| 2016-03-31 00:00 | 32378 |
|---|---|
| 2001-06-29 00:00 | 26583 |
| 2002-05-31 00:00 | 26338 |
| 2002-06-28 00:00 | 26320 |
| 2007-06-29 00:00 | 24970 |
| Other values (9693) |
Unique
| Unique | 10 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 1995-07-07 00:00 |
|---|---|
| 2nd row | 1995-02-03 00:00 |
| 3rd row | 1995-01-13 00:00 |
| 4th row | 1995-07-28 00:00 |
| 5th row | 1995-06-28 00:00 |
Common Values
| Value | Count | Frequency (%) |
| 2016-03-31 00:00 | 32378 | 0.1% |
| 2001-06-29 00:00 | 26583 | 0.1% |
| 2002-05-31 00:00 | 26338 | 0.1% |
| 2002-06-28 00:00 | 26320 | 0.1% |
| 2007-06-29 00:00 | 24970 | 0.1% |
| 2000-06-30 00:00 | 24927 | 0.1% |
| 2003-11-28 00:00 | 24802 | 0.1% |
| 1999-05-28 00:00 | 24335 | 0.1% |
| 2006-06-30 00:00 | 24308 | 0.1% |
| 2000-03-31 00:00 | 23428 | 0.1% |
| Other values (9688) | 26063396 |
| Distinct | 1274429 |
|---|---|
| Distinct (%) | 4.8% |
| Missing | 42019 |
| Missing (%) | 0.2% |
| Memory size | 200.8 MiB |
| YO10 3FT | 534 |
|---|---|
| LU1 5FT | 523 |
| RH10 3HZ | 387 |
| L7 3AA | 372 |
| TR8 4LX | 355 |
| Other values (1274424) |
Unique
| Unique | 88384 ? |
|---|---|
| Unique (%) | 0.3% |
Sample
| 1st row | MK15 9HP |
|---|---|
| 2nd row | SR6 0AQ |
| 3rd row | CO6 1SQ |
| 4th row | B90 4TG |
| 5th row | DY5 1SA |
Common Values
| Value | Count | Frequency (%) |
| YO10 3FT | 534 | < 0.1% |
| LU1 5FT | 523 | < 0.1% |
| RH10 3HZ | 387 | < 0.1% |
| L7 3AA | 372 | < 0.1% |
| TR8 4LX | 355 | < 0.1% |
| M1 5GB | 348 | < 0.1% |
| BS3 3NG | 322 | < 0.1% |
| L5 3AA | 315 | < 0.1% |
| L3 8HA | 312 | < 0.1% |
| CM21 9PF | 305 | < 0.1% |
| Other values (1274419) | 26275993 | |
| (Missing) | 42019 | 0.2% |
property_type
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.8 MiB |
| T | |
|---|---|
| S | |
| D | |
| F | |
| O | 348815 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | D |
|---|---|
| 2nd row | T |
| 3rd row | T |
| 4th row | T |
| 5th row | S |
Common Values
| Value | Count | Frequency (%) |
| T | 7940720 | |
| S | 7224820 | |
| D | 6077417 | |
| F | 4730013 | |
| O | 348815 | 1.3% |
old_new
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.8 MiB |
| N | |
|---|---|
| Y |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | N |
|---|---|
| 2nd row | N |
| 3rd row | N |
| 4th row | N |
| 5th row | N |
Common Values
| Value | Count | Frequency (%) |
| N | 23590329 | |
| Y | 2731456 | 10.4% |
duration
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.8 MiB |
| F | |
|---|---|
| L | |
| U | 534 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | F |
|---|---|
| 2nd row | F |
| 3rd row | F |
| 4th row | F |
| 5th row | F |
Common Values
| Value | Count | Frequency (%) |
| F | 20127890 | |
| L | 6193361 | 23.5% |
| U | 534 | < 0.1% |
| Distinct | 508466 |
|---|---|
| Distinct (%) | 1.9% |
| Missing | 4196 |
| Missing (%) | < 0.1% |
| Memory size | 200.8 MiB |
| 1 | 656135 |
|---|---|
| 2 | 655728 |
| 3 | 649054 |
| 4 | 628033 |
| 5 | 605766 |
| Other values (508461) |
Unique
| Unique | 236792 ? |
|---|---|
| Unique (%) | 0.9% |
Sample
| 1st row | 31 |
|---|---|
| 2nd row | 50 |
| 3rd row | 19 |
| 4th row | 37 |
| 5th row | 59 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 656135 | 2.5% |
| 2 | 655728 | 2.5% |
| 3 | 649054 | 2.5% |
| 4 | 628033 | 2.4% |
| 5 | 605766 | 2.3% |
| 6 | 584416 | 2.2% |
| 7 | 559000 | 2.1% |
| 8 | 547016 | 2.1% |
| 9 | 517669 | 2.0% |
| 10 | 505557 | 1.9% |
| Other values (508456) | 20409215 |
| Distinct | 58202 |
|---|---|
| Distinct (%) | 1.9% |
| Missing | 23252525 |
| Missing (%) | 88.3% |
| Memory size | 200.8 MiB |
| FLAT 2 | 165778 |
|---|---|
| FLAT 1 | 164781 |
| FLAT 3 | 148061 |
| FLAT 4 | 122764 |
| FLAT 5 | 97065 |
| Other values (58197) |
Unique
| Unique | 33682 ? |
|---|---|
| Unique (%) | 1.1% |
Sample
| 1st row | 28 |
|---|---|
| 2nd row | FLAT 21 |
| 3rd row | FLAT 7A |
| 4th row | FLAT 1 |
| 5th row | FLAT 8 |
Common Values
| Value | Count | Frequency (%) |
| FLAT 2 | 165778 | 0.6% |
| FLAT 1 | 164781 | 0.6% |
| FLAT 3 | 148061 | 0.6% |
| FLAT 4 | 122764 | 0.5% |
| FLAT 5 | 97065 | 0.4% |
| FLAT 6 | 82135 | 0.3% |
| 2 | 79685 | 0.3% |
| 1 | 78655 | 0.3% |
| FLAT 7 | 66209 | 0.3% |
| FLAT 8 | 59802 | 0.2% |
| Other values (58192) | 2004325 | 7.6% |
| (Missing) | 23252525 |
| Distinct | 320310 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 411893 |
| Missing (%) | 1.6% |
| Memory size | 200.8 MiB |
| HIGH STREET | 169595 |
|---|---|
| STATION ROAD | 87514 |
| LONDON ROAD | 60153 |
| CHURCH ROAD | 49684 |
| CHURCH STREET | 49125 |
| Other values (320305) |
Unique
| Unique | 16555 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | ALDRICH DRIVE |
|---|---|
| 2nd row | HOWICK PARK |
| 3rd row | BRICK KILN CLOSE |
| 4th row | RAINSBROOK DRIVE |
| 5th row | MERRY HILL |
Common Values
| Value | Count | Frequency (%) |
| HIGH STREET | 169595 | 0.6% |
| STATION ROAD | 87514 | 0.3% |
| LONDON ROAD | 60153 | 0.2% |
| CHURCH ROAD | 49684 | 0.2% |
| CHURCH STREET | 49125 | 0.2% |
| MAIN STREET | 48089 | 0.2% |
| PARK ROAD | 40175 | 0.2% |
| VICTORIA ROAD | 34925 | 0.1% |
| CHURCH LANE | 32130 | 0.1% |
| MAIN ROAD | 29955 | 0.1% |
| Other values (320300) | 25308547 | |
| (Missing) | 411893 | 1.6% |
| Distinct | 23716 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 8868564 |
| Missing (%) | 33.7% |
| Memory size | 200.8 MiB |
| LONDON | 899924 |
|---|---|
| BIRMINGHAM | 111310 |
| MANCHESTER | 100845 |
| LIVERPOOL | 99506 |
| LEEDS | 88968 |
| Other values (23711) |
Unique
| Unique | 836 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | WILLEN |
|---|---|
| 2nd row | SUNDERLAND |
| 3rd row | COGGESHALL |
| 4th row | SHIRLEY |
| 5th row | BRIERLEY HILL |
Common Values
| Value | Count | Frequency (%) |
| LONDON | 899924 | 3.4% |
| BIRMINGHAM | 111310 | 0.4% |
| MANCHESTER | 100845 | 0.4% |
| LIVERPOOL | 99506 | 0.4% |
| LEEDS | 88968 | 0.3% |
| BRISTOL | 88659 | 0.3% |
| SHEFFIELD | 76269 | 0.3% |
| BOURNEMOUTH | 60354 | 0.2% |
| SOUTHAMPTON | 56612 | 0.2% |
| PLYMOUTH | 56250 | 0.2% |
| Other values (23706) | 15814524 | |
| (Missing) | 8868564 |
| Distinct | 1171 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.8 MiB |
| LONDON | 2031341 |
|---|---|
| MANCHESTER | 431016 |
| BRISTOL | 404300 |
| BIRMINGHAM | 386956 |
| NOTTINGHAM | 344685 |
| Other values (1166) |
Unique
| Unique | 2 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | MILTON KEYNES |
|---|---|
| 2nd row | SUNDERLAND |
| 3rd row | COLCHESTER |
| 4th row | SOLIHULL |
| 5th row | BRIERLEY HILL |
Common Values
| Value | Count | Frequency (%) |
| LONDON | 2031341 | 7.7% |
| MANCHESTER | 431016 | 1.6% |
| BRISTOL | 404300 | 1.5% |
| BIRMINGHAM | 386956 | 1.5% |
| NOTTINGHAM | 344685 | 1.3% |
| LEEDS | 296653 | 1.1% |
| LIVERPOOL | 272491 | 1.0% |
| SHEFFIELD | 250601 | 1.0% |
| LEICESTER | 231450 | 0.9% |
| SOUTHAMPTON | 213759 | 0.8% |
| Other values (1161) | 21458533 |
| Distinct | 463 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.8 MiB |
| BIRMINGHAM | 387632 |
|---|---|
| LEEDS | 351319 |
| BRADFORD | 230830 |
| SHEFFIELD | 213316 |
| MANCHESTER | 210509 |
| Other values (458) |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | MILTON KEYNES |
|---|---|
| 2nd row | SUNDERLAND |
| 3rd row | BRAINTREE |
| 4th row | SOLIHULL |
| 5th row | DUDLEY |
Common Values
| Value | Count | Frequency (%) |
| BIRMINGHAM | 387632 | 1.5% |
| LEEDS | 351319 | 1.3% |
| BRADFORD | 230830 | 0.9% |
| SHEFFIELD | 213316 | 0.8% |
| MANCHESTER | 210509 | 0.8% |
| CITY OF BRISTOL | 205163 | 0.8% |
| LIVERPOOL | 183632 | 0.7% |
| KIRKLEES | 178730 | 0.7% |
| WANDSWORTH | 177641 | 0.7% |
| EAST RIDING OF YORKSHIRE | 174522 | 0.7% |
| Other values (453) | 24008491 |
| Distinct | 130 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.8 MiB |
| GREATER LONDON | |
|---|---|
| GREATER MANCHESTER | 1165840 |
| WEST MIDLANDS | 1005283 |
| WEST YORKSHIRE | 1000437 |
| KENT | 747469 |
| Other values (125) |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | MILTON KEYNES |
|---|---|
| 2nd row | TYNE AND WEAR |
| 3rd row | ESSEX |
| 4th row | WEST MIDLANDS |
| 5th row | WEST MIDLANDS |
Common Values
| Value | Count | Frequency (%) |
| GREATER LONDON | 3406054 | 12.9% |
| GREATER MANCHESTER | 1165840 | 4.4% |
| WEST MIDLANDS | 1005283 | 3.8% |
| WEST YORKSHIRE | 1000437 | 3.8% |
| KENT | 747469 | 2.8% |
| ESSEX | 732466 | 2.8% |
| HAMPSHIRE | 691504 | 2.6% |
| SURREY | 593744 | 2.3% |
| LANCASHIRE | 593351 | 2.3% |
| HERTFORDSHIRE | 559718 | 2.1% |
| Other values (120) | 15825919 |
ppd_category
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.8 MiB |
| A | |
|---|---|
| B | 955840 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | A |
|---|---|
| 2nd row | A |
| 3rd row | A |
| 4th row | A |
| 5th row | A |
Common Values
| Value | Count | Frequency (%) |
| A | 25365945 | |
| B | 955840 | 3.6% |