---
title: "Link Council Tax"
subtitle: "Homelessness Risk"
author: "Laurie Platt"
date: last-modified
date-format: "[Last updated ] DD MMMM, YYYY"
format:
html:
code-tools: true
code-fold: true
toc: true
toc-location: left
toc-depth: 5
toc-expand: 1
number-sections: true
fig-cap-location: top
code-links:
- text: dev.azure.com/.../homeless-risk
icon: git
href: https://dev.azure.com/SheffieldCityCouncil/DataScience/_git/homeless-risk
execute:
warning: false
message: false
include-in-header:
- text: |
<style>
.table {
width: auto;
}
</style>
---
```{python}
#| label: setup
#| context: setup
import os
import datetime
import pandas as pd
import numpy as np
import hl_risk as hl
# Load the homelessness case & household membership dataframes
hl_case = hl.load_hl_case()
hl_case_hh = hl.load_hl_case_hh()
# Load the homelessness client, high level support client,
# and rough sleeper dataframes
hl_client = hl.load_hl_client()
hl_high = hl.load_hl_high()
hl_rough = hl.load_hl_rough()
# Load the homeless households & people dataframes
hl_hh = hl.load_hl_hh()
hl_person = hl.load_hl_person()
# Load the Council Tax arrears data frame
ctax_arrears = hl.load_ctax_arrears()
# Number of rows in each dataframe
n_hl_case = hl_case.shape[0 ]
n_hl_case_hh = hl_case_hh.shape[0 ]
n_ctax_arrears = ctax_arrears.shape[0 ]
# Earliest and last case
case_earliest_date = hl_case["application_date" ].min ()
case_last_date = hl_case["application_date" ].max ()
```
## Examples
### Linking two tables of persons
Linking without deduplication:
[ moj-analytical-services.github.io/splink/demos/examples/duckdb/link_only.html ](https://moj-analytical-services.github.io/splink/demos/examples/duckdb/link_only.html)
```{python}
#| label: sample-data
#| tbl-cap: "Sample dataset"
from splink import splink_datasets
df = splink_datasets.fake_1000
# Split a simple dataset into two, separate datasets which can be linked together.
df_l = df.sample(frac= 0.5 )
df_r = df.drop(df_l.index)
df_l.head(2 )
```
< br />
```{python}
#| label: link
#| tbl-cap: "Completeness chart"
import splink.comparison_library as cl
from splink import DuckDBAPI, Linker, SettingsCreator, block_on
settings = SettingsCreator(
link_type= "link_only" ,
blocking_rules_to_generate_predictions= [
block_on("first_name" ),
block_on("surname" ),
],
comparisons= [
cl.NameComparison(
"first_name" ,
),
cl.NameComparison("surname" ),
cl.DateOfBirthComparison(
"dob" ,
input_is_string= True ,
invalid_dates_as_null= True ,
),
cl.ExactMatch("city" ).configure(term_frequency_adjustments= True ),
cl.EmailComparison("email" ),
],
)
linker = Linker(
[df_l, df_r],
settings,
db_api= DuckDBAPI(),
input_table_aliases= ["df_left" , "df_right" ],
)
from splink.exploratory import completeness_chart
completeness_chart(
[df_l, df_r],
cols= ["first_name" , "surname" , "dob" , "city" , "email" ],
db_api= DuckDBAPI(),
table_names_for_chart= ["df_left" , "df_right" ],
)
```
## Splink
We're using Splink for record linkage.
### Resources
### Official documentation
Main website: [ moj-analytical-services.github.io/splink ](https://moj-analytical-services.github.io/splink/)
GitHub repository: [ github.com/moj-analytical-services/splink ](https://github.com/moj-analytical-services/splink)
Tutorial: [ moj-analytical-services.github.io/splink/demos/tutorials/00_Tutorial_Introduction.html ](https://moj-analytical-services.github.io/splink/demos/tutorials/00_Tutorial_Introduction.html)
Examples: [ moj-analytical-services.github.io/splink/demos/examples/examples_index.html ](https://moj-analytical-services.github.io/splink/demos/examples/examples_index.html)
### Blog posts
Splink - MoJ's open source library for probabilistic record linkage at scale:
[ www.gov.uk/government/publications/joined-up-data-in-government-the-future-of-data-linking-methods/splink-mojs-open-source-library-for-probabilistic-record-linkage-at-scale ](https://www.gov.uk/government/publications/joined-up-data-in-government-the-future-of-data-linking-methods/splink-mojs-open-source-library-for-probabilistic-record-linkage-at-scale)
Splink - Fast, accurate and scalable record linkage:
[ dataingovernment.blog.gov.uk/2022/09/23/splink-fast-accurate-and-scalable-record-linkage ](https://dataingovernment.blog.gov.uk/2022/09/23/splink-fast-accurate-and-scalable-record-linkage/)
Towards Data Science - Record Linkage:
[ towardsdatascience.com/tag/record-linkage ](https://towardsdatascience.com/tag/record-linkage/)
Real World Data Science - Splink case study:
[ https://realworlddatascience.net/case-studies/posts/2023/11/22/splink.html ](https://realworlddatascience.net/case-studies/posts/2023/11/22/splink.html)
NICD.org.uk - Splink ID guide:
[ nicd.org.uk/knowledge-hub/an-end-to-end-guide-to-overcoming-unique-identifier-challenges-with-splink ](https://nicd.org.uk/knowledge-hub/an-end-to-end-guide-to-overcoming-unique-identifier-challenges-with-splink)
### Videos and chat
Mastering Record Linkage with Splink - The Big Analytics Query Ep.6:
[ www.youtube.com/watch?v=C6HOItmlv8A ](https://www.youtube.com/watch?v=C6HOItmlv8A)
Gov Data Science Slack *#chat-data-linking* channel:
[ govdatascience.slack.com/archives/C01033UJNSH ](https://govdatascience.slack.com/archives/C01033UJNSH)