Link Council Tax

Homelessness Risk

Author

Laurie Platt

Published

Last updated 17 June, 2025

Code

import os
import datetime
import pandas as pd
import numpy as np
import hl_risk as hl

# Load the homelessness case & household membership dataframes
hl_case = hl.load_hl_case()
hl_case_hh = hl.load_hl_case_hh()

# Load the homelessness client, high level support client, 
# and rough sleeper dataframes 
hl_client = hl.load_hl_client()
hl_high = hl.load_hl_high()
hl_rough = hl.load_hl_rough()

# Load the homeless households & people dataframes
hl_hh = hl.load_hl_hh()
hl_person = hl.load_hl_person()

# Load the Council Tax arrears data frame
ctax_arrears = hl.load_ctax_arrears()

# Number of rows in each dataframe
n_hl_case = hl_case.shape[0]
n_hl_case_hh = hl_case_hh.shape[0]
n_ctax_arrears = ctax_arrears.shape[0]

# Earliest and last case
case_earliest_date = hl_case["application_date"].min()
case_last_date = hl_case["application_date"].max()

1 Examples

1.1 Linking two tables of persons

Linking without deduplication:
moj-analytical-services.github.io/splink/demos/examples/duckdb/link_only.html

Code

from splink import splink_datasets

df = splink_datasets.fake_1000

# Split a simple dataset into two, separate datasets which can be linked together.
df_l = df.sample(frac=0.5)
df_r = df.drop(df_l.index)

df_l.head(2)

downloading: https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/fake_1000.csv

Sample dataset
	unique_id	first_name	surname	dob	city	email	cluster
86	86	Charlotte	Johnson	2012-01-06	fTelford	charlottej68@lee-taylor@.org	25
937	937	Isabelle	Hall	1984-11-20	Swansae	isabelleh97m@lewis-gregory.com	235

Code

import splink.comparison_library as cl
from splink import DuckDBAPI, Linker, SettingsCreator, block_on

settings = SettingsCreator(
    link_type="link_only",
    blocking_rules_to_generate_predictions=[
        block_on("first_name"),
        block_on("surname"),
    ],
    comparisons=[
        cl.NameComparison(
            "first_name",
        ),
        cl.NameComparison("surname"),
        cl.DateOfBirthComparison(
            "dob",
            input_is_string=True,
            invalid_dates_as_null=True,
        ),
        cl.ExactMatch("city").configure(term_frequency_adjustments=True),
        cl.EmailComparison("email"),
    ],
)

linker = Linker(
    [df_l, df_r],
    settings,
    db_api=DuckDBAPI(),
    input_table_aliases=["df_left", "df_right"],
)

from splink.exploratory import completeness_chart

completeness_chart(
    [df_l, df_r],
    cols=["first_name", "surname", "dob", "city", "email"],
    db_api=DuckDBAPI(),
    table_names_for_chart=["df_left", "df_right"],
)

2 Splink

We’re using Splink for record linkage.

2.1 Resources

2.2 Official documentation

Main website: moj-analytical-services.github.io/splink
GitHub repository: github.com/moj-analytical-services/splink
Tutorial: moj-analytical-services.github.io/splink/demos/tutorials/00_Tutorial_Introduction.html
Examples: moj-analytical-services.github.io/splink/demos/examples/examples_index.html

2.3 Blog posts

Splink - MoJ’s open source library for probabilistic record linkage at scale:
www.gov.uk/government/publications/joined-up-data-in-government-the-future-of-data-linking-methods/splink-mojs-open-source-library-for-probabilistic-record-linkage-at-scale
Splink - Fast, accurate and scalable record linkage:
dataingovernment.blog.gov.uk/2022/09/23/splink-fast-accurate-and-scalable-record-linkage
Towards Data Science - Record Linkage:
towardsdatascience.com/tag/record-linkage
Real World Data Science - Splink case study:
https://realworlddatascience.net/case-studies/posts/2023/11/22/splink.html
NICD.org.uk - Splink ID guide:
nicd.org.uk/knowledge-hub/an-end-to-end-guide-to-overcoming-unique-identifier-challenges-with-splink

2.4 Videos and chat

Mastering Record Linkage with Splink - The Big Analytics Query Ep.6:
www.youtube.com/watch?v=C6HOItmlv8A
Gov Data Science Slack #chat-data-linking channel:
govdatascience.slack.com/archives/C01033UJNSH

--- title: "Link Council Tax" subtitle: "Homelessness Risk" author: "Laurie Platt" date: last-modified date-format: "[Last updated ] DD MMMM, YYYY" format: html: code-tools: true code-fold: true toc: true toc-location: left toc-depth: 5 toc-expand: 1 number-sections: true fig-cap-location: top code-links: - text: dev.azure.com/.../homeless-risk icon: git href: https://dev.azure.com/SheffieldCityCouncil/DataScience/_git/homeless-risk execute: warning: false message: false include-in-header: - text: | <style> .table { width: auto; } </style> --- ```{python} #| label: setup #| context: setup import os import datetime import pandas as pd import numpy as np import hl_risk as hl # Load the homelessness case & household membership dataframes hl_case = hl.load_hl_case() hl_case_hh = hl.load_hl_case_hh() # Load the homelessness client, high level support client, # and rough sleeper dataframes hl_client = hl.load_hl_client() hl_high = hl.load_hl_high() hl_rough = hl.load_hl_rough() # Load the homeless households & people dataframes hl_hh = hl.load_hl_hh() hl_person = hl.load_hl_person() # Load the Council Tax arrears data frame ctax_arrears = hl.load_ctax_arrears() # Number of rows in each dataframe n_hl_case = hl_case.shape[0] n_hl_case_hh = hl_case_hh.shape[0] n_ctax_arrears = ctax_arrears.shape[0] # Earliest and last case case_earliest_date = hl_case["application_date"].min() case_last_date = hl_case["application_date"].max() ``` ## Examples ### Linking two tables of persons Linking without deduplication: [moj-analytical-services.github.io/splink/demos/examples/duckdb/link_only.html](https://moj-analytical-services.github.io/splink/demos/examples/duckdb/link_only.html) ```{python} #| label: sample-data #| tbl-cap: "Sample dataset" from splink import splink_datasets df = splink_datasets.fake_1000 # Split a simple dataset into two, separate datasets which can be linked together. df_l = df.sample(frac=0.5) df_r = df.drop(df_l.index) df_l.head(2) ``` <br/> ```{python} #| label: link #| tbl-cap: "Completeness chart" import splink.comparison_library as cl from splink import DuckDBAPI, Linker, SettingsCreator, block_on settings = SettingsCreator( link_type="link_only", blocking_rules_to_generate_predictions=[ block_on("first_name"), block_on("surname"), ], comparisons=[ cl.NameComparison( "first_name", ), cl.NameComparison("surname"), cl.DateOfBirthComparison( "dob", input_is_string=True, invalid_dates_as_null=True, ), cl.ExactMatch("city").configure(term_frequency_adjustments=True), cl.EmailComparison("email"), ], ) linker = Linker( [df_l, df_r], settings, db_api=DuckDBAPI(), input_table_aliases=["df_left", "df_right"], ) from splink.exploratory import completeness_chart completeness_chart( [df_l, df_r], cols=["first_name", "surname", "dob", "city", "email"], db_api=DuckDBAPI(), table_names_for_chart=["df_left", "df_right"], ) ``` ## Splink We're using Splink for record linkage. ### Resources ### Official documentation Main website: [moj-analytical-services.github.io/splink](https://moj-analytical-services.github.io/splink/) GitHub repository: [github.com/moj-analytical-services/splink](https://github.com/moj-analytical-services/splink) Tutorial: [moj-analytical-services.github.io/splink/demos/tutorials/00_Tutorial_Introduction.html](https://moj-analytical-services.github.io/splink/demos/tutorials/00_Tutorial_Introduction.html) Examples: [moj-analytical-services.github.io/splink/demos/examples/examples_index.html](https://moj-analytical-services.github.io/splink/demos/examples/examples_index.html) ### Blog posts Splink - MoJ's open source library for probabilistic record linkage at scale: [www.gov.uk/government/publications/joined-up-data-in-government-the-future-of-data-linking-methods/splink-mojs-open-source-library-for-probabilistic-record-linkage-at-scale](https://www.gov.uk/government/publications/joined-up-data-in-government-the-future-of-data-linking-methods/splink-mojs-open-source-library-for-probabilistic-record-linkage-at-scale) Splink - Fast, accurate and scalable record linkage: [dataingovernment.blog.gov.uk/2022/09/23/splink-fast-accurate-and-scalable-record-linkage](https://dataingovernment.blog.gov.uk/2022/09/23/splink-fast-accurate-and-scalable-record-linkage/) Towards Data Science - Record Linkage: [towardsdatascience.com/tag/record-linkage](https://towardsdatascience.com/tag/record-linkage/) Real World Data Science - Splink case study: [https://realworlddatascience.net/case-studies/posts/2023/11/22/splink.html](https://realworlddatascience.net/case-studies/posts/2023/11/22/splink.html) NICD.org.uk - Splink ID guide: [nicd.org.uk/knowledge-hub/an-end-to-end-guide-to-overcoming-unique-identifier-challenges-with-splink](https://nicd.org.uk/knowledge-hub/an-end-to-end-guide-to-overcoming-unique-identifier-challenges-with-splink) ### Videos and chat Mastering Record Linkage with Splink - The Big Analytics Query Ep.6: [www.youtube.com/watch?v=C6HOItmlv8A](https://www.youtube.com/watch?v=C6HOItmlv8A) Gov Data Science Slack *#chat-data-linking* channel: [govdatascience.slack.com/archives/C01033UJNSH](https://govdatascience.slack.com/archives/C01033UJNSH)