Overview & methodology
Homelessness Risk
1 Introduction
The aim of this proof-of-concept project is to predict households most at risk of homelessness in the next 6-12 months.
As part of the Council’s prevention focused services; we aim to identify and resolve housing issues earlier. Doing this can provide us with more options to support people that deliver better, more cost-effective outcomes.
2 Target variable
“The target variable is the feature of a dataset that you want to understand more clearly. It is the variable that the user would want to predict using the rest of the dataset”1. The “target” is alternatively known as the “outcome”, or “dependent” variable.
Our target variable is:
Within 6 to 12 months a household reaches a point of housing crisis and presents to Sheffield’s Housing Options & Advice service.
Our target variable is binary: the household will or will not present to the service.
“Imbalanced datasets” are a common issue for predictive analysis and can lead to trained models that make biased predictions and show poor overall performance.
If our dataset was to remain limited to just homelessness cases, presenting to the service would be a 100% “majority class” of our target variable in our imbalanced dataset!
An alternative and hypothetical scenario at the other extreme, is that we have a dataset of the entire Sheffield population. From the 2021 census, Sheffield has 556,500 individuals in 232,000 households. Our homelessness data includes 20,110 cases over the last 6 years. We’re looking to predict cases in the next 6 to 12 months, so over a 6 month period that’s around 1,680 cases. Therefore households that present will be just 0.007% of the population, a very small “minority class” of our target variable (predicting a needle in a haystack). In a meeting early in the project, Nicola mentioned that the homelessness prediction literature highlights this as an issue and she suggested some sort of filtering.
In practice, we don’t have a dataset of the entire Sheffield population, but we will link data to create a dataset with more households than just those included in the homelessness data. In effect, our linkage of different data will provide some of the “filtering” of the whole Sheffield population that Nicola has suggested. In addition, there are statistical techniques such as “oversampling” and “undersampling” that can assist with imbalanced data.
3 Unit of measurement
The unit of measurement for our target variable is expressed above as the household. Both the Project Mandate and homelessness case records have households as our unit of measurement. However, a household is mutable over time, and indeed a change in household membership (e.g. partners splitting up) may be factor in homelessness. We also don’t know yet if, or how, the other data sources record households. This is an aspect that requires further consideration. We may need to disaggregate to individuals and reaggregate to households.
4 APPENDIX
4.1 Homelessness definition
The definition of homelessness covers both homelessness and threatened with homelessness2.
A homelessness case in our homelessness data represents a household not an individual i.e. if a household is homeless the data records this as a single case, there is not a homelessness case record for every individual in the household.
The definition of homelessness in our data is where the circumstances
field in the homelessness data is one of the following values:
- Already homeless - Relief Duty owed (include accepted local connection referrals)
- HRA case owed reapplication duty - prevention
- HRA case owed reapplication duty - relief
- Legacy case - (pre HRA also includes pre HRA reapplications)
- Local connection referral - Main duty accepted
- Threatened with homelessness - Prevention Duty owed
- Threatened with homelessness due to service of valid Section 21 Notice - Prevention Duty owed
And not one of these circumstances
values:
- Not eligible / no longer eligible
- Not threatened with homelessness within 56 days
- Withdrew application before assessment
On this basis 96% of our homelessness cases (since 2019) are homeless. 19,400 out of 20,110 over the last 6 years.
4.2 Homelessness process
The Council’s duty’s, relating to homelessness is defined by the Homelessness Reduction Act (HRA) 2017.
To what extent is the post homelessness process relevant to our risk model?
Footnotes
What is a Target Variable? h2o.ai/wiki/target-variable↩︎
Shelter’s legal definition of homlessness (england.shelter.org.uk/professional_resources/legal/homelessness_applications/homelessness_and_threatened_homelessness/legal_definition_of_homelessness_and_threatened_homelessness)↩︎