Entity Relationship Diagram

Homelessness Risk

Author

Laurie Platt

Published

Last updated 13 June, 2025

1 Introduction

We’ve already specified all the table columns and their data types when we imported the different data source files. The objective is to produce an ERD (Entity Relationship Diagram) without having to specify these again. We will have to specify the relationships between these tables as this has not been done yet.

1.1 eralchemy

Based on functionality and evidence of ongoing maintenance and development, we’ve initially chosen Python package {eralchemy} as an ERD tool.

1.2 Graphviz

{eralchemy} requires Graphviz installed, both:

  1. Python package - {graphviz}, and
  2. Binaries - Graphviz download

On a Windows 10 laptop the Graphviz binaries were installed via Scoop: scoop install graphviz (bjansen.github.io/scoop-apps/main/graphviz/). We’re probably not able to install it on a Windows 11 laptop or the Reporting AVD. Our Azure Pipeline for the Data Science website uses an Ubuntu VM so we need to add: sudo apt install graphviz.

1.3 SQLAlchemy

{eralchemy} is built on the {SQLAlchemy} python package.

In addition to an ERD {SQLAlchemy} provides other benefits:

  1. Simplifies creating a database to work from rather than individual Pickle files.
  2. Enables us to easily switch from using one database type to another e.g. from SQLite to SQL Server.

Most of our ERD code relates to SQLAlchemy ORM.

1.3.1 What is ORM?1

“An Object Relation Mapper (ORM) library provides a seamless interface between a class and a SQL table. A class is mapped to a certain table in database, so that cumbersome to and fro conversion between object and SQL types is automated. The products class in Python code can be mapped to Products table in the database. As a result, all CRUD operations are done with the help of object only, not requiring hard coded SQL queries to be used in Python script.

ORMs thus provide an abstraction layer over the raw SQL queries, thus enabling rapid application development. Such ORM libraries are available for most programming languages including Python. SQLAlchemy is a popular database toolkit widely used by Python developers. SQLALchemy’s ORM system transparently synchronizes all changes in state of an object of user defined class with its related row in the database table.

SQLAlchemy interacts with certain type of database in association with respective DB-API compliant module. Its dialect system is able to establish interaction with a database through latter’s DB-API driver. That means, you should have corresponding DB-API module also installed along with SQLAlchemy to be able to use particular type of RDBMS.

As a matter of fact, SQLAlchemy library also contains, in addition to ORM API, the SQL Expression Language (SQLAlchemy Core) that executes primitive constructs of the relational database directly. While our focus in this chapter is on SQLALChemy ORM, we shall also briefly SQL Expression language in the end.”

SQLAlchemy

1.3.2 SQLAlchemy resources

Automap
Set a primary key to an existing column
ORM Quick Start
SQLAlchemy Unified Tutorial
Basic Relationship Patterns

2 Original Setup

Code
import os
import datetime
import pandas as pd
import numpy as np
import hl_risk as hl

# Load the homelessness case & household membership dataframes
hl_case = hl.load_hl_case()
hl_case_hh = hl.load_hl_case_hh()

# Load the homelessness client, high level support client, 
# and rough sleeper dataframes 

hl_client = hl.load_hl_client()
hl_high = hl.load_hl_high()
hl_rough = hl.load_hl_rough()

# Load the homeless households & people dataframes
hl_hh = hl.load_hl_hh()
hl_person = hl.load_hl_person()

# Number of rows in each dataframe
n_hl_case = hl_case.shape[0]
n_hl_case_hh = hl_case_hh.shape[0]

# Earliest and last case
case_earliest_date = hl_case["application_date"].min()
case_last_date = hl_case["application_date"].max()

3 ER diagram

An Entity Relation (ER) diagram of the homelessness data.

3.1 SQLite database

Create a SQLite database from our dataframes.

Code
from __future__ import annotations
from typing import List
from sqlalchemy import create_engine
from sqlalchemy import Integer
from sqlalchemy import Column
from sqlalchemy import ForeignKey
from sqlalchemy.orm import DeclarativeBase #??
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Mapped
from sqlalchemy.orm import mapped_column
from sqlalchemy.orm import relationship
from eralchemy import render_er

# Path of our database with just dataframe tables
harh_dfs_db = "data/harh-dfs.db" 

# Delete database if it already exists
if os.path.exists(harh_dfs_db):
    os.remove(harh_dfs_db)

# Create new SQLite database file
url_engine_dfs = "sqlite:///" + harh_dfs_db
engine_dfs = create_engine(url_engine_dfs)

# Save dataframes as SQLite tables
hl_case.to_sql('hl_case', engine_dfs, if_exists='replace')
hl_case_hh.to_sql('hl_case_hh', engine_dfs, if_exists='replace')
hl_client.to_sql('hl_client', engine_dfs, if_exists='replace')

# class Base(DeclarativeBase):
#     pass

# Draw from database
render_er(url_engine_dfs, 'resources/erd_harh_dfs.png')

3.2 Relationships

Add the relationships between the database tables.

TODO: More efficient way of adding creating tables with relationships

Code
# Path of our project database (with dataframe tables & relationships)
harh_db = "data/harh.db" 

# Delete database if it already exists
if os.path.exists(harh_db):
    os.remove(harh_db)

# Create new SQLite database file
url_engine = "sqlite:///" + harh_db
engine = create_engine(url_engine)

# Create a second SQLite database file
engine = create_engine(url_engine)

# Use automapping
Base = automap_base()

# Define relationships
class HlCase(Base):
    __tablename__ = "hl_case"

    hra_case_id: Mapped[int] = mapped_column(primary_key=True)
    children: Mapped[List["HlClient"]] = relationship(back_populates="parent")

class HlClient(Base):
    __tablename__ = "hl_client"

    index = mapped_column(Integer, primary_key=True)
    hra_case_id = mapped_column(ForeignKey("hl_case.hra_case_id"))
    parent = relationship("Parent", back_populates="children")

# Continue with automapping
Base.prepare(engine_dfs)

# Write schema to new SQLite db
Base.metadata.create_all(engine)

# Append rows from dataframes to SQLite tables
hl_case.to_sql('hl_case', engine, if_exists='append')
hl_case_hh.to_sql('hl_case_hh', engine, if_exists='append')
hl_client.to_sql('hl_client', engine, if_exists='append')

# # Draw from SQLAlchemy base
# render_er(Base, 'resources/erd_harh.png')

# Draw from database
render_er(url_engine, 'resources/erd_harh.png')

Footnotes

  1. ch9.1 Lathkar, M. (2019) Python Data Persistence↩︎