We’ve already specified all the table columns and their data types when we imported the different data source files. The objective is to produce an ERD (Entity Relationship Diagram) without having to specify these again. We will have to specify the relationships between these tables as this has not been done yet.
1.1 eralchemy
Based on functionality and evidence of ongoing maintenance and development, we’ve initially chosen Python package {eralchemy} as an ERD tool.
On a Windows 10 laptop the Graphviz binaries were installed via Scoop: scoop install graphviz (bjansen.github.io/scoop-apps/main/graphviz/). We’re probably not able to install it on a Windows 11 laptop or the Reporting AVD. Our Azure Pipeline for the Data Science website uses an Ubuntu VM so we need to add: sudo apt install graphviz.
1.3 SQLAlchemy
{eralchemy} is built on the {SQLAlchemy} python package.
In addition to an ERD {SQLAlchemy} provides other benefits:
Simplifies creating a database to work from rather than individual Pickle files.
Enables us to easily switch from using one database type to another e.g. from SQLite to SQL Server.
“An Object Relation Mapper (ORM) library provides a seamless interface between a class and a SQL table. A class is mapped to a certain table in database, so that cumbersome to and fro conversion between object and SQL types is automated. The products class in Python code can be mapped to Products table in the database. As a result, all CRUD operations are done with the help of object only, not requiring hard coded SQL queries to be used in Python script.
ORMs thus provide an abstraction layer over the raw SQL queries, thus enabling rapid application development. Such ORM libraries are available for most programming languages including Python. SQLAlchemy is a popular database toolkit widely used by Python developers. SQLALchemy’s ORM system transparently synchronizes all changes in state of an object of user defined class with its related row in the database table.
SQLAlchemy interacts with certain type of database in association with respective DB-API compliant module. Its dialect system is able to establish interaction with a database through latter’s DB-API driver. That means, you should have corresponding DB-API module also installed along with SQLAlchemy to be able to use particular type of RDBMS.
As a matter of fact, SQLAlchemy library also contains, in addition to ORM API, the SQL Expression Language (SQLAlchemy Core) that executes primitive constructs of the relational database directly. While our focus in this chapter is on SQLALChemy ORM, we shall also briefly SQL Expression language in the end.”
import osimport datetimeimport pandas as pdimport numpy as npimport hl_risk as hl# Load the homelessness case & household membership dataframeshl_case = hl.load_hl_case()hl_case_hh = hl.load_hl_case_hh()# Load the homelessness client, high level support client, # and rough sleeper dataframes hl_client = hl.load_hl_client()hl_high = hl.load_hl_high()hl_rough = hl.load_hl_rough()# Load the homeless households & people dataframeshl_hh = hl.load_hl_hh()hl_person = hl.load_hl_person()# Number of rows in each dataframen_hl_case = hl_case.shape[0]n_hl_case_hh = hl_case_hh.shape[0]# Earliest and last casecase_earliest_date = hl_case["application_date"].min()case_last_date = hl_case["application_date"].max()
3 ER diagram
An Entity Relation (ER) diagram of the homelessness data.
3.1 SQLite database
Create a SQLite database from our dataframes.
Code
from __future__ import annotationsfrom typing import Listfrom sqlalchemy import create_enginefrom sqlalchemy import Integerfrom sqlalchemy import Columnfrom sqlalchemy import ForeignKeyfrom sqlalchemy.orm import DeclarativeBase #??from sqlalchemy.ext.automap import automap_basefrom sqlalchemy.orm import Mappedfrom sqlalchemy.orm import mapped_columnfrom sqlalchemy.orm import relationshipfrom eralchemy import render_er# Path of our database with just dataframe tablesharh_dfs_db ="data/harh-dfs.db"# Delete database if it already existsif os.path.exists(harh_dfs_db): os.remove(harh_dfs_db)# Create new SQLite database fileurl_engine_dfs ="sqlite:///"+ harh_dfs_dbengine_dfs = create_engine(url_engine_dfs)# Save dataframes as SQLite tableshl_case.to_sql('hl_case', engine_dfs, if_exists='replace')hl_case_hh.to_sql('hl_case_hh', engine_dfs, if_exists='replace')hl_client.to_sql('hl_client', engine_dfs, if_exists='replace')# class Base(DeclarativeBase):# pass# Draw from databaserender_er(url_engine_dfs, 'resources/erd_harh_dfs.png')
3.2 Relationships
Add the relationships between the database tables.
TODO: More efficient way of adding creating tables with relationships
Code
# Path of our project database (with dataframe tables & relationships)harh_db ="data/harh.db"# Delete database if it already existsif os.path.exists(harh_db): os.remove(harh_db)# Create new SQLite database fileurl_engine ="sqlite:///"+ harh_dbengine = create_engine(url_engine)# Create a second SQLite database fileengine = create_engine(url_engine)# Use automappingBase = automap_base()# Define relationshipsclass HlCase(Base): __tablename__ ="hl_case" hra_case_id: Mapped[int] = mapped_column(primary_key=True) children: Mapped[List["HlClient"]] = relationship(back_populates="parent")class HlClient(Base): __tablename__ ="hl_client" index = mapped_column(Integer, primary_key=True) hra_case_id = mapped_column(ForeignKey("hl_case.hra_case_id")) parent = relationship("Parent", back_populates="children")# Continue with automappingBase.prepare(engine_dfs)# Write schema to new SQLite dbBase.metadata.create_all(engine)# Append rows from dataframes to SQLite tableshl_case.to_sql('hl_case', engine, if_exists='append')hl_case_hh.to_sql('hl_case_hh', engine, if_exists='append')hl_client.to_sql('hl_client', engine, if_exists='append')# # Draw from SQLAlchemy base# render_er(Base, 'resources/erd_harh.png')# Draw from databaserender_er(url_engine, 'resources/erd_harh.png')
Footnotes
ch9.1 Lathkar, M. (2019) Python Data Persistence↩︎
Source Code
---title: "Entity Relationship Diagram"subtitle: "Homelessness Risk"author: "Laurie Platt"date: last-modifieddate-format: "[Last updated ] DD MMMM, YYYY"format: html: code-tools: true code-fold: true toc: true toc-location: left toc-depth: 5 toc-expand: 1 number-sections: true fig-cap-location: top code-links: - text: dev.azure.com/.../homeless-risk icon: git href: https://dev.azure.com/SheffieldCityCouncil/DataScience/_git/homeless-riskexecute: warning: false message: falseinclude-in-header: - text: | <style> .table { width: auto; } </style>---# IntroductionWe've already specified all the table columns and their data types when we imported the different data source files. The objective is to produce an ERD (Entity Relationship Diagram) without having to specify these again. We will have to specify the relationships between these tables as this has not been done yet. ## eralchemyBased on functionality and evidence of ongoing maintenance and development, we've initially chosen Python package [{eralchemy}](https://pypi.org/project/eralchemy/) as an ERD tool. ## Graphviz{eralchemy} requires [Graphviz](https://graphviz.org/) installed, both: 1. Python package - [{graphviz}](https://pypi.org/project/graphviz/), and 2. Binaries - [Graphviz download](https://www.graphviz.org/download/)On a Windows 10 laptop the Graphviz binaries were installed via Scoop: `scoop install graphviz` ([bjansen.github.io/scoop-apps/main/graphviz/](https://bjansen.github.io/scoop-apps/main/graphviz/)). We're probably not able to install it on a Windows 11 laptop or the *Reporting* AVD. Our Azure Pipeline for the [Data Science website](https://scc-data-science.sheffield.gov.uk/) uses an Ubuntu VM so we need to add: `sudo apt install graphviz`. ## SQLAlchemy {eralchemy} is built on the {[SQLAlchemy](https://www.sqlalchemy.org/)} python package. In addition to an ERD {SQLAlchemy} provides other benefits: 1. Simplifies creating a database to work from rather than individual Pickle files. 1. Enables us to easily switch from using one database type to another e.g. from SQLite to SQL Server. Most of our ERD code relates to SQLAlchemy ORM. ### What is ORM?^[ch9.1 Lathkar, M. (2019) Python Data Persistence]"An Object Relation Mapper (ORM) library provides a seamless interface between a class and a SQL table. A class is mapped to a certain table in database, so that cumbersome to and fro conversion between object and SQL types is automated. The products class in Python code can be mapped to Products table in the database. As a result, all CRUD operations are done with the help of object only, not requiring hard coded SQL queries to be used in Python script. ORMs thus provide an abstraction layer over the raw SQL queries, thus enabling rapid application development. Such ORM libraries are available for most programming languages including Python. SQLAlchemy is a popular database toolkit widely used by Python developers. SQLALchemy’s ORM system transparently synchronizes all changes in state of an object of user defined class with its related row in the database table.SQLAlchemy interacts with certain type of database in association with respective DB-API compliant module. Its dialect system is able to establish interaction with a database through latter’s DB-API driver. That means, you should have corresponding DB-API module also installed along with SQLAlchemy to be able to use particular type of RDBMS.As a matter of fact, SQLAlchemy library also contains, in addition to ORM API, the SQL Expression Language (SQLAlchemy Core) that executes primitive constructs of the relational database directly. While our focus in this chapter is on SQLALChemy ORM, we shall also briefly SQL Expression language in the end." ### SQLAlchemy resources [Automap](https://docs.sqlalchemy.org/en/20/orm/extensions/automap.html)[Set a primary key to an existing column](https://stackoverflow.com/questions/45703143/sqlalchemy-setting-a-primary-key-to-a-pre-existing-db-table-not-using-sqlite)[ORM Quick Start](https://docs.sqlalchemy.org/en/20/orm/quickstart.html)[SQLAlchemy Unified Tutorial](https://docs.sqlalchemy.org/en/20/tutorial/index.html#unified-tutorial)[Basic Relationship Patterns](https://docs.sqlalchemy.org/en/20/orm/basic_relationships.html)# Original Setup ```{python}#| label: setup#| context: setupimport osimport datetimeimport pandas as pdimport numpy as npimport hl_risk as hl# Load the homelessness case & household membership dataframeshl_case = hl.load_hl_case()hl_case_hh = hl.load_hl_case_hh()# Load the homelessness client, high level support client, # and rough sleeper dataframes hl_client = hl.load_hl_client()hl_high = hl.load_hl_high()hl_rough = hl.load_hl_rough()# Load the homeless households & people dataframeshl_hh = hl.load_hl_hh()hl_person = hl.load_hl_person()# Number of rows in each dataframen_hl_case = hl_case.shape[0]n_hl_case_hh = hl_case_hh.shape[0]# Earliest and last casecase_earliest_date = hl_case["application_date"].min()case_last_date = hl_case["application_date"].max()```# ER diagramAn Entity Relation (ER) diagram of the homelessness data.## SQLite databaseCreate a SQLite database from our dataframes. ```{python}#| label: sqlite-dfsfrom __future__ import annotationsfrom typing import Listfrom sqlalchemy import create_enginefrom sqlalchemy import Integerfrom sqlalchemy import Columnfrom sqlalchemy import ForeignKeyfrom sqlalchemy.orm import DeclarativeBase #??from sqlalchemy.ext.automap import automap_basefrom sqlalchemy.orm import Mappedfrom sqlalchemy.orm import mapped_columnfrom sqlalchemy.orm import relationshipfrom eralchemy import render_er# Path of our database with just dataframe tablesharh_dfs_db ="data/harh-dfs.db"# Delete database if it already existsif os.path.exists(harh_dfs_db): os.remove(harh_dfs_db)# Create new SQLite database fileurl_engine_dfs ="sqlite:///"+ harh_dfs_dbengine_dfs = create_engine(url_engine_dfs)# Save dataframes as SQLite tableshl_case.to_sql('hl_case', engine_dfs, if_exists='replace')hl_case_hh.to_sql('hl_case_hh', engine_dfs, if_exists='replace')hl_client.to_sql('hl_client', engine_dfs, if_exists='replace')# class Base(DeclarativeBase):# pass# Draw from databaserender_er(url_engine_dfs, 'resources/erd_harh_dfs.png')```## RelationshipsAdd the relationships between the database tables.TODO: More efficient way of adding creating tables with relationships```{python}#| label: relationships# Path of our project database (with dataframe tables & relationships)harh_db ="data/harh.db"# Delete database if it already existsif os.path.exists(harh_db): os.remove(harh_db)# Create new SQLite database fileurl_engine ="sqlite:///"+ harh_dbengine = create_engine(url_engine)# Create a second SQLite database fileengine = create_engine(url_engine)# Use automappingBase = automap_base()# Define relationshipsclass HlCase(Base): __tablename__ ="hl_case" hra_case_id: Mapped[int] = mapped_column(primary_key=True) children: Mapped[List["HlClient"]] = relationship(back_populates="parent")class HlClient(Base): __tablename__ ="hl_client" index = mapped_column(Integer, primary_key=True) hra_case_id = mapped_column(ForeignKey("hl_case.hra_case_id")) parent = relationship("Parent", back_populates="children")# Continue with automappingBase.prepare(engine_dfs)# Write schema to new SQLite dbBase.metadata.create_all(engine)# Append rows from dataframes to SQLite tableshl_case.to_sql('hl_case', engine, if_exists='append')hl_case_hh.to_sql('hl_case_hh', engine, if_exists='append')hl_client.to_sql('hl_client', engine, if_exists='append')# # Draw from SQLAlchemy base# render_er(Base, 'resources/erd_harh.png')# Draw from databaserender_er(url_engine, 'resources/erd_harh.png')```