Code
load(str_c(data_folder,"attendance_inclusion_data_model.RData"))
load(str_c(data_folder,"attendance_inclusion_data_model.RData"))
This work was undertaken by the Sheffield City Council Business Intelligence team from around September 2023. New analysis was carried out on available data with the aim of understanding school attendance Sheffield and informing the requirements of the city’s response. This report summarises the findings of that analysis, along with commentary derived from discussions of those findings with colleagues in SCC, Learn Sheffield and from Sheffield schools.
This report covers the following:
Within the same analysis but out of scope of this report are:
Unless otherwise stated, absence refers to both authorised and unauthorised absences. Correspondingly, attendance refers to registered time in the classroom. Absence in this report may include periods of study leave, approved offsite activity
Unless otherwise stated, the word year refers to the academic or exam year. So 2023 refers to the period of schooling between September ’22 and July ’23.
Attendance, exclusion and school registration data and student details used in this report are from Capita One, retrieved from the OSCAR database, which is maintained by the Performance & Analysis Service (PAS). Supplementary information on school types and locations, geography & deprivation are held in spreadsheets.
An R script gathers, combines, processes and aggregates this data into a data model. That data model was last updated 16/8/24 to include the first release of the full year 2024 attendance data.
A more detailed description of this data processing is provided in the appendix.
v0.1 - 1/7/24 - Giles Robinson. First complete draft for circulation.
v0.2 - 16/8/24 - GR - updated with latest data, full 2024 academic year, various revisions, analysis of daily data; young carers
Recent changes in overall attendance, by the major reasons covered by Department for Education (DfE) absence codes. We also discuss some codes that do not count as absences, but contribute to the picture around attendance, such as late present.
The COVID-19 pandemic and lockdowns saw a significant drop in attendance rates - although many of these trends beggan before the pandemic. Secondary age pupils were affected more than those in primary. The 2024 data shows a gap continuing to open up, with primary school attendance improving but worsening in secondary.
A significant factor in secondary school absence in 2024 is the return of study leave as a coded absence reason (see below). This accounts for around 1% of secondary age absences.
ggplot(attend_year_phase |>
filter(phase %in% c("Primary","Secondary"),
>= 2018
year |>
) mutate(grey_flag = if_else(year==2025,0,1))
aes(x = year, y = percent_present, alpha = grey_flag,
, colour = phase)) +
geom_point() +
geom_line(linetype = "dashed", alpha = 0.5) +
geom_text(aes(label = year), size = 3, vjust = 1.5) +
+ # Use theme_minimal() instead of barplottheme_minimal
barplottheme_minimal scale_y_continuous(labels = scales::percent) +
labs(title = "Overall attendance in Sheffield <b><span style='color:#dd5129'>primary </span></b>and <b><span style='color:#0f7ba2;'>secondary</b></span> schools",
subtitle = "percentage of available sessions marked present per year; 2025 is Autumn term only",
caption = "data from Capita One") +
annotate("text", x = 2020.3, y = 0.9, label = "COVID-19", size = 2.5, hjust = 1.1) +
geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none",
axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) +
::scale_fill_met_d("Egypt")+
MetBrewercoord_cartesian(clip = "off")
Our data prior to 2018 is less reliable and less complete, but taking a longer view suggests that at least some of the drivers of recent trends predate the pandemic. Attendance was improving to a peak in 2016, and was gradually dropping away from there, particularly in secondary schools. Things were getting worse before COVID, but the pandemic changed everything, and attendance at secondary schools is now lower than at any year for which we have data.
ggplot(attend_year_phase |>
filter(phase %in% c("Primary","Secondary")) |>
mutate(grey_flag = if_else(year==2025,0,1))
aes(x = year, y = percent_present, alpha = grey_flag,
, colour = phase)) +
geom_point() +
geom_line(linetype = "dashed", alpha = 0.5) +
geom_text(aes(label = year), size = 3, vjust = 1.5) +
+ # Use theme_minimal() instead of barplottheme_minimal
barplottheme_minimal scale_y_continuous(labels = scales::percent) +
labs(title = "Attendance in Sheffield <b><span style='color:#dd5129'>primary </span></b>and <b><span style='color:#0f7ba2;'>secondary</b></span> schools",
subtitle = "All available data; percentage of available sessions marked present per year; 2025 is Autumn term only",
caption = "data from Capita One") +
annotate("text", x = 2020.3, y = 0.9, label = "COVID-19", size = 3, hjust = 1.1) +
geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none",
axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) +
::scale_fill_met_d("Egypt")+
MetBrewercoord_cartesian(clip = "off")
The recorded data on illness shows an increase year on year. The big rise into 2022 particularly affected primary age children, and was probably a mixture of COVID-19 itself and post-lockdown viral bounce-back.
Patterns in the day level data, and feedback from head teachers suggests that we are probably not seeing the true picture on illness. Differences in reporting (and the honesty of parents), policy and recording may be as significant here as changes in actual illness. See the later charts plotting a full year day-by-day.
ggplot(attend_year_phase |>
filter(phase %in% c("Primary","Secondary"),
>= 2018) |>
year mutate(grey_flag = if_else(year==2025,0,1))
aes(x = year, y = percent_illness, alpha = grey_flag,
, colour = phase)) +
geom_point() +
geom_line(linetype = "dashed", alpha = 0.5) +
geom_text(aes(label = year), size = 3, vjust = 1.5) +
+ # Use theme_minimal() instead of barplottheme_minimal
barplottheme_minimal scale_y_continuous(labels = scales::percent, limits = c(0,0.04)) +
labs(title = "Illness by academic year in <b><span style='color:#dd5129'>primary </span></b>and <b><span style='color:#0f7ba2;'>secondary</b></span> schools",
subtitle = "percentage of sessions missed per year (code I); 2025 is Autumn term only",
caption = "data from Capita One") +
#annotate("text", x = 2020.3, y = 0.001, label = "COVID-19", size = 3, hjust = 1.1) +
#geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none",
axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) +
::scale_fill_met_d("Egypt")+
MetBrewercoord_cartesian(clip = "off")
The data on illness is worth monitoring in 2025 and considering in relation to schools’ recording policies, particularly at the day level, and in proximity to bank holidays and half term holidays. It’s possible that the new DfE rules around penalty notices and family holidays will create perverse incentives to increase reported illness rates.
Head teachers report that lateness, even if marked as late present can have a significant can impact on activies that are regularly done first thing in the day, such as phonics. Lateness can be recorded as late present or late absent, the latter meaning that the child attends only after the registers have closed. Both categories are on the rise, with late absence in primary schools in particular growing problem. In secondary schools, late present is more common - and rising.
<- attend_year_phase |>
late_present filter(phase %in% c("Primary","Secondary"),
>= 2018) |>
year select(year, phase, percent_late_pres) |>
rename(value = percent_late_pres) |>
mutate(
category = "late present",
grey_flag = if_else(year==2025,0,1)
)
ggplot(
late_present,aes(x = year, y = value, alpha = grey_flag,
colour = phase)) +
geom_point() +
geom_line(linetype = "dashed", alpha = 0.5) +
geom_text(aes(label = year), size = 3, vjust = 1.5) +
+
barplottheme_minimal scale_y_continuous(labels = scales::percent) +
labs(title = "Lateness (marked present) by academic year in <b><span style='color:#dd5129'>primary </span></b>and <b><span style='color:#0f7ba2;'>secondary</b></span> schools",
subtitle = "percentage of sessions missed per year (code L); 2025 is Autumn term only",
caption = "data from Capita One") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none", axis.text.x = eb, axis.title.x = eb, axis.title.y = eb,
strip.background = eb, strip.placement = "top", strip.text = element_text(size = 11)) +
::scale_fill_met_d("Egypt") +
MetBrewer#facet_grid(rows = vars(category), scales = "free_y") +
coord_cartesian(clip = "off")
remove(late_present)
Late absent (after registers closed) is dramatically up in 2025 (Autumn term only showing here):
<- attend_year_phase |>
late_absent filter(phase %in% c("Primary","Secondary"),
>= 2018) |>
year select(year, phase, percent_late_absent) |>
rename(value = percent_late_absent) |>
mutate(
category = "late absent",
grey_flag = if_else(year==2025,0,1)
)
ggplot(
late_absent,aes(x = year, y = value, alpha = grey_flag,
colour = phase)) +
geom_point() +
geom_line(linetype = "dashed", alpha = 0.5) +
geom_text(aes(label = year), size = 3, vjust = 1.5) +
+
barplottheme_minimal scale_y_continuous(labels = scales::percent) +
labs(title = "Lateness (marked absent) by academic year in <b><span style='color:#dd5129'>primary </span></b>and <b><span style='color:#0f7ba2;'>secondary</b></span> schools",
subtitle = "percentage of sessions missed per year (code U); 2025 is Autumn term only",
caption = "data from Capita One") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none", axis.text.x = eb, axis.title.x = eb, axis.title.y = eb,
strip.background = eb, strip.placement = "top", strip.text = element_text(size = 11)) +
::scale_fill_met_d("Egypt") +
MetBrewercoord_cartesian(clip = "off")
remove(late_absent)
Absences for family holidays are higher in primary, but have risen in secondary also. The cost of living crisis likely plays a part here. Rates appear lower in 2024, but at the time of writing the summer term is missing from the 2024 data, which is excluded from the plot below. Family holidays can be authorised or unauthorised, but due to differences in recording and coding policy between schools, both are grouped together here.
New DfE guidance and harsher penalties around family holiday absences come into effect August 2024, and are likely to impact this trend.
ggplot(attend_year_phase |>
filter(phase %in% c("Primary","Secondary"),
>= 2018) |>
year mutate(grey_flag = if_else(year==2025,0,1))
aes(x = year, y = percent_family_holiday, alpha = grey_flag,
, colour = phase)) +
geom_point() +
geom_line(linetype = "dashed", alpha = 0.5) +
geom_text(aes(label = year), size = 3, vjust = 1.5) +
+ # Use theme_minimal() instead of barplottheme_minimal
barplottheme_minimal scale_y_continuous(labels = scales::percent) +
labs(title = "Family holidays by academic year in <b><span style='color:#dd5129'>primary </span></b>and <b><span style='color:#0f7ba2;'>secondary</b></span> schools",
subtitle = "percentage of sessions missed per year (codes F, H & G); 2025 is Autumn term only",
caption = "data from Capita One") +
annotate("text", x = 2020.3, y = 0.001, label = "COVID-19", size = 3, hjust = 1.1, colour = "gray") +
geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none",
axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) +
::scale_fill_met_d("Egypt")+
MetBrewercoord_cartesian(clip = "off")
Exclusions have risen very rapidly, particularly in secondary schools. This is mostly driven by temporary suspensions, largely as schools clamp down on what is classified as persistent disruptive behaviour. This makes only a small contribution to overall absence rates, but is growing, and for some children is a major contribution to their overall school absence. Exclusion rates in 2025 (part year at the time of writing) look to have levelled off.
ggplot(attend_year_phase |>
filter(phase %in% c("Primary","Secondary"),
>= 2018) |>
year mutate(grey_flag = if_else(year==2025,0,1))
aes(x = year, y = percent_excluded, alpha = grey_flag,
, colour = phase)) +
geom_point() +
geom_line(linetype = "dashed", alpha = 0.5) +
geom_text(aes(label = year), size = 3, vjust = 1.5) +
+ # Use theme_minimal() instead of barplottheme_minimal
barplottheme_minimal scale_y_continuous(labels = scales::percent) +
labs(title = "Absence due to exclusion, by academic year in <b><span style='color:#dd5129'>primary </span></b>and <b><span style='color:#0f7ba2;'>secondary</b></span> schools",
subtitle = "percentage of sessions missed per year (code E); 2025 is Autumn term only",
caption = "data from Capita One") +
#annotate("text", x = 2020.3, y = 0.001, label = "COVID-19", size = 3, hjust = 1.1) +
#geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none",
axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) +
::scale_fill_met_d("Egypt") +
MetBrewercoord_cartesian(clip = "off")
2024 saw seen the return of study leave as a coded absence reason, with a significant impact on overall attendance levels in secondary and particularly y11. At the time of writing 2025 is the Autumn term only and shows only minimal study leave
ggplot(attend_year_phase |>
filter(phase %in% c("Primary","Secondary"),
>= 2018) |>
year mutate(grey_flag = if_else(year==2025,0,1))
aes(x = year, y = percent_study_leave, alpha = grey_flag,
, colour = phase)) +
geom_point() +
geom_line(linetype = "dashed", alpha = 0.5) +
geom_text(aes(label = year), size = 3, vjust = 1.5) +
+ # Use theme_minimal() instead of barplottheme_minimal
barplottheme_minimal scale_y_continuous(labels = scales::percent) +
labs(title = "Study leave, by academic year in <b><span style='color:#dd5129'>primary </span></b>and <b><span style='color:#0f7ba2;'>secondary</b></span> schools",
subtitle = "percentage of sessions missed per year (code S); 2025 is Autumn term only",
caption = "data from Capita One") +
annotate("text", x = 2020.3, y = 0.001, label = "COVID-19", size = 3, hjust = 1.1) +
geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none",
axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) +
::scale_fill_met_d("Egypt") +
MetBrewercoord_cartesian(clip = "off")
The plot below comprises two DfE codes. Code N is intended as a placeholder until schools can establish a reason for absence, and code 0 is for unknown or other circumstances. Here both are grouped together - though the bulk is code O.
Although the increase has levelled off into 2024 and seems to be down in 2025, the no reason category remains the biggest contributor to overall absence rates in secondary, and the increase in no reason absences are the biggest contribution to the post-pandemic rise in absences. Furthermore, no reason absences are significantly more prevalent in more deprived areas of the city, where attendance in general is poorer.
We can draw two possible conclusions from this: parents and children are (more than ever before) not reporting the true reasons for absence, and the DfE codes are no longer suitable for capturing those reasons. In either case, this represents a serious blind spot in the data.
Analysis of recorded case notes and text on Capita One, along with interviews or surveys of pupils, teachers, parents or community groups is required to understand the stories behind these no reason absences
ggplot(attend_year_phase |>
filter(phase %in% c("Primary","Secondary"),
>= 2018) |>
year mutate(grey_flag = if_else(year==2025,0,1))
aes(x = year, y = percent_no_reason, alpha = grey_flag,
, colour = phase)) +
geom_point() +
geom_line(linetype = "dashed", alpha = 0.5) +
geom_text(aes(label = year), size = 3, vjust = 1.5) +
+ # Use theme_minimal() instead of barplottheme_minimal
barplottheme_minimal scale_y_continuous(labels = scales::percent) +
labs(title = "Absent with no reason, by academic year in <b><span style='color:#dd5129'>primary </span></b>and <b><span style='color:#0f7ba2;'>secondary</b></span> schools",
subtitle = "percentage of sessions missed per year (codes N & O); 2025 is Autumn term only",
caption = "data from Capita One") +
annotate("text", x = 2020.3, y = 0.001, label = "COVID-19", size = 3, hjust = 1.1) +
geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none",
axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) +
::scale_fill_met_d("Egypt")+
MetBrewercoord_cartesian(clip = "off")
Finally, the two charts below summarise the contributions of each of these coded absence reasons to the overall absence picture, during 2023:
<- attend_year_phase |>
coded_absence_2023 ungroup() |>
filter(
== 2023,
year %in% c("Primary","Secondary")
phase |>
) select(phase,
`missing data` = percent_missing,
illness = percent_illness,
`family holidays` = percent_family_holiday,
excluded = percent_excluded,
`medical appointments` = percent_med_appt,
late = percent_late_absent,
`study leave` = percent_study_leave,
`approved offsite activity` = percent_approved_offsite,
`no reason` = percent_no_reason) |>
pivot_longer(-phase,
names_to = "category",
values_to = "percent") |>
arrange(percent)
<-coded_absence_2023 |>
wf filter(phase == "Primary") |>
mutate(cat_wrap = str_wrap(category, width = 10)) |>
select(-phase, -category)
waterfall(wf,
rect_text_labels = scales::percent(wf$percent, accuracy = 0.1),
calc_total = TRUE,
total_rect_text = scales::percent(sum(wf$percent),0.1),
total_axis_text = "total",
total_rect_color = "gray"
+
) theme(axis.line = eb, axis.ticks = eb, axis.text.y = eb) +
labs(title = "Coded absences in Sheffield primary schools - 2022-23",
subtitle = "% of available sessions missed")
<-coded_absence_2023 |>
wf filter(phase == "Secondary") |>
mutate(cat_wrap = str_wrap(category, width = 10)) |>
select(-phase, -category)
waterfall(wf,
rect_text_labels = scales::percent(wf$percent, accuracy = 0.1),
calc_total = TRUE,
total_rect_text = scales::percent(sum(wf$percent),0.1),
total_axis_text = "total",
total_rect_color = "gray"
+
) theme(axis.line = eb, axis.ticks = eb, axis.text.y = eb) +
labs(title = "Coded absences in Sheffield secondary schools - 2022-23",
subtitle = "% of available sessions missed")
Looking at how attendance varies with age, gender and ethnicity, and how this picture is changing over time.
Absence is little higher in Y1 and Y2 when children are very young, and is level through primary. The transition to secondary school is associated with a big increase in absence, which continues year on year up to Y11. As we’ll see later on - this transition drop into Y7 and subsequent decline is more severe for groups with particular risk factors.
# calculate average presence by ncy
<- attend |>
attend_ncy filter(year >= 2018,
>= 1 & ncy <= 11) |>
ncy summarise_attendance(grouping_vars = c("ncy", "stud_id")) |>
group_by(ncy) |>
summarise (mean.percent_absent = mean(percent_absent, na.rm = TRUE),
sd.percent_absent = sd(percent_absent, na.rm = TRUE),
n.percent_absent = n() ) |>
mutate(se.percent_absent = sd.percent_absent / sqrt(n.percent_absent),
lower.ci.percent_absent = mean.percent_absent - qt(1 - (0.05 / 2), n.percent_absent - 1) * se.percent_absent,
upper.ci.percent_absent = mean.percent_absent + qt(1 - (0.05 / 2), n.percent_absent - 1) * se.percent_absent)
# plot
ggplot(attend_ncy, aes(x = ncy, y = mean.percent_absent)) +
geom_col(position = position_dodge(0.9), fill = "#0072B2")+
geom_errorbar(aes(ymin = lower.ci.percent_absent, ymax = upper.ci.percent_absent), width = 0.2, position = position_dodge(0.9))+
geom_text(aes(label = scales::percent(round(mean.percent_absent,3))), vjust = 2, colour = "white", size = 3, position = position_dodge(0.9)) +
labs(title = "Absence by school year",
subtitle = "Average percentage of available sessions not attended +- 95 CI; all reason codes; all Sheffield schools & pupils, 2018 - 2024",
x = "national curriculum year",
caption = "data from Capita One")+
+
barplottheme_minimal theme(axis.text.y = eb) +
scale_x_continuous(breaks = seq(1,11))
The ImpactEd report Understanding Attendance - Report 1 identified an emerging trend of a jump in absence between Y7 and y8. The Sheffield data does not support this, with the increase from Y7 to Y8 looking broadly the same - around 1% increase in absence - as any other year on year increase within secondary years.
Looking at trends over time for primary school years, we see that the youngest and oldest primary age children were most affected. There are encouraging signs of recovery among all primary years into 2024, and particularly in Y1.
|>
attend filter(year != 2020, year >= 2018,
== "Primary") |>
school_ed_phase_corrected ungroup() |>
summarise_attendance(grouping_vars = c("ncy","year","school_ed_phase_corrected")) |>
filter(ncy <= 11 & ncy >= 1,
> 1000) |>
child_count ungroup() |>
mutate(label = ifelse(year == max(year), ncy, NA_character_),
ncy = factor(ncy)) |>
ggplot(aes(x = year,
y = percent_present,
colour = ncy,
group = ncy,
label = label
)+
) geom_point(shape = 1) +
geom_line() +
geom_label_repel(hjust = TRUE,min.segment.length = Inf,max.overlaps = Inf,size = 2.5) +
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(limits = c(2018,2026), breaks = seq(2018,2025)) +
theme(legend.position = "none", axis.title = eb, strip.background = eb, axis.line = eb, axis.ticks = eb) +
labs(title = "Primary school attendance over time by national curriculum year",
subtitle = "% of available sessions attended all Sheffield schools & pupils, 2020 excluded; 2025 is Autumn term only",
caption = "data from Capita One") +
coord_cartesian(clip = "off")
In secondary schools, we can see how disproportionately affected children in Y11, and encouraging signs of recovery in years 7 and 9. It is worth noting that the children in years 10 and 11 in 2024 were those who had their crucial Y6 and y7 transition years disrupted by the pandemic.
|>
attend filter(year != 2020, year >= 2018,
== "Secondary") |>
school_ed_phase_corrected ungroup() |>
summarise_attendance(grouping_vars = c("ncy","year","school_ed_phase_corrected")) |>
filter(ncy <= 11 & ncy >= 1,
> 1000) |>
child_count ungroup() |>
mutate(label = ifelse(year == max(year), ncy, NA_character_),
ncy = factor(ncy)) |>
ggplot(aes(x = year,
y = percent_present,
colour = ncy,
group = ncy,
label = label
)+
) geom_point(shape = 1) +
geom_line() +
geom_label_repel(hjust = TRUE,min.segment.length = Inf,max.overlaps = Inf,size = 2.5) +
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(limits = c(2018,2026), breaks = seq(2018,2025)) +
theme(legend.position = "none", axis.title = eb, strip.background = eb, axis.line = eb, axis.ticks = eb) +
labs(title = "Secondary school attendance over time by national curriculum year",
subtitle = "% of available sessions attended all Sheffield schools & pupils, 2020 excluded; 2025 is Autumn term only",
caption = "data from Capita One") +
coord_cartesian(clip = 'off')
The drop off in Y11 is driven in large part by an increase in study leave
These trends will be explored in more detail in the Trends by annual cohort section later in this report.
Looking at overall school attendance since 2021, girls attend slightly better than boys, a difference of about 0.5%.
The gender time series show boys and girls moving in lockstep through primary school, separated by about half a percentage point:
|>
attend_year_gender_phase filter(!is.na(gender), year >= 2018, phase == "Primary") |>
ungroup() |>
mutate(label = if_else(year == max(year), case_when(gender == "M" ~ "boys", gender == "F" ~ "girls"), NA_character_)) |>
ggplot(aes(x = year,
y = percent_present,
colour = gender, group = gender,
label = label)) +
geom_point(size = 3) +
geom_line() +
+
barplottheme_minimal theme(legend.position = "none", axis.title.x = eb, legend.title = eb) +
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(breaks = seq(2018,2026, by = 1)) +
geom_text_repel(hjust = TRUE, nudge_x = 0.5, min.segment.length = Inf) +
labs(title = "Primary school attendance by year and gender",
subtitle = "% of sessions attended per year; 2025 is Autumn term only",
caption = "data from Capita One")
In secondary we see boys’ attendance overtaking girls in the aftermath of the pandemic, but all continuing to decline into 2024.
|>
attend_year_gender_phase filter(!is.na(gender), year >= 2018, phase == "Secondary") |>
ungroup() |>
mutate(label = if_else(year == max(year), case_when(gender == "M" ~ "boys", gender == "F" ~ "girls"), NA_character_)) |>
ggplot(aes(x = year,
y = percent_present,
colour = gender, group = gender,
label = label)) +
geom_point(size = 3) +
geom_line() +
+
barplottheme_minimal theme(legend.position = "none", axis.title.x = eb, legend.title = eb) +
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(breaks = seq(2018,2026, by = 1)) +
geom_text_repel(hjust = TRUE, nudge_x = 0.5, min.segment.length = Inf) +
labs(title = "Secondary school attendance by year and gender",
subtitle = "% of sessions attended per year; 2025 is Autumn term only",
caption = "data from Capita One")
|>
attend filter(!is.na(gender),
!= "U",
gender >= 2018,
year >= 0, ncy <= 11) |>
ncy group_by(ncy, gender) |>
presence_mean_calc() |>
ungroup() |>
mutate(label = if_else(ncy == max(ncy), case_when(gender == "M" ~ "boys", gender == "F" ~ "girls"), NA_character_)) |>
ggplot(aes(x = ncy,
y = mean.percent_present,
colour = gender, group = gender,
label = label)) +
geom_point(size = 1) +
geom_line() +
geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, alpha = 0.6)+
+
barplottheme_minimal theme(legend.position = "none", axis.title.x = eb, legend.title = eb) +
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(breaks = seq(0,11, by = 1)) +
geom_text_repel(hjust = TRUE, nudge_x = 0.5, min.segment.length = Inf) +
labs(title = "School attendance by national curriculum year and gender",
subtitle = "% of sessions attended per year since 2018",
caption = "data from Capita One")
Looking at age, gender and deprivation together, we see the pattern reversed in older children. In poorer wards of the city, girls consistently attend better than boys across all ages. In the most affluent wards, this is reversed in older children, with a gender gap widening from Y8 onwards, where boys have higher attendance.
|>
attend left_join(stud_details_joined |> select(-gender), by = "stud_id") |>
filter(year >= 2018,
>= 1,
ncy <= 11,
ncy !is.na(gender),
%in% c(1,4)) |>
imd_quartile mutate(imd_quart_name = if_else(imd_quartile == 1, "most affluent 25%", "most deprived 25%")) |>
select(-imd_quartile) |>
group_by(imd_quart_name,ncy,gender) |>
presence_mean_calc() |>
ggplot(aes(x = ncy, y = mean.percent_present,
colour = gender, group = gender)) +
geom_point() +
geom_line() +
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(breaks = seq(1:11)) +
facet_grid(rows = vars(imd_quart_name)) +
theme(axis.title.y = eb, legend.position = "top", legend.title = eb) +
labs(title = "Secondary school attendance by IMD quartile, national curriculum year and gender",
subtitle = "% of sessions attended per year; 2025 is Autumn term only",
caption = "data from Capita One")
The ethnic makeup of Sheffield’s population continues to change, and there are differences in attendance rates between children in different ethnic groups. Here we summarise the data around ethnicity.
The ethnic groups and subgroups used in this analysis are those available the Capita One source data. These don’t necessarily align with the groupings used by ONS for census data, other organisations, or in other SCC data and reporting
With the caveat that data prior to 2018 may not be wholly complete, the attendance data allows us to look at a long term view of changes in the ethnic makeup of the Sheffield school population. Note the free y-axis scales on the following chart, means that the lines are not directly comparable:
<- attend |>
eth_category_volumes select(year, stud_id, ethnicity_category) |> unique() |>
group_by(year, ethnicity_category) |> summarise(student_count = n_distinct(stud_id)) |> mutate(freq = student_count / sum(student_count)) |>
ungroup() |> mutate(label = ifelse(year == max(year), ethnicity_category, NA_character_),
label_n = ifelse(year %in% c(2008,2012,2016,2020,2024,2025),student_count,NA_real_)
)
ggplot(eth_category_volumes,
aes(x = year, y = student_count, colour = ethnicity_category)) +
geom_line() +
scale_x_continuous(breaks = seq(2006,2024, by = 2)) +
geom_label_repel(aes(label = label), nudge_x = 4, nudge_y = 0, alpha = 0.75, size = 2.5,
min.segment.length = Inf) +
geom_text_repel(aes(label = label_n), size = 2.5) +
facet_grid(rows = vars(fct_rev(ethnicity_category)), scales = "free_y") +
+
barplottheme_minimal theme(strip.background = eb, axis.title.x = eb, legend.position = "none", strip.text = eb, axis.text.y = eb) +
labs(title = "Pupils in Sheffield by ethnicity category",
subtitle = "unique count of pupils in attendance data per year",
caption = "data from Capita One") +
scale_colour_brewer(palette = "Dark2")
<- attend |>
attend_eth_des_phase #select(-phase) |>
#rename(phase = school_ed_phase_corrected) |>
filter(year == 2023,
%in% c("Primary","Secondary"),
phase >= 1,
ncy <= 11) |>
ncy group_by(ethnicity_description, phase) |>
summarise_attendance(grouping_vars = c("ethnicity_description", "phase")) |>
select(ethnicity_description, phase, child_count, percent_of_pupils, percent_absent)
<- attend |>
attend_eth_des_total #select(-phase) |>
#rename(phase = school_ed_phase_corrected) |>
filter(year == 2023,
%in% c("Primary","Secondary"),
phase >= 1,
ncy <= 11) |>
ncy summarise_attendance(grouping_vars = "ethnicity_description") |>
select(ethnicity_description, child_count, percent_of_pupils, percent_absent) |>
mutate(phase = "Total")
<- attend |>
attend_phase_total #select(-phase) |>
#rename(phase = school_ed_phase_corrected) |>
filter(year == 2023,
%in% c("Primary","Secondary"),
phase >= 1,
ncy <= 11) |>
ncy summarise_attendance(grouping_vars = "phase") |>
select(phase, child_count, percent_of_pupils, percent_absent) |>
mutate(#phase = "Total",
ethnicity_description = "all children")
<- attend |>
attend_total #select(-phase) |>
#rename(phase = school_ed_phase_corrected) |>
filter(year == 2023,
%in% c("Primary","Secondary"),
phase >= 1,
ncy <= 11) |>
ncy summarise_attendance(grouping_vars = "none") |>
select(child_count, percent_of_pupils, percent_absent) |>
mutate(phase = "Total",
ethnicity_description = "all children")
<- rbind(
eth_des_table
attend_eth_des_phase,
attend_eth_des_total,
attend_phase_total,|>
attend_total) pivot_wider(names_from = phase, values_from = c(child_count,percent_of_pupils, percent_absent)) |>
rename(`ethnicity description` = ethnicity_description) |>
ungroup() |>
arrange(desc(child_count_Total)) |>
select(
`ethnicity description`,
contains("Total"),
contains("Primary"),
contains("Secondary")
)
|>
eth_des_table gt(rowname_col = "ethnicity description") |>
tab_spanner(id = 1, label = "Primary", columns = dplyr::contains("Primary")) |>
tab_spanner(id = 2, label = "Secondary", columns = dplyr::contains("Secondary")) |>
tab_spanner(id = 3, label = "Total", columns = dplyr::contains("Total")) |>
cols_label(
contains("count") ~ "count",
contains("percent_of_pupils") ~ "% of pupils",
contains("percent_absent") ~ "% absent"
|>
) tab_header(
title = "Pupils and attendance in Sheffield by ethnicity description",
subtitle = "count of pupils on roll in 2023/24; data from School Census & Capita One attendance records") |>
tab_options(
table.align = "left",
table.font.size = 10,
heading.title.font.size = 12,
heading.subtitle.font.size= 10,
heading.align = "left",
column_labels.font.size = 14,
stub.font.size = 12
|>
) cols_align(
"left",'ethnicity description'
|>
) fmt_percent(columns = contains("percent"),
decimals = 1) |>
data_color(
columns = percent_absent_Primary,
method = "numeric",
palette = "viridis") |>
data_color( columns = percent_absent_Secondary,
method = "numeric",
palette = "viridis") |>
data_color( columns = percent_absent_Total,
method = "numeric",
palette = "viridis")
Pupils and attendance in Sheffield by ethnicity description | |||||||||
---|---|---|---|---|---|---|---|---|---|
count of pupils on roll in 2023/24; data from School Census & Capita One attendance records | |||||||||
Total | Primary | Secondary | |||||||
count | % of pupils | % absent | count | % of pupils | % absent | count | % of pupils | % absent | |
all children | 72545 | 100.0% | 8.3% | 40411 | 55.7% | 6.6% | 32181 | 44.3% | 10.5% |
White British | 41639 | 57.4% | 7.8% | 22919 | 55.0% | 5.7% | 18751 | 45.0% | 10.4% |
Black African and White/Black African | 5613 | 7.7% | 5.5% | 3277 | 58.3% | 5.0% | 2340 | 41.7% | 6.2% |
Pakistani | 5448 | 7.5% | 9.0% | 3090 | 56.7% | 8.8% | 2359 | 43.3% | 9.2% |
Any Other Ethnic Group | 3050 | 4.2% | 9.1% | 1732 | 56.8% | 8.5% | 1318 | 43.2% | 10.0% |
Any Other White Background | 2791 | 3.8% | 9.3% | 1589 | 56.9% | 7.9% | 1204 | 43.1% | 11.2% |
White/Black Caribbean | 1976 | 2.7% | 12.4% | 1122 | 56.7% | 9.1% | 856 | 43.3% | 17.0% |
Other Asian Background | 1814 | 2.5% | 8.0% | 1040 | 57.3% | 7.6% | 774 | 42.7% | 8.4% |
Gypsy, Roma and Traveller of Irish Heritage | 1761 | 2.4% | 21.1% | 928 | 52.6% | 18.0% | 835 | 47.4% | 24.6% |
White/Asian | 1641 | 2.3% | 9.3% | 970 | 59.1% | 7.8% | 671 | 40.9% | 11.6% |
Any Other Mixed | 1604 | 2.2% | 9.1% | 932 | 58.1% | 7.6% | 672 | 41.9% | 11.3% |
not known | 1521 | 2.1% | 11.6% | 613 | 40.2% | 7.3% | 912 | 59.8% | 14.4% |
Indian | 1021 | 1.4% | 6.6% | 693 | 67.9% | 7.2% | 328 | 32.1% | 5.2% |
Bangladeshi | 814 | 1.1% | 8.3% | 471 | 57.9% | 8.1% | 343 | 42.1% | 8.6% |
Any Other Black Background | 723 | 1.0% | 6.5% | 436 | 60.2% | 5.9% | 288 | 39.8% | 7.5% |
Chinese | 627 | 0.9% | 3.6% | 354 | 56.5% | 4.1% | 273 | 43.5% | 2.9% |
Black Caribbean | 392 | 0.5% | 9.1% | 188 | 48.0% | 7.1% | 204 | 52.0% | 11.2% |
Irish | 110 | 0.2% | 10.1% | 57 | 51.8% | 6.1% | 53 | 48.2% | 14.6% |
There are many ways to divide up the city geographically, but we’ll look at the 28 wards, and in particular their deprivation as measured in the 2019 Indices of Multiple Deprivation (IMD) scores. More recent (and older) measures of deprivation may be available, but the analysis is broadly the same.
The table below shows overall attendance by ward of residence during 2023.
<- attend |>
attend_ward_phase #select(-phase) |>
#rename(phase = school_ed_phase_corrected) |>
filter(year == 2023,
%in% c("Primary","Secondary"),
phase >= 1,
ncy <= 11) |>
ncy #group_by(ward, phase) |>
summarise_attendance(grouping_vars = c("ward", "phase")) |>
select(ward, phase, child_count, percent_of_pupils, percent_absent)
<- attend |>
attend_ward_total #select(-phase) |>
#rename(phase = school_ed_phase_corrected) |>
filter(year == 2023,
%in% c("Primary","Secondary"),
phase >= 1,
ncy <= 11) |>
ncy summarise_attendance(grouping_vars = "ward") |>
select(ward, child_count, percent_of_pupils, percent_absent) |>
mutate(phase = "Total")
<- attend |>
attend_phase_total #select(-phase) |>
#rename(phase = school_ed_phase_corrected) |>
filter(year == 2023,
%in% c("Primary","Secondary"),
phase >= 1,
ncy <= 11) |>
ncy summarise_attendance(grouping_vars = "phase") |>
select(phase, child_count, percent_of_pupils, percent_absent) |>
mutate(#phase = "Total",
ward = "Sheffield")
<- attend |>
attend_total #select(-phase) |>
#rename(phase = school_ed_phase_corrected) |>
filter(year == 2023,
%in% c("Primary","Secondary"),
phase >= 1,
ncy <= 11) |>
ncy summarise_attendance(grouping_vars = "none") |>
select(child_count, percent_of_pupils, percent_absent) |>
mutate(phase = "Total",
ward = "Sheffield")
<- rbind(
ward_table
attend_ward_phase,
attend_ward_total,
attend_phase_total,|>
attend_total) filter(!is.na(ward)) |>
pivot_wider(names_from = phase, values_from = c(child_count,percent_of_pupils, percent_absent)) |>
rename(`ethnicity description` = ward) |>
ungroup() |>
arrange(desc(child_count_Total)) |>
select(
`ethnicity description`,
contains("Total"),
contains("Primary"),
contains("Secondary")
)
|>
ward_table gt(rowname_col = "ethnicity description") |>
tab_spanner(id = 1, label = "Primary", columns = dplyr::contains("Primary")) |>
tab_spanner(id = 2, label = "Secondary", columns = dplyr::contains("Secondary")) |>
tab_spanner(id = 3, label = "Total", columns = dplyr::contains("Total")) |>
cols_label(
contains("count") ~ "count",
contains("percent_of_pupils") ~ "% of children",
contains("percent_absent") ~ "% absent"
|>
) tab_header(
title = "Pupils in Sheffield, by ward of residence",
subtitle = "pupils on roll & attendance in 2023/24; data from School Census & Capita One attendance records") |>
tab_options(
table.align = "left",
table.font.size = 10,
heading.title.font.size = 12,
heading.subtitle.font.size= 10,
heading.align = "left",
column_labels.font.size = 14,
stub.font.size = 12
|>
) cols_align(
"left",'ethnicity description'
|>
) fmt_percent(columns = contains("percent"),
decimals = 1) |>
data_color(
columns = percent_absent_Primary,
method = "numeric",
palette = "viridis") |>
data_color( columns = percent_absent_Secondary,
method = "numeric",
palette = "viridis") |>
data_color(columns = percent_absent_Total,
method = "numeric",
palette = "viridis")
Pupils in Sheffield, by ward of residence | |||||||||
---|---|---|---|---|---|---|---|---|---|
pupils on roll & attendance in 2023/24; data from School Census & Capita One attendance records | |||||||||
Total | Primary | Secondary | |||||||
count | % of children | % absent | count | % of children | % absent | count | % of children | % absent | |
Sheffield | 72545 | 100.0% | 8.3% | 40411 | 55.7% | 6.6% | 32181 | 44.3% | 10.5% |
Burngreave | 5858 | 7.8% | 11.3% | 3163 | 54.0% | 9.7% | 2697 | 46.0% | 13.2% |
Firth Park | 4140 | 5.5% | 9.7% | 2312 | 55.8% | 7.7% | 1828 | 44.2% | 12.3% |
Darnall | 3992 | 5.3% | 10.2% | 2378 | 59.5% | 8.7% | 1618 | 40.5% | 12.6% |
Manor Castle | 3759 | 5.0% | 9.8% | 2202 | 58.6% | 7.4% | 1558 | 41.4% | 13.4% |
Shiregreen & Brightside | 3603 | 4.8% | 9.0% | 2011 | 55.8% | 7.2% | 1592 | 44.2% | 11.3% |
Southey | 3509 | 4.7% | 10.5% | 1940 | 55.3% | 8.0% | 1570 | 44.7% | 13.6% |
Ecclesall | 3243 | 4.3% | 4.2% | 1762 | 54.3% | 3.6% | 1481 | 45.7% | 5.1% |
Gleadless Valley | 3177 | 4.2% | 9.9% | 1769 | 55.6% | 7.6% | 1410 | 44.4% | 13.0% |
Nether Edge & Sharrow | 2799 | 3.7% | 7.1% | 1570 | 56.1% | 6.6% | 1229 | 43.9% | 7.7% |
Park & Arbourthorne | 2687 | 3.6% | 9.9% | 1537 | 57.2% | 7.9% | 1150 | 42.8% | 12.7% |
Beauchief & Greenhill | 2650 | 3.5% | 9.2% | 1491 | 56.3% | 7.1% | 1159 | 43.7% | 12.0% |
Richmond | 2502 | 3.3% | 8.6% | 1420 | 56.7% | 6.5% | 1083 | 43.3% | 11.4% |
Dore & Totley | 2399 | 3.2% | 4.7% | 1339 | 55.8% | 4.0% | 1060 | 44.2% | 5.6% |
Hillsborough | 2340 | 3.1% | 7.7% | 1309 | 55.9% | 5.7% | 1031 | 44.1% | 10.3% |
Stannington | 2308 | 3.1% | 6.3% | 1265 | 54.8% | 4.6% | 1044 | 45.2% | 8.6% |
Woodhouse | 2277 | 3.0% | 9.0% | 1300 | 56.8% | 6.7% | 987 | 43.2% | 12.1% |
Stocksbridge & Upper Don | 2203 | 2.9% | 7.4% | 1195 | 53.9% | 5.1% | 1021 | 46.1% | 10.2% |
West Ecclesfield | 2193 | 2.9% | 8.0% | 1195 | 54.3% | 5.8% | 1005 | 45.7% | 10.6% |
Walkley | 2170 | 2.9% | 7.6% | 1337 | 61.6% | 6.4% | 834 | 38.4% | 9.6% |
Birley | 2076 | 2.8% | 8.8% | 1109 | 53.4% | 6.4% | 967 | 46.6% | 11.7% |
East Ecclesfield | 1992 | 2.7% | 7.0% | 1038 | 52.1% | 5.7% | 954 | 47.9% | 8.6% |
Graves Park | 1942 | 2.6% | 5.6% | 1103 | 56.8% | 4.5% | 839 | 43.2% | 7.3% |
Crookes & Crosspool | 1906 | 2.5% | 4.6% | 1032 | 54.1% | 4.2% | 874 | 45.9% | 5.1% |
Beighton | 1902 | 2.5% | 7.7% | 1049 | 55.2% | 6.0% | 853 | 44.8% | 10.0% |
Fulwood | 1746 | 2.3% | 4.5% | 951 | 54.5% | 3.8% | 795 | 45.5% | 5.3% |
Mosborough | 1700 | 2.3% | 8.0% | 1001 | 58.9% | 6.3% | 699 | 41.1% | 10.5% |
Broomhill & Sharrow Vale | 1501 | 2.0% | 6.5% | 885 | 59.0% | 5.5% | 616 | 41.0% | 8.1% |
City | 666 | 0.9% | 10.5% | 438 | 65.8% | 9.7% | 228 | 34.2% | 12.0% |
These ward level attendance figures line up neatly with deprivation indicators. Plotting attendance against the 2019 Indices of Multiple Deprivation (IMD) scores shows a tight correlation.
Since school attendance figures one of the input variables to the IMD scores, there is some circular logic at work here. Even so, attendance is only one of 39 inputs, so this analysis is worth pursuing.
|> filter(year == 2023) |>
attend summarise_attendance(grouping_vars = c("ward","ward_imd_score")) |>
ggplot(aes(x = ward_imd_score, y = percent_present,
+ geom_point() +
)) geom_text_repel(aes(label = ward), size = 2.5,
segment.colour = "gray") +
#geom_smooth(method = "lm") +
scale_y_continuous(labels = scales::percent) +
labs(title ="School attendance by ward level deprivation",
subtitle = "Average % of sessions attended 2023; ward of residence; all ages",
caption = "data from Capita One",
y = "attendance",
x = "Indices of multiple deprivation score (2019)")
The link to deprivation has always been there but is stronger today - recreating the chart above with 2010 attendance and IMD scores shows a weaker relationship.
The link to deprivation less evident in primary schools, but stronger in secondary schools, and the gap between primary and secondary attendance widens in poorer areas of the city.
<- attend |>
ward_data #select(-phase) |>
#rename(phase = school_ed_phase_corrected) |>
filter(year == 2023,
%in% c("Primary","Secondary"),
phase >= 1,
ncy <= 11) |>
ncy summarise_attendance(grouping_vars = c("ward","ward_imd_score", "phase")) |>
arrange(ward_imd_score)
<- seq(from = min(ward_data$ward_imd_score, na.rm = TRUE), to = max(ward_data$ward_imd_score, na.rm = TRUE),
grid length.out = ward_data |> ungroup() |> select(ward) |> distinct() |> tally() |> pull())
<- ward_data |>
ward_grid select(ward, ward_imd_score) |>
distinct() |>
arrange(ward_imd_score) |>
cbind(grid) |>
rename(label_sequence = 3) |>
ungroup() |>
select(-ward_imd_score)
<- ward_data |> left_join(ward_grid, by = "ward")
plot_data
ggplot(plot_data,
aes(x = ward_imd_score, y = percent_present, colour = phase,
group = ward,label = ward)) +
geom_point(size = 2.5, alpha = 0.7) +
geom_line(colour = "grey70") +
scale_y_continuous(labels = scales::percent) +
#coord_cartesian(expand = FALSE, clip = "off") +
geom_text_repel(data = plot_data |> filter(phase == "Primary"),
aes(x = label_sequence, y = 0.75),
colour = "grey40", size = 2.5,
#force_pull = 0,
min.segment.length = Inf,
angle = 90,
#segment.angle = 90,
#point.padding = 0,
#max.overlaps = Inf,
#direction = "x",
nudge_y = 0.04#,
#hjust = 0,
#max.iter = 1e4, max.time = 1
+
)
labs(title = "School attendance in <b><span style='color:#dd5129'>primary </span></b>and <b><span style='color:#0f7ba2;'>secondary</b></span> schools, by ward level deprivation score",
subtitle = "% of available sessions attended in 2023 by ward of residence",
caption = "data from Capita One",
y = "attendance",
x = "Indices of multiple deprivation score (2019)") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none"
+
) ::scale_fill_met_d("Egypt") MetBrewer
This longer term view below compares the trend in attendance between the top and bottom quartiles of the ward level deprivation scores, at the half-term level with a trend-line. The middle two quartiles are excluded from this plot. The gap between the most and least deprived areas narrowed towards the peak attendance rate in 2016, so gains were disproportionately made in poorer areas, but the most deprived quartile then falls away more rapidly since the pandemic. Filtering this to just Secondary phase makes little difference to the overall shape.
|>
attend left_join(stud_details_joined, by = "stud_id") |>
filter(imd_quartile %in% c(1,4)) |>
mutate(imd_quartile = factor(imd_quartile)) |>
summarise_attendance(grouping_vars = c("ht_id","ht_start_date","imd_quartile")) |>
ggplot(aes(x = ht_start_date, y = percent_present, fill = imd_quartile, colour = imd_quartile)) +
geom_point() +
geom_smooth(alpha = 0.2) +
+
barplottheme_minimal scale_y_continuous(labels = scales::percent) +
scale_x_date()+
labs(title = "Attendance of children living in the <b><span style='color:#ce9642'>most deprived </span></b>and <b><span style='color:#3b7c70;'>least deprived</b></span> wards of the city",
subtitle = "groups are upper & lower quartiles of the IMD score of the ward of residence (2019); data points are half terms with trendline",
caption = "data from Capita One")+
theme(plot.title = element_markdown(size = 12),
legend.position = "none",
axis.title.x = eb) +
::scale_fill_met_d("Kandinsky") +
MetBrewer::scale_colour_met_d("Kandinsky") MetBrewer
The age profile by deprivation quartile shows how children in poorer areas have a steeper drop off through secondary school. Children in the most affluent 25% of wards attend better across all years, but show a more significant dropoff into Y11. Study leave?
<- attend |>
attend_deprivation_quartile_ncy filter(year >= 2018, ncy >= 1, ncy <= 11) |>
left_join(stud_details_joined, by = "stud_id") |>
filter(imd_quartile %in% c(1,4)) |>
mutate(imd_quartile = factor(imd_quartile)) |>
group_by(imd_quartile, ncy) |> summarise_avg()
ggplot(attend_deprivation_quartile_ncy, aes(x = ncy,
y = mean.percent_present,
colour = imd_quartile, group = imd_quartile,
+
)) geom_point() + geom_line() +
geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, alpha = 0.7)+
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(breaks = seq(1,11)) +
labs(title = "Attendance of children living in the <b><span style='color:#ce9642'>most deprived </span></b>and <b><span style='color:#3b7c70;'>least deprived</b></span> wards of the city",
subtitle = "avg % of sessions attended since 2018 +-95CI; groups are upper & lower quartiles of the IMD score of the ward of residence (2019)",
caption = "data from Capita One")+
theme(plot.title = element_markdown(size = 12),
legend.position = "none",
axis.title.x = eb) +
::scale_fill_met_d("Kandinsky") +
MetBrewer::scale_colour_met_d("Kandinsky") +
MetBrewertheme(axis.title.y = eb, legend.position = "none") +
geom_vline(aes(xintercept = 6.5), linetype = "dotted", colour = "gray70", size = 1.2) +
annotate("text", label = "primary", y = 0.99, x = 3.5, colour = "gray40") +
annotate("text", label = "secondary", y = 0.99, x = 9, colour = "gray40")
Free School Meal (FSM) status is perhaps a better indicator of socio-economic status of children than ward of residence, since it is means tested at the family level.
<- attend |>
fsm_table_data mutate(fsm = replace_na(fsm, "0")) |>
#mutate(fsm = factor(fsm, levels = c("T","F"))) |>
#select(-phase) |>
#rename(phase = school_ed_phase_corrected) |>
filter(year == 2023,
%in% c("Primary","Secondary"),
phase >= 1,
ncy <= 11) |>
ncy summarise_attendance(grouping_vars = c("phase","fsm")) |>
select(phase, fsm, child_count, percent_of_pupils, percent_absent) |>
pivot_wider(names_from = phase, values_from = c(child_count,percent_of_pupils, percent_absent)) |>
mutate(fsm = fct_recode(fsm, "free school meal eligible" = "T","no fsm" = "F")) |>
ungroup()
<- attend |>
fsm_table_total_row #select(-phase) |>
#rename(phase = school_ed_phase_corrected) |>
filter(year == 2023,
%in% c("Primary","Secondary"),
phase >= 1,
ncy <= 11) |>
ncy summarise_attendance(grouping_vars = c("phase")) |>
select(phase, child_count, percent_of_pupils, percent_absent) |>
pivot_wider(names_from = phase, values_from = c(child_count,percent_of_pupils, percent_absent)) |>
mutate(fsm = "total") |>
ungroup()
|>
fsm_table_data rbind(fsm_table_total_row) |>
rename(`free school meal` = fsm) |>
gt(rowname_col = "free school meal") |>
tab_spanner(id = 1, label = "Primary", columns = dplyr::contains("Primary")) |>
tab_spanner(id = 2, label = "Secondary", columns = dplyr::contains("Secondary")) |>
#tab_spanner(id = 3, label = "Total", columns = dplyr::contains("Total")) |>
cols_label(
contains("count") ~ "count",
contains("percent_of_pupils") ~ "% of children",
contains("percent_absent") ~ "avg % absent (2023)"
|>
) tab_header(
title = "Pupils in Sheffield, by free school meal status",
subtitle = "count of pupils on roll in 2023/24; data from School Census & Capita One attendance records") |>
tab_options(
table.align = "left",
table.font.size = 10,
heading.title.font.size = 12,
heading.subtitle.font.size= 10,
heading.align = "left",
column_labels.font.size = 14,
stub.font.size = 12
|>
) cols_align("left",'free school meal') |>
fmt_percent(columns = contains("percent"),
decimals = 1) |>
data_color(
columns = percent_absent_Primary,
method = "numeric",
palette = "viridis",
alpha = 0.7) |>
data_color( columns = percent_absent_Secondary,
method = "numeric",
palette = "viridis",
alpha = 0.7)
Pupils in Sheffield, by free school meal status | ||||||
---|---|---|---|---|---|---|
count of pupils on roll in 2023/24; data from School Census & Capita One attendance records | ||||||
Primary | Secondary | |||||
count | % of children | avg % absent (2023) | count | % of children | avg % absent (2023) | |
0 | 26535 | 65.7% | 5.1% | 21628 | 67.2% | 7.3% |
1 | 13878 | 34.3% | 9.7% | 10555 | 32.8% | 17.1% |
total | 40411 | 55.7% | 6.6% | 32181 | 44.3% | 10.5% |
ggplot(attend_year_ht_fsm, aes(x = ht_start_date, y = percent_present, fill = fsm, colour = fsm)) +
geom_point() +
geom_smooth() +
+
barplottheme_minimal scale_y_continuous(labels = scales::percent) +
annotate("text", x = date("2020-03-31"), y = 0.6, label = "COVID-19", size = 2.5, hjust = 1.1, colour = "dark gray") +
geom_vline(xintercept = date("2020-03-31"), linetype = "longdash", colour = "dark gray") +
labs(title = "Attendance by <b><span style='color:#dd5129'>children receiving free school meals </span></b>and <b><span style='color:#0f7ba2;'>not on fsm</b></span>",
subtitle = "% of available sessions attended, with trend",
caption = "data from Capita One")+
theme(plot.title = element_markdown(size = 12),
legend.position = "none",
axis.title.x = eb) +
#MetBrewer::scale_fill_met_d("Egypt") +
scale_fill_manual(values = c("#0f7ba2","#dd5129")) +
scale_colour_manual(values = c("#0f7ba2","#dd5129"))
More concerning are the exclusion rates for children with Free School Meals, which are rapidly diverging from those without.
ggplot(attend_year_ht_fsm |>
filter(ht_start_date >= as_date("2016-09-01")),
aes(x = ht_start_date, y = percent_excluded, fill = fsm, colour = fsm)) +
geom_point() +
geom_smooth() +
+
barplottheme_minimal scale_y_continuous(labels = scales::percent) +
annotate("text", x = date("2020-03-31"), y = 0.01, label = "COVID-19", size = 2.5, hjust = 1.1, colour = "dark gray") +
geom_vline(xintercept = date("2020-03-31"), linetype = "longdash", colour = "dark gray") +
labs(title = "Exclusion rates by <b><span style='color:#dd5129'>children receiving free school meals </span></b>and <b><span style='color:#0f7ba2;'>not on fsm</b></span>",
subtitle = "% of available sessions missed, with trend line",
caption = "data from Capita One")+
theme(plot.title = element_markdown(size = 12),
legend.position = "none",
axis.title.x = eb) +
#MetBrewer::scale_fill_met_d("Egypt") +
scale_fill_manual(values = c("#0f7ba2","#dd5129")) +
scale_colour_manual(values = c("#0f7ba2","#dd5129"))
#geom_hline(yintercept = excl_2023_avg, linetype = "dashed", colour = "dark grey") +
#annotate("text", x = date("2023-03-31"), y = 0.0025, label = "2023 average", size = 2.5, hjust = 1.1, colour = "dark gray")
We used the postcodes of each child’s home address and school location to calculate a measure of straight-line distance between the two.
Attendance is significantly better, on average for children who live closer to school. Children living very close to school (<100m) attend about 1.5% better on average in Primary. For secondary schools this difference is 2.3%. Conversely,
# calculated average by binned distance primary
<- sch_dist_sheff_23 |>
dist_data filter(school_ed_phase_corrected %in% c("Primary","Secondary"),
!= "EHCP"
sen_level |>
) rename(phase = school_ed_phase_corrected)
<- dist_data |>
sch_dist_binned_pri filter(phase == "Primary") |>
mutate(dist_bin = cut(dist_crow,
breaks = c(0,100,200,500,1000,2000,5000,10000,1000000))) |>
group_by(dist_bin) |>
presence_mean_calc() |>
filter(!is.na(dist_bin)) |>
mutate(dist_bin_label = c("<100m", "100 - 200m", "200 - 500m", "500m - 1km","1-2km","2-5km","5-10km","10km+"),
phase = "Primary")
# calculated avg by binned distance secondary
<- dist_data |>
sch_dist_binned_sec filter(phase == "Secondary") |>
mutate(dist_bin = cut(dist_crow,
breaks = c(0,100,200,500,1000,2000,5000,10000,1000000))) |>
group_by(dist_bin) |>
presence_mean_calc() |>
filter(!is.na(dist_bin)) |>
mutate(dist_bin_label = c("<100m", "100 - 200m", "200 - 500m", "500m - 1km","1-2km","2-5km","5-10km","10km+"),
phase = "Secondary")
# calculate overall averages by phase
<- dist_data |>
sch_dist_binned_overall mutate(
dist_bin = NA_character_,
dist_bin_label = "overall avg") |>
group_by(dist_bin, dist_bin_label, phase) |>
presence_mean_calc()
<- rbind(sch_dist_binned_pri, sch_dist_binned_sec, sch_dist_binned_overall) |>
sch_dist_binned mutate(fill_code = case_when(dist_bin_label == 'overall avg' ~ 'total',
TRUE ~ 'others'))
# plot
ggplot(sch_dist_binned, aes(x = reorder(dist_bin_label,mean.percent_present),
y = mean.percent_present,
fill = fill_code)) +
geom_col(position = position_dodge(0.9))+
geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, position = position_dodge(0.9))+
geom_text(aes(label = scales::percent(round(mean.percent_present,3))), hjust = 1,
colour = "white", size = 3,
position = position_dodge(0.9)
+
) labs(title = "Attendance by distance to school",
subtitle = "Avg % sessions attended, 2023 +-95CI; straight line home to school distance; excluding children with EHCP",
caption = "data from Capita One")+
+
barplottheme_minimal theme(axis.title.x = eb, axis.text.y = element_text(size = 8), axis.text.x = eb,
legend.position = "none", plot.subtitle = element_text(size = 8, face = "italic"),
strip.background = eb) +
coord_flip() +
facet_grid(cols = vars(phase)) +
scale_fill_manual(values = c("others"= "#0072B2", "total" = "#b47846"))
Plotting the average distance travelled against average attendance rates for secondary schools reveals four groupings:
# the distance data is already filtered to just Sheffield schools, but here we want to remove specials & nursery:
<- sch_dist_sheff_23 |>
dist_data filter(school_type == "mainstream",
== "Secondary")
school_ed_phase
<- dist_data |>
dist_by_sch group_by(school_short_name, school_ed_phase) |>
summarise(mean.dist_crow = mean(dist_crow, na.rm = TRUE),
sd.dist_crow = sd(dist_crow, na.rm = TRUE),
n.dist_crow = n() ) |>
mutate(se.dist_crow = sd.dist_crow / sqrt(n.dist_crow),
lower.ci.dist_crow = mean.dist_crow - qt(1 - (0.05 / 2), n.dist_crow - 1) * se.dist_crow,
upper.ci.dist_crow = mean.dist_crow + qt(1 - (0.05 / 2), n.dist_crow - 1) * se.dist_crow)
<- dist_data |>
dist_attend_by_sch group_by(school_short_name, school_ed_phase) |>
presence_mean_calc()
<- inner_join(
sch_dist_by_sch
dist_by_sch,
dist_attend_by_sch)
# plot
ggplot(sch_dist_by_sch,
aes(x = mean.dist_crow,
y = mean.percent_present#,
#colour = "dark blue"
)+
) geom_point(alpha = 0.7, colour = "steel blue")+
geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), colour = "steel blue", alpha = 0.5) +
geom_errorbar(aes(xmin = lower.ci.dist_crow, xmax = upper.ci.dist_crow), colour = "steel blue", alpha = 0.5) +
geom_text_repel(aes(label = school_short_name), size = 2.5, colour = "steel blue") +
labs(title = "Attendance vs distance travelled",
subtitle = "Sheffield Secondary schools; 2023 attendance rates",
x = "average straight line distance from home to school (m)",
y = "average % of sessions attended") +
scale_y_continuous(labels = scales::percent) +
theme(legend.position = "none")
Plotting the distance travelled against attendance at the child level reveals further differences. In the plot below we take one example from each of the four groups described above.
We can think of dividing these plots into four quadrants:
Notre Dame High has good attendance across the board, which varies regardless of the distance travelled. Mercia has excellent attendance, and a limited distance travelled, presumably due to it’s oversubscription and high demand, with most datapoints appearing in the top left. The trend line points slightly down, as a few children who live further away have lower attendance.
Meadowhead has typical average values for both attendance and distance, appearing in the middle of the pack in the plot above. Most children attend well and those with poorer attendance generally live close by - there are few in the bottom right. Chaucer by contrast has a small but significant number of points in the bottom right quadrant - those who attend very poorly and live far away. Some of this may be explained by families failing to secure a place at closer schools, and being placed across the city, with the distance then contributing to poor attendance.
|>
sch_dist_sheff_23 filter(school_short_name %in% c("Chaucer", "Mercia", "Notre Dame High",#)) |>
"Meadowhead")) |>
mutate(school = factor(school_short_name, levels = c("Notre Dame High","Mercia","Meadowhead","Chaucer"))) |>
ggplot(aes(x = dist_crow,
y = percent_present,
colour = school,
group = school)) +
geom_point(alpha = 0.6, size = 1.5) +
geom_smooth(alpha = 0.4) +
scale_y_continuous(labels = scales::percent) +
facet_wrap(vars(school)) +
theme(legend.title = eb,
legend.text = element_text(size = 7.5),
legend.position = "none", strip.background = eb
+
) labs(title = "Attendance vs distance travelled",
subtitle = "Selected Sheffield secondary schools",
x = "straight line home to school distance (m)",
y = "% of sessions attended")
It is difficult to establish the true number of young carers in the city - and perhaps dependent on definitions & methods. A 2023 all party parliamentary group (APPG) for young carers and adult carers report cites several sources:
<- sheffield_pupil_population_20241 * 0.1 yc_estimate_10pc
Applying the 10% figure to Sheffield’s pupil population would indicate over 7000 young carers in the city. Our local data identifies just 904 since 2020, so we provide the analysis here with the following caveat:
The data used in this section of the report comes from young carer type involvements in capita one, covering around 900 children from 2020 onwards. Clearly our data doesn’t capture all young carers (and may skew towards those at the more severe end of the caring spectrum) and/or we are working with different definitions of what a young carer is. Issues with getting people of all ages to self-identify as carers are well known, and the perceived stigma attached to caring roles is likely more acute in young people - indeed this is probably a factor in explaining differences in school attendance.
The involvements have an open date, but no close date, so a time series analysis of volumes isn’t possible, and also that the data implicitly assumes that a young carer remains so for the rest of their school career.
A descriptive of demographic analysis may also be misleading, but we can make a comparison of attendance rates, which shows a significant impact. Primary age young carers attend just under 4% less that those without a caring role. In secondary school this gap rises to 10%:
# can't do a time series on volumes as there are no close dates
# yc_time_series <-
# seq(ymd('2015-04-01'),ymd('2024-07-1'), by = '3 months')
# young_carers
<- attend |>
attend_yc_phase filter(ncy >= 1, ncy <= 11,
!phase %in% c("Nursery","6th form"),
>= 2020) |>
year left_join(young_carers,
join_by(stud_id == stud_id,
>= open_date)) |>
ht_start_date mutate(yc_flag = replace_na(code_des,"not young carer")) |>
group_by(yc_flag, phase) |>
presence_mean_calc()
<- attend |>
attend_yc_ncy filter(ncy >= 1, ncy <= 11,
>= 2020) |>
year left_join(young_carers,
join_by(stud_id == stud_id,
>= open_date)) |>
ht_start_date mutate(yc_flag = replace_na(code_des,"not young carer")) |>
group_by(yc_flag, ncy) |>
presence_mean_calc()
<- attend |>
attend_yc_year filter(ncy >= 1, ncy <= 11,
>= 2020) |>
year left_join(young_carers,
join_by(stud_id == stud_id,
>= open_date)) |>
ht_start_date mutate(yc_flag = replace_na(code_des,"not young carer")) |>
group_by(yc_flag, year) |>
presence_mean_calc()
ggplot(attend_yc_phase, aes(x = reorder(yc_flag,mean.percent_present, desc = TRUE),
y = mean.percent_present
+
)) geom_col(fill = "steel blue", position = position_dodge(0.9))+
geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, position = position_dodge(0.9))+
geom_text(aes(label = scales::percent(round(mean.percent_present,3))), hjust = 1,
colour = "white", size = 5,
position = position_dodge(0.9)) +
labs(title = "Attendance of young carers",
subtitle = "Avg % sessions attended, 2023 +-95CI; young carer status from capita one involvements",
caption = "data from Capita One")+
+
barplottheme_minimal theme(axis.title.x = eb, axis.text.y = element_text(size = 8), axis.text.x = eb,
legend.position = "none", plot.subtitle = element_text(size = 8, face = "italic"),
strip.background = eb) +
coord_flip() +
facet_grid(cols = vars(phase))
As we did for deprivation quartiles above, we can create an age profile of attendance for young carers, and compare it to pupils with no caring role. Again we see the greater impact on attendance as age increases, and presumably the expectations and stigmatisation around caring roles also increases. There is a particular drop in attendance going into year 8.
ggplot(attend_yc_ncy, aes(x = ncy,
y = mean.percent_present,
colour = yc_flag, group = yc_flag,
#label = label
+
)) geom_point() + geom_line() +
#geom_label_repel(hjust = 0, nudge_y = c(0.05,0.02,0.02), min.segment.length = Inf, alpha = 0.8) +
geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, alpha = 0.7)+
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(breaks = seq(1,11)) +
labs(title = "Attendance of<b><span style='color:#ce9642'> young carers </span></b>and <b><span style='color:#3b7c70;'>those without</b></span> a caring role",
subtitle = "average % of sessions attended since 2020; young carers data from Capita One involvements",
#caption = "data from Capita One"
+
)theme(plot.title = element_markdown(size = 12),
legend.position = "none",
axis.title.x = eb) +
::scale_fill_met_d("Kandinsky") +
MetBrewer::scale_colour_met_d("Kandinsky") +
MetBrewertheme(axis.title.y = eb, legend.position = "none") +
geom_vline(aes(xintercept = 6.5), linetype = "dotted", colour = "gray70", linewidth = 1.2) +
annotate("text", label = "primary", y = 0.99, x = 3.5, colour = "gray40") +
annotate("text", label = "secondary", y = 0.99, x = 9, colour = "gray40")
The attendance of young carers appears to be getting worse over time, though this may be a function of the cumulative natute of the data, which has no end dates attached, so our cohort of young carers is ageing in in the system
ggplot(attend_yc_year |>
filter(year > 2020),
aes(x = year,
y = mean.percent_present,
colour = yc_flag, group = yc_flag,
+
)) geom_point() + geom_line() +
geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, alpha = 0.7)+
scale_y_continuous(labels = scales::percent) +
labs(title = "Attendance of<b><span style='color:#ce9642'> young carers </span></b>and <b><span style='color:#3b7c70;'>those without</b></span> a caring role",
subtitle = "average % of sessions attended since 2020; young carers data from Capita One involvements; 2025 data is Autumn half-term 1 only",
caption = "data from Capita One"
+
)theme(plot.title = element_markdown(size = 12),
legend.position = "none",
axis.title = eb,
axis.line = eb,
axis.ticks = eb) +
::scale_fill_met_d("Kandinsky") +
MetBrewer::scale_colour_met_d("Kandinsky") MetBrewer
Better long term data is required to understand volumes, impacts & the geographical distribution of young carers, as well as change over time and the provision of services to young carers.
He we show how attendance is changing for each annual year group cohort of children, and explore some of the intersectionality between age, deprivation and special educational needs. This analysis particularly demonstrates differences in how the COVID pandemic, lockdowns and subsequent societal shifts have affected different groups.
Annual cohorts of children are referred to here as, for example, the “class of 2025” meaning the year group who began year 1 in September 2014 and will complete Y11 in July 2025. In each case there is a separate small line chart for each annual cohort. Data are labelled with the academic year and the % attendance rates, and the time period is divided into three phases: pre pandemic, during (2020 & 2021), and post pandemic - all years since. The time periods are denoted by colours or shapes, depending on the chart.
The first chart shows the overall picture in secondary schools. The first cohort shown here is the class of 2020, who completed most of Y11 before the pandemic struck, their GCSE exams were wildly disrupted, but their attendance follows only a shallow decline from Y7 through to Y11, while the classes of ’23 to ’25 (on the middle row), saw dramatic drops during the COVID years, and a continued decline in the period since. The classes of ’24 and ’25 were perhaps worse hit by the pandemic, effectively missing Y6-7 and Y7-8 respectively. Finally, the bottom row shows the latest three cohorts and some small but encouraging signs of recovery: the class of ’27 have less of a drop off to Y8, and the class of ’28 had the best attendance in Y7 since before the pandemic.
##| fig-height: 8
#| warning: false
#| message: false
|>
attend #select(-phase) |>
#rename(phase = school_ed_phase_corrected) |>
mutate(covid_year_flag = case_when(year < 2020 ~ "pre-COVID", year == 2020 ~ "lockdown years", year == 2021 ~ "lockdown years", year > 2021 ~ "post-pandemic")) |>
filter(cohort >= 2010, phase == "Secondary") |>
summarise_attendance(grouping_vars = c("ncy","class_of","year","covid_year_flag","phase")) |>
filter(ncy <= 11 & ncy >= 1,
> 1000
child_count |>
) ggplot(aes(x = ncy,
y = percent_present,
colour = covid_year_flag,
group = class_of
+
)) geom_point() + geom_line() +
scale_colour_manual(values = c("pre-COVID" = "#4DAF4A" ,"lockdown years" = "#E41A1C", "post-pandemic" = "#377EB8")) +
geom_text(aes(label = year), size = 2.5, nudge_y = -0.01, colour = "darkgrey") +
geom_text(aes(label = scales::percent(percent_present, accuracy = 0.1L)), size = 2.5, nudge_y = 0.01, alpha = 0.7) +
#scale_x_continuous(breaks = seq(7:11)) +
facet_wrap(vars(class_of))+
theme(legend.position = "top", axis.title.y = eb, strip.background = eb, axis.line.y = eb, axis.text.y = eb, axis.ticks.y = eb) +
labs(title = "Secondary school attendance by national curriculum year and annual cohort",
subtitle = "% of available sessions attended; all Sheffield schools; pandemic years are 2020 and 2021",
x = "NCY", colour = "pandemic time period", caption = "data from Capita One")
The picture in primary schools looks very different. Children generally attend better in years 2 to 4 than they do in Y1, so the underlying profile is more of a hump than a steady decline seen in secondary. The pandemic had a less dramatic effect on primary age children, and the decline also persisted into the post-pandemic years for many cohorts. However the big difference here, and an encouraging sign for the future, is that all cohorts from the class of ’29 onwards show improvements in recent years (here coloured blue), and that the youngest cohorts are showing the fastest improvements of all.
##| fig-height: 8
#| warning: false
|>
attend #select(-phase) |>
#rename(phase = school_ed_phase_corrected) |>
mutate(covid_year_flag = case_when(year < 2020 ~ "pre-COVID", year == 2020 ~ "lockdown years", year == 2021 ~ "lockdown years", year > 2021 ~ "post-pandemic")) |>
mutate(covid_year_flag = fct_relevel(covid_year_flag, c("pre-COVID","lockdown years","post-pandemic"))) |>
filter(cohort >= 2016, phase == "Primary") |>
summarise_attendance(grouping_vars = c("ncy","class_of","year","covid_year_flag","phase")) |>
filter(ncy <= 11 & ncy >= 1,
> 1000
child_count |>
) ggplot(aes(x = ncy,
y = percent_present,
colour = covid_year_flag,
group = class_of
+
)) geom_point() + geom_line() +
scale_colour_manual(values = c("pre-COVID" = "#4DAF4A" ,"lockdown years" = "#E41A1C", "post-pandemic" = "#377EB8")) +
geom_text(aes(label = year), size = 2.5, nudge_y = -0.005, colour = "darkgrey") +
geom_text(aes(label = scales::percent(percent_present, accuracy = 0.1L)), size = 2.5, nudge_y = 0.005, alpha = 0.7) +
scale_x_continuous(breaks = seq(1:6)) +
coord_cartesian(expand = FALSE, clip = "off") +
facet_wrap(vars(class_of))+
theme(legend.position = "bottom", legend.direction = "horizontal", legend.justification = "left", axis.title.y = eb, strip.background = eb, axis.line.y = eb, axis.text.y = eb, axis.ticks.y = eb) +
labs(title = "Primary school attendance by national curriculum year and annual cohort",
subtitle = "% of available sessions attended; all Sheffield schools; pandemic years are 2020 and 2021",
x = "NCY", colour = "time period: ", caption = "data from Capita One")
Re-creating the same plot but split by deprivation quartile, it becomes clear how the effects of the pandemic were concentrated in the more deprived areas of the city. Here the middle two quartiles of deprivation have been removed, and the pairs of lines show the most and least deprived quartiles of the school population, according to the 2019 indices of multiple deprivation scores of their ward of residence.
For all annual cohorts, the gap is stark, children living in more deprived areas were worse affected during the pandemic and have seen worse post-pandemic declines in attendance. If there is good news here, it is a narrowing of the gap in the latest Y7 intake.
##| fig-height: 8
#| warning: false
<- attend |>
plot_data #select(-phase) |>
left_join(stud_details_joined, by = "stud_id") |>
filter(imd_quartile %in% c(1,4)) |>
mutate(imd_quartile = case_when(imd_quartile == 1 ~ "least deprived",
== 4 ~ "most deprived")) |>
imd_quartile #rename(phase = school_ed_phase_corrected) |>
mutate(covid_year_flag = case_when(year < 2020 ~ "pre-COVID", year == 2020 ~ "lockdown years", year == 2021 ~ "lockdown years", year > 2021 ~ "post-pandemic")) |>
mutate(covid_year_flag = fct_relevel(covid_year_flag, c("pre-COVID","lockdown years","post-pandemic"))) |>
filter(cohort >= 2010, phase == "Secondary") |>
ungroup() |>
summarise_attendance(grouping_vars = c("ncy","class_of","year","covid_year_flag",
"phase",
"imd_quartile"
|>
)) filter(ncy <= 11 & ncy >= 1,
> 100
child_count
)
ggplot(plot_data,
aes(x = ncy,
y = percent_present,
colour = imd_quartile,
group = imd_quartile,
label = year,
shape = covid_year_flag)) +
geom_point(size = 2.5) +
scale_shape_manual(values = c("pre-COVID" = 1, "lockdown years" = 8, "post-pandemic" = 4)) +
geom_line() +
geom_text(data = plot_data |> filter(imd_quartile == 4),
aes(label = year), size = 2.5, nudge_y = -0.02, colour = "darkgrey") +
geom_text(aes(label = scales::percent(percent_present, accuracy = 0.1L)), size = 2.5, nudge_y = 0.02, alpha = 0.7) +
scale_y_continuous(labels = scales::percent) +
facet_wrap(vars(class_of))+
theme(axis.title = eb, strip.background = eb, legend.position = "top", strip.text = element_text(size = 7), legend.text = element_text(size = 7)) +
labs(title = "Secondary school attendance over time by national curriculum year, and deprivation quartile",
subtitle = "% of available sessions attended, top & bottom 25% by 2019 Indices of Multiple Deprivation (IMD) score of ward of residence",
caption = "data from Capita One",
shape = "COVID time period",
colour = "IMD quartile")
Repeating the same deprivation analysis for primary, and again we see how the pandemic disproportionately affected children in more deprived areas, with steeper dropoffs during the lockdown years. But for the classes of ’ we see recovery after the pandemic, for all cohorts and among the more deprived q
##| fig-height: 8
#| warning: false
<- attend |>
plot_data #select(-phase) |>
left_join(stud_details_joined, by = "stud_id") |>
filter(imd_quartile %in% c(1,4)) |>
mutate(imd_quartile = case_when(imd_quartile == 1 ~ "least deprived",
== 4 ~ "most deprived")) |>
imd_quartile #rename(phase = school_ed_phase_corrected) |>
mutate(covid_year_flag = case_when(year < 2020 ~ "pre-COVID", year == 2020 ~ "lockdown years", year == 2021 ~ "lockdown years", year > 2021 ~ "post-pandemic")) |>
mutate(covid_year_flag = fct_relevel(covid_year_flag, c("pre-COVID","lockdown years","post-pandemic"))) |>
filter(cohort >= 2016, phase == "Primary") |>
ungroup() |>
summarise_attendance(grouping_vars = c("ncy","class_of","year","covid_year_flag","phase","imd_quartile"
|>
)) filter(ncy <= 11 & ncy >= 1,
> 100
child_count |>
) group_by(class_of)
ggplot(plot_data,
aes(x = ncy,
y = percent_present,
colour = imd_quartile,
group = imd_quartile,
label = year,
shape = covid_year_flag)) +
geom_point(size = 3) +
scale_shape_manual(values = c("pre-COVID" = 1, "lockdown years" = 8, "post-pandemic" = 4)) +
geom_line() +
geom_text(data = plot_data |> filter(imd_quartile == 4),
aes(label = year), size = 2.5, nudge_y = -0.01, colour = "darkgrey") +
geom_text(aes(label = scales::percent(percent_present, accuracy = 0.1L)), size = 2.5, nudge_y = 0.01, alpha = 0.7) +
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(breaks = seq(1,6))+
facet_wrap(vars(class_of))+
theme(legend.position = "top",
axis.title = eb, strip.background = eb) +
labs(title = "Primary school attendance over time by national curriculum year, and deprivation quartile",
subtitle = "% of available sessions attended, top & bottom 25% by 2019 Indices of Multiple Deprivation (IMD) score of ward of residence",
caption = "data from Capita One",
shape = "COVID time period",
colour = "IMD quartile")
<- attend |>
plot_data mutate(covid_year_flag = case_when(year < 2020 ~ "pre-COVID", year == 2020 ~ "lockdown years", year == 2021 ~ "lockdown years", year > 2021 ~ "post-pandemic")) |>
mutate(covid_year_flag = fct_relevel(covid_year_flag, c("pre-COVID","lockdown years","post-pandemic"))) |>
filter(cohort >= 2010, phase == "Secondary",
>= 7, ncy <= 11) |>
ncy ungroup() |>
summarise_attendance(grouping_vars = c("ncy","class_of","year","covid_year_flag","sen_level"
|>
)) filter(ncy <= 11 & ncy >= 1,
# TO DO - fix this - what are the right levels
%in% c("EHCP","SEN support") | child_count > 500)
(sen_level |>
) group_by(class_of)
ggplot(plot_data,
aes(x = ncy,
y = percent_present,
colour = sen_level,
group = sen_level,
label = year,
shape = covid_year_flag)) +
geom_point(size = 3) +
scale_shape_manual(values = c("pre-COVID" = 1, "lockdown years" = 8, "post-pandemic" = 4)) +
geom_line() +
geom_text(data = plot_data |> filter(sen_level == "EHCP"),
aes(label = year), size = 2.5, nudge_y = -0.05, colour = "darkgrey") +
geom_text(aes(label = scales::percent(percent_present, accuracy = 0.1L)), size = 2.5, nudge_y = 0.01, alpha = 0.7) +
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(breaks = seq(7,11))+
facet_wrap(vars(class_of))+
theme(legend.position = "top",
axis.title = eb, strip.background = eb) +
labs(title = "Secondary school attendance over time by national curriculum year, and SEN level",
subtitle = "% of available sessions attended, top & bottom 25% by 2019 Indices of Multiple Deprivation (IMD) score of ward of residence",
caption = "data from Capita One",
shape = "COVID time period",
colour = "SEN level")
# TO DO why are there duplicates in some?
#check <- plot_data |> filter(
# class_of == "class of 2023",
# sen_level == "No SEN"
#)
<- attend |>
plot_data mutate(covid_year_flag = case_when(year < 2020 ~ "pre-COVID", year == 2020 ~ "lockdown years", year == 2021 ~ "lockdown years", year > 2021 ~ "post-pandemic")) |>
mutate(covid_year_flag = fct_relevel(covid_year_flag, c("pre-COVID","lockdown years","post-pandemic"))) |>
filter(cohort >= 2016, phase == "Primary") |>
ungroup() |>
summarise_attendance(grouping_vars = c("ncy","class_of","year","covid_year_flag","phase","sen_level"
|>
)) filter(ncy <= 11 & ncy >= 1,
> 100
child_count |>
) group_by(class_of)
ggplot(plot_data,
aes(x = ncy,
y = percent_present,
colour = sen_level,
group = sen_level,
label = year,
shape = covid_year_flag)) +
geom_point(size = 3) +
scale_shape_manual(values = c("pre-COVID" = 1, "lockdown years" = 8, "post-pandemic" = 4)) +
geom_line() +
#geom_text(data = plot_data |> filter(imd_quartile == 4),
# aes(label = year), size = 2.5, nudge_y = -0.01, colour = "darkgrey") +
geom_text(aes(label = scales::percent(percent_present, accuracy = 0.1L)), size = 2.5, nudge_y = 0.01, alpha = 0.7) +
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(breaks = seq(1,6))+
facet_wrap(vars(class_of))+
theme(legend.position = "top",
axis.title = eb, strip.background = eb) +
labs(title = "Primary school attendance over time by national curriculum year, and SEN level",
subtitle = "% of available sessions attended, top & bottom 25% by 2019 Indices of Multiple Deprivation (IMD) score of ward of residence",
caption = "data from Capita One",
shape = "COVID time period",
colour = "SEN level")
Children are classed as severely absent if they miss over 50% of available sessions in any given period. This section explores the characteristics of severely absent children, and how this is changing over time.
Severe absence rates (here measured over the full year) followed similar trajectories in primary and secondary schools through the pandemic, but are now diverging. As with overall absence, rates were already rising slowly before the pandemic. 2021 was a peak year, and 2022 showed recovery. Rates increased into 2023 but much more in secondary schools, and 2024 shows rates improving for primary but stable in secondary schools. If this rate in secondary schools represents a new normal, it is over double the pre-pandemic rate.
Almost 1 in 20 children at Sheffield secondary schools was severely absent in 2023.
ggplot(attend_year_phase |>
filter(phase %in% c("Primary","Secondary"),
>= 2018) |>
year mutate(grey_flag = if_else(year==2024,0,1))
aes(x = year, y = pc_of_pupils_severely_absent, #alpha = grey_flag,
, colour = phase)) +
geom_point() +
geom_line(linetype = "dashed", alpha = 0.5) +
geom_text(aes(label = year), size = 3, vjust = 1.5) +
+
barplottheme_minimal scale_y_continuous(labels = scales::percent) +
labs(title = "Severe absence by academic year in <b><span style='color:#dd5129'>primary </span></b>and <b><span style='color:#0f7ba2;'>secondary</b></span> schools",
subtitle = "percentage of pupils missing over half of available sessions",
caption = "data from Capita One") +
annotate("text", x = 2020.3, y = 0.045, label = "COVID-19", size = 3, hjust = 1.1) +
geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none",
axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) +
::scale_fill_met_d("Egypt") MetBrewer
Next we look at the severe attendance rates of groups with different characteristics. The groupings here are chosen as those that show significant differences in severe absence rates. Note that the characteristics given here are not mutually exclusive. Children with an EHCP plan were nearly 8% more likely to be severely absent than average. Children on free school meals were nearly 6% more likely. Children in Y11 have twice the average rate.
All primary years, and a few ethnic groups have significantly lower severe absence rates.
<- attend_stud_year |> filter(year == 2023) |>
sev_pc_all group_by(severe_absence) |>
tally() |>
mutate(pc = n / sum(n),
category = "all children") |>
filter(severe_absence == 1) |>
select(pc) |> pull()
<- attend_stud_year_ethcat |> filter(year == 2023) |>
sev_pc_eth_cat group_by(ethnicity_category, severe_absence) |>
tally() |>
mutate(pc = n / sum(n)) |>
mutate(category = str_c("ethnicity category ",ethnicity_category)) |>
ungroup() |> select(-ethnicity_category) |>
filter(severe_absence == 1)
<- attend |> filter(year == 2023) |> summarise_attendance(grouping_vars = "ethnicity_category")
sev_pc_eth_cat_2
<- attend_stud_year_gender |> filter(year == 2023) |>
sev_pc_gender group_by(gender, severe_absence) |>
tally() |> mutate(pc = n / sum(n)) |>
mutate(category = str_c("gender ",gender)) |>
ungroup() |> select(-gender) |>
filter(severe_absence == 1)
<- attend_stud_year_ncy |> filter(year == 2023) |>
sev_pc_ncy group_by(ncy, severe_absence) |>
tally() |> mutate(pc = n / sum(n)) |>
mutate(category = str_c("ncy ",ncy)) |>
ungroup() |> select(-ncy) |>
filter(severe_absence == 1)
<- attend_stud_year_fsm |> filter(year == 2023) |>
sev_pc_fsm group_by(fsm, severe_absence) |>
tally() |> mutate(pc = n / sum(n)) |>
mutate(category = if_else(fsm == 1, "free school meals",
"not on free school meals")) |>
ungroup() |> select(-fsm) |>
filter(severe_absence == 1)
<- attend_stud_year_sen_level |> filter(year == 2023) |>
sev_pc_sen_level group_by(sen_level, severe_absence) |>
tally() |> mutate(pc = n / sum(n)) |>
mutate(category = str_c("SEN level - ",sen_level)) |>
ungroup() |> select(-sen_level) |>
filter(severe_absence == 1)
<- rbind(
sev_plot_data #sev_pc_all,
sev_pc_eth_cat,#sev_pc_gender,
sev_pc_ncy,
sev_pc_fsm,
sev_pc_sen_level|>
) filter(!is.na(category))
ggplot(sev_plot_data,
aes(x = reorder(category,pc),
y = pc,
fill = pc)) +
geom_col() +
geom_text(aes(label = scales::percent(pc, accuracy = 1.1L)), size = 2.5, colour = "darkgrey",
nudge_y = 0.004) +
scale_y_continuous(labels = scales::percent) +
theme(axis.title = eb, legend.position = "none",
axis.text.y = element_text(size = 7.5)) +
labs(title = "Severe absence rates by selected pupil characteristics",
subtitle = "% of children in each group attending less than 50% of available sessions in 2023") +
geom_hline(aes(yintercept = sev_pc_all), linetype = "dotted") +
geom_text(label = str_c("all pupils ",scales::percent(sev_pc_all, accuracy = 1.1L)), x = 4.5, y = 0.045, size = 3, colour = "dark gray") +
coord_flip() +
scale_fill_distiller(palette = "Spectral")
The chart above shows relative severe absence rates of different groups, but we’ll complement that by quantifying the cohort of severely absent pupils in 2023 by their characteristics.
<- attend |>
sa_2023 filter(year == 2023,
>= 1, ncy <= 11,
ncy == 1
severe_absence |>
) left_join(stud_details_joined |>
select(stud_id, imd_quartile),
by = "stud_id") |>
select(stud_id, gender, ncy, imd_quartile, primary_specific_need) |>
group_by(stud_id) |>
slice(1)
|>
sa_2023 mutate(ncy = factor(ncy, levels = c(1,2,3,4,5,6,7,8,9,10,11))) |>
group_by(ncy) |>
count() |>
ggplot(aes(fill = ncy,
values = n)) +
expand_limits(x=c(0,0), y=c(0,0)) +
coord_equal() +
labs(
title = "Severely absent children in Sheffield, by national curriculum year",
subtitle = "Pupils missing over 50% of sessions in 2022-23",
fill = NULL, colour = NULL) +
#theme_ipsum_rc(grid="") +
theme_enhance_waffle() +
#theme(axis.line = eb, axis.text = eb, axis.ticks = eb) +
geom_waffle(
size = 0.5,
n_rows = 10,
colour = "white",
#radius = unit(1, "pt")
flip = TRUE#,
#make_proportional = TRUE
+
) facet_grid(~ncy) +
theme(axis.line = eb, axis.ticks = eb, axis.text = eb, legend.position = "none")
|>
sa_2023 filter(!is.na(imd_quartile)) |>
mutate(imd_quartile = factor(imd_quartile, levels = c(1,2,3,4))) |>
group_by(imd_quartile) |>
count() |>
ggplot(aes(fill = imd_quartile, values = n)) +
expand_limits(x=c(0,0), y=c(0,0)) +
coord_equal() +
labs(
title = "Severely absent children in Sheffield, by deprivation quartile",
subtitle = "Pupils missing over 50% of sessions in 2022-23",
fill = NULL, colour = NULL) +
#theme_ipsum_rc(grid="") +
theme_enhance_waffle() +
#theme(axis.line = eb, axis.text = eb, axis.ticks = eb) +
geom_waffle(
size = 0.5,
n_rows = 40,
colour = "white",
#radius = unit(1, "pt")
flip = TRUE#,
#make_proportional = TRUE
+
) geom_text(aes(x = c(1,2,3,4), y = (n / 40) + 2, label = n), nudge_x = 27, size = 2.5) +
facet_grid(~imd_quartile) +
theme(axis.line = eb, axis.ticks = eb, axis.text = eb, legend.position = "none")
|>
sa_2023 mutate(primary_specific_need = replace_na(primary_specific_need, "No SEN")) |>
mutate(primary_specific_need = factor(primary_specific_need)) |>
group_by(primary_specific_need) |>
tally() |>
mutate(primary_specific_need = reorder(primary_specific_need,desc(n))) |>
ggplot(aes(fill = primary_specific_need,
area = n,
label = paste(primary_specific_need,n, sep = "\n"))) +
labs(title = "Severely absent children in Sheffield, by primary specific need",
subtitle = "Pupils missing over 50% of sessions in 2022-23",
fill = NULL, colour = NULL) +
geom_treemap() +
geom_treemap_text(place = "centre",
size = 8,
force.print.labels = TRUE,
reflow = TRUE) +
theme(legend.position = "none")
In the chart below, severely absent children are classed as retained if they were also severely absent the year before, and new if not. Both categories have risen in recent years:
<- attend_stud_year |>
sa left_join(stud_details_joined) |> #might want this but not yet
left_join(attend |> select(stud_id, year, school_ed_phase_corrected) |> distinct()) |>
filter(severe_absence == 1,
== "Secondary") |>
school_ed_phase_corrected select(stud_id, year) |>
mutate(sa = 1,
prev_year = year - 1)
<- sa |>
sa_yoy left_join(sa |> select(-prev_year) |> rename(retained = sa),
join_by(stud_id == stud_id,
== year)) |>
prev_year mutate(
retained = if_else(is.na(retained),0,1),
new = if_else(retained == 0,1,0)
)
<- sa_yoy |>
sa_yoy_crunched group_by(year) |>
summarise(total = sum(sa),
new = sum(new),
retained = sum(retained),
pc_retained = sum(retained) / sum(sa)) |>
pivot_longer(cols = -year,
names_to = "category",
values_to = "value") |>
filter(year > 2006)
ggplot(sa_yoy_crunched |> filter(year >= 2018, category %in% c("new","retained")),
aes(x = year,
y = value,
colour = category,
group = category)) +
geom_point() + geom_line() +
labs(title = "Severely absent children: <b><span style='color:#dd5129'>new in the year </span></b>and <b><span style='color:#0f7ba2;'>retained from the previous year</b></span>",
subtitle = "Secondary provision only; count of children attending less than 50% of available sessions; 2024 data excludes the summer term",
caption = "data from Capita One") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none", axis.title = eb) +
::scale_fill_met_d("Egypt") MetBrewer
So the problem of severe absence is, in part, due to a cohort we could describe as chronically severely absent.
The retention rate here is calculated as the percentage of all severely absent pupils in a given year that were also severely absent the year before. In secondary schools, in 2023, this was around 40% of children who were severely absent in 2023 were also severely absent in 2022.
This retention rate has risen in recent years:
ggplot(sa_yoy_crunched |> filter(year >= 2018, category == "pc_retained"),
aes(x = year,
y = value, label = scales::percent(value, accuracy = 1.1L))) +
geom_point() + geom_line(linetype = "dotted") + geom_text(size = 2.5, nudge_y = -0.02) +
#geom_text(aes(label = scales::percent(pc_retained, accuracy = 1.1L, size = 3))) +
#geom_col(position = position_stack()) +
labs(title = "Year on year severe absence retention rate (secondary)",
subtitle = "% of severely absent children who were severely absent in the previous year",
caption = "data from Capita One") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none", axis.title = eb, axis.line.y = eb, axis.text.y = eb, axis.ticks.y = eb) +
::scale_fill_met_d("Egypt") MetBrewer
Plotting the retention rate by NCY shows increased year on year retention as children grow older. Here we’ve included the NCY profiles of two years: 2018 and 2024, showing the increased retention rates across the board into 2024.
<- attend_stud_year_ncy |>
sa_ncy left_join(stud_details_joined) |> #might want this but not yet
left_join(attend |> select(stud_id, year, school_ed_phase_corrected) |> distinct()) |>
filter(severe_absence == 1,
#school_ed_phase_corrected == "Secondary",
>= 6, ncy <= 11) |>
ncy select(stud_id, year, ncy) |>
mutate(sa = 1,
prev_year = year - 1)
<- sa_ncy |>
sa_yoy left_join(sa_ncy |> select(-prev_year, -ncy) |> rename(retained = sa),
join_by(stud_id == stud_id,
== year)) |>
prev_year mutate(
retained = if_else(is.na(retained),0,1),
new = if_else(retained == 0,1,0)
|>
) filter(ncy >= 7)
<- sa_yoy |>
sa_yoy_ncy_crunched filter(year %in% c(2018,
#2019,
#2020,
#2021,
#2022,
#2023,
2024)) |>
group_by(year, ncy) |>
summarise(total = sum(sa),
new = sum(new),
retained = sum(retained),
pc_retained = sum(retained) / sum(sa)) |>
#pivot_longer(cols = c(-year,-ncy),
# names_to = "category",
# values_to = "value") |>
mutate(year = factor(year)) |>
mutate(label = if_else(ncy == max(ncy), year, NA_character_))
ggplot(sa_yoy_ncy_crunched,#|> filter(category == "pc_retained"),
aes(x = ncy,
y = pc_retained,
colour = year,
group = year,
label = label)) +
geom_point() + geom_line() +
geom_label_repel() +
scale_y_continuous(labels = scales::percent)+
labs(title = "Severely absent children - year on year retention rate by NCY",
subtitle = "Secondary schools only; of children severely absent for the year, the % who were also severely absent the previous year",
caption = "data from Capita One") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none", axis.title = eb)
Following to the question of turnover & retention, we considered the trajectories of pupils with severe absence. We carried out sampling work on those with severe absence during secondary schools, in order find patterns or groups.
We categorised around 1400 such pupils into one of 7 categories:
This work is described in more detail in a separate short report [LINK], but the main takeaways were:
The analysis so far in this report has used data aggregated up to the half term or annual level. During the course of this project we processed the raw daily data (recorded as a string of symbols and codes) to allow analysis of attendance at the level of the individual day.
Fridays,(to a lesser extent Mondays) see significantly lower attendance than the other days of the week.
NOTE - in this render the day level data is not available
ggplot(attend_weekday_phase |>
filter(!week_day %in% c("Sat","Sun")),
aes(x = week_day,
y = percent_present,
#fill = phase,
label = scales::percent(percent_present, accuracy = 0.1L))
+
) geom_col(fill = "steel blue", position = "dodge") +
geom_text(position = position_dodge(0.9), colour = "white", size = 3, vjust = 1.5, fontface = "bold")+
scale_y_continuous(labels = scales::percent) +
labs(title = "Attendance by day of the week",
subtitle = "percentage of sessions attended since 2016; all Sheffield pupils",
caption = "data from Capita One") +
theme(plot.title = element_markdown(size = 12),
legend.position = "none", axis.title = eb,
strip.background = eb, axis.text.y = eb) +
facet_grid(cols = vars(phase)) +
barplottheme_minimal
Looking at a time series, we see that Friday’s lower attendance is nothing new, and the gap has not really changed over time: NOTE - in this render the day level data is not available
ggplot(attend_weekday_phase_year |>
filter(year <= 2023) |>
mutate(label = if_else(year == max(year), week_day, NA_character_)),
aes(x = year,
y = percent_present,
colour = week_day,
group = week_day,
label = label)) +
geom_point() +
geom_line() +
geom_label_repel(aes(x = 2023.5),
size = 2,
vjust = 1,
min.segment.length = Inf) +
facet_wrap(vars(phase), scales = "free_y", nrow = 2) +
scale_x_continuous(limits = c(2018,2025), breaks = seq(2018,2023)) +
theme(legend.position = "none", axis.title = eb, strip.background = eb,
strip.placement = "top")+
labs(title = "Attendance by weekday & year")
The day level data allows us to visualise an entire school year. Here we see how key points in the year and particular dates impact on school attendance. When the data are aggregated to the term level, there is very little seasonal variation, but differences at the day level are more dramatic than the differences we see between demographic groups.
In particular, we can see the impacts of:
ggplot(day_2023 |>
filter(half_term != 0,
!= as_date("2023-05-01")) |>
date mutate(label = case_when(
== as_date("2022-12-12") ~ "week before Christmas",
date == as_date("2023-03-10") ~ "heavy snowfall",
date == as_date("2023-02-01") ~ "teachers strike",
date == as_date("2023-02-28") ~ "teachers strike",
date == as_date("2023-03-16") ~ "teachers strikes",
date == as_date("2023-04-21") ~ "Eid al-Fitr",
date == as_date("2023-06-21") ~ "study leave",
date == as_date("2023-06-28") ~ "Eid al-Adha",
date == as_date("2023-07-21") ~ "end of term",
date TRUE ~ NA_character_)),
aes(x = date,
y = 1 - percent_present,
fill = 1 - percent_present,
label = label
+
)) geom_col() +
geom_text_repel(fontface = "italic", nudge_x = -1, size = 3, nudge_y = 0.02, colour = "gray40") +
scale_y_continuous(labels = scales::percent) +
theme(legend.position = "none", strip.background = eb, axis.title = eb,
#axis.text.x = element_text(angle = 90),
axis.line = eb, axis.ticks = eb, strip.text = element_text(size = 12)) +
scale_x_date(date_labels = "%d-%b") +
scale_fill_viridis_c(option = "viridis", direction = -1) +
facet_wrap(vars(half_term_name), scales = "free_x") +
labs(title = "School absence in Sheffield Schools - a full academic year - 2022/23",
subtitle = "each bar = 1 day; % of available sessions attended; all schools & all pupils")
Recreating the same plot for absences coded as illness (though this time showing the count of sick days rather than the % of available sessions) shows how rates increased dramatically through the run up to Christmas, peaks on Fridays (and to a lesser extent Mondays) throughout the year, and a significantly lower rate in the summer. NOTE - in this render the day level data is not available
ggplot(attend_daily |> filter(year == 2023, time_category == "term time"),
aes(x = date,
y = illness,
fill = illness)) +
geom_col() +
theme(legend.position = "none", strip.background = eb, axis.title = eb,
axis.text.x = element_text(angle = 90)) +
scale_x_date(date_labels = "%d-%b") +
scale_fill_viridis_c(option= "mako",direction = -1) +
facet_wrap(vars(half_term_name), scales = "free_x") +
labs(title = "Daily illness in Sheffield Schools - 2022/23",
subtitle = "Each bar = 1 day; count of sessions marked code I; all schools & all pupils")
The day level no reason plot shows a similar shape to the illness plot, which suggests that at least some of the no reason absences are explained by genuine sickness. NOTE - in this render the day level data is not available
ggplot(attend_daily |> filter(year == 2023, time_category == "term time"),
aes(x = date,
y = no_reason,
fill = no_reason)) +
geom_col() +
theme(legend.position = "none", strip.background = eb, axis.title = eb,
axis.text.x = element_text(angle = 90)) +
scale_x_date(date_labels = "%d-%b") +
scale_fill_viridis_c(option= "magma", direction = -1) +
facet_wrap(vars(half_term_name), scales = "free_x") +
labs(title = "Absence with no recorded reason in Sheffield Schools - 2022/23",
subtitle = "Each bar = 1 day; count of sessions coded N or O; all schools & all pupils")