Attendance in Sheffield Schools

Author

Giles Robinson

Published

February 10, 2025

Code
load(str_c(data_folder,"attendance_inclusion_data_model.RData"))

1 Introduction

1.1 Background & scope

This work was undertaken by the Sheffield City Council Business Intelligence team from around September 2023. New analysis was carried out on available data with the aim of understanding school attendance Sheffield and informing the requirements of the city’s response. This report summarises the findings of that analysis, along with commentary derived from discussions of those findings with colleagues in SCC, Learn Sheffield and from Sheffield schools.

This report covers the following:

  • recent trends
  • benchmarking and comparisons
  • key drivers of absences
  • demographic differences:
  • age
  • gender
  • ethnicity
  • geography & deprivation
  • distance to school
  • young carers
  • severe absence (<50% attendance)
  • the absence patterns of annual year cohorts
  • day level data analysis - mapping out a school year

Within the same analysis but out of scope of this report are:

  • special educational needs (this is covered in depth in the SNA report)
  • the performance of individual schools
  • exclusions
  • the reach and effectiveness of existing teams, services and interventions
some terms & definitions

Unless otherwise stated, absence refers to both authorised and unauthorised absences. Correspondingly, attendance refers to registered time in the classroom. Absence in this report may include periods of study leave, approved offsite activity

Unless otherwise stated, the word year refers to the academic or exam year. So 2023 refers to the period of schooling between September ’22 and July ’23.

1.2 Data sources & processing

Attendance, exclusion and school registration data and student details used in this report are from Capita One, retrieved from the OSCAR database, which is maintained by the Performance & Analysis Service (PAS). Supplementary information on school types and locations, geography & deprivation are held in spreadsheets.

An R script gathers, combines, processes and aggregates this data into a data model. That data model was last updated 16/8/24 to include the first release of the full year 2024 attendance data.

A more detailed description of this data processing is provided in the appendix.

1.3 Version control

v0.1 - 1/7/24 - Giles Robinson. First complete draft for circulation.
v0.2 - 16/8/24 - GR - updated with latest data, full 2024 academic year, various revisions, analysis of daily data; young carers

3 Demographics

Looking at how attendance varies with age, gender and ethnicity, and how this picture is changing over time.

3.1 Age

Absence is little higher in Y1 and Y2 when children are very young, and is level through primary. The transition to secondary school is associated with a big increase in absence, which continues year on year up to Y11. As we’ll see later on - this transition drop into Y7 and subsequent decline is more severe for groups with particular risk factors.

Code
# calculate average presence by ncy
attend_ncy <- attend |> 
  filter(year >= 2018,
         ncy >= 1 & ncy <= 11) |> 
  summarise_attendance(grouping_vars = c("ncy", "stud_id")) |> 
  group_by(ncy) |> 
  summarise (mean.percent_absent = mean(percent_absent, na.rm = TRUE),
             sd.percent_absent = sd(percent_absent, na.rm = TRUE),
             n.percent_absent = n() )  |> 
  mutate(se.percent_absent = sd.percent_absent / sqrt(n.percent_absent),
  lower.ci.percent_absent = mean.percent_absent - qt(1 - (0.05 / 2), n.percent_absent - 1) * se.percent_absent,
  upper.ci.percent_absent = mean.percent_absent + qt(1 - (0.05 / 2), n.percent_absent - 1) * se.percent_absent)

# plot
ggplot(attend_ncy, aes(x = ncy, y = mean.percent_absent)) +
  geom_col(position = position_dodge(0.9), fill = "#0072B2")+
  geom_errorbar(aes(ymin = lower.ci.percent_absent, ymax = upper.ci.percent_absent), width = 0.2, position = position_dodge(0.9))+
    geom_text(aes(label = scales::percent(round(mean.percent_absent,3))), vjust = 2, colour = "white", size = 3, position = position_dodge(0.9)) +
      labs(title = "Absence by school year",
           subtitle = "Average percentage of available sessions not attended +- 95 CI; all reason codes; all Sheffield schools & pupils, 2018 - 2024",
           x = "national curriculum year",
           caption = "data from Capita One")+
  barplottheme_minimal +
  theme(axis.text.y = eb) +
  scale_x_continuous(breaks = seq(1,11))

Note

The ImpactEd report Understanding Attendance - Report 1 identified an emerging trend of a jump in absence between Y7 and y8. The Sheffield data does not support this, with the increase from Y7 to Y8 looking broadly the same - around 1% increase in absence - as any other year on year increase within secondary years.

Looking at trends over time for primary school years, we see that the youngest and oldest primary age children were most affected. There are encouraging signs of recovery among all primary years into 2024, and particularly in Y1.

Code
attend |> 
  filter(year != 2020, year >= 2018,
         school_ed_phase_corrected == "Primary") |> 
  ungroup() |> 
  summarise_attendance(grouping_vars = c("ncy","year","school_ed_phase_corrected")) |> 
  filter(ncy <= 11 & ncy >= 1,
         child_count > 1000) |> 
  ungroup() |> 
  mutate(label = ifelse(year == max(year), ncy, NA_character_),
         ncy = factor(ncy)) |> 
  ggplot(aes(x = year,
        y = percent_present,
        colour = ncy,
        group = ncy,
        label = label
        )
    ) +
  geom_point(shape = 1) +
  geom_line() +
  geom_label_repel(hjust = TRUE,min.segment.length = Inf,max.overlaps = Inf,size = 2.5) +
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(limits = c(2018,2026), breaks = seq(2018,2025)) +
  theme(legend.position = "none", axis.title = eb, strip.background = eb, axis.line = eb, axis.ticks = eb) +
  labs(title = "Primary school attendance over time by national curriculum year",
           subtitle = "% of available sessions attended all Sheffield schools & pupils, 2020 excluded; 2025 is Autumn term only",
           caption = "data from Capita One") +
  coord_cartesian(clip = "off")

In secondary schools, we can see how disproportionately affected children in Y11, and encouraging signs of recovery in years 7 and 9. It is worth noting that the children in years 10 and 11 in 2024 were those who had their crucial Y6 and y7 transition years disrupted by the pandemic.

Code
attend |> 
  filter(year != 2020, year >= 2018,
         school_ed_phase_corrected == "Secondary") |> 
  ungroup() |> 
  summarise_attendance(grouping_vars = c("ncy","year","school_ed_phase_corrected")) |> 
  filter(ncy <= 11 & ncy >= 1,
         child_count > 1000) |> 
  ungroup() |> 
  mutate(label = ifelse(year == max(year), ncy, NA_character_),
         ncy = factor(ncy)) |> 
  ggplot(aes(x = year,
        y = percent_present,
        colour = ncy,
        group = ncy,
        label = label
        )
    ) +
  geom_point(shape = 1) +
  geom_line() +
  geom_label_repel(hjust = TRUE,min.segment.length = Inf,max.overlaps = Inf,size = 2.5) +
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(limits = c(2018,2026), breaks = seq(2018,2025)) +
  theme(legend.position = "none", axis.title = eb, strip.background = eb, axis.line = eb, axis.ticks = eb) +
  labs(title = "Secondary school attendance over time by national curriculum year",
           subtitle = "% of available sessions attended all Sheffield schools & pupils, 2020 excluded; 2025 is Autumn term only",
           caption = "data from Capita One") +
  coord_cartesian(clip = 'off')

The drop off in Y11 is driven in large part by an increase in study leave

These trends will be explored in more detail in the Trends by annual cohort section later in this report.

3.2 Gender

Looking at overall school attendance since 2021, girls attend slightly better than boys, a difference of about 0.5%.

The gender time series show boys and girls moving in lockstep through primary school, separated by about half a percentage point:

Code
attend_year_gender_phase |> 
  filter(!is.na(gender), year >= 2018, phase == "Primary") |>
  ungroup() |> 
  mutate(label = if_else(year == max(year), case_when(gender == "M" ~ "boys", gender == "F" ~ "girls"), NA_character_)) |> 
       ggplot(aes(x = year,
                  y = percent_present,
                  colour = gender, group = gender,
                  label = label)) + 
  geom_point(size = 3) + 
  geom_line() +
  barplottheme_minimal +
  theme(legend.position = "none", axis.title.x = eb, legend.title = eb) +
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(breaks = seq(2018,2026, by = 1)) +
  geom_text_repel(hjust = TRUE, nudge_x = 0.5, min.segment.length = Inf) +
  labs(title = "Primary school attendance by year and gender", 
       subtitle = "% of sessions attended per year; 2025 is Autumn term only",
           caption = "data from Capita One")

In secondary we see boys’ attendance overtaking girls in the aftermath of the pandemic, but all continuing to decline into 2024.

Code
attend_year_gender_phase |> 
  filter(!is.na(gender), year >= 2018, phase == "Secondary") |>
  ungroup() |> 
  mutate(label = if_else(year == max(year), case_when(gender == "M" ~ "boys", gender == "F" ~ "girls"), NA_character_)) |> 
       ggplot(aes(x = year,
                  y = percent_present,
                  colour = gender, group = gender,
                  label = label)) + 
  geom_point(size = 3) + 
  geom_line() + 
  barplottheme_minimal +
  theme(legend.position = "none", axis.title.x = eb, legend.title = eb) +
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(breaks = seq(2018,2026, by = 1)) +
  geom_text_repel(hjust = TRUE, nudge_x = 0.5, min.segment.length = Inf) +
  labs(title = "Secondary school attendance by year and gender", 
       subtitle = "% of sessions attended per year; 2025 is Autumn term only",
           caption = "data from Capita One")

Code
attend |> 
  filter(!is.na(gender), 
         gender != "U",
         year >= 2018,
         ncy >= 0, ncy <= 11) |>
  group_by(ncy, gender) |> 
  presence_mean_calc() |> 
  ungroup() |>
  mutate(label = if_else(ncy == max(ncy), case_when(gender == "M" ~ "boys", gender == "F" ~ "girls"), NA_character_)) |>
  ggplot(aes(x = ncy,
             y = mean.percent_present,
             colour = gender, group = gender,
             label = label)) + 
    geom_point(size = 1) + 
    geom_line() + 
    geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, alpha = 0.6)+
    barplottheme_minimal +
    theme(legend.position = "none", axis.title.x = eb, legend.title = eb) +
    scale_y_continuous(labels = scales::percent) +
    scale_x_continuous(breaks = seq(0,11, by = 1)) +
    geom_text_repel(hjust = TRUE, nudge_x = 0.5, min.segment.length = Inf) +
    labs(title = "School attendance by national curriculum year and gender", 
       subtitle = "% of sessions attended per year since 2018",
           caption = "data from Capita One")

Looking at age, gender and deprivation together, we see the pattern reversed in older children. In poorer wards of the city, girls consistently attend better than boys across all ages. In the most affluent wards, this is reversed in older children, with a gender gap widening from Y8 onwards, where boys have higher attendance.

Code
attend |>
  left_join(stud_details_joined |> select(-gender), by = "stud_id") |> 
  filter(year >= 2018,
         ncy >= 1,
         ncy <= 11,
         !is.na(gender),
         imd_quartile %in% c(1,4)) |> 
  mutate(imd_quart_name = if_else(imd_quartile == 1, "most affluent 25%", "most deprived 25%")) |> 
  select(-imd_quartile) |> 
  group_by(imd_quart_name,ncy,gender) |> 
  presence_mean_calc() |> 
  ggplot(aes(x = ncy, y = mean.percent_present,
             colour = gender, group = gender)) + 
  geom_point() +
  geom_line() +
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(breaks = seq(1:11)) +
  facet_grid(rows = vars(imd_quart_name)) +
  theme(axis.title.y = eb, legend.position = "top", legend.title = eb) +
  labs(title = "Secondary school attendance by IMD quartile, national curriculum year and gender", 
       subtitle = "% of sessions attended per year; 2025 is Autumn term only",
           caption = "data from Capita One")

3.3 Ethnicity

The ethnic makeup of Sheffield’s population continues to change, and there are differences in attendance rates between children in different ethnic groups. Here we summarise the data around ethnicity.

Caution

The ethnic groups and subgroups used in this analysis are those available the Capita One source data. These don’t necessarily align with the groupings used by ONS for census data, other organisations, or in other SCC data and reporting

With the caveat that data prior to 2018 may not be wholly complete, the attendance data allows us to look at a long term view of changes in the ethnic makeup of the Sheffield school population. Note the free y-axis scales on the following chart, means that the lines are not directly comparable:

Code
eth_category_volumes <- attend |>
  select(year, stud_id, ethnicity_category) |> unique() |> 
  group_by(year, ethnicity_category) |> summarise(student_count = n_distinct(stud_id)) |> mutate(freq = student_count / sum(student_count)) |>
  ungroup() |> mutate(label = ifelse(year == max(year), ethnicity_category, NA_character_),
                      label_n = ifelse(year %in% c(2008,2012,2016,2020,2024,2025),student_count,NA_real_)
                      )

ggplot(eth_category_volumes,
       aes(x = year, y = student_count, colour = ethnicity_category)) +
  geom_line() +
  scale_x_continuous(breaks = seq(2006,2024, by = 2)) +
  geom_label_repel(aes(label = label), nudge_x = 4, nudge_y = 0, alpha = 0.75, size = 2.5,
                   min.segment.length = Inf) +
  geom_text_repel(aes(label = label_n), size = 2.5) +
  facet_grid(rows = vars(fct_rev(ethnicity_category)), scales = "free_y") +
  barplottheme_minimal +
  theme(strip.background = eb, axis.title.x = eb, legend.position = "none", strip.text = eb, axis.text.y = eb) +
  labs(title = "Pupils in Sheffield by ethnicity category",
       subtitle = "unique count of pupils in attendance data per year",
       caption = "data from Capita One") +
  scale_colour_brewer(palette = "Dark2")

Code
attend_eth_des_phase <- attend |> 
  #select(-phase) |> 
  #rename(phase = school_ed_phase_corrected) |> 
  filter(year == 2023,
         phase %in% c("Primary","Secondary"),
         ncy >= 1,
         ncy <= 11) |> 
  group_by(ethnicity_description, phase) |>
  summarise_attendance(grouping_vars = c("ethnicity_description", "phase")) |> 
  select(ethnicity_description, phase, child_count, percent_of_pupils, percent_absent)
 
attend_eth_des_total <- attend |> 
  #select(-phase) |> 
  #rename(phase = school_ed_phase_corrected) |> 
  filter(year == 2023,
         phase %in% c("Primary","Secondary"),
         ncy >= 1,
         ncy <= 11) |>
  summarise_attendance(grouping_vars = "ethnicity_description") |> 
  select(ethnicity_description, child_count, percent_of_pupils, percent_absent) |> 
  mutate(phase = "Total")

attend_phase_total <- attend |> 
  #select(-phase) |> 
  #rename(phase = school_ed_phase_corrected) |> 
  filter(year == 2023,
         phase %in% c("Primary","Secondary"),
         ncy >= 1,
         ncy <= 11) |>
  summarise_attendance(grouping_vars = "phase") |> 
  select(phase, child_count, percent_of_pupils, percent_absent) |> 
  mutate(#phase = "Total",
         ethnicity_description = "all children")

attend_total <- attend |> 
  #select(-phase) |> 
  #rename(phase = school_ed_phase_corrected) |> 
  filter(year == 2023,
         phase %in% c("Primary","Secondary"),
         ncy >= 1,
         ncy <= 11) |>
  summarise_attendance(grouping_vars = "none") |> 
  select(child_count, percent_of_pupils, percent_absent) |> 
  mutate(phase = "Total",
         ethnicity_description = "all children")

eth_des_table <- rbind(
  attend_eth_des_phase,
  attend_eth_des_total,
  attend_phase_total,
  attend_total) |> 
  pivot_wider(names_from = phase, values_from = c(child_count,percent_of_pupils, percent_absent)) |> 
  rename(`ethnicity description` = ethnicity_description) |>
  ungroup() |> 
  arrange(desc(child_count_Total)) |> 
  select(
    `ethnicity description`,
     contains("Total"),
    contains("Primary"),
    contains("Secondary")
  )

eth_des_table |> 
  gt(rowname_col = "ethnicity description") |> 
  tab_spanner(id = 1, label = "Primary", columns = dplyr::contains("Primary")) |> 
  tab_spanner(id = 2, label = "Secondary", columns = dplyr::contains("Secondary")) |> 
  tab_spanner(id = 3, label = "Total", columns = dplyr::contains("Total")) |> 
  cols_label(
    contains("count") ~ "count",
    contains("percent_of_pupils") ~ "% of pupils",
    contains("percent_absent") ~ "% absent"
    ) |> 
  tab_header(
    title = "Pupils and attendance in Sheffield by ethnicity description",
    subtitle = "count of pupils on roll in 2023/24; data from School Census & Capita One attendance records") |>
  tab_options(
      table.align = "left",
      table.font.size = 10,
      heading.title.font.size = 12,
      heading.subtitle.font.size= 10,
      heading.align = "left",
      column_labels.font.size = 14,
      stub.font.size = 12
    ) |>
  cols_align(
    "left",'ethnicity description'
  ) |> 
  fmt_percent(columns = contains("percent"),
              decimals = 1) |> 
  data_color(
    columns = percent_absent_Primary,
    method = "numeric",
    palette = "viridis") |> 
  data_color(  columns = percent_absent_Secondary,
    method = "numeric",
    palette = "viridis") |> 
    data_color(  columns = percent_absent_Total,
    method = "numeric",
    palette = "viridis")
Pupils and attendance in Sheffield by ethnicity description
count of pupils on roll in 2023/24; data from School Census & Capita One attendance records
Total Primary Secondary
count % of pupils % absent count % of pupils % absent count % of pupils % absent
all children 72545 100.0% 8.3% 40411 55.7% 6.6% 32181 44.3% 10.5%
White British 41639 57.4% 7.8% 22919 55.0% 5.7% 18751 45.0% 10.4%
Black African and White/Black African 5613 7.7% 5.5% 3277 58.3% 5.0% 2340 41.7% 6.2%
Pakistani 5448 7.5% 9.0% 3090 56.7% 8.8% 2359 43.3% 9.2%
Any Other Ethnic Group 3050 4.2% 9.1% 1732 56.8% 8.5% 1318 43.2% 10.0%
Any Other White Background 2791 3.8% 9.3% 1589 56.9% 7.9% 1204 43.1% 11.2%
White/Black Caribbean 1976 2.7% 12.4% 1122 56.7% 9.1% 856 43.3% 17.0%
Other Asian Background 1814 2.5% 8.0% 1040 57.3% 7.6% 774 42.7% 8.4%
Gypsy, Roma and Traveller of Irish Heritage 1761 2.4% 21.1% 928 52.6% 18.0% 835 47.4% 24.6%
White/Asian 1641 2.3% 9.3% 970 59.1% 7.8% 671 40.9% 11.6%
Any Other Mixed 1604 2.2% 9.1% 932 58.1% 7.6% 672 41.9% 11.3%
not known 1521 2.1% 11.6% 613 40.2% 7.3% 912 59.8% 14.4%
Indian 1021 1.4% 6.6% 693 67.9% 7.2% 328 32.1% 5.2%
Bangladeshi 814 1.1% 8.3% 471 57.9% 8.1% 343 42.1% 8.6%
Any Other Black Background 723 1.0% 6.5% 436 60.2% 5.9% 288 39.8% 7.5%
Chinese 627 0.9% 3.6% 354 56.5% 4.1% 273 43.5% 2.9%
Black Caribbean 392 0.5% 9.1% 188 48.0% 7.1% 204 52.0% 11.2%
Irish 110 0.2% 10.1% 57 51.8% 6.1% 53 48.2% 14.6%

4 Geography & deprivation

There are many ways to divide up the city geographically, but we’ll look at the 28 wards, and in particular their deprivation as measured in the 2019 Indices of Multiple Deprivation (IMD) scores. More recent (and older) measures of deprivation may be available, but the analysis is broadly the same.

4.1 Attendance by ward

The table below shows overall attendance by ward of residence during 2023.

Code
attend_ward_phase <- attend |> 
  #select(-phase) |> 
  #rename(phase = school_ed_phase_corrected) |> 
  filter(year == 2023,
         phase %in% c("Primary","Secondary"),
         ncy >= 1,
         ncy <= 11) |> 
  #group_by(ward, phase) |>
  summarise_attendance(grouping_vars = c("ward", "phase")) |> 
  select(ward, phase, child_count, percent_of_pupils, percent_absent)
 
attend_ward_total <- attend |> 
  #select(-phase) |> 
  #rename(phase = school_ed_phase_corrected) |> 
  filter(year == 2023,
         phase %in% c("Primary","Secondary"),
         ncy >= 1,
         ncy <= 11) |>
  summarise_attendance(grouping_vars = "ward") |> 
  select(ward, child_count, percent_of_pupils, percent_absent) |> 
  mutate(phase = "Total")

attend_phase_total <- attend |> 
  #select(-phase) |> 
  #rename(phase = school_ed_phase_corrected) |> 
  filter(year == 2023,
         phase %in% c("Primary","Secondary"),
         ncy >= 1,
         ncy <= 11) |>
  summarise_attendance(grouping_vars = "phase") |> 
  select(phase, child_count, percent_of_pupils, percent_absent) |> 
  mutate(#phase = "Total",
         ward = "Sheffield")

attend_total <- attend |> 
  #select(-phase) |> 
  #rename(phase = school_ed_phase_corrected) |> 
  filter(year == 2023,
         phase %in% c("Primary","Secondary"),
         ncy >= 1,
         ncy <= 11) |>
  summarise_attendance(grouping_vars = "none") |> 
  select(child_count, percent_of_pupils, percent_absent) |> 
  mutate(phase = "Total",
         ward = "Sheffield")

ward_table <- rbind(
  attend_ward_phase,
  attend_ward_total,
  attend_phase_total,
  attend_total) |> 
  filter(!is.na(ward)) |> 
  pivot_wider(names_from = phase, values_from = c(child_count,percent_of_pupils, percent_absent)) |> 
  rename(`ethnicity description` = ward) |>
  ungroup() |> 
  arrange(desc(child_count_Total)) |> 
  select(
    `ethnicity description`,
     contains("Total"),
    contains("Primary"),
    contains("Secondary")
  )

ward_table |> 
  gt(rowname_col = "ethnicity description") |> 
  tab_spanner(id = 1, label = "Primary", columns = dplyr::contains("Primary")) |> 
  tab_spanner(id = 2, label = "Secondary", columns = dplyr::contains("Secondary")) |> 
  tab_spanner(id = 3, label = "Total", columns = dplyr::contains("Total")) |> 
  cols_label(
    contains("count") ~ "count",
    contains("percent_of_pupils") ~ "% of children",
    contains("percent_absent") ~ "% absent"
    ) |> 
  tab_header(
    title = "Pupils in Sheffield, by ward of residence",
    subtitle = "pupils on roll & attendance in 2023/24; data from School Census & Capita One attendance records") |>
  tab_options(
      table.align = "left",
      table.font.size = 10,
      heading.title.font.size = 12,
      heading.subtitle.font.size= 10,
      heading.align = "left",
      column_labels.font.size = 14,
      stub.font.size = 12
    ) |>
  cols_align(
    "left",'ethnicity description'
  ) |> 
  fmt_percent(columns = contains("percent"),
              decimals = 1) |> 
  data_color(
    columns = percent_absent_Primary,
    method = "numeric",
    palette = "viridis") |> 
  data_color(  columns = percent_absent_Secondary,
    method = "numeric",
    palette = "viridis") |> 
    data_color(columns = percent_absent_Total,
    method = "numeric",
    palette = "viridis")
Pupils in Sheffield, by ward of residence
pupils on roll & attendance in 2023/24; data from School Census & Capita One attendance records
Total Primary Secondary
count % of children % absent count % of children % absent count % of children % absent
Sheffield 72545 100.0% 8.3% 40411 55.7% 6.6% 32181 44.3% 10.5%
Burngreave 5858 7.8% 11.3% 3163 54.0% 9.7% 2697 46.0% 13.2%
Firth Park 4140 5.5% 9.7% 2312 55.8% 7.7% 1828 44.2% 12.3%
Darnall 3992 5.3% 10.2% 2378 59.5% 8.7% 1618 40.5% 12.6%
Manor Castle 3759 5.0% 9.8% 2202 58.6% 7.4% 1558 41.4% 13.4%
Shiregreen & Brightside 3603 4.8% 9.0% 2011 55.8% 7.2% 1592 44.2% 11.3%
Southey 3509 4.7% 10.5% 1940 55.3% 8.0% 1570 44.7% 13.6%
Ecclesall 3243 4.3% 4.2% 1762 54.3% 3.6% 1481 45.7% 5.1%
Gleadless Valley 3177 4.2% 9.9% 1769 55.6% 7.6% 1410 44.4% 13.0%
Nether Edge & Sharrow 2799 3.7% 7.1% 1570 56.1% 6.6% 1229 43.9% 7.7%
Park & Arbourthorne 2687 3.6% 9.9% 1537 57.2% 7.9% 1150 42.8% 12.7%
Beauchief & Greenhill 2650 3.5% 9.2% 1491 56.3% 7.1% 1159 43.7% 12.0%
Richmond 2502 3.3% 8.6% 1420 56.7% 6.5% 1083 43.3% 11.4%
Dore & Totley 2399 3.2% 4.7% 1339 55.8% 4.0% 1060 44.2% 5.6%
Hillsborough 2340 3.1% 7.7% 1309 55.9% 5.7% 1031 44.1% 10.3%
Stannington 2308 3.1% 6.3% 1265 54.8% 4.6% 1044 45.2% 8.6%
Woodhouse 2277 3.0% 9.0% 1300 56.8% 6.7% 987 43.2% 12.1%
Stocksbridge & Upper Don 2203 2.9% 7.4% 1195 53.9% 5.1% 1021 46.1% 10.2%
West Ecclesfield 2193 2.9% 8.0% 1195 54.3% 5.8% 1005 45.7% 10.6%
Walkley 2170 2.9% 7.6% 1337 61.6% 6.4% 834 38.4% 9.6%
Birley 2076 2.8% 8.8% 1109 53.4% 6.4% 967 46.6% 11.7%
East Ecclesfield 1992 2.7% 7.0% 1038 52.1% 5.7% 954 47.9% 8.6%
Graves Park 1942 2.6% 5.6% 1103 56.8% 4.5% 839 43.2% 7.3%
Crookes & Crosspool 1906 2.5% 4.6% 1032 54.1% 4.2% 874 45.9% 5.1%
Beighton 1902 2.5% 7.7% 1049 55.2% 6.0% 853 44.8% 10.0%
Fulwood 1746 2.3% 4.5% 951 54.5% 3.8% 795 45.5% 5.3%
Mosborough 1700 2.3% 8.0% 1001 58.9% 6.3% 699 41.1% 10.5%
Broomhill & Sharrow Vale 1501 2.0% 6.5% 885 59.0% 5.5% 616 41.0% 8.1%
City 666 0.9% 10.5% 438 65.8% 9.7% 228 34.2% 12.0%

4.2 Economic deprivation

These ward level attendance figures line up neatly with deprivation indicators. Plotting attendance against the 2019 Indices of Multiple Deprivation (IMD) scores shows a tight correlation.

Caution

Since school attendance figures one of the input variables to the IMD scores, there is some circular logic at work here. Even so, attendance is only one of 39 inputs, so this analysis is worth pursuing.

Code
attend |> filter(year == 2023) |> 
  summarise_attendance(grouping_vars = c("ward","ward_imd_score")) |> 
  ggplot(aes(x = ward_imd_score, y = percent_present,
             )) + geom_point() +
      geom_text_repel(aes(label = ward), size = 2.5,
                      segment.colour = "gray") +
      #geom_smooth(method = "lm") +
      scale_y_continuous(labels = scales::percent) +
      labs(title ="School attendance by ward level deprivation", 
           subtitle = "Average % of sessions attended 2023; ward of residence; all ages",
           caption = "data from Capita One",
           y = "attendance",
           x = "Indices of multiple deprivation score (2019)")

The link to deprivation has always been there but is stronger today - recreating the chart above with 2010 attendance and IMD scores shows a weaker relationship.

The link to deprivation less evident in primary schools, but stronger in secondary schools, and the gap between primary and secondary attendance widens in poorer areas of the city.

Code
ward_data <- attend |>
  #select(-phase) |> 
  #rename(phase = school_ed_phase_corrected) |> 
  filter(year == 2023,
         phase %in% c("Primary","Secondary"),
         ncy >= 1,
         ncy <= 11) |>
  summarise_attendance(grouping_vars = c("ward","ward_imd_score", "phase")) |> 
  arrange(ward_imd_score)

grid <- seq(from = min(ward_data$ward_imd_score, na.rm = TRUE), to = max(ward_data$ward_imd_score, na.rm = TRUE),
            length.out = ward_data |> ungroup() |> select(ward) |> distinct() |> tally() |> pull())

ward_grid <- ward_data |> 
  select(ward, ward_imd_score) |> 
  distinct() |> 
  arrange(ward_imd_score) |> 
  cbind(grid) |> 
  rename(label_sequence = 3) |> 
  ungroup() |> 
  select(-ward_imd_score)

plot_data <- ward_data |> left_join(ward_grid, by = "ward")
  
ggplot(plot_data,
         aes(x = ward_imd_score, y = percent_present, colour = phase, 
             group = ward,label = ward)) + 
    geom_point(size = 2.5, alpha = 0.7) + 
    geom_line(colour = "grey70") +
  scale_y_continuous(labels = scales::percent) +
  #coord_cartesian(expand = FALSE, clip = "off") +
  geom_text_repel(data = plot_data |> filter(phase == "Primary"), 
        aes(x = label_sequence, y = 0.75),
        colour = "grey40", size = 2.5,
        #force_pull   = 0,
        min.segment.length = Inf, 
        angle = 90,
        #segment.angle = 90,
        #point.padding = 0,
        #max.overlaps = Inf,
        #direction = "x", 
        nudge_y = 0.04#,
        #hjust = 0,
        #max.iter = 1e4, max.time = 1
        ) +

    labs(title = "School attendance in <b><span style='color:#dd5129'>primary </span></b>and <b><span style='color:#0f7ba2;'>secondary</b></span> schools, by ward level deprivation score",
    subtitle = "% of available sessions attended in 2023 by ward of residence",
    caption = "data from Capita One",
    y = "attendance",
    x = "Indices of multiple deprivation score (2019)") +
  theme(plot.title = element_markdown(size = 12),
        legend.position = "none"
        ) +
  MetBrewer::scale_fill_met_d("Egypt")

This longer term view below compares the trend in attendance between the top and bottom quartiles of the ward level deprivation scores, at the half-term level with a trend-line. The middle two quartiles are excluded from this plot. The gap between the most and least deprived areas narrowed towards the peak attendance rate in 2016, so gains were disproportionately made in poorer areas, but the most deprived quartile then falls away more rapidly since the pandemic. Filtering this to just Secondary phase makes little difference to the overall shape.

Code
attend |> 
  left_join(stud_details_joined, by = "stud_id") |>
  filter(imd_quartile %in% c(1,4)) |> 
  mutate(imd_quartile = factor(imd_quartile)) |> 
  summarise_attendance(grouping_vars = c("ht_id","ht_start_date","imd_quartile")) |> 
ggplot(aes(x = ht_start_date, y = percent_present, fill = imd_quartile, colour = imd_quartile)) + 
  geom_point() + 
  geom_smooth(alpha = 0.2) + 
  barplottheme_minimal +
  scale_y_continuous(labels = scales::percent) +
  scale_x_date()+
  labs(title = "Attendance of children living in the <b><span style='color:#ce9642'>most deprived </span></b>and <b><span style='color:#3b7c70;'>least deprived</b></span> wards of the city",
       subtitle = "groups are upper & lower quartiles of the IMD score of the ward of residence (2019); data points are half terms with trendline",
       caption = "data from Capita One")+
  theme(plot.title = element_markdown(size = 12),
        legend.position = "none",
        axis.title.x = eb) +
  MetBrewer::scale_fill_met_d("Kandinsky") +
  MetBrewer::scale_colour_met_d("Kandinsky")

The age profile by deprivation quartile shows how children in poorer areas have a steeper drop off through secondary school. Children in the most affluent 25% of wards attend better across all years, but show a more significant dropoff into Y11. Study leave?

Code
attend_deprivation_quartile_ncy <- attend |> 
  filter(year >= 2018, ncy >= 1, ncy <= 11) |> 
  left_join(stud_details_joined, by = "stud_id") |>
  filter(imd_quartile %in% c(1,4)) |> 
  mutate(imd_quartile = factor(imd_quartile)) |> 
  group_by(imd_quartile, ncy) |> summarise_avg() 

ggplot(attend_deprivation_quartile_ncy, aes(x = ncy, 
                       y = mean.percent_present,
                       colour = imd_quartile, group = imd_quartile,
                       )) +
  geom_point() + geom_line() +
  geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, alpha = 0.7)+
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(breaks = seq(1,11)) +
labs(title = "Attendance of children living in the <b><span style='color:#ce9642'>most deprived </span></b>and <b><span style='color:#3b7c70;'>least deprived</b></span> wards of the city",
       subtitle = "avg % of sessions attended since 2018 +-95CI; groups are upper & lower quartiles of the IMD score of the ward of residence (2019)",
       caption = "data from Capita One")+
  theme(plot.title = element_markdown(size = 12),
        legend.position = "none",
        axis.title.x = eb) +
  MetBrewer::scale_fill_met_d("Kandinsky") +
  MetBrewer::scale_colour_met_d("Kandinsky") +
  theme(axis.title.y = eb, legend.position = "none") +
  geom_vline(aes(xintercept = 6.5), linetype = "dotted", colour = "gray70", size = 1.2) +
  annotate("text", label = "primary", y = 0.99, x = 3.5, colour = "gray40") +
  annotate("text", label = "secondary", y = 0.99, x = 9, colour = "gray40")

4.3 Free School Meals

Free School Meal (FSM) status is perhaps a better indicator of socio-economic status of children than ward of residence, since it is means tested at the family level.

Code
fsm_table_data <- attend |> 
  mutate(fsm = replace_na(fsm, "0")) |> 
  #mutate(fsm = factor(fsm, levels = c("T","F"))) |> 
  #select(-phase) |> 
  #rename(phase = school_ed_phase_corrected) |> 
  filter(year == 2023,
         phase %in% c("Primary","Secondary"),
         ncy >= 1,
         ncy <= 11) |>
  summarise_attendance(grouping_vars = c("phase","fsm")) |> 
  select(phase, fsm, child_count, percent_of_pupils, percent_absent) |> 
 pivot_wider(names_from = phase, values_from = c(child_count,percent_of_pupils, percent_absent)) |> 
  mutate(fsm = fct_recode(fsm, "free school meal eligible" = "T","no fsm" = "F")) |> 
  ungroup()

fsm_table_total_row <- attend |> 
  #select(-phase) |> 
  #rename(phase = school_ed_phase_corrected) |> 
  filter(year == 2023,
         phase %in% c("Primary","Secondary"),
         ncy >= 1,
         ncy <= 11) |>
  summarise_attendance(grouping_vars = c("phase")) |> 
  select(phase, child_count, percent_of_pupils, percent_absent) |> 
 pivot_wider(names_from = phase, values_from = c(child_count,percent_of_pupils, percent_absent)) |> 
  mutate(fsm = "total") |> 
  ungroup() 

fsm_table_data |> 
  rbind(fsm_table_total_row) |> 
  rename(`free school meal` = fsm) |>
  gt(rowname_col = "free school meal") |> 
  tab_spanner(id = 1, label = "Primary", columns = dplyr::contains("Primary")) |> 
  tab_spanner(id = 2, label = "Secondary", columns = dplyr::contains("Secondary")) |> 
  #tab_spanner(id = 3, label = "Total", columns = dplyr::contains("Total")) |> 
  cols_label(
    contains("count") ~ "count",
    contains("percent_of_pupils") ~ "% of children",
    contains("percent_absent") ~ "avg % absent (2023)"
    ) |> 
  tab_header(
    title = "Pupils in Sheffield, by free school meal status",
    subtitle = "count of pupils on roll in 2023/24; data from School Census & Capita One attendance records") |>
  tab_options(
      table.align = "left",
      table.font.size = 10,
      heading.title.font.size = 12,
      heading.subtitle.font.size= 10,
      heading.align = "left",
      column_labels.font.size = 14,
      stub.font.size = 12
    ) |>
  cols_align("left",'free school meal') |> 
  fmt_percent(columns = contains("percent"),
              decimals = 1) |> 
  data_color(
    columns = percent_absent_Primary,
    method = "numeric",
    palette = "viridis",
    alpha = 0.7) |>
  data_color(  columns = percent_absent_Secondary,
    method = "numeric",
    palette = "viridis",
    alpha = 0.7)
Pupils in Sheffield, by free school meal status
count of pupils on roll in 2023/24; data from School Census & Capita One attendance records
Primary Secondary
count % of children avg % absent (2023) count % of children avg % absent (2023)
0 26535 65.7% 5.1% 21628 67.2% 7.3%
1 13878 34.3% 9.7% 10555 32.8% 17.1%
total 40411 55.7% 6.6% 32181 44.3% 10.5%
Code
ggplot(attend_year_ht_fsm, aes(x = ht_start_date, y = percent_present, fill = fsm, colour = fsm)) + 
  geom_point() + 
  geom_smooth() + 
  barplottheme_minimal +
    scale_y_continuous(labels = scales::percent) +
  annotate("text", x = date("2020-03-31"), y = 0.6, label = "COVID-19", size = 2.5, hjust = 1.1, colour = "dark gray") +
  geom_vline(xintercept = date("2020-03-31"), linetype = "longdash", colour = "dark gray") +
  labs(title = "Attendance by <b><span style='color:#dd5129'>children receiving free school meals </span></b>and <b><span style='color:#0f7ba2;'>not on fsm</b></span>",
       subtitle = "% of available sessions attended, with trend",
       caption = "data from Capita One")+
  theme(plot.title = element_markdown(size = 12),
        legend.position = "none",
        axis.title.x = eb) +
  #MetBrewer::scale_fill_met_d("Egypt") +
  scale_fill_manual(values = c("#0f7ba2","#dd5129")) +
  scale_colour_manual(values = c("#0f7ba2","#dd5129"))

More concerning are the exclusion rates for children with Free School Meals, which are rapidly diverging from those without.

Code
ggplot(attend_year_ht_fsm |> 
         filter(ht_start_date >= as_date("2016-09-01")), 
       aes(x = ht_start_date, y = percent_excluded, fill = fsm, colour = fsm)) + 
  geom_point() + 
  geom_smooth() + 
  barplottheme_minimal +
    scale_y_continuous(labels = scales::percent) +
  annotate("text", x = date("2020-03-31"), y = 0.01, label = "COVID-19", size = 2.5, hjust = 1.1, colour = "dark gray") +
  geom_vline(xintercept = date("2020-03-31"), linetype = "longdash", colour = "dark gray") +
  labs(title = "Exclusion rates by <b><span style='color:#dd5129'>children receiving free school meals </span></b>and <b><span style='color:#0f7ba2;'>not on fsm</b></span>",
       subtitle = "% of available sessions missed, with trend line",
       caption = "data from Capita One")+
  theme(plot.title = element_markdown(size = 12),
        legend.position = "none",
        axis.title.x = eb) +
  #MetBrewer::scale_fill_met_d("Egypt") +
  scale_fill_manual(values = c("#0f7ba2","#dd5129")) +
  scale_colour_manual(values = c("#0f7ba2","#dd5129"))

Code
   #geom_hline(yintercept = excl_2023_avg, linetype = "dashed", colour = "dark grey") +
  #annotate("text", x = date("2023-03-31"), y = 0.0025, label = "2023 average", size = 2.5, hjust = 1.1, colour = "dark gray")

4.4 Distance to school

We used the postcodes of each child’s home address and school location to calculate a measure of straight-line distance between the two.

Attendance is significantly better, on average for children who live closer to school. Children living very close to school (<100m) attend about 1.5% better on average in Primary. For secondary schools this difference is 2.3%. Conversely,

Code
# calculated average by binned distance primary
dist_data <- sch_dist_sheff_23 |> 
  filter(school_ed_phase_corrected %in% c("Primary","Secondary"),
         sen_level != "EHCP"
         ) |> 
  rename(phase = school_ed_phase_corrected)

sch_dist_binned_pri <- dist_data |> 
  filter(phase == "Primary") |>   
  mutate(dist_bin = cut(dist_crow, 
                       breaks = c(0,100,200,500,1000,2000,5000,10000,1000000))) |> 
  group_by(dist_bin) |> 
  presence_mean_calc() |> 
  filter(!is.na(dist_bin)) |> 
  mutate(dist_bin_label = c("<100m", "100 - 200m", "200 - 500m", "500m - 1km","1-2km","2-5km","5-10km","10km+"),
         phase = "Primary")

# calculated avg by binned distance secondary
sch_dist_binned_sec <- dist_data |> 
  filter(phase == "Secondary") |>   
  mutate(dist_bin = cut(dist_crow, 
                       breaks = c(0,100,200,500,1000,2000,5000,10000,1000000))) |> 
  group_by(dist_bin) |> 
  presence_mean_calc() |> 
  filter(!is.na(dist_bin)) |> 
  mutate(dist_bin_label = c("<100m", "100 - 200m", "200 - 500m", "500m - 1km","1-2km","2-5km","5-10km","10km+"),
         phase = "Secondary")

# calculate overall averages by phase
sch_dist_binned_overall <- dist_data |> 
  mutate(
    dist_bin = NA_character_, 
    dist_bin_label = "overall avg") |> 
  group_by(dist_bin, dist_bin_label, phase) |> 
  presence_mean_calc()

sch_dist_binned <- rbind(sch_dist_binned_pri, sch_dist_binned_sec, sch_dist_binned_overall) |>   
  mutate(fill_code = case_when(dist_bin_label == 'overall avg' ~ 'total',
                         TRUE ~ 'others')) 

# plot
ggplot(sch_dist_binned, aes(x = reorder(dist_bin_label,mean.percent_present), 
                       y = mean.percent_present,
                       fill = fill_code)) +
  geom_col(position = position_dodge(0.9))+
  geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, position = position_dodge(0.9))+
    geom_text(aes(label = scales::percent(round(mean.percent_present,3))), hjust = 1, 
              colour = "white", size = 3,
              position = position_dodge(0.9)
              ) +
      labs(title = "Attendance by distance to school",
           subtitle = "Avg % sessions attended, 2023 +-95CI; straight line home to school distance; excluding children with EHCP",
           caption = "data from Capita One")+
  barplottheme_minimal +
  theme(axis.title.x = eb, axis.text.y = element_text(size = 8), axis.text.x = eb,
        legend.position = "none", plot.subtitle = element_text(size = 8, face = "italic"),
        strip.background = eb) +
  coord_flip() +
  facet_grid(cols = vars(phase)) +
  scale_fill_manual(values = c("others"= "#0072B2", "total" = "#b47846"))

Plotting the average distance travelled against average attendance rates for secondary schools reveals four groupings:

  • on the right are two specialist facilities - UTC Sheffield & UTC Sheffield Olympic Legacy Park) and two catholic schools - All Saints and Notre Dame. All of these may incentivise pupils to travel further than normal.
  • the main bunch of schools in the middle seems to show a linear relationship between distance and attendance. Though this relationship is weak, and relies on us discarding the outliers (more on these below), and may not be a causal relationship.
  • Outlying this group above, Mercia, Tapton and High Storrs schools, are all in affluent areas of the city, and show higher attendance with average distance travelled
  • Below this group Chaucer school shows average distance travelled and below average attendance. Though, as we’ll see below, the average distance travelled disguises some significant differences.
Code
# the distance data is already filtered to just Sheffield schools, but here we want to remove specials & nursery:
dist_data <- sch_dist_sheff_23 |> 
  filter(school_type == "mainstream",
         school_ed_phase == "Secondary")

dist_by_sch <- dist_data |> 
  group_by(school_short_name, school_ed_phase) |> 
  summarise(mean.dist_crow = mean(dist_crow, na.rm = TRUE),
             sd.dist_crow = sd(dist_crow, na.rm = TRUE),
             n.dist_crow = n() )  |> 
  mutate(se.dist_crow = sd.dist_crow / sqrt(n.dist_crow),
  lower.ci.dist_crow = mean.dist_crow - qt(1 - (0.05 / 2), n.dist_crow - 1) * se.dist_crow,
  upper.ci.dist_crow = mean.dist_crow + qt(1 - (0.05 / 2), n.dist_crow - 1) * se.dist_crow)

dist_attend_by_sch <- dist_data |> 
  group_by(school_short_name, school_ed_phase) |> 
  presence_mean_calc()

sch_dist_by_sch <- inner_join(
  dist_by_sch,
  dist_attend_by_sch)

# plot
ggplot(sch_dist_by_sch, 
       aes(x = mean.dist_crow,
           y = mean.percent_present#,
           #colour = "dark blue"
           )
       ) +
  geom_point(alpha = 0.7,  colour = "steel blue")+
  geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present),  colour = "steel blue", alpha = 0.5) +
  geom_errorbar(aes(xmin = lower.ci.dist_crow, xmax = upper.ci.dist_crow),  colour = "steel blue", alpha = 0.5) +
    geom_text_repel(aes(label = school_short_name), size = 2.5,  colour = "steel blue") +
      labs(title = "Attendance vs distance travelled",
           subtitle = "Sheffield Secondary schools; 2023 attendance rates",
           x = "average straight line distance from home to school (m)",
           y = "average % of sessions attended") +
  scale_y_continuous(labels = scales::percent) +
  theme(legend.position = "none")

Plotting the distance travelled against attendance at the child level reveals further differences. In the plot below we take one example from each of the four groups described above.

We can think of dividing these plots into four quadrants: distance_quadrants

Notre Dame High has good attendance across the board, which varies regardless of the distance travelled. Mercia has excellent attendance, and a limited distance travelled, presumably due to it’s oversubscription and high demand, with most datapoints appearing in the top left. The trend line points slightly down, as a few children who live further away have lower attendance.
Meadowhead has typical average values for both attendance and distance, appearing in the middle of the pack in the plot above. Most children attend well and those with poorer attendance generally live close by - there are few in the bottom right. Chaucer by contrast has a small but significant number of points in the bottom right quadrant - those who attend very poorly and live far away. Some of this may be explained by families failing to secure a place at closer schools, and being placed across the city, with the distance then contributing to poor attendance.

Code
sch_dist_sheff_23 |> 
  filter(school_short_name %in% c("Chaucer", "Mercia", "Notre Dame High",#)) |>
  "Meadowhead")) |>
  mutate(school = factor(school_short_name, levels = c("Notre Dame High","Mercia","Meadowhead","Chaucer"))) |> 
  ggplot(aes(x = dist_crow,
             y = percent_present,
             colour = school,
             group = school)) +
  geom_point(alpha = 0.6, size = 1.5) +
  geom_smooth(alpha = 0.4) +
  scale_y_continuous(labels = scales::percent) +
  facet_wrap(vars(school)) +
  theme(legend.title = eb,
        legend.text = element_text(size = 7.5),
        legend.position = "none", strip.background = eb
        ) +
  labs(title = "Attendance vs distance travelled",
           subtitle = "Selected Sheffield secondary schools",
           x = "straight line home to school distance (m)",
           y = "% of sessions attended")

5 Young carers

It is difficult to establish the true number of young carers in the city - and perhaps dependent on definitions & methods. A 2023 all party parliamentary group (APPG) for young carers and adult carers report cites several sources:

  • 1.6% of pupils (2021 Census)
  • 0.5% of pupils (2023 school census) Though it places little confidence in these first two, preferring the estimates of two surveys:
  • 10% of all pupils provide high or very high levels of care (BBC / University of Nottingham)
  • 13% of pupils surveyed (COVID Social Mobility & Opportunities study)
Code
yc_estimate_10pc <- sheffield_pupil_population_20241 * 0.1

Applying the 10% figure to Sheffield’s pupil population would indicate over 7000 young carers in the city. Our local data identifies just 904 since 2020, so we provide the analysis here with the following caveat:

data on young carers

The data used in this section of the report comes from young carer type involvements in capita one, covering around 900 children from 2020 onwards. Clearly our data doesn’t capture all young carers (and may skew towards those at the more severe end of the caring spectrum) and/or we are working with different definitions of what a young carer is. Issues with getting people of all ages to self-identify as carers are well known, and the perceived stigma attached to caring roles is likely more acute in young people - indeed this is probably a factor in explaining differences in school attendance.

The involvements have an open date, but no close date, so a time series analysis of volumes isn’t possible, and also that the data implicitly assumes that a young carer remains so for the rest of their school career.

A descriptive of demographic analysis may also be misleading, but we can make a comparison of attendance rates, which shows a significant impact. Primary age young carers attend just under 4% less that those without a caring role. In secondary school this gap rises to 10%:

Code
# can't do a time series on volumes as there are no close dates
# yc_time_series <- 
#   seq(ymd('2015-04-01'),ymd('2024-07-1'), by = '3 months')
#   young_carers

attend_yc_phase <- attend |>  
  filter(ncy >= 1, ncy <= 11,
         !phase %in% c("Nursery","6th form"),
         year >= 2020) |> 
  left_join(young_carers,
            join_by(stud_id == stud_id,
                    ht_start_date >= open_date)) |> 
  mutate(yc_flag = replace_na(code_des,"not young carer")) |> 
  group_by(yc_flag, phase) |> 
  presence_mean_calc()

attend_yc_ncy <- attend |>  
  filter(ncy >= 1, ncy <= 11,
         year >= 2020) |> 
  left_join(young_carers,
            join_by(stud_id == stud_id,
                    ht_start_date >= open_date)) |> 
  mutate(yc_flag = replace_na(code_des,"not young carer")) |> 
  group_by(yc_flag, ncy) |> 
  presence_mean_calc()

attend_yc_year <- attend |>  
  filter(ncy >= 1, ncy <= 11,
         year >= 2020) |> 
  left_join(young_carers,
            join_by(stud_id == stud_id,
                    ht_start_date >= open_date)) |> 
  mutate(yc_flag = replace_na(code_des,"not young carer")) |> 
  group_by(yc_flag, year) |> 
  presence_mean_calc()
Code
ggplot(attend_yc_phase, aes(x = reorder(yc_flag,mean.percent_present, desc = TRUE), 
                       y = mean.percent_present
                       )) +
  geom_col(fill = "steel blue", position = position_dodge(0.9))+
  geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, position = position_dodge(0.9))+
    geom_text(aes(label = scales::percent(round(mean.percent_present,3))), hjust = 1, 
              colour = "white", size = 5,
              position = position_dodge(0.9)) +
      labs(title = "Attendance of young carers",
           subtitle = "Avg % sessions attended, 2023 +-95CI; young carer status from capita one involvements",
           caption = "data from Capita One")+
  barplottheme_minimal +
  theme(axis.title.x = eb, axis.text.y = element_text(size = 8), axis.text.x = eb,
        legend.position = "none", plot.subtitle = element_text(size = 8, face = "italic"),
        strip.background = eb) +
  coord_flip() +
  facet_grid(cols = vars(phase))

As we did for deprivation quartiles above, we can create an age profile of attendance for young carers, and compare it to pupils with no caring role. Again we see the greater impact on attendance as age increases, and presumably the expectations and stigmatisation around caring roles also increases. There is a particular drop in attendance going into year 8.

Code
ggplot(attend_yc_ncy, aes(x = ncy, 
                       y = mean.percent_present,
                       colour = yc_flag, group = yc_flag,
                       #label = label
)) +
  geom_point() + geom_line() +
  #geom_label_repel(hjust = 0, nudge_y = c(0.05,0.02,0.02), min.segment.length = Inf, alpha = 0.8) +
  geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, alpha = 0.7)+
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(breaks = seq(1,11)) +
labs(title = "Attendance of<b><span style='color:#ce9642'> young carers </span></b>and <b><span style='color:#3b7c70;'>those without</b></span> a caring role",
       subtitle = "average % of sessions attended since 2020; young carers data from Capita One involvements",
       #caption = "data from Capita One"
     )+
  theme(plot.title = element_markdown(size = 12),
        legend.position = "none",
        axis.title.x = eb) +
  MetBrewer::scale_fill_met_d("Kandinsky") +
  MetBrewer::scale_colour_met_d("Kandinsky") +
  theme(axis.title.y = eb, legend.position = "none") +
  geom_vline(aes(xintercept = 6.5), linetype = "dotted", colour = "gray70", linewidth = 1.2) +
  annotate("text", label = "primary", y = 0.99, x = 3.5, colour = "gray40") +
  annotate("text", label = "secondary", y = 0.99, x = 9, colour = "gray40")

The attendance of young carers appears to be getting worse over time, though this may be a function of the cumulative natute of the data, which has no end dates attached, so our cohort of young carers is ageing in in the system

Code
ggplot(attend_yc_year |> 
         filter(year > 2020), 
       aes(x = year, 
           y = mean.percent_present,
           colour = yc_flag, group = yc_flag,
           )) +
  geom_point() + geom_line() +
  geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, alpha = 0.7)+
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Attendance of<b><span style='color:#ce9642'> young carers </span></b>and <b><span style='color:#3b7c70;'>those without</b></span> a caring role",
       subtitle = "average % of sessions attended since 2020; young carers data from Capita One involvements; 2025 data is Autumn half-term 1 only",
       caption = "data from Capita One"
     )+
  theme(plot.title = element_markdown(size = 12),
        legend.position = "none",
        axis.title = eb,
        axis.line = eb,
        axis.ticks = eb) +
  MetBrewer::scale_fill_met_d("Kandinsky") +
  MetBrewer::scale_colour_met_d("Kandinsky")

recomendation

Better long term data is required to understand volumes, impacts & the geographical distribution of young carers, as well as change over time and the provision of services to young carers.

7 Severe absences

Children are classed as severely absent if they miss over 50% of available sessions in any given period. This section explores the characteristics of severely absent children, and how this is changing over time.

Severe absence rates (here measured over the full year) followed similar trajectories in primary and secondary schools through the pandemic, but are now diverging. As with overall absence, rates were already rising slowly before the pandemic. 2021 was a peak year, and 2022 showed recovery. Rates increased into 2023 but much more in secondary schools, and 2024 shows rates improving for primary but stable in secondary schools. If this rate in secondary schools represents a new normal, it is over double the pre-pandemic rate.

Important

Almost 1 in 20 children at Sheffield secondary schools was severely absent in 2023.

Code
ggplot(attend_year_phase |> 
         filter(phase %in% c("Primary","Secondary"),
                year >= 2018) |> 
         mutate(grey_flag = if_else(year==2024,0,1))
       , aes(x = year, y = pc_of_pupils_severely_absent, #alpha = grey_flag, 
             colour = phase)) +
  geom_point() +
  geom_line(linetype = "dashed", alpha = 0.5) +
  geom_text(aes(label = year), size = 3, vjust = 1.5) +
  barplottheme_minimal +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Severe absence by academic year in <b><span style='color:#dd5129'>primary </span></b>and <b><span style='color:#0f7ba2;'>secondary</b></span> schools",
    subtitle = "percentage of pupils missing over half of available sessions",
    caption = "data from Capita One") +
    annotate("text", x = 2020.3, y = 0.045, label = "COVID-19", size = 3, hjust = 1.1) +
    geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") +
  theme(plot.title = element_markdown(size = 12),
        legend.position = "none",
        axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) +
  MetBrewer::scale_fill_met_d("Egypt")

Next we look at the severe attendance rates of groups with different characteristics. The groupings here are chosen as those that show significant differences in severe absence rates. Note that the characteristics given here are not mutually exclusive. Children with an EHCP plan were nearly 8% more likely to be severely absent than average. Children on free school meals were nearly 6% more likely. Children in Y11 have twice the average rate.

All primary years, and a few ethnic groups have significantly lower severe absence rates.

Code
sev_pc_all <- attend_stud_year |> filter(year == 2023) |> 
  group_by(severe_absence) |> 
  tally() |> 
  mutate(pc = n / sum(n),
         category = "all children") |> 
  filter(severe_absence == 1) |> 
  select(pc) |> pull()

sev_pc_eth_cat <- attend_stud_year_ethcat |> filter(year == 2023) |> 
  group_by(ethnicity_category, severe_absence) |> 
  tally() |> 
  mutate(pc = n / sum(n)) |> 
  mutate(category = str_c("ethnicity category ",ethnicity_category)) |> 
  ungroup() |> select(-ethnicity_category) |> 
  filter(severe_absence == 1)

sev_pc_eth_cat_2 <- attend |> filter(year == 2023) |> summarise_attendance(grouping_vars = "ethnicity_category")

sev_pc_gender <- attend_stud_year_gender |> filter(year == 2023) |>
 group_by(gender, severe_absence) |>
 tally() |> mutate(pc = n / sum(n)) |>
 mutate(category = str_c("gender ",gender)) |>
 ungroup() |> select(-gender) |>
 filter(severe_absence == 1)

sev_pc_ncy <- attend_stud_year_ncy |> filter(year == 2023) |> 
  group_by(ncy, severe_absence) |> 
  tally() |> mutate(pc = n / sum(n)) |> 
  mutate(category = str_c("ncy ",ncy)) |> 
  ungroup() |> select(-ncy) |> 
  filter(severe_absence == 1)

sev_pc_fsm <- attend_stud_year_fsm |> filter(year == 2023) |> 
  group_by(fsm, severe_absence) |> 
  tally() |> mutate(pc = n / sum(n)) |> 
  mutate(category = if_else(fsm == 1, "free school meals",
                            "not on free school meals")) |> 
  ungroup() |> select(-fsm) |> 
  filter(severe_absence == 1)

sev_pc_sen_level <- attend_stud_year_sen_level |> filter(year == 2023) |> 
  group_by(sen_level, severe_absence) |> 
  tally() |> mutate(pc = n / sum(n)) |> 
  mutate(category = str_c("SEN level - ",sen_level)) |> 
  ungroup() |> select(-sen_level) |> 
  filter(severe_absence == 1)

sev_plot_data <- rbind(
  #sev_pc_all,
  sev_pc_eth_cat,
  #sev_pc_gender,
  sev_pc_ncy,
  sev_pc_fsm,
  sev_pc_sen_level
) |> 
  filter(!is.na(category))

ggplot(sev_plot_data,
       aes(x = reorder(category,pc),
           y = pc,
           fill = pc)) +
  geom_col() +
  geom_text(aes(label = scales::percent(pc, accuracy = 1.1L)), size = 2.5, colour = "darkgrey",
            nudge_y = 0.004) +
  scale_y_continuous(labels = scales::percent) +
  theme(axis.title = eb, legend.position = "none",
        axis.text.y = element_text(size = 7.5)) +
  labs(title = "Severe absence rates by selected pupil characteristics",
       subtitle = "% of children in each group attending less than 50% of available sessions in 2023") +
  geom_hline(aes(yintercept = sev_pc_all), linetype = "dotted") +
  geom_text(label = str_c("all pupils ",scales::percent(sev_pc_all, accuracy = 1.1L)), x = 4.5, y = 0.045, size = 3, colour = "dark gray") +
    coord_flip() +
  scale_fill_distiller(palette = "Spectral")

The chart above shows relative severe absence rates of different groups, but we’ll complement that by quantifying the cohort of severely absent pupils in 2023 by their characteristics.

Code
sa_2023 <- attend |> 
  filter(year == 2023,
         ncy >= 1, ncy <= 11,
         severe_absence == 1
  ) |> 
  left_join(stud_details_joined |> 
              select(stud_id, imd_quartile), 
            by = "stud_id") |> 
  select(stud_id, gender, ncy, imd_quartile, primary_specific_need) |> 
  group_by(stud_id) |> 
  slice(1)
Code
sa_2023 |> 
  mutate(ncy = factor(ncy, levels = c(1,2,3,4,5,6,7,8,9,10,11))) |> 
  group_by(ncy) |> 
  count() |> 
  ggplot(aes(fill = ncy,
        values = n)) + 
  expand_limits(x=c(0,0), y=c(0,0)) +
  coord_equal() +
  labs(
    title = "Severely absent children in Sheffield, by national curriculum year",
    subtitle = "Pupils missing over 50% of sessions in 2022-23",
    fill = NULL, colour = NULL) +
  #theme_ipsum_rc(grid="") +
  theme_enhance_waffle() +
  #theme(axis.line = eb, axis.text = eb, axis.ticks = eb) +
  geom_waffle(
    size = 0.5,
    n_rows = 10,
    colour = "white",
    #radius = unit(1, "pt")
    flip = TRUE#,
    #make_proportional = TRUE
    ) +
  facet_grid(~ncy) +
  theme(axis.line = eb, axis.ticks = eb, axis.text = eb, legend.position = "none")

Code
sa_2023 |> 
  filter(!is.na(imd_quartile)) |> 
  mutate(imd_quartile = factor(imd_quartile, levels = c(1,2,3,4))) |> 
  group_by(imd_quartile) |> 
  count() |> 
  ggplot(aes(fill = imd_quartile, values = n)) + 
  expand_limits(x=c(0,0), y=c(0,0)) +
  coord_equal() +
  labs(
    title = "Severely absent children in Sheffield, by deprivation quartile",
    subtitle = "Pupils missing over 50% of sessions in 2022-23",
    fill = NULL, colour = NULL) +
  #theme_ipsum_rc(grid="") +
  theme_enhance_waffle() +
  #theme(axis.line = eb, axis.text = eb, axis.ticks = eb) +
  geom_waffle(
    size = 0.5,
    n_rows = 40,
    colour = "white",
    #radius = unit(1, "pt")
    flip = TRUE#,
    #make_proportional = TRUE
    ) +
  geom_text(aes(x = c(1,2,3,4), y = (n / 40) + 2, label = n), nudge_x = 27, size = 2.5) +
  facet_grid(~imd_quartile) +
  theme(axis.line = eb, axis.ticks = eb, axis.text = eb, legend.position = "none")

Code
sa_2023 |> 
  mutate(primary_specific_need = replace_na(primary_specific_need, "No SEN")) |> 
  mutate(primary_specific_need = factor(primary_specific_need)) |> 
  group_by(primary_specific_need) |> 
  tally() |> 
  mutate(primary_specific_need = reorder(primary_specific_need,desc(n))) |> 
  ggplot(aes(fill = primary_specific_need, 
             area = n,
             label = paste(primary_specific_need,n, sep = "\n"))) + 
  labs(title = "Severely absent children in Sheffield, by primary specific need",
       subtitle = "Pupils missing over 50% of sessions in 2022-23",
       fill = NULL, colour = NULL) +
  geom_treemap() +
  geom_treemap_text(place = "centre",
                    size = 8,
                    force.print.labels = TRUE,
                    reflow = TRUE) +
  theme(legend.position = "none")

7.1 Severe absence - turnover and retention

In the chart below, severely absent children are classed as retained if they were also severely absent the year before, and new if not. Both categories have risen in recent years:

Code
sa <- attend_stud_year |> 
  left_join(stud_details_joined) |> #might want this but not yet
  left_join(attend |> select(stud_id, year, school_ed_phase_corrected) |> distinct()) |> 
  filter(severe_absence == 1,
         school_ed_phase_corrected == "Secondary") |>
  select(stud_id, year) |> 
  mutate(sa = 1,
         prev_year = year - 1)

sa_yoy <- sa |> 
  left_join(sa |> select(-prev_year) |> rename(retained = sa),
            join_by(stud_id == stud_id,
                    prev_year == year)) |>
    mutate(
      retained = if_else(is.na(retained),0,1),
      new = if_else(retained == 0,1,0)
      )

sa_yoy_crunched <- sa_yoy |> 
  group_by(year) |> 
  summarise(total = sum(sa),
            new = sum(new),
            retained = sum(retained),
            pc_retained = sum(retained) / sum(sa)) |> 
  pivot_longer(cols = -year,
               names_to = "category",
               values_to = "value") |> 
  filter(year > 2006)

ggplot(sa_yoy_crunched |> filter(year >= 2018, category %in% c("new","retained")),
    aes(x = year,
y = value,
colour = category,
group = category)) +
geom_point() + geom_line() +
 labs(title = "Severely absent children: <b><span style='color:#dd5129'>new in the year </span></b>and <b><span style='color:#0f7ba2;'>retained from the previous year</b></span>",
   subtitle = "Secondary provision only; count of children attending less than 50% of available sessions; 2024 data excludes the summer term",
   caption = "data from Capita One") +
 theme(plot.title = element_markdown(size = 12),
       legend.position = "none", axis.title = eb) +
 MetBrewer::scale_fill_met_d("Egypt")

So the problem of severe absence is, in part, due to a cohort we could describe as chronically severely absent.

The retention rate here is calculated as the percentage of all severely absent pupils in a given year that were also severely absent the year before. In secondary schools, in 2023, this was around 40% of children who were severely absent in 2023 were also severely absent in 2022.

This retention rate has risen in recent years:

Code
ggplot(sa_yoy_crunched |> filter(year >= 2018, category == "pc_retained"),
    aes(x = year,
y = value, label = scales::percent(value, accuracy = 1.1L))) +
geom_point() + geom_line(linetype = "dotted") + geom_text(size = 2.5, nudge_y = -0.02) +
#geom_text(aes(label = scales::percent(pc_retained, accuracy = 1.1L, size = 3))) +
 #geom_col(position = position_stack()) +
 labs(title = "Year on year severe absence retention rate (secondary)",
   subtitle = "% of severely absent children who were severely absent in the previous year",
   caption = "data from Capita One") +
 theme(plot.title = element_markdown(size = 12),
       legend.position = "none", axis.title = eb, axis.line.y = eb, axis.text.y = eb, axis.ticks.y = eb) +
 MetBrewer::scale_fill_met_d("Egypt")

Plotting the retention rate by NCY shows increased year on year retention as children grow older. Here we’ve included the NCY profiles of two years: 2018 and 2024, showing the increased retention rates across the board into 2024.

Code
sa_ncy <- attend_stud_year_ncy |> 
  left_join(stud_details_joined) |> #might want this but not yet
  left_join(attend |> select(stud_id, year, school_ed_phase_corrected) |> distinct()) |> 
  filter(severe_absence == 1,
         #school_ed_phase_corrected == "Secondary",
         ncy >= 6, ncy <= 11) |>
  select(stud_id, year, ncy) |> 
  mutate(sa = 1,
         prev_year = year - 1)
sa_yoy <- sa_ncy |> 
  left_join(sa_ncy |> select(-prev_year, -ncy) |> rename(retained = sa),
            join_by(stud_id == stud_id,
                    prev_year == year)) |>
    mutate(
      retained = if_else(is.na(retained),0,1),
      new = if_else(retained == 0,1,0)
      ) |>
  filter(ncy >= 7)

sa_yoy_ncy_crunched <- sa_yoy |>
  filter(year %in% c(2018,
                     #2019,
                     #2020,
                     #2021,
                     #2022,
                     #2023,
                     2024)) |> 
  group_by(year, ncy) |> 
  summarise(total = sum(sa),
            new = sum(new),
            retained = sum(retained),
            pc_retained = sum(retained) / sum(sa)) |> 
  #pivot_longer(cols = c(-year,-ncy),
  #             names_to = "category",
  #             values_to = "value") |> 
  mutate(year = factor(year)) |> 
  mutate(label = if_else(ncy == max(ncy), year, NA_character_))
 
ggplot(sa_yoy_ncy_crunched,#|> filter(category == "pc_retained"),
    aes(x = ncy,
        y = pc_retained,
        colour = year,
        group = year,
        label = label)) +
  geom_point() + geom_line() +
  geom_label_repel() +
  scale_y_continuous(labels = scales::percent)+
  labs(title = "Severely absent children - year on year retention rate by NCY",
       subtitle = "Secondary schools only; of children severely absent for the year, the % who were also severely absent the previous year",
   caption = "data from Capita One") +
  theme(plot.title = element_markdown(size = 12),
       legend.position = "none", axis.title = eb)

7.2 Severe absence - trajectory analysis

Following to the question of turnover & retention, we considered the trajectories of pupils with severe absence. We carried out sampling work on those with severe absence during secondary schools, in order find patterns or groups.

We categorised around 1400 such pupils into one of 7 categories:

  • steady decline
  • dipper (one or two years of severe absence followed by a return to normal levels)
  • persistently severely absent (over 50% of available years were severely absent)
  • zero & out (normal absence levels followed by a year at zero attendance and no further available data)
  • severely absent in y11 only
  • severely absent in y7 only

This work is described in more detail in a separate short report [LINK], but the main takeaways were:

  • boys were overrepresented in all severely absent groups
  • the “zero & out” group are likely an artefact of data & recording, resulting mostly from pupils leaving the city but remaining on roll for one or more terms.
  • those persistently severely absent were more likely to have

8 Daily attendance patterns

The analysis so far in this report has used data aggregated up to the half term or annual level. During the course of this project we processed the raw daily data (recorded as a string of symbols and codes) to allow analysis of attendance at the level of the individual day.

8.1 Week day

Fridays,(to a lesser extent Mondays) see significantly lower attendance than the other days of the week.

NOTE - in this render the day level data is not available

Code
ggplot(attend_weekday_phase |> 
            filter(!week_day %in% c("Sat","Sun")),
       aes(x = week_day,
           y = percent_present,
           #fill = phase,
           label = scales::percent(percent_present, accuracy = 0.1L))
       ) +
  geom_col(fill = "steel blue", position = "dodge") +
  geom_text(position = position_dodge(0.9), colour = "white", size = 3, vjust = 1.5, fontface = "bold")+
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Attendance by day of the week",
    subtitle = "percentage of sessions attended since 2016; all Sheffield pupils",
    caption = "data from Capita One") +
  theme(plot.title = element_markdown(size = 12),
        legend.position = "none", axis.title = eb,
        strip.background = eb, axis.text.y = eb) +
  facet_grid(cols = vars(phase)) +
  barplottheme_minimal

Looking at a time series, we see that Friday’s lower attendance is nothing new, and the gap has not really changed over time: NOTE - in this render the day level data is not available

Code
ggplot(attend_weekday_phase_year |> 
         filter(year <= 2023) |> 
         mutate(label = if_else(year == max(year), week_day, NA_character_)),
       aes(x = year,
           y = percent_present,
           colour = week_day,
           group = week_day,
           label = label)) +
         geom_point() +
         geom_line() +
         geom_label_repel(aes(x = 2023.5),
                          size = 2,
                          vjust = 1,
                          min.segment.length = Inf) +
  facet_wrap(vars(phase), scales = "free_y", nrow = 2) +
  scale_x_continuous(limits = c(2018,2025), breaks = seq(2018,2023)) +
  theme(legend.position = "none", axis.title = eb, strip.background = eb,
        strip.placement = "top")+
  labs(title = "Attendance by weekday & year")

8.2 School attendance across the year

The day level data allows us to visualise an entire school year. Here we see how key points in the year and particular dates impact on school attendance. When the data are aggregated to the term level, there is very little seasonal variation, but differences at the day level are more dramatic than the differences we see between demographic groups.

In particular, we can see the impacts of:

  • the first and last days of term
  • a growing absence rates up towards Christmas
  • a wave of teachers’ strikes
  • heavy snowfall in March
  • Eid
  • the days immediately after bank holidays
  • study leave
  • increasing absence through the final summer term NOTE - in this render the day level data is not available
Code
ggplot(day_2023 |> 
         filter(half_term != 0,
                date != as_date("2023-05-01")) |> 
       mutate(label = case_when(
         date == as_date("2022-12-12") ~ "week before Christmas",
         date == as_date("2023-03-10") ~ "heavy snowfall",
         date == as_date("2023-02-01") ~ "teachers strike",
         date == as_date("2023-02-28") ~ "teachers strike",
         date == as_date("2023-03-16") ~ "teachers strikes",
         date == as_date("2023-04-21") ~ "Eid al-Fitr",
         date == as_date("2023-06-21") ~ "study leave",
         date == as_date("2023-06-28") ~ "Eid al-Adha",
         date == as_date("2023-07-21") ~ "end of term",
         TRUE ~ NA_character_)),
       aes(x = date,
           y = 1 - percent_present,
           fill = 1 - percent_present,
           label = label
           )) +
  geom_col() +
  geom_text_repel(fontface = "italic", nudge_x = -1, size = 3, nudge_y = 0.02, colour = "gray40") +
  scale_y_continuous(labels = scales::percent) +
  theme(legend.position = "none", strip.background = eb, axis.title = eb,
        #axis.text.x = element_text(angle = 90), 
        axis.line = eb, axis.ticks = eb, strip.text = element_text(size = 12)) +
  scale_x_date(date_labels = "%d-%b") +
  scale_fill_viridis_c(option = "viridis", direction = -1) +
  facet_wrap(vars(half_term_name), scales = "free_x") +
  labs(title = "School absence in Sheffield Schools - a full academic year - 2022/23",
       subtitle = "each bar = 1 day; % of available sessions attended; all schools & all pupils")

Recreating the same plot for absences coded as illness (though this time showing the count of sick days rather than the % of available sessions) shows how rates increased dramatically through the run up to Christmas, peaks on Fridays (and to a lesser extent Mondays) throughout the year, and a significantly lower rate in the summer. NOTE - in this render the day level data is not available

Code
ggplot(attend_daily |> filter(year == 2023, time_category == "term time"),
       aes(x = date,
           y = illness,
           fill = illness)) +
  geom_col() +
  theme(legend.position = "none", strip.background = eb, axis.title = eb,
        axis.text.x = element_text(angle = 90)) +
  scale_x_date(date_labels = "%d-%b") +
  scale_fill_viridis_c(option= "mako",direction = -1) +
  facet_wrap(vars(half_term_name), scales = "free_x") +
  labs(title = "Daily illness in Sheffield Schools - 2022/23",
       subtitle = "Each bar = 1 day; count of sessions marked code I; all schools & all pupils")

The day level no reason plot shows a similar shape to the illness plot, which suggests that at least some of the no reason absences are explained by genuine sickness. NOTE - in this render the day level data is not available

Code
ggplot(attend_daily |> filter(year == 2023, time_category == "term time"),
       aes(x = date,
           y = no_reason,
           fill = no_reason)) +
  geom_col() +
  theme(legend.position = "none", strip.background = eb, axis.title = eb,
        axis.text.x = element_text(angle = 90)) +
  scale_x_date(date_labels = "%d-%b") +
  scale_fill_viridis_c(option= "magma", direction = -1) +
  facet_wrap(vars(half_term_name), scales = "free_x") +
  labs(title = "Absence with no recorded reason in Sheffield Schools - 2022/23",
       subtitle = "Each bar = 1 day; count of sessions coded N or O; all schools & all pupils")