Attendance in Sheffield Schools

Author

Giles Robinson

Published

February 10, 2025

1 Introduction

1.1 Background & scope

This work was undertaken by the Sheffield City Council Business Intelligence team from around September 2023. New analysis was carried out on available data with the aim of understanding school attendance Sheffield and informing the requirements of the city’s response. This report summarises the findings of that analysis, along with commentary derived from discussions of those findings with colleagues in SCC, Learn Sheffield and from Sheffield schools.

This report covers the following:

recent trends
benchmarking and comparisons
key drivers of absences
demographic differences:
age
gender
ethnicity
geography & deprivation
distance to school
young carers
severe absence (<50% attendance)
the absence patterns of annual year cohorts
day level data analysis - mapping out a school year

Within the same analysis but out of scope of this report are:

special educational needs (this is covered in depth in the SNA report)
the performance of individual schools
exclusions
the reach and effectiveness of existing teams, services and interventions

some terms & definitions

Unless otherwise stated, absence refers to both authorised and unauthorised absences. Correspondingly, attendance refers to registered time in the classroom. Absence in this report may include periods of study leave, approved offsite activity

Unless otherwise stated, the word year refers to the academic or exam year. So 2023 refers to the period of schooling between September ’22 and July ’23.

1.2 Data sources & processing

Attendance, exclusion and school registration data and student details used in this report are from Capita One, retrieved from the OSCAR database, which is maintained by the Performance & Analysis Service (PAS). Supplementary information on school types and locations, geography & deprivation are held in spreadsheets.

An R script gathers, combines, processes and aggregates this data into a data model. That data model was last updated 16/8/24 to include the first release of the full year 2024 attendance data.

1.3 Release notes

1/7/24 - Giles Robinson. First complete draft for circulation.
16/8/24 - GR - updated with latest data, full 2024 academic year, various revisions, analysis of daily data; young carers
9/5/24 - GR - significant update with data now available up to Easter 2025.

2 Trends

Recent changes in overall attendance, by the major reasons covered by Department for Education (DfE) absence codes. We also discuss some codes that do not count as absences, but contribute to the picture around attendance, such as late present.

2.1 Overall attendance

The COVID-19 pandemic and lockdowns saw a significant drop in attendance rates - although many of these trends beggan before the pandemic. Secondary age pupils were affected more than those in primary. In 2024 a gap continued to grow, with primary school attendance improving but worsening in secondary (though this was, in part, a result of the return of study leave as a coded reason, accounting for around 1% of secondary absences).

Important

At the time of writing (May 2025), there has been a significant shift into the current year. Primary school attendance continues to improve, while secondary has improved sharply on the previous year.

Our data prior to 2018 is less reliable and less complete, but taking a longer view suggests that at least some of the drivers of recent trends predate the pandemic. Attendance was improving to a peak in 2016, and was gradually dropping away from there, particularly in secondary schools. Things were getting worse before COVID, but the pandemic changed everything - despite recent improvements, attendance in Secondary schools remains well below pre-pandemic levels.

2.2 Illness

The recorded data on illness shows an increase year on year. The big rise into 2022 particularly affected primary age children, and was probably a mixture of COVID-19 itself and post-lockdown viral bounce-back. Illness rates increased slightly into 2025.

Caution

Patterns in the day level data, and feedback from head teachers suggests that we are probably not seeing the true picture on illness. Differences in reporting (and the honesty of parents), policy and recording may be as significant here as changes in actual illness.

recomendation

The data on illness is worth monitoring in 2025 and considering in relation to schools’ recording policies, particularly at the day level, and in proximity to bank holidays and half term holidays. It’s possible that the new DfE rules around penalty notices and family holidays will create perverse incentives to increase reported illness rates, especially as we head into the 2025 summer term. It might be detectable in the day level data (see later in this report).

2.3 Lateness

Head teachers report that lateness, even if marked as late present can have a significant can impact on activies that are regularly done first thing in the day, such as phonics. Lateness can be recorded as late present or late absent, the latter meaning that the child attends only after the registers have closed. Both categories are on the rise, with late absence in primary schools in particular growing problem. In secondary schools, late present is more common - and has been rising in recent years.

Late absent (after registers closed) make up a much smaller percentage of available sessions, but in secondary schools this is dramatically up in 2025 (note that this data does not yet include the summer term):

2.4 Family holidays

Absences for family holidays are higher in primary, but have risen in secondary also. The cost of living crisis likely plays a part here. Rates appear lower in 2024, but at the time of writing the summer term is missing from the 2024 data, which is excluded from the plot below. Family holidays can be authorised or unauthorised, but due to differences in recording and coding policy between schools, both are grouped together here.

New DfE guidance and harsher penalties around family holiday absences came into effect August 2024, and the chart above appears to show a significant reduction, though this is mostly due to the summer term data is not yet (as of May ’25) being available. Breaking this down by term shows two things:
- in all previous years (COVID period aside, and the lockdown years have been removed here), the summer term has been the time when most holidays are taken - so far in 2025, the impact of the new rules is apparent in the autumn and spring term data - though the impact appears small, and levels are still well above pre-COVID years. The real test of this policy will be the rates in the summer term

2.5 Exclusions

Exclusions have risen very rapidly, particularly in secondary schools. This is mostly driven by temporary suspensions, largely as schools clamp down on what is classified as persistent disruptive behaviour. This makes only a small contribution to overall absence rates, but is growing, and for some children is a major contribution to their overall school absence. Exclusion rates in 2025 (part year at the time of writing) look to have levelled off.

2.6 Study leave

2024 saw seen the return of study leave as a coded absence reason, with a significant impact on overall attendance levels in secondary and particularly y11. At the time of writing the 2025 data does not include the summer term and shows only minimal study leave.

2.7 No reason

The plot below comprises two DfE codes. Code N is intended as a placeholder until schools can establish a reason for absence, and code 0 is for unknown or other circumstances. Here both are grouped together - though the bulk is code O.

Important

The increase in no reason absences levelled off into 2024 and is down dramatically 2025. Some of this is likely due to the changes to recording from September 2024. Even so, the no reason category remains the biggest contributor to overall absence rates in secondary, and the increase in no reason absences are the biggest contribution to the post-pandemic rise in absences. Furthermore, no reason absences are significantly more prevalent in more deprived areas of the city, where attendance in general is poorer.

We can draw two possible conclusions from this: parents and children are not reporting the true reasons for absence, and the DfE codes are no longer suitable for capturing those reasons. In either case, this represents a serious blind spot in the data.

recomendation

Analysis of recorded case notes and text on Capita One, along with interviews or surveys of pupils, teachers, parents or community groups may help to understand the stories behind these no reason absences

Finally, the two charts below summarise the contributions of each of these coded absence reasons to the overall absence picture, during 2023-2024:

3 Demographics

Looking at how attendance varies with age, gender and ethnicity, and how this picture is changing over time.

3.1 Age

Absence is little higher in Y1 and Y2 when children are very young, and is level through primary. The transition to secondary school is associated with a big increase in absence, which continues year on year up to Y11. As we’ll see later on - this transition drop into Y7 and subsequent decline is more severe for groups with particular risk factors.

Note

The ImpactEd report Understanding Attendance - Report 1 identified an emerging trend of a jump in absence between Y7 and y8. The Sheffield data does not support this, with the increase from Y7 to Y8 looking broadly the same - around 1% increase in absence - as any other year on year increase within secondary years.

Looking at trends over time for primary school years, we see that the youngest and oldest primary age children were most affected. There are encouraging signs of recovery among all primary years into 2024, and particularly in Y1.

In secondary schools, we can see how disproportionately affected children in Y11, and encouraging signs of recovery in years 7 and 9. It is worth noting that the children in years 10 and 11 in 2024 were those who had their crucial Y6 and y7 transition years disrupted by the pandemic.

The drop off in Y11 is driven in part driven by study leave in 2024; this is yet to occur in the 2025 year

These trends will be explored in more detail in the Trends by annual cohort section later in this report.

3.2 Gender

Looking at overall school attendance since 2021, girls attend slightly better than boys, a difference of about 0.5%.

The gender time series show boys and girls moving in lockstep through primary school, separated by about half a percentage point:

In secondary we see boys’ attendance overtaking girls in the aftermath of the pandemic, but all continuing to decline into 2024.

Looking at age, gender and deprivation together, we see the pattern reversed in older children. In poorer wards of the city, girls consistently attend better than boys across all ages. In the most affluent wards, this is reversed in older children, with a gender gap widening from Y8 onwards, where boys have higher attendance.

3.3 Ethnicity

The ethnic makeup of Sheffield’s population continues to change, and there are differences in attendance rates between children in different ethnic groups. Here we summarise the data around ethnicity.

Caution

The ethnic groups and subgroups used in this analysis are those available the Capita One source data. These don’t necessarily align with the groupings used by ONS for census data, other organisations, or in other SCC data and reporting

With the caveat that data prior to 2018 may not be wholly complete, the attendance data allows us to look at a long term view of changes in the ethnic makeup of the Sheffield school population. Note the free y-axis scales on the following chart, means that the lines are not directly comparable:

Pupils and attendance in Sheffield by ethnicity description
pupils on roll in 2023/24; data from School Census & Capita One attendance records
	Total			Primary			Secondary
	count	% of pupils	% absent 2023/24	count	% of pupils	% absent 2023/24	count	% of pupils	% absent 2023/24
all children	73154	100.0%	8.5%	40342	55.1%	6.1%	32821	44.9%	11.7%
White British	41229	56.4%	8.5%	22372	54.3%	5.6%	18859	45.7%	12.0%
Black African and White/Black African	6223	8.5%	5.1%	3616	58.1%	4.1%	2607	41.9%	6.5%
Pakistani	5522	7.5%	8.2%	3133	56.7%	7.1%	2390	43.3%	9.7%
Any Other Ethnic Group	3144	4.3%	8.4%	1767	56.2%	6.9%	1379	43.8%	10.3%
Any Other White Background	2763	3.8%	9.3%	1546	56.0%	7.1%	1217	44.0%	12.1%
White/Black Caribbean	1971	2.7%	12.6%	1098	55.7%	8.7%	874	44.3%	17.9%
Other Asian Background	1863	2.5%	7.2%	1089	58.5%	6.2%	774	41.5%	8.7%
Gypsy, Roma and Traveller of Irish Heritage	1696	2.3%	21.2%	881	51.9%	16.0%	817	48.1%	27.0%
White/Asian	1679	2.3%	8.5%	958	57.0%	6.4%	722	43.0%	11.6%
Any Other Mixed	1623	2.2%	9.1%	934	57.5%	6.7%	689	42.5%	12.5%
not known	1443	2.0%	12.4%	607	42.1%	7.3%	836	57.9%	16.2%
Indian	1278	1.7%	6.1%	866	67.8%	5.8%	412	32.2%	6.7%
Bangladeshi	830	1.1%	8.1%	476	57.3%	7.1%	354	42.7%	9.6%
Any Other Black Background	773	1.1%	6.2%	450	58.2%	5.1%	323	41.8%	7.8%
Chinese	647	0.9%	4.1%	329	50.9%	3.5%	318	49.1%	4.7%
Black Caribbean	367	0.5%	9.1%	169	46.0%	5.8%	198	54.0%	12.1%
Irish	103	0.1%	8.3%	51	49.5%	4.8%	52	50.5%	12.0%

4 Geography & deprivation

There are many ways to divide up the city geographically, but we’ll look at the 28 wards, and in particular their deprivation as measured in the 2019 Indices of Multiple Deprivation (IMD) scores. More recent (and older) measures of deprivation may be available, but the analysis is broadly the same.

4.1 Attendance by ward

The table below shows overall attendance by ward of residence during 2023-24.

Pupils in Sheffield, by ward of residence
pupils on roll & attendance in 2023/24; data from School Census & Capita One attendance records
	Total			Primary			Secondary
	count	% of children	% absent 2023/24	count	% of children	% absent 2023/24	count	% of children	% absent 2023/24
Sheffield	73154	100.0%	8.5%	40342	55.1%	6.1%	32821	44.9%	11.7%
Burngreave	5720	7.6%	10.5%	3052	53.3%	8.1%	2672	46.7%	13.3%
Firth Park	4217	5.6%	9.5%	2369	56.2%	6.9%	1848	43.8%	13.0%
Darnall	4020	5.3%	10.2%	2368	58.9%	7.7%	1652	41.1%	14.2%
Manor Castle	3868	5.1%	9.5%	2188	56.6%	6.4%	1680	43.4%	13.8%
Shiregreen & Brightside	3656	4.8%	9.8%	1984	54.3%	6.7%	1673	45.7%	13.5%
Southey	3560	4.7%	10.7%	1975	55.5%	7.7%	1586	44.5%	14.4%
Ecclesall	3205	4.2%	4.7%	1727	53.9%	3.4%	1479	46.1%	6.3%
Gleadless Valley	3187	4.2%	9.7%	1787	56.1%	7.6%	1400	43.9%	12.6%
Nether Edge & Sharrow	2876	3.8%	6.6%	1603	55.7%	5.1%	1273	44.3%	8.5%
Park & Arbourthorne	2750	3.6%	9.9%	1555	56.5%	7.1%	1195	43.5%	14.4%
Beauchief & Greenhill	2702	3.6%	9.0%	1532	56.7%	6.8%	1171	43.3%	12.0%
Richmond	2553	3.4%	9.2%	1430	56.0%	6.5%	1124	44.0%	12.7%
Dore & Totley	2419	3.2%	5.3%	1340	55.4%	3.6%	1079	44.6%	7.3%
Woodhouse	2336	3.1%	9.9%	1320	56.5%	6.7%	1016	43.5%	14.0%
Hillsborough	2310	3.1%	7.8%	1275	55.2%	5.1%	1035	44.8%	11.6%
Stannington	2280	3.0%	6.8%	1234	54.1%	4.2%	1046	45.9%	10.0%
Walkley	2224	2.9%	7.3%	1366	61.4%	5.3%	858	38.6%	10.6%
West Ecclesfield	2168	2.9%	8.5%	1142	52.7%	5.5%	1026	47.3%	11.6%
Stocksbridge & Upper Don	2160	2.9%	8.8%	1134	52.5%	5.3%	1026	47.5%	12.7%
Birley	2041	2.7%	9.6%	1101	53.9%	6.3%	940	46.1%	13.6%
Graves Park	1988	2.6%	6.0%	1097	55.2%	4.3%	891	44.8%	8.1%
East Ecclesfield	1975	2.6%	7.5%	1032	52.3%	5.5%	943	47.7%	9.6%
Beighton	1913	2.5%	8.1%	1056	55.2%	6.2%	857	44.8%	10.7%
Crookes & Crosspool	1892	2.5%	6.1%	972	51.4%	3.9%	920	48.6%	8.4%
Fulwood	1733	2.3%	5.6%	933	53.8%	3.6%	800	46.2%	8.1%
Mosborough	1724	2.3%	8.1%	979	56.8%	6.1%	745	43.2%	11.1%
Broomhill & Sharrow Vale	1459	1.9%	6.1%	877	60.1%	4.9%	582	39.9%	8.1%
City	705	0.9%	9.2%	464	65.8%	8.2%	241	34.2%	11.4%

4.2 Economic deprivation

These ward level attendance figures line up neatly with deprivation indicators. Plotting attendance against the 2019 Indices of Multiple Deprivation (IMD) scores shows a tight correlation.

Caution

Since school attendance figures one of the input variables to the IMD scores, there is some circular logic at work here. Even so, attendance is only one of 39 inputs, so this analysis is worth pursuing.

The link to deprivation has always been there but is stronger today - recreating the chart above with 2010 attendance and IMD scores shows a weaker relationship.

The link to deprivation less evident in primary schools, but stronger in secondary schools, and the gap between primary and secondary attendance widens in poorer areas of the city.

This longer term view below compares the trend in attendance between the top and bottom quartiles of the ward level deprivation scores, at the half-term level with a trend-line. The middle two quartiles are excluded from this plot. The gap between the most and least deprived areas narrowed towards the peak attendance rate in 2016, so gains were disproportionately made in poorer areas, but the most deprived quartile then falls away more rapidly since the pandemic.

Finally here, since it’s not so easy to read from the above charts we we can look at the change in the difference in attendance between the most and least deprived quartiles of the city. Plotting this reveals that although attendance is increasing in both primary & secondary, and across all levels of deprivation, the gap between the most and least deprived quartiles of the city is reducing in primary schools, but continues to grow in secondary:

The age profile by deprivation quartile shows how children in poorer areas have a steeper drop off through secondary school. Children in the most affluent 25% of wards attend better across all years, but show a more significant dropoff into Y11. Could study leave be a factor here?

4.3 Free School Meals

Free School Meal (FSM) status is perhaps a better indicator of socio-economic status of children than ward of residence, since it is means tested at the family level.

Pupils in Sheffield, by free school meal status
count of pupils on roll in 2023/24; data from School Census & Capita One attendance records
	Primary			Secondary
	count	% of children	avg % absent (2023)	count	% of children	avg % absent (2023)
0	26477	65.6%	4.6%	21639	65.9%	8.4%
1	13865	34.4%	8.9%	11182	34.1%	17.9%
total	40342	55.1%	6.1%	32821	44.9%	11.7%

More concerning are the exclusion rates for children with Free School Meals, which are rapidly diverging from those without.

4.4 Distance to school

We used the postcodes of each child’s home address and school location to calculate a measure of straight-line distance between the two.

Attendance is significantly better, on average for children who live closer to school. Children living very close to school (<100m) attend about 1.5% better on average in Primary. For secondary schools this difference is 2.3%. Conversely,

Plotting the average distance travelled against average attendance rates for secondary schools reveals four groupings:

on the right are two specialist facilities - UTC Sheffield & UTC Sheffield Olympic Legacy Park) and two catholic schools - All Saints and Notre Dame. All of these may incentivise pupils to travel further than normal.
the main bunch of schools in the middle seems to show a linear relationship between distance and attendance. Though this relationship is weak, and relies on us discarding the outliers (more on these below), and may not be a causal relationship.
Outlying this group above, Mercia, Tapton and High Storrs schools, are all in affluent areas of the city, and show higher attendance with average distance travelled
Below this group Chaucer school shows average distance travelled and below average attendance. Though, as we’ll see below, the average distance travelled disguises some significant differences.

Plotting the distance travelled against attendance at the child level reveals further differences. In the plot below we take one example from each of the four groups described above.

We can think of dividing these plots into four quadrants: distance quadrants

Notre Dame High has good attendance across the board, which varies regardless of the distance travelled. Mercia has excellent attendance, and a limited distance travelled, presumably due to it’s oversubscription and high demand, with most datapoints appearing in the top left. The trend line points slightly down, as a few children who live further away have lower attendance.
Meadowhead has typical average values for both attendance and distance, appearing in the middle of the pack in the plot above. Most children attend well and those with poorer attendance generally live close by - there are few in the bottom right. Chaucer by contrast has a small but significant number of points in the bottom right quadrant - those who attend very poorly and live far away. Some of this may be explained by families failing to secure a place at closer schools, and being placed across the city, with the distance then contributing to poor attendance.

5 Young carers

It is difficult to establish the true number of young carers in the city - and perhaps dependent on definitions & methods. A 2023 all party parliamentary group (APPG) for young carers and adult carers report cites several sources:

1.6% of pupils (2021 Census)
0.5% of pupils (2023 school census) Though it places little confidence in these first two, preferring the estimates of two surveys:
10% of all pupils provide high or very high levels of care (BBC / University of Nottingham)
13% of pupils surveyed (COVID Social Mobility & Opportunities study)

Applying the 10% figure to Sheffield’s pupil population would indicate over 7000 young carers in the city. Our local data identifies just 904 since 2020, so we provide the analysis here with the following caveat:

data on young carers

The data used in this section of the report comes from young carer type involvements in capita one, covering around 900 children from 2020 onwards. Clearly our data doesn’t capture all young carers (and may skew towards those at the more severe end of the caring spectrum) and/or we are working with different definitions of what a young carer is. Issues with getting people of all ages to self-identify as carers are well known, and the perceived stigma attached to caring roles is likely more acute in young people - indeed this is probably a factor in explaining differences in school attendance.

The involvements have an open date, but no close date, so a time series analysis of volumes isn’t possible, and also that the data implicitly assumes that a young carer remains so for the rest of their school career.

A descriptive of demographic analysis may also be misleading, but we can make a comparison of attendance rates, which shows a significant impact. Primary age young carers attend just under 4% less that those without a caring role. In secondary school this gap rises to 10%:

As we did for deprivation quartiles above, we can create an age profile of attendance for young carers, and compare it to pupils with no caring role. Again we see the greater impact on attendance as age increases, and presumably the expectations and stigmatisation around caring roles also increases. There is a particular drop in attendance going into year 8.

Along with other groups, the attendance of young carers improved into 2025.

Note that some of the decline seen effect here may be a function of the cumulative nature of the data, which has no end dates attached, so our cohort of young carers is ageing in in the system

recomendation

Better long term data is required to understand volumes, impacts & the geographical distribution of young carers, as well as change over time and the provision of services to young carers.

6 Trends by annual cohort

He we show how attendance is changing for each annual year group cohort of children, and explore some of the intersectionality between age, deprivation and special educational needs. This analysis particularly demonstrates differences in how the COVID pandemic, lockdowns and subsequent societal shifts have affected different groups.

Annual cohorts of children are referred to here as, for example, the “class of 2025” meaning the year group who began year 1 in September 2014 and will complete Y11 in July 2025. In each case there is a separate small line chart for each annual cohort. Data are labelled with the academic year and the % attendance rates, and the time period is divided into three phases: pre pandemic, during (2020 & 2021), and post pandemic - all years since. The time periods are denoted by colours or shapes, depending on the chart.

The first chart shows the overall picture in secondary schools. The first cohort shown here is the class of 2020, who completed most of Y11 before the pandemic struck, their GCSE exams were wildly disrupted, but their attendance follows only a shallow decline from Y7 through to Y11, while the classes of ’23 to ’25 (on the middle row), saw dramatic drops during the COVID years, and a continued decline in the period since. The classes of ’24 and ’25 were perhaps worse hit by the pandemic, effectively missing Y6-7 and Y7-8 respectively. Finally, the bottom row shows the latest three cohorts and some small but encouraging signs of recovery: the class of ’27 have less of a drop off to Y8, and the class of ’28 had the best attendance in Y7 since before the pandemic.

The picture in primary schools looks very different. Children generally attend better in years 2 to 4 than they do in Y1, so the underlying profile is more of a hump than a steady decline seen in secondary. The pandemic had a less dramatic effect on primary age children, and the decline also persisted into the post-pandemic years for many cohorts. However the big difference here, and an encouraging sign for the future, is that all cohorts from the class of ’29 onwards show improvements in recent years (here coloured blue), and that the youngest cohorts are showing the fastest improvements of all.

Re-creating the same plot but split by deprivation quartile, it becomes clear how the effects of the pandemic were concentrated in the more deprived areas of the city. Here the middle two quartiles of deprivation have been removed, and the pairs of lines show the most and least deprived quartiles of the school population, according to the 2019 indices of multiple deprivation scores of their ward of residence.

For all annual cohorts, the gap is stark, children living in more deprived areas were worse affected during the pandemic and have seen worse post-pandemic declines in attendance. If there is good news here, it is a narrowing of the gap in the latest Y7 intake.

Repeating the same deprivation analysis for primary, and again we see how the pandemic disproportionately affected children in more deprived areas, with steeper dropoffs during the lockdown years. But we can also see recovery after the pandemic, for all cohorts and with steeper rates of increase for children in more deprived areas - but the deprivation gap still remains.

7 Severe absences

Children are classed as severely absent if they miss over 50% of available sessions in any given period. This section explores the characteristics of severely absent children, and how this is changing over time.

Important

Almost 1 in 20 children at Sheffield secondary schools was severely absent in 2023.

Severe absences in secondary schools appear to have peaked in 2024.

Next we look at the severe attendance rates of groups with different characteristics in 2023-24. The groupings here are chosen as those that show significant differences in severe absence rates. Note that the characteristics given here are not mutually exclusive. Children with an EHCP plan were nearly 8% more likely to be severely absent than average. Children in Y11 have twice the average rate.

All primary years, and a few ethnic groups have significantly lower severe absence rates.

The chart above shows relative severe absence rates of different groups, but we’ll complement that by quantifying the cohort of severely absent pupils in 2023 by their characteristics.

7.1 Severe absence - turnover and retention

It seems likely that there are children for whom severe absence is for some reason a persistent behaviour, and children for whom a severe absence happens in one or more years for some specific reason - like a crisis of health or personal circumstances. To try to understand this, we looked at year on year turnover and retention in the cohort of severely absent children.In the chart below, severely absent children are classed as retained if they were also severely absent the year before, and new if not. Both categories have risen in recent years:

So the problem of severe absence is, in part, due to a cohort we could describe as chronically severely absent.

The retention rate here is calculated as the percentage of all severely absent pupils in a given year that were also severely absent the year before. In secondary schools, in 2023, this was around 40% of children who were severely absent in 2023 were also severely absent in 2022.

This retention rate has risen in recent years:

Plotting the retention rate by NCY shows increased year on year retention as children grow older. Here we’ve included the NCY profiles of two years: 2018 and 2024, showing the increased retention rates across the board into 2024.

8 Daily attendance patterns

The analysis so far in this report has used data aggregated up to the half term or annual level. During the course of this project we processed the raw daily data (recorded as a string of symbols and codes) to allow analysis of attendance at the level of the individual day.

8.1 Week day

Fridays,(to a lesser extent Mondays) see significantly lower attendance than the other days of the week.

Looking at a time series, we see that Friday’s lower attendance is nothing new, and the gap has not really changed over time:

8.2 School attendance across the year

The day level data allows us to visualise an entire school year. Here we see how key points in the year and particular dates impact on school attendance. When the data are aggregated to the term level, there is very little seasonal variation, but differences at the day level are more dramatic than the differences we see between demographic groups.

In particular, we can see the impacts of:

the first and last days of term
a growing absence rates up towards Christmas
a wave of teachers’ strikes
heavy snowfall in March
Eid
the days immediately after bank holidays
study leave
increasing absence through the final summer term

Here is the same chart for the 2023-24 year:

Recreating the same plot for absences coded as illness (though this time showing the count of sick days rather than the % of available sessions) shows how rates increased dramatically through the run up to Christmas, peaks on Fridays (and to a lesser extent Mondays) throughout the year, and a significantly lower rate in the summer. There are also spikes in illness on the last day of each half term (except the summer). This is the plot for 2024 but the pattern is very similar in other years.

The day level no reason plot shows a similar shape to the illness plot. We could read this as suggesting that at least some of the no reason absences are explained by genuine sickness. Although the major spikes here on the last days of term may be due to unrecorded family holidays or other absences.

It’s worth comparing the 2023 and 2024 plots for no reason absences. As well as reduced levels of no reason absences throughout the year, 2024 sees much less seasonal variation - such as the steady build up to Christmas - although the end of term spikes are more pronounced.

9 Conclusion

School attendance is affected by a multitude of factors: age, economic deprivation, special educational needs, caring responsibilities, the culture of individual schools, the attitude of families and ultimately the children themselves. Factors associated with lower attendance are intersectional and compound each other.

The pandemic dominates the recent history of school attendance (and much else besides). COVID-19 lockdowns, social distancing and school closures were all surely transformative in cultural attitudes to school attendance, and the impacts were felt differently in different places. However, it would be a mistake to place too much emphasis on COVID-19 alone - deprivation & the cost of living; the rise of smartphones and social media; changes around special educational needs (both prevalences and attitudes) - these are all surely factors, many of which will have influenced one-another. Much of this is not recorded in the available data, and the interactions between these forces will be complex.

The good news is that despite the widespread risk factors identified here and despite recent social and cultural shifts, school attendance is recovering. Encouragingly, this recovery is strongest among the youngest cohorts of children. Recent changes to recording and the rules appear to be having an impact, but most inequalities persist, and some continue to widen. The coming years will tell if school attendance can recover to levels seen before the pandemic, and if the most vulnerable children can be helped to attend school as well as their peers.

This report is one of several produced under the inclusion & attendance data science project - there are also dedicted reports around Special Educational Needs (strategic needs analysis), the impact and effectiveness of services & interventions, and attendance by early years foundation stage attainment. Please refer to the links at the top of the SCC Data Science site for links to these.

If you have further questions about the data, analysis and narrative in this report please contact the Sheffield City Council Performance & Insight Team, or email giles.robinson@sheffield.gov.uk

--- title: "Attendance in Sheffield Schools" author: "Giles Robinson" date: 2025-02-10 editor: visual format: html: code-tools: true code-fold: true toc: true toc-location: left toc-depth: 4 number-sections: true number-depth: 4 fig-cap-location: top other-links: - text: Back to SCC Data Science site home href: https://scc-data-science.sheffield.gov.uk/ execute: warning: false message: false echo: false knitr: opts_chunk: out.width: "100%" --- ```{r} #| label: setup # clear the environment remove(list = ls()) # load packages library(tidyverse) library(janitor) library(lubridate) library(ggtext) library(ggrepel) library(gghighlight) library(kableExtra) library(MetBrewer) library(corrplot) library(ggcorrplot) #library(shadowtext) library(readxl) library(ggstatsplot) library(geosphere) library(ggridges) library(forecast) library(tsibble) library(gt) library(waffle) library(treemapify) library(waterfalls) # specify data folder data_folder <- str_c("S:/Public Health/Policy Performance Communications/Business Intelligence/Projects/EIP/data/inclusion/") # copy to excel function copy_excel <- function(input) {write.table(input, file = "clipboard-20000", sep = "\t", row.names = F)} # ggplot themes eb <- element_blank() # Set default ggplot theme theme_set( theme_classic() + theme( #plot.title = element_text(), plot.subtitle = element_text(size = 9, face = "italic"), plot.caption = element_text(size = 8, face = "italic"), plot.title.position = "plot", plot.title = element_markdown(size = 12), strip.background = eb ) ) # theme for minimal bar charts barplottheme_minimal <- theme( axis.title.y = eb, axis.line.y = eb, axis.ticks.y = eb, axis.line.x = eb, axis.ticks.x = eb ) gannt_theme <- theme_classic() + theme( plot.title = element_text(size = 12), plot.subtitle = element_text(size = 8, face = "italic"), plot.caption = element_text(size = 8, face = "italic"), plot.title.position = "plot", axis.title = eb, axis.line.y = eb, axis.ticks.y = eb, axis.text.y = eb, legend.position = "right", legend.title = eb, legend.text = element_text(size = 8) ) # Connect to OSCAR database via ODBC #oscar_con <- DBI::dbConnect( # odbc::odbc(), # Driver = "Oracle in OraClient12Home1", # Dbq = "SCPRFLVE", # UID = if (Sys.getenv("oscar_userid") == "") { # rstudioapi::askForPassword("OSCAR User ID") # } else { # Sys.getenv("oscar_userid") # }, # PWD = if (Sys.getenv("oscar_pwd") == "") { # rstudioapi::askForPassword("OSCAR Password") # } else { # Sys.getenv("oscar_pwd") # }, # timeout = 10 #) # Connect to LCS database via ODBC #lcs_con <- DBI::dbConnect( # odbc::odbc(), # Driver = "SQL Server Native Client 11.0", # Server = "shef-biprd-01.syhapp.com, 1438", # Database = "HDM_Local", # UID = if (Sys.getenv("lcs_userid") == "") { # rstudioapi::askForPassword("LCS User ID") # } else { # Sys.getenv("lcs_userid") # }, # PWD = if (Sys.getenv("lcs_pwd") == "") { # rstudioapi::askForPassword("LCS Password") # } else { # Sys.getenv("lcs_pwd") # }, # timeout = 10 #) ## connect to LAS database via ODBC #las_con <- DBI::dbConnect( # odbc::odbc(), # #dsn = "LAS", # Driver = "SQL Server Native Client 11.0", # Server = "shef-biprd-01.syhapp.com, 1436", # Database = "HDM", # UID = if (Sys.getenv("las_userid") == "") { # rstudioapi::askForPassword("LAS User ID") # } else { # Sys.getenv("las_userid") # }, # PWD = if (Sys.getenv("las_pwd") == "") { # rstudioapi::askForPassword("LAS Password") # } else { # Sys.getenv("las_pwd") # }, # timeout = 10 #) # summarising attendance function # this is copied from the attendance & exclusion data model. # any changes made there should be reflected here & vice versa # note that the groupings appear TWICE in this function, once for grouped data and once for the "no grouping" scenario (grouping_vars = "none"). Any changes must be consistent across both. summarise_attendance <- function(input_data, grouping_vars) { ifelse (grouping_vars == "none", { # Aggregate without grouping result <- input_data |> mutate(zero_attendance = if_else(present == 0, 1, 0)) |> summarise(child_count = n_distinct(stud_id, na.rm = TRUE), row_count = n(), possible_sessions = sum(possible_sessions, na.rm = TRUE), present = sum(present, na.rm = TRUE), authorised = sum(authorised, na.rm = TRUE), unauthorised = sum(unauthorised, na.rm = TRUE), missing = sum(missing, na.rm = TRUE), excluded = sum(excluded, na.rm = TRUE), family_holiday_agreed = sum(family_holiday_agreed, na.rm = TRUE), family_holiday_not_agreed = sum(family_holiday_not_agreed, na.rm = TRUE), family_holiday_total = sum(family_holiday_total, na.rm = TRUE), illness = sum(illness, na.rm = TRUE), med_appt = sum(med_appt, na.rm = TRUE), no_reason = sum(no_reason, na.rm = TRUE), late_absent = sum(late_absent, na.rm = TRUE), late_pres = sum(late_pres, na.rm = TRUE), late_total = sum(late_absent, na.rm = TRUE) + sum(late_pres, na.rm = TRUE), study_leave = sum(study_leave, na.rm = TRUE), approved_offsite = sum(approved_offsite, na.rm = TRUE), fixed_exclusions = sum(fixed_exclusions, na.rm = TRUE), perm_exclusions = sum(perm_exclusions, na.rm = TRUE), total_exclusions = sum(total_exclusions, na.rm = TRUE), persistent_absent_count = sum(persistent_absence, na.rm = TRUE), severe_absent_count = sum(severe_absence, na.rm = TRUE), zero_attendance_count = sum(zero_attendance, na.rm = TRUE) ) |> mutate(percent_of_pupils = child_count / sum(child_count, na.rm = TRUE), percent_present = present / possible_sessions, percent_auth_absence = authorised / possible_sessions, percent_unauth_absence = unauthorised / possible_sessions, percent_missing = missing / possible_sessions, percent_family_holiday_agreed = family_holiday_agreed / possible_sessions, percent_family_holiday_not_agreed = family_holiday_not_agreed / possible_sessions, percent_family_holiday = family_holiday_total / possible_sessions, percent_excluded = excluded / possible_sessions, percent_illness = illness / possible_sessions, percent_med_appt = med_appt / possible_sessions, percent_no_reason = no_reason / possible_sessions, percent_late_absent = late_absent / possible_sessions, percent_late_pres = late_pres / possible_sessions, percent_late_total = late_total / possible_sessions, percent_study_leave = study_leave / possible_sessions, percent_approved_offsite = approved_offsite / possible_sessions, pc_of_pupils_persistent_absent = persistent_absent_count / row_count, pc_of_pupils_severely_absent = severe_absent_count / row_count, pc_of_pupils_zero_attendance = zero_attendance_count / row_count ) |> mutate(percent_absent = 1 - percent_present) }, { # Group by specified variables and then summarize result <- input_data |> mutate(zero_attendance = if_else(present == 0, 1, 0)) |> group_by(across(all_of(grouping_vars))) |> summarise(child_count = n_distinct(stud_id, na.rm = TRUE), row_count = n(), possible_sessions = sum(possible_sessions, na.rm = TRUE), present = sum(present, na.rm = TRUE), authorised = sum(authorised, na.rm = TRUE), unauthorised = sum(unauthorised, na.rm = TRUE), missing = sum(missing, na.rm = TRUE), excluded = sum(excluded, na.rm = TRUE), family_holiday_agreed = sum(family_holiday_agreed, na.rm = TRUE), family_holiday_not_agreed = sum(family_holiday_not_agreed, na.rm = TRUE), family_holiday_total = sum(family_holiday_total, na.rm = TRUE), illness = sum(illness, na.rm = TRUE), med_appt = sum(med_appt, na.rm = TRUE), no_reason = sum(no_reason, na.rm = TRUE), late_absent = sum(late_absent, na.rm = TRUE), late_pres = sum(late_pres, na.rm = TRUE), late_total = sum(late_absent, na.rm = TRUE) + sum(late_pres, na.rm = TRUE), study_leave = sum(study_leave, na.rm = TRUE), approved_offsite = sum(approved_offsite, na.rm = TRUE), fixed_exclusions = sum(fixed_exclusions, na.rm = TRUE), perm_exclusions = sum(perm_exclusions, na.rm = TRUE), total_exclusions = sum(total_exclusions, na.rm = TRUE), persistent_absent_count = sum(persistent_absence, na.rm = TRUE), severe_absent_count = sum(severe_absence, na.rm = TRUE), zero_attendance_count = sum(zero_attendance, na.rm = TRUE) ) |> mutate(percent_of_pupils = child_count / sum(child_count, na.rm = TRUE), percent_present = present / possible_sessions, percent_auth_absence = authorised / possible_sessions, percent_unauth_absence = unauthorised / possible_sessions, percent_missing = missing / possible_sessions, percent_family_holiday_agreed = family_holiday_agreed / possible_sessions, percent_family_holiday_not_agreed = family_holiday_not_agreed / possible_sessions, percent_family_holiday = family_holiday_total / possible_sessions, percent_excluded = excluded / possible_sessions, percent_illness = illness / possible_sessions, percent_med_appt = med_appt / possible_sessions, percent_no_reason = no_reason / possible_sessions, percent_late_absent = late_absent / possible_sessions, percent_late_pres = late_pres / possible_sessions, percent_late_total = late_total / possible_sessions, percent_study_leave = study_leave / possible_sessions, percent_approved_offsite = approved_offsite / possible_sessions, pc_of_pupils_persistent_absent = persistent_absent_count / row_count, pc_of_pupils_severely_absent = severe_absent_count / row_count, pc_of_pupils_zero_attendance = zero_attendance_count / row_count )|> mutate(percent_absent = 1 - percent_present) } ) return(result) } # function to create average summary with 95% confidence intervals summarise_avg <- function(input_data) { summarise (input_data, mean.percent_present = mean(percent_present, na.rm = TRUE), sd.percent_present = sd(percent_present, na.rm = TRUE), n.percent_present = n() ) |> mutate(se.percent_present = sd.percent_present / sqrt(n.percent_present), lower.ci.percent_present = mean.percent_present - qt(1 - (0.05 / 2), n.percent_present - 1) * se.percent_present, upper.ci.percent_present = mean.percent_present + qt(1 - (0.05 / 2), n.percent_present - 1) * se.percent_present ) } # percentage function with 95 CI percent_calc <- function(input_data) {input_data |> tally() |> mutate(freq = n / sum(n)) |> mutate( l_ci = freq - (1.96 * sqrt((freq * (1 - freq)) / n)), u_ci = freq + (1.96 * sqrt((freq * (1 - freq)) / n)) )} # mean attendance function with 95% CI presence_mean_calc <- function(input_data) {input_data |> summarise(mean.percent_present = mean(percent_present, na.rm = TRUE), sd.percent_present = sd(percent_present, na.rm = TRUE), n.percent_present = n() ) |> mutate(se.percent_present = sd.percent_present / sqrt(n.percent_present), lower.ci.percent_present = mean.percent_present - qt(1 - (0.05 / 2), n.percent_present - 1) * se.percent_present, upper.ci.percent_present = mean.percent_present + qt(1 - (0.05 / 2), n.percent_present - 1) * se.percent_present) } ``` ```{r} #| label: load data load(str_c(data_folder,"attendance_inclusion_data_model.RData")) ``` # Introduction ## Background & scope This work was undertaken by the Sheffield City Council Business Intelligence team from around September 2023. New analysis was carried out on available data with the aim of understanding school attendance Sheffield and informing the requirements of the city's response. This report summarises the findings of that analysis, along with commentary derived from discussions of those findings with colleagues in SCC, Learn Sheffield and from Sheffield schools. This report covers the following: - recent trends\ - benchmarking and comparisons\ - key drivers of absences\ - demographic differences:\ - age\ - gender\ - ethnicity\ - geography & deprivation\ - distance to school - young carers - severe absence (\<50% attendance) - the absence patterns of annual year cohorts - day level data analysis - mapping out a school year Within the same analysis but out of scope of this report are: - special educational needs (this is covered in depth in the SNA report) - the performance of individual schools\ - exclusions\ - the reach and effectiveness of existing teams, services and interventions ::: callout-note ## some terms & definitions Unless otherwise stated, *absence* refers to both *authorised* and *unauthorised* absences. Correspondingly, *attendance* refers to registered time in the classroom. *Absence* in this report may include periods of study leave, approved offsite activity Unless otherwise stated, the word *year* refers to the academic or exam year. So 2023 refers to the period of schooling between September '22 and July '23. ::: ## Data sources & processing Attendance, exclusion and school registration data and student details used in this report are from Capita One, retrieved from the OSCAR database, which is maintained by the Performance & Analysis Service (PAS). Supplementary information on school types and locations, geography & deprivation are held in spreadsheets. An R script gathers, combines, processes and aggregates this data into a data model. That data model was last updated 16/8/24 to include the first release of the full year 2024 attendance data. ## Release notes 1/7/24 - Giles Robinson. First complete draft for circulation. 16/8/24 - GR - updated with latest data, full 2024 academic year, various revisions, analysis of daily data; young carers 9/5/24 - GR - significant update with data now available up to Easter 2025. # Trends Recent changes in overall attendance, by the major reasons covered by Department for Education (DfE) absence codes. We also discuss some codes that do not count as absences, but contribute to the picture around attendance, such as *late present*. ## Overall attendance The COVID-19 pandemic and lockdowns saw a significant drop in attendance rates - although many of these trends beggan before the pandemic. Secondary age pupils were affected more than those in primary. In 2024 a gap continued to grow, with primary school attendance improving but worsening in secondary (though this was, in part, a result of the return of study leave as a coded reason, accounting for around 1% of secondary absences). ::: callout-important At the time of writing (May 2025), there has been a significant shift into the current year. Primary school attendance continues to improve, while secondary has improved sharply on the previous year. ::: ```{r} #| label: plot overall attendance by year and phase post 2018 #| fig-height: 4 ggplot(attend_year_phase |> filter(phase %in% c("Primary","Secondary"), year >= 2018 ) |> mutate(grey_flag = if_else(year==2025,0,1)) , aes(x = year, y = percent_present, alpha = grey_flag, colour = phase)) + geom_point() + geom_line(linetype = "dashed", alpha = 0.5) + geom_text(aes(label = year), size = 3, vjust = 1.5) + barplottheme_minimal + # Use theme_minimal() instead of barplottheme_minimal scale_y_continuous(labels = scales::percent) + labs(title = "Overall attendance in Sheffield primary and secondary schools", subtitle = "percentage of available sessions marked present per year; 2025 is part year", caption = "data from Capita One") + annotate("text", x = 2020.3, y = 0.9, label = "COVID-19", size = 2.5, hjust = 1.1) + geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") + theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) + MetBrewer::scale_fill_met_d("Egypt")+ coord_cartesian(clip = "off") ``` Our data prior to 2018 is less reliable and less complete, but taking a longer view suggests that at least some of the drivers of recent trends predate the pandemic. Attendance was improving to a peak in 2016, and was gradually dropping away from there, particularly in secondary schools. Things were getting worse before COVID, but the pandemic changed everything - despite recent improvements, attendance in Secondary schools remains well below pre-pandemic levels. ```{r} #| label: plot overall attendance by year and phase all time #| fig-height: 4 ggplot(attend_year_phase |> filter(phase %in% c("Primary","Secondary")) |> mutate(grey_flag = if_else(year==2025,0,1)) , aes(x = year, y = percent_present, alpha = grey_flag, colour = phase)) + geom_point() + geom_line(linetype = "dashed", alpha = 0.5) + geom_text(aes(label = year), size = 3, vjust = 1.5) + barplottheme_minimal + # Use theme_minimal() instead of barplottheme_minimal scale_y_continuous(labels = scales::percent) + labs(title = "Attendance in Sheffield primary and secondary schools", subtitle = "All available data; percentage of available sessions marked present per year; 2025 is part year", caption = "data from Capita One") + annotate("text", x = 2020.3, y = 0.9, label = "COVID-19", size = 3, hjust = 1.1) + geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") + theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) + MetBrewer::scale_fill_met_d("Egypt")+ coord_cartesian(clip = "off") ``` ## Illness The recorded data on illness shows an increase year on year. The big rise into 2022 particularly affected primary age children, and was probably a mixture of COVID-19 itself and post-lockdown viral bounce-back. Illness rates increased slightly into 2025. ::: callout-caution Patterns in the day level data, and feedback from head teachers suggests that we are probably not seeing the true picture on illness. Differences in reporting (and the honesty of parents), policy and recording may be as significant here as changes in actual illness. ::: ```{r} #| label: plot illness by year and phase #| fig-height: 3.5 ggplot(attend_year_phase |> filter(phase %in% c("Primary","Secondary"), year >= 2018) |> mutate(grey_flag = if_else(year==2025,0,1)) , aes(x = year, y = percent_illness, alpha = grey_flag, colour = phase)) + geom_point() + geom_line(linetype = "dashed", alpha = 0.5) + geom_text(aes(label = year), size = 3, vjust = 1.5) + barplottheme_minimal + # Use theme_minimal() instead of barplottheme_minimal scale_y_continuous(labels = scales::percent, limits = c(0,0.04)) + labs(title = "Illness by academic year in primary and secondary schools", subtitle = "percentage of sessions missed per year (code I); 2025 is part year", caption = "data from Capita One") + #annotate("text", x = 2020.3, y = 0.001, label = "COVID-19", size = 3, hjust = 1.1) + #geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") + theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) + MetBrewer::scale_fill_met_d("Egypt")+ coord_cartesian(clip = "off") ``` ::: callout-tip ## recomendation The data on illness is worth monitoring in 2025 and considering in relation to schools' recording policies, particularly at the day level, and in proximity to bank holidays and half term holidays. It's possible that the new DfE rules around penalty notices and family holidays will create perverse incentives to increase *reported* illness rates, especially as we head into the 2025 summer term. It might be detectable in the day level data (see later in this report). ::: ## Lateness Head teachers report that lateness, even if marked as *late present* can have a significant can impact on activies that are regularly done first thing in the day, such as phonics. Lateness can be recorded as *late present* or *late absent*, the latter meaning that the child attends only after the registers have closed. Both categories are on the rise, with late absence in primary schools in particular growing problem. In secondary schools, *late present* is more common - and has been rising in recent years. ```{r} #| label: plot late present by phase #| fig-height: 3.5 late_present <- attend_year_phase |> filter(phase %in% c("Primary","Secondary"), year >= 2018) |> select(year, phase, percent_late_pres) |> rename(value = percent_late_pres) |> mutate( category = "late present", grey_flag = if_else(year==2025,0,1) ) ggplot( late_present, aes(x = year, y = value, alpha = grey_flag, colour = phase)) + geom_point() + geom_line(linetype = "dashed", alpha = 0.5) + geom_text(aes(label = year), size = 3, vjust = 1.5) + barplottheme_minimal + scale_y_continuous(labels = scales::percent) + labs(title = "Lateness (marked present) by academic year in primary and secondary schools", subtitle = "percentage of sessions missed per year (code L); 2025 is part year", caption = "data from Capita One") + theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.text.x = eb, axis.title.x = eb, axis.title.y = eb, strip.background = eb, strip.placement = "top", strip.text = element_text(size = 11)) + MetBrewer::scale_fill_met_d("Egypt") + #facet_grid(rows = vars(category), scales = "free_y") + coord_cartesian(clip = "off") remove(late_present) ``` Late absent (after registers closed) make up a much smaller percentage of available sessions, but in secondary schools this is dramatically up in 2025 (note that this data does not yet include the summer term): ```{r} #| label: plot late absent by phase and lateness type #| fig-height: 3.5 late_absent <- attend_year_phase |> filter(phase %in% c("Primary","Secondary"), year >= 2018) |> select(year, phase, percent_late_absent) |> rename(value = percent_late_absent) |> mutate( category = "late absent", grey_flag = if_else(year==2025,0,1) ) ggplot( late_absent, aes(x = year, y = value, alpha = grey_flag, colour = phase)) + geom_point() + geom_line(linetype = "dashed", alpha = 0.5) + geom_text(aes(label = year), size = 3, vjust = 1.5) + barplottheme_minimal + scale_y_continuous(labels = scales::percent) + labs(title = "Lateness (marked absent) by academic year in primary and secondary schools", subtitle = "percentage of sessions missed per year (code U); 2025 is part year", caption = "data from Capita One") + theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.text.x = eb, axis.title.x = eb, axis.title.y = eb, strip.background = eb, strip.placement = "top", strip.text = element_text(size = 11)) + MetBrewer::scale_fill_met_d("Egypt") + coord_cartesian(clip = "off") remove(late_absent) ``` ## Family holidays Absences for family holidays are higher in primary, but have risen in secondary also. The cost of living crisis likely plays a part here. Rates appear lower in 2024, but at the time of writing the summer term is missing from the 2024 data, which is excluded from the plot below. Family holidays can be authorised or unauthorised, but due to differences in recording and coding policy between schools, both are grouped together here. ```{r} #| label: plot family holidays by year and phase #| fig-height: 3.5 ggplot(attend_year_phase |> filter(phase %in% c("Primary","Secondary"), year >= 2018) |> mutate(grey_flag = if_else(year==2025,0,1)) , aes(x = year, y = percent_family_holiday, alpha = grey_flag, colour = phase)) + geom_point() + geom_line(linetype = "dashed", alpha = 0.5) + geom_text(aes(label = year), size = 3, vjust = 1.5) + barplottheme_minimal + # Use theme_minimal() instead of barplottheme_minimal scale_y_continuous(labels = scales::percent) + labs(title = "Family holidays by academic year in primary and secondary schools", subtitle = "percentage of sessions missed per year (codes F, H & G); 2025 is part year", caption = "data from Capita One") + annotate("text", x = 2020.3, y = 0.001, label = "COVID-19", size = 3, hjust = 1.1, colour = "gray") + geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") + theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) + MetBrewer::scale_fill_met_d("Egypt")+ coord_cartesian(clip = "off") ``` New DfE guidance and harsher penalties around family holiday absences came into effect August 2024, and the chart above appears to show a significant reduction, though this is mostly due to the summer term data is not yet (as of May '25) being available. Breaking this down by term shows two things: - in all previous years (COVID period aside, and the lockdown years have been removed here), the summer term has been the time when most holidays are taken - so far in 2025, the impact of the new rules is apparent in the autumn and spring term data - though the impact appears small, and levels are still well above pre-COVID years. The real test of this policy will be the rates in the summer term ```{r} #| label: calculate attendance by year, term name and school phase attend_year_term_phase <- summarise_attendance(input_data = attend, grouping_vars = c("year","term_name","phase")) |> group_by(phase) |> mutate(label = if_else(year == max(year),term_name,NA_character_)) ``` ```{r} #| label: plot family holidays by term name primary plot_data <- attend_year_term_phase |> filter(phase == "Primary", !year %in% c(2020,2021), year >= 2018) ggplot(plot_data, aes(x = year, y = percent_family_holiday, #alpha = grey_flag, colour = term_name)) + geom_point() + geom_line(data = plot_data |> filter(year<= 2020),linetype = "dashed", alpha = 0.75) + geom_line(data = plot_data |> filter(year>= 2022),linetype = "dashed", alpha = 0.75) + barplottheme_minimal + # Use theme_minimal() instead of barplottheme_minimal scale_y_continuous(labels = scales::percent) + scale_x_continuous(breaks = seq(2018,2025, by = 1)) + labs(title = "Family holidays by academic year Sheffield primary schools, by year and term", subtitle = "percentage of sessions missed per year (codes F, H & G); 2020 & 2021 removed; summer 2025 Summer not yet available", caption = "data from Capita One") + annotate("text", x = 2020.3, y = 0.001, label = "COVID-19", size = 3, hjust = 1.1, colour = "gray") + geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") + theme(plot.title = element_markdown(size = 12), legend.position = "top", legend.title = eb, axis.title.x = eb, axis.title.y = eb) + MetBrewer::scale_fill_met_d("Egypt")+ coord_cartesian(clip = "off") ``` ```{r} #| label: plot family holidays by term name secondary plot_data <- attend_year_term_phase |> filter(phase == "Secondary", !year %in% c(2020,2021), year >= 2018) ggplot(plot_data, aes(x = year, y = percent_family_holiday, #alpha = grey_flag, colour = term_name)) + geom_point() + geom_line(data = plot_data |> filter(year<= 2020),linetype = "dashed", alpha = 0.75) + geom_line(data = plot_data |> filter(year>= 2022),linetype = "dashed", alpha = 0.75) + barplottheme_minimal + # Use theme_minimal() instead of barplottheme_minimal scale_y_continuous(labels = scales::percent) + scale_x_continuous(breaks = seq(2018,2025, by = 1)) + labs(title = "Family holidays by academic year Sheffield secondary schools, by year and term", subtitle = "percentage of sessions missed per year (codes F, H & G); 2020 & 2021 removed; summer 2025 Summer not yet available", caption = "data from Capita One") + annotate("text", x = 2020.3, y = 0.001, label = "COVID-19", size = 3, hjust = 1.1, colour = "gray") + geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") + theme(plot.title = element_markdown(size = 12), legend.position = "top", legend.title = eb, axis.title.x = eb, axis.title.y = eb) + MetBrewer::scale_fill_met_d("Egypt")+ coord_cartesian(clip = "off") ``` ## Exclusions Exclusions have risen very rapidly, particularly in secondary schools. This is mostly driven by temporary suspensions, largely as schools clamp down on what is classified as *persistent disruptive behaviour*. This makes only a small contribution to overall absence rates, but is growing, and for some children is a major contribution to their overall school absence. Exclusion rates in 2025 (part year at the time of writing) look to have levelled off. ```{r} #| label: plot exclusion by year and phase #| fig-height: 3.5 ggplot(attend_year_phase |> filter(phase %in% c("Primary","Secondary"), year >= 2018) |> mutate(grey_flag = if_else(year==2025,0,1)) , aes(x = year, y = percent_excluded, alpha = grey_flag, colour = phase)) + geom_point() + geom_line(linetype = "dashed", alpha = 0.5) + geom_text(aes(label = year), size = 3, vjust = 1.5) + barplottheme_minimal + # Use theme_minimal() instead of barplottheme_minimal scale_y_continuous(labels = scales::percent) + labs(title = "Absence due to exclusion, by academic year in primary and secondary schools", subtitle = "percentage of sessions missed per year (code E); 2025 is part year", caption = "data from Capita One") + #annotate("text", x = 2020.3, y = 0.001, label = "COVID-19", size = 3, hjust = 1.1) + #geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") + theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) + MetBrewer::scale_fill_met_d("Egypt") + coord_cartesian(clip = "off") ``` ## Study leave 2024 saw seen the return of study leave as a coded absence reason, with a significant impact on overall attendance levels in secondary and particularly y11. At the time of writing the 2025 data does not include the summer term and shows only minimal study leave. ```{r} #| label: plot study leave by year and phase #| fig-height: 3.5 ggplot(attend_year_phase |> filter(phase %in% c("Primary","Secondary"), year >= 2018) |> mutate(grey_flag = if_else(year==2025,0,1)) , aes(x = year, y = percent_study_leave, alpha = grey_flag, colour = phase)) + geom_point() + geom_line(linetype = "dashed", alpha = 0.5) + geom_text(aes(label = year), size = 3, vjust = 1.5) + barplottheme_minimal + # Use theme_minimal() instead of barplottheme_minimal scale_y_continuous(labels = scales::percent) + labs(title = "Study leave, by academic year in primary and secondary schools", subtitle = "percentage of sessions missed per year (code S); 2025 is part year", caption = "data from Capita One") + annotate("text", x = 2020.3, y = 0.001, label = "COVID-19", size = 3, hjust = 1.1) + geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") + theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) + MetBrewer::scale_fill_met_d("Egypt") + coord_cartesian(clip = "off") ``` ## No reason The plot below comprises two DfE codes. Code N is intended as a placeholder until schools can establish a reason for absence, and code 0 is for unknown or other circumstances. Here both are grouped together - though the bulk is code O. ::: callout-important The increase in _no reason_ absences levelled off into 2024 and is down dramatically 2025. Some of this is likely due to the changes to recording from September 2024. Even so, the *no reason* category remains the biggest contributor to overall absence rates in secondary, and the increase in *no reason* absences are the biggest contribution to the post-pandemic rise in absences. Furthermore, *no reason* absences are significantly more prevalent in more deprived areas of the city, where attendance in general is poorer. ::: We can draw two possible conclusions from this: parents and children are not reporting the true reasons for absence, and the DfE codes are no longer suitable for capturing those reasons. In either case, this represents a serious blind spot in the data. ::: callout-tip ## recomendation Analysis of recorded case notes and text on Capita One, along with interviews or surveys of pupils, teachers, parents or community groups may help to understand the stories behind these *no reason* absences ::: ```{r} #| label: plot no reason by year and phase #| fig-height: 3.5 ggplot(attend_year_phase |> filter(phase %in% c("Primary","Secondary"), year >= 2018) |> mutate(grey_flag = if_else(year==2025,0,1)) , aes(x = year, y = percent_no_reason, alpha = grey_flag, colour = phase)) + geom_point() + geom_line(linetype = "dashed", alpha = 0.5) + geom_text(aes(label = year), size = 3, vjust = 1.5) + barplottheme_minimal + # Use theme_minimal() instead of barplottheme_minimal scale_y_continuous(labels = scales::percent) + labs(title = "Absent with no reason, by academic year in primary and secondary schools", subtitle = "percentage of sessions missed per year (codes N & O); 2025 is part year", caption = "data from Capita One") + annotate("text", x = 2020.3, y = 0.001, label = "COVID-19", size = 3, hjust = 1.1) + geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") + theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) + MetBrewer::scale_fill_met_d("Egypt")+ coord_cartesian(clip = "off") ``` Finally, the two charts below summarise the contributions of each of these coded absence reasons to the overall absence picture, during 2023-2024: ```{r} #| label: get 2024 summary by phase and coded reason coded_absence_2024 <- attend_year_phase |> ungroup() |> filter( year == 2024, phase %in% c("Primary","Secondary") ) |> select(phase, `missing data` = percent_missing, illness = percent_illness, `family holidays` = percent_family_holiday, excluded = percent_excluded, `medical appointments` = percent_med_appt, late = percent_late_absent, `study leave` = percent_study_leave, `approved offsite activity` = percent_approved_offsite, `no reason` = percent_no_reason) |> pivot_longer(-phase, names_to = "category", values_to = "percent") |> arrange(percent) ``` ```{r} #| label: plot coded absence primary #| fig-height: 4 wf <-coded_absence_2024 |> filter(phase == "Primary") |> mutate(cat_wrap = str_wrap(category, width = 10)) |> select(-phase, -category) waterfall(wf, rect_text_labels = scales::percent(wf$percent, accuracy = 0.1), calc_total = TRUE, total_rect_text = scales::percent(sum(wf$percent),0.1), total_axis_text = "total", total_rect_color = "gray" ) + theme(axis.line = eb, axis.ticks = eb, axis.text.y = eb) + labs(title = "Coded absences in Sheffield primary schools - 2023-24", subtitle = "% of available sessions missed") ``` ```{r} #| label: plot coded absence secondary #| fig-height: 4 wf <-coded_absence_2024 |> filter(phase == "Secondary") |> mutate(cat_wrap = str_wrap(category, width = 10)) |> select(-phase, -category) waterfall(wf, rect_text_labels = scales::percent(wf$percent, accuracy = 0.1), calc_total = TRUE, total_rect_text = scales::percent(sum(wf$percent),0.1), total_axis_text = "total", total_rect_color = "gray" ) + theme(axis.line = eb, axis.ticks = eb, axis.text.y = eb) + labs(title = "Coded absences in Sheffield secondary schools - 2023-24", subtitle = "% of available sessions missed") ``` # Demographics Looking at how attendance varies with age, gender and ethnicity, and how this picture is changing over time. ## Age Absence is little higher in Y1 and Y2 when children are very young, and is level through primary. The transition to secondary school is associated with a big increase in absence, which continues year on year up to Y11. As we'll see later on - this *transition drop* into Y7 and subsequent decline is more severe for groups with particular risk factors. ```{r} #| label: calculate and plot attendance by school year #| fig-height: 4 # calculate average presence by ncy attend_ncy <- attend |> filter(year >= 2018, ncy >= 1 & ncy <= 11) |> summarise_attendance(grouping_vars = c("ncy", "stud_id")) |> group_by(ncy) |> summarise (mean.percent_absent = mean(percent_absent, na.rm = TRUE), sd.percent_absent = sd(percent_absent, na.rm = TRUE), n.percent_absent = n() ) |> mutate(se.percent_absent = sd.percent_absent / sqrt(n.percent_absent), lower.ci.percent_absent = mean.percent_absent - qt(1 - (0.05 / 2), n.percent_absent - 1) * se.percent_absent, upper.ci.percent_absent = mean.percent_absent + qt(1 - (0.05 / 2), n.percent_absent - 1) * se.percent_absent) # plot ggplot(attend_ncy, aes(x = ncy, y = mean.percent_absent)) + geom_col(position = position_dodge(0.9), fill = "#0072B2")+ geom_errorbar(aes(ymin = lower.ci.percent_absent, ymax = upper.ci.percent_absent), width = 0.2, position = position_dodge(0.9))+ geom_text(aes(label = scales::percent(round(mean.percent_absent,3))), vjust = 2, colour = "white", size = 3, position = position_dodge(0.9)) + labs(title = "Absence by school year", subtitle = "Average percentage of available sessions not attended +- 95 CI; all reason codes; all Sheffield schools & pupils, 2018 - 2024", x = "national curriculum year", caption = "data from Capita One")+ barplottheme_minimal + theme(axis.text.y = eb) + scale_x_continuous(breaks = seq(1,11)) ``` ::: callout-note The ImpactEd report [Understanding Attendance - Report 1](https://www.evaluation.impactedgroup.uk/research-and-resources/understanding-attendance) identified an emerging trend of a jump in absence between Y7 and y8. The Sheffield data does not support this, with the increase from Y7 to Y8 looking broadly the same - around 1% increase in absence - as any other year on year increase within secondary years. ::: Looking at trends over time for primary school years, we see that the youngest and oldest primary age children were most affected. There are encouraging signs of recovery among all primary years into 2024, and particularly in Y1. ```{r} #| label: plot attendance by year & ncy - primary #| fig-height: 3.5 attend |> filter(year != 2020, year >= 2018, school_ed_phase_corrected == "Primary") |> ungroup() |> summarise_attendance(grouping_vars = c("ncy","year","school_ed_phase_corrected")) |> filter(ncy <= 11 & ncy >= 1, child_count > 1000) |> ungroup() |> mutate(label = ifelse(year == max(year), ncy, NA_character_), ncy = factor(ncy)) |> ggplot(aes(x = year, y = percent_present, colour = ncy, group = ncy, label = label ) ) + geom_point(shape = 1) + geom_line() + geom_label_repel(hjust = TRUE,min.segment.length = Inf,max.overlaps = Inf,size = 2.5) + scale_y_continuous(labels = scales::percent) + scale_x_continuous(limits = c(2018,2026), breaks = seq(2018,2025)) + theme(legend.position = "none", axis.title = eb, strip.background = eb, axis.line = eb, axis.ticks = eb) + labs(title = "Primary school attendance over time by national curriculum year", subtitle = "% of available sessions attended; all Sheffield schools; 2020 excluded; 2025 is part-year", caption = "data from Capita One") + coord_cartesian(clip = "off") ``` In secondary schools, we can see how disproportionately affected children in Y11, and encouraging signs of recovery in years 7 and 9. It is worth noting that the children in years 10 and 11 in 2024 were those who had their crucial Y6 and y7 transition years disrupted by the pandemic. ```{r} #| label: plot attendance by year & ncy - secondary #| fig-height: 3.5 attend |> filter(year != 2020, year >= 2018, school_ed_phase_corrected == "Secondary") |> ungroup() |> summarise_attendance(grouping_vars = c("ncy","year","school_ed_phase_corrected")) |> filter(ncy <= 11 & ncy >= 1, child_count > 1000) |> ungroup() |> mutate(label = ifelse(year == max(year), ncy, NA_character_), ncy = factor(ncy)) |> ggplot(aes(x = year, y = percent_present, colour = ncy, group = ncy, label = label ) ) + geom_point(shape = 1) + geom_line() + geom_label_repel(hjust = TRUE,min.segment.length = Inf,max.overlaps = Inf,size = 2.5) + scale_y_continuous(labels = scales::percent) + scale_x_continuous(limits = c(2018,2026), breaks = seq(2018,2025)) + theme(legend.position = "none", axis.title = eb, strip.background = eb, axis.line = eb, axis.ticks = eb) + labs(title = "Secondary school attendance over time by national curriculum year", subtitle = "% of available sessions attended; all Sheffield schools; 2020 excluded; 2025 is part-year", caption = "data from Capita One") + coord_cartesian(clip = 'off') ``` ::: callout-info The drop off in Y11 is driven in part driven by study leave in 2024; this is yet to occur in the 2025 year ::: These trends will be explored in more detail in the *Trends by annual cohort* section later in this report. ## Gender Looking at overall school attendance since 2021, girls attend slightly better than boys, a difference of about 0.5%. The gender time series show boys and girls moving in lockstep through primary school, separated by about half a percentage point: ```{r} #| label: plot primary attendance by gender #| fig-height: 3 #| warning: false attend_year_gender_phase |> filter(!is.na(gender), year >= 2018, phase == "Primary") |> ungroup() |> mutate(label = if_else(year == max(year), case_when(gender == "M" ~ "boys", gender == "F" ~ "girls"), NA_character_)) |> ggplot(aes(x = year, y = percent_present, colour = gender, group = gender, label = label)) + geom_point(size = 3) + geom_line() + barplottheme_minimal + theme(legend.position = "none", axis.title.x = eb, legend.title = eb) + scale_y_continuous(labels = scales::percent) + scale_x_continuous(breaks = seq(2018,2026, by = 1)) + geom_text_repel(hjust = TRUE, nudge_x = 0.5, min.segment.length = Inf) + labs(title = "Primary school attendance by year and gender", subtitle = "% of sessions attended per year; 2025 is part year", caption = "data from Capita One") ``` In secondary we see boys' attendance overtaking girls in the aftermath of the pandemic, but all continuing to decline into 2024. ```{r} #| label: plot secondary attendance by gender #| fig-height: 3 #| warning: false attend_year_gender_phase |> filter(!is.na(gender), year >= 2018, phase == "Secondary") |> ungroup() |> mutate(label = if_else(year == max(year), case_when(gender == "M" ~ "boys", gender == "F" ~ "girls"), NA_character_)) |> ggplot(aes(x = year, y = percent_present, colour = gender, group = gender, label = label)) + geom_point(size = 3) + geom_line() + barplottheme_minimal + theme(legend.position = "none", axis.title.x = eb, legend.title = eb) + scale_y_continuous(labels = scales::percent) + scale_x_continuous(breaks = seq(2018,2026, by = 1)) + geom_text_repel(hjust = TRUE, nudge_x = 0.5, min.segment.length = Inf) + labs(title = "Secondary school attendance by year and gender", subtitle = "% of sessions attended per year; 2025 is part year", caption = "data from Capita One") ``` ```{r} #| label: plot attendance by ncy and gender #| fig-height: 3 #| warning: false attend |> filter(!is.na(gender), gender != "U", year >= 2018, ncy >= 0, ncy <= 11) |> group_by(ncy, gender) |> presence_mean_calc() |> ungroup() |> mutate(label = if_else(ncy == max(ncy), case_when(gender == "M" ~ "boys", gender == "F" ~ "girls"), NA_character_)) |> ggplot(aes(x = ncy, y = mean.percent_present, colour = gender, group = gender, label = label)) + geom_point(size = 1) + geom_line() + geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, alpha = 0.6)+ barplottheme_minimal + theme(legend.position = "none", axis.title.x = eb, legend.title = eb) + scale_y_continuous(labels = scales::percent) + scale_x_continuous(breaks = seq(0,11, by = 1)) + geom_text_repel(hjust = TRUE, nudge_x = 0.5, min.segment.length = Inf) + labs(title = "School attendance by national curriculum year and gender", subtitle = "% of sessions attended per year since 2018", caption = "data from Capita One") ``` Looking at age, gender and deprivation together, we see the pattern reversed in older children. In poorer wards of the city, girls consistently attend better than boys across all ages. In the most affluent wards, this is reversed in older children, with a gender gap widening from Y8 onwards, where boys have higher attendance. ```{r} attend |> left_join(stud_details_joined |> select(-gender), by = "stud_id") |> filter(year >= 2018, ncy >= 1, ncy <= 11, !is.na(gender), imd_quartile %in% c(1,4)) |> mutate(imd_quart_name = if_else(imd_quartile == 1, "most affluent 25%", "most deprived 25%")) |> select(-imd_quartile) |> group_by(imd_quart_name,ncy,gender) |> presence_mean_calc() |> ggplot(aes(x = ncy, y = mean.percent_present, colour = gender, group = gender)) + geom_point() + geom_line() + scale_y_continuous(labels = scales::percent) + scale_x_continuous(breaks = seq(1:11)) + facet_grid(rows = vars(imd_quart_name)) + theme(axis.title.y = eb, legend.position = "top", legend.title = eb) + labs(title = "Secondary school attendance by IMD quartile, national curriculum year and gender", subtitle = "% of sessions attended per year; 2025 is part year", caption = "data from Capita One") ``` ## Ethnicity The ethnic makeup of Sheffield's population continues to change, and there are differences in attendance rates between children in different ethnic groups. Here we summarise the data around ethnicity. ::: callout-caution The ethnic groups and subgroups used in this analysis are those available the Capita One source data. These don't necessarily align with the groupings used by ONS for census data, other organisations, or in other SCC data and reporting ::: With the caveat that data prior to 2018 may not be wholly complete, the attendance data allows us to look at a long term view of changes in the ethnic makeup of the Sheffield school population. Note the free y-axis scales on the following chart, means that the lines are not directly comparable: ```{r} #| label: ethnic groups over time #| warning: false #| message: false eth_category_volumes <- attend |> select(year, stud_id, ethnicity_category) |> unique() |> group_by(year, ethnicity_category) |> summarise(student_count = n_distinct(stud_id)) |> mutate(freq = student_count / sum(student_count)) |> ungroup() |> mutate(label = ifelse(year == max(year), ethnicity_category, NA_character_), label_n = ifelse(year %in% c(2008,2012,2016,2020,2024,2025),student_count,NA_real_) ) ggplot(eth_category_volumes, aes(x = year, y = student_count, colour = ethnicity_category)) + geom_line() + scale_x_continuous(breaks = seq(2006,2024, by = 2)) + geom_label_repel(aes(label = label), nudge_x = 4, nudge_y = 0, alpha = 0.75, size = 2.5, min.segment.length = Inf) + geom_text_repel(aes(label = label_n), size = 2.5) + facet_grid(rows = vars(fct_rev(ethnicity_category)), scales = "free_y") + barplottheme_minimal + theme(strip.background = eb, axis.title.x = eb, legend.position = "none", strip.text = eb, axis.text.y = eb) + labs(title = "Pupils in Sheffield by ethnicity category", subtitle = "unique count of pupils in attendance data per year", caption = "data from Capita One") + scale_colour_brewer(palette = "Dark2") ``` ```{r} #| label: ethnicity description SEN summary table #| message: false #| fig-width: 10 attend_eth_des_phase <- attend |> #select(-phase) |> #rename(phase = school_ed_phase_corrected) |> filter(year == 2024, phase %in% c("Primary","Secondary"), ncy >= 1, ncy <= 11) |> group_by(ethnicity_description, phase) |> summarise_attendance(grouping_vars = c("ethnicity_description", "phase")) |> select(ethnicity_description, phase, child_count, percent_of_pupils, percent_absent) attend_eth_des_total <- attend |> #select(-phase) |> #rename(phase = school_ed_phase_corrected) |> filter(year == 2024, phase %in% c("Primary","Secondary"), ncy >= 1, ncy <= 11) |> summarise_attendance(grouping_vars = "ethnicity_description") |> select(ethnicity_description, child_count, percent_of_pupils, percent_absent) |> mutate(phase = "Total") attend_phase_total <- attend |> #select(-phase) |> #rename(phase = school_ed_phase_corrected) |> filter(year == 2024, phase %in% c("Primary","Secondary"), ncy >= 1, ncy <= 11) |> summarise_attendance(grouping_vars = "phase") |> select(phase, child_count, percent_of_pupils, percent_absent) |> mutate(#phase = "Total", ethnicity_description = "all children") attend_total <- attend |> #select(-phase) |> #rename(phase = school_ed_phase_corrected) |> filter(year == 2024, phase %in% c("Primary","Secondary"), ncy >= 1, ncy <= 11) |> summarise_attendance(grouping_vars = "none") |> select(child_count, percent_of_pupils, percent_absent) |> mutate(phase = "Total", ethnicity_description = "all children") eth_des_table <- rbind( attend_eth_des_phase, attend_eth_des_total, attend_phase_total, attend_total) |> pivot_wider(names_from = phase, values_from = c(child_count,percent_of_pupils, percent_absent)) |> rename(`ethnicity description` = ethnicity_description) |> ungroup() |> arrange(desc(child_count_Total)) |> select( `ethnicity description`, contains("Total"), contains("Primary"), contains("Secondary") ) eth_des_table |> gt(rowname_col = "ethnicity description") |> tab_spanner(id = 1, label = "Primary", columns = dplyr::contains("Primary")) |> tab_spanner(id = 2, label = "Secondary", columns = dplyr::contains("Secondary")) |> tab_spanner(id = 3, label = "Total", columns = dplyr::contains("Total")) |> cols_label( contains("count") ~ "count", contains("percent_of_pupils") ~ "% of pupils", contains("percent_absent") ~ "% absent 2023/24" ) |> tab_header( title = "Pupils and attendance in Sheffield by ethnicity description", subtitle = "pupils on roll in 2023/24; data from School Census & Capita One attendance records") |> tab_options( table.align = "left", table.font.size = 10, heading.title.font.size = 12, heading.subtitle.font.size= 10, heading.align = "left", column_labels.font.size = 12, stub.font.size = 12 ) |> cols_align( "left",'ethnicity description' ) |> fmt_percent(columns = contains("percent"), decimals = 1) |> data_color( columns = percent_absent_Primary, method = "numeric", palette = "viridis") |> data_color( columns = percent_absent_Secondary, method = "numeric", palette = "viridis") |> data_color( columns = percent_absent_Total, method = "numeric", palette = "viridis") ``` # Geography & deprivation There are many ways to divide up the city geographically, but we'll look at the 28 wards, and in particular their deprivation as measured in the 2019 Indices of Multiple Deprivation (IMD) scores. More recent (and older) measures of deprivation may be available, but the analysis is broadly the same. ## Attendance by ward The table below shows overall attendance by ward of residence during 2023-24. ```{r} #| label: ward summary table #| message: false #| fig-width: 10 attend_ward_phase <- attend |> #select(-phase) |> #rename(phase = school_ed_phase_corrected) |> filter(year == 2024, phase %in% c("Primary","Secondary"), ncy >= 1, ncy <= 11) |> #group_by(ward, phase) |> summarise_attendance(grouping_vars = c("ward", "phase")) |> select(ward, phase, child_count, percent_of_pupils, percent_absent) attend_ward_total <- attend |> #select(-phase) |> #rename(phase = school_ed_phase_corrected) |> filter(year == 2024, phase %in% c("Primary","Secondary"), ncy >= 1, ncy <= 11) |> summarise_attendance(grouping_vars = "ward") |> select(ward, child_count, percent_of_pupils, percent_absent) |> mutate(phase = "Total") attend_phase_total <- attend |> #select(-phase) |> #rename(phase = school_ed_phase_corrected) |> filter(year == 2024, phase %in% c("Primary","Secondary"), ncy >= 1, ncy <= 11) |> summarise_attendance(grouping_vars = "phase") |> select(phase, child_count, percent_of_pupils, percent_absent) |> mutate(#phase = "Total", ward = "Sheffield") attend_total <- attend |> #select(-phase) |> #rename(phase = school_ed_phase_corrected) |> filter(year == 2024, phase %in% c("Primary","Secondary"), ncy >= 1, ncy <= 11) |> summarise_attendance(grouping_vars = "none") |> select(child_count, percent_of_pupils, percent_absent) |> mutate(phase = "Total", ward = "Sheffield") ward_table <- rbind( attend_ward_phase, attend_ward_total, attend_phase_total, attend_total) |> filter(!is.na(ward)) |> pivot_wider(names_from = phase, values_from = c(child_count,percent_of_pupils, percent_absent)) |> rename(`ethnicity description` = ward) |> ungroup() |> arrange(desc(child_count_Total)) |> select( `ethnicity description`, contains("Total"), contains("Primary"), contains("Secondary") ) ward_table |> gt(rowname_col = "ethnicity description") |> tab_spanner(id = 1, label = "Primary", columns = dplyr::contains("Primary")) |> tab_spanner(id = 2, label = "Secondary", columns = dplyr::contains("Secondary")) |> tab_spanner(id = 3, label = "Total", columns = dplyr::contains("Total")) |> cols_label( contains("count") ~ "count", contains("percent_of_pupils") ~ "% of children", contains("percent_absent") ~ "% absent 2023/24" ) |> tab_header( title = "Pupils in Sheffield, by ward of residence", subtitle = "pupils on roll & attendance in 2023/24; data from School Census & Capita One attendance records") |> tab_options( table.align = "left", table.font.size = 10, heading.title.font.size = 12, heading.subtitle.font.size= 10, heading.align = "left", column_labels.font.size = 12, stub.font.size = 12 ) |> cols_align( "left",'ethnicity description' ) |> fmt_percent(columns = contains("percent"), decimals = 1) |> data_color( columns = percent_absent_Primary, method = "numeric", palette = "viridis") |> data_color( columns = percent_absent_Secondary, method = "numeric", palette = "viridis") |> data_color(columns = percent_absent_Total, method = "numeric", palette = "viridis") ``` ## Economic deprivation These ward level attendance figures line up neatly with deprivation indicators. Plotting attendance against the 2019 Indices of Multiple Deprivation (IMD) scores shows a tight correlation. ::: callout-caution Since school attendance figures one of the input variables to the IMD scores, there is some circular logic at work here. Even so, attendance is only one of 39 inputs, so this analysis is worth pursuing. ::: ```{r} #| label: plot attendance by ward level imd score #| warning: false attend |> filter(year == 2024) |> summarise_attendance(grouping_vars = c("ward","ward_imd_score")) |> ggplot(aes(x = ward_imd_score, y = percent_present, )) + geom_point() + geom_text_repel(aes(label = ward), size = 2.5, segment.colour = "gray") + #geom_smooth(method = "lm") + scale_y_continuous(labels = scales::percent) + labs(title ="School attendance by ward level deprivation", subtitle = "Average % of sessions attended 2024; ward of residence; all ages", caption = "data from Capita One", y = "attendance", x = "Indices of multiple deprivation score (2019)") ``` The link to deprivation has always been there but is stronger today - recreating the chart above with 2010 attendance and IMD scores shows a weaker relationship. The link to deprivation less evident in primary schools, but stronger in secondary schools, and the gap between primary and secondary attendance widens in poorer areas of the city. ```{r} #| label: plot attendance by ward level imd score and phase #| warning: false #| message: false ward_data <- attend |> #select(-phase) |> #rename(phase = school_ed_phase_corrected) |> filter(year == 2023, phase %in% c("Primary","Secondary"), ncy >= 1, ncy <= 11) |> summarise_attendance(grouping_vars = c("ward","ward_imd_score", "phase")) |> arrange(ward_imd_score) grid <- seq(from = min(ward_data$ward_imd_score, na.rm = TRUE), to = max(ward_data$ward_imd_score, na.rm = TRUE), length.out = ward_data |> ungroup() |> select(ward) |> distinct() |> tally() |> pull()) ward_grid <- ward_data |> select(ward, ward_imd_score) |> distinct() |> arrange(ward_imd_score) |> cbind(grid) |> rename(label_sequence = 3) |> ungroup() |> select(-ward_imd_score) plot_data <- ward_data |> left_join(ward_grid, by = "ward") ggplot(plot_data, aes(x = ward_imd_score, y = percent_present, colour = phase, group = ward,label = ward)) + geom_point(size = 2.5, alpha = 0.7) + geom_line(colour = "grey70") + scale_y_continuous(labels = scales::percent) + #coord_cartesian(expand = FALSE, clip = "off") + geom_text_repel(data = plot_data |> filter(phase == "Primary"), aes(x = label_sequence, y = 0.75), colour = "grey40", size = 2.5, #force_pull = 0, min.segment.length = Inf, angle = 90, #segment.angle = 90, #point.padding = 0, #max.overlaps = Inf, #direction = "x", nudge_y = 0.04#, #hjust = 0, #max.iter = 1e4, max.time = 1 ) + labs(title = "School attendance in primary and secondary schools, by ward level deprivation score", subtitle = "% of available sessions attended in 2023 by ward of residence", caption = "data from Capita One", y = "attendance", x = "Indices of multiple deprivation score (2019)") + theme(plot.title = element_markdown(size = 12), legend.position = "none" ) + MetBrewer::scale_fill_met_d("Egypt") ``` This longer term view below compares the trend in attendance between the top and bottom quartiles of the ward level deprivation scores, at the half-term level with a trend-line. The middle two quartiles are excluded from this plot. The gap between the most and least deprived areas narrowed towards the peak attendance rate in 2016, so gains were disproportionately made in poorer areas, but the most deprived quartile then falls away more rapidly since the pandemic. ```{r} #| label: plot attendance by imd quartile and half term with trend #| warning: false attend |> left_join(stud_details_joined, by = "stud_id") |> filter(imd_quartile %in% c(1,4)) |> mutate(imd_quartile = factor(imd_quartile)) |> summarise_attendance(grouping_vars = c("ht_id","ht_start_date","imd_quartile")) |> ggplot(aes(x = ht_start_date, y = percent_present, fill = imd_quartile, colour = imd_quartile)) + geom_point() + geom_smooth(alpha = 0.2) + barplottheme_minimal + scale_y_continuous(labels = scales::percent) + scale_x_date()+ labs(title = "Attendance of children living in the most deprived and least deprived wards of the city", subtitle = "groups are upper & lower quartiles of the IMD score of the ward of residence (2019); data points are half terms with trendline", caption = "data from Capita One")+ theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.title.x = eb) + MetBrewer::scale_fill_met_d("Kandinsky") + MetBrewer::scale_colour_met_d("Kandinsky") ``` ```{r} #| label: get data attendance by imd quartile and year #| warning: false # get attendance attend_imdq_year <- attend |> left_join(stud_details_joined, by = "stud_id") |> filter(imd_quartile %in% c(1,4), year >= 2018) |> mutate(imd_quartile = factor(imd_quartile)) |> summarise_attendance(grouping_vars = c("year","imd_quartile","phase")) # calculate the gap between most & least deprived quartiles attend_imdq_gap <- attend_imdq_year |> filter(!phase %in% c("Nursery","6th form")) |> select(year, phase, imd_quartile, percent_present) |> pivot_wider(names_from = "imd_quartile", values_from = "percent_present") |> mutate(gap = `1` - `4`) ``` ```{r} #| label: plot primary attendance by imd quartile and year #| warning: false ggplot(attend_imdq_year |> filter(phase == "Primary"), aes(x = year, y = percent_present, fill = imd_quartile, colour = imd_quartile)) + geom_point() + geom_line() + #geom_smooth(alpha = 0.2) + barplottheme_minimal + scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) + #scale_x_date()+ labs(title = "Primary school attendance of children in the most deprived and least deprived wards of Sheffield", subtitle = "groups are upper & lower quartiles of the IMD score of the ward of residence (2019); data points are half terms with trendline", caption = "data from Capita One")+ theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.title.x = eb) + MetBrewer::scale_colour_met_d("Kandinsky") ``` ```{r} #| label: plot secondary attendance by imd quartile and year #| warning: false ggplot(attend_imdq_year |> filter(phase == "Secondary"), aes(x = year, y = percent_present, fill = imd_quartile, colour = imd_quartile)) + geom_point() + geom_line() + barplottheme_minimal + scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) + scale_x_continuous(breaks = seq(2018,2025,by = 1))+ labs(title = "Secondary school attendance of children in the most deprived and least deprived wards of Sheffield", subtitle = "groups are upper & lower quartiles of the IMD score of the ward of residence (2019); data points are half terms with trendline", caption = "data from Capita One")+ theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.title.x = eb) + MetBrewer::scale_colour_met_d("Kandinsky") ``` Finally here, since it's not so easy to read from the above charts we we can look at the change in the _difference_ in attendance between the most and least deprived quartiles of the city. Plotting this reveals that although attendance is increasing in both primary & secondary, and across all levels of deprivation, the gap between the most and least deprived quartiles of the city is reducing in primary schools, but continues to grow in secondary: ```{r} #| label: plot the gap between imd q 1 and 4 by year ggplot(attend_imdq_gap, aes(x = year, y = gap, colour = phase, fill = phase)) + geom_point() + geom_line(linetype = "dashed", alpha = 0.5) + geom_text(aes(label = year), size = 3, vjust = 1.5) + barplottheme_minimal + # Use theme_minimal() instead of barplottheme_minimal scale_y_continuous(labels = scales::percent) + labs(title = "The deprivation gap in Sheffield in Sheffield primary and secondary schools", subtitle = "Difference in % attendance between top 25% and bottom 25% indices of multiple deprivation (IMD) score by ward of residence", caption = "data from Capita One") + geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") + theme(plot.title = element_markdown(size = 12), plot.subtitle = element_text(size = 8), legend.position = "none", axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) + MetBrewer::scale_fill_met_d("Egypt")+ coord_cartesian(clip = "off") ``` The age profile by deprivation quartile shows how children in poorer areas have a steeper drop off through secondary school. Children in the most affluent 25% of wards attend better across all years, but show a more significant dropoff into Y11. Could study leave be a factor here? ```{r} #| label: plot attendance age profile by imd quartile attend_deprivation_quartile_ncy <- attend |> filter(year >= 2018, ncy >= 1, ncy <= 11) |> left_join(stud_details_joined, by = "stud_id") |> filter(imd_quartile %in% c(1,4)) |> mutate(imd_quartile = factor(imd_quartile)) |> group_by(imd_quartile, ncy) |> summarise_avg() ggplot(attend_deprivation_quartile_ncy, aes(x = ncy, y = mean.percent_present, colour = imd_quartile, group = imd_quartile, )) + geom_point() + geom_line() + geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, alpha = 0.7)+ scale_y_continuous(labels = scales::percent) + scale_x_continuous(breaks = seq(1,11)) + labs(title = "Attendance of children living in the most deprived and least deprived wards of the city", subtitle = "avg % of sessions attended since 2018 +-95CI; groups are upper & lower quartiles of the IMD score of the ward of residence (2019)", caption = "data from Capita One")+ theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.title.x = eb) + MetBrewer::scale_fill_met_d("Kandinsky") + MetBrewer::scale_colour_met_d("Kandinsky") + theme(axis.title.y = eb, legend.position = "none") + geom_vline(aes(xintercept = 6.5), linetype = "dotted", colour = "gray70", size = 1.2) + annotate("text", label = "primary", y = 0.99, x = 3.5, colour = "gray40") + annotate("text", label = "secondary", y = 0.99, x = 9, colour = "gray40") ``` ## Free School Meals Free School Meal (FSM) status is perhaps a better indicator of socio-economic status of children than ward of residence, since it is means tested at the family level. ```{r} #| label: table of children by phase and free school meal #| message: false fsm_table_data <- attend |> mutate(fsm = replace_na(fsm, "0")) |> #mutate(fsm = factor(fsm, levels = c("T","F"))) |> #select(-phase) |> #rename(phase = school_ed_phase_corrected) |> filter(year == 2024, phase %in% c("Primary","Secondary"), ncy >= 1, ncy <= 11) |> summarise_attendance(grouping_vars = c("phase","fsm")) |> select(phase, fsm, child_count, percent_of_pupils, percent_absent) |> pivot_wider(names_from = phase, values_from = c(child_count,percent_of_pupils, percent_absent)) |> mutate(fsm = fct_recode(fsm, "free school meal eligible" = "T","no fsm" = "F")) |> ungroup() fsm_table_total_row <- attend |> #select(-phase) |> #rename(phase = school_ed_phase_corrected) |> filter(year == 2024, phase %in% c("Primary","Secondary"), ncy >= 1, ncy <= 11) |> summarise_attendance(grouping_vars = c("phase")) |> select(phase, child_count, percent_of_pupils, percent_absent) |> pivot_wider(names_from = phase, values_from = c(child_count,percent_of_pupils, percent_absent)) |> mutate(fsm = "total") |> ungroup() fsm_table_data |> rbind(fsm_table_total_row) |> rename(`free school meal` = fsm) |> gt(rowname_col = "free school meal") |> tab_spanner(id = 1, label = "Primary", columns = dplyr::contains("Primary")) |> tab_spanner(id = 2, label = "Secondary", columns = dplyr::contains("Secondary")) |> #tab_spanner(id = 3, label = "Total", columns = dplyr::contains("Total")) |> cols_label( contains("count") ~ "count", contains("percent_of_pupils") ~ "% of children", contains("percent_absent") ~ "avg % absent (2023)" ) |> tab_header( title = "Pupils in Sheffield, by free school meal status", subtitle = "count of pupils on roll in 2023/24; data from School Census & Capita One attendance records") |> tab_options( table.align = "left", table.font.size = 10, heading.title.font.size = 12, heading.subtitle.font.size= 10, heading.align = "left", column_labels.font.size = 14, stub.font.size = 12 ) |> cols_align("left",'free school meal') |> fmt_percent(columns = contains("percent"), decimals = 1) |> data_color( columns = percent_absent_Primary, method = "numeric", palette = "viridis", alpha = 0.7) |> data_color( columns = percent_absent_Secondary, method = "numeric", palette = "viridis", alpha = 0.7) ``` ```{r} #| label: attendance by fsm status and half term ggplot(attend_year_ht_fsm, aes(x = ht_start_date, y = percent_present, fill = fsm, colour = fsm)) + geom_point() + geom_smooth() + barplottheme_minimal + scale_y_continuous(labels = scales::percent) + annotate("text", x = date("2020-03-31"), y = 0.6, label = "COVID-19", size = 2.5, hjust = 1.1, colour = "dark gray") + geom_vline(xintercept = date("2020-03-31"), linetype = "longdash", colour = "dark gray") + labs(title = "Attendance by children receiving free school meals and not on fsm", subtitle = "% of available sessions attended, with trend", caption = "data from Capita One")+ theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.title.x = eb) + #MetBrewer::scale_fill_met_d("Egypt") + scale_fill_manual(values = c("#0f7ba2","#dd5129")) + scale_colour_manual(values = c("#0f7ba2","#dd5129")) ``` More concerning are the exclusion rates for children with Free School Meals, which are rapidly diverging from those without. ```{r} ggplot(attend_year_ht_fsm |> filter(ht_start_date >= as_date("2016-09-01")), aes(x = ht_start_date, y = percent_excluded, fill = fsm, colour = fsm)) + geom_point() + geom_smooth() + barplottheme_minimal + scale_y_continuous(labels = scales::percent) + annotate("text", x = date("2020-03-31"), y = 0.01, label = "COVID-19", size = 2.5, hjust = 1.1, colour = "dark gray") + geom_vline(xintercept = date("2020-03-31"), linetype = "longdash", colour = "dark gray") + labs(title = "Exclusion rates by children receiving free school meals and not on fsm", subtitle = "% of available sessions missed, with trend line", caption = "data from Capita One")+ theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.title.x = eb) + #MetBrewer::scale_fill_met_d("Egypt") + scale_fill_manual(values = c("#0f7ba2","#dd5129")) + scale_colour_manual(values = c("#0f7ba2","#dd5129")) #geom_hline(yintercept = excl_2023_avg, linetype = "dashed", colour = "dark grey") + #annotate("text", x = date("2023-03-31"), y = 0.0025, label = "2023 average", size = 2.5, hjust = 1.1, colour = "dark gray") ``` ## Distance to school We used the postcodes of each child's home address and school location to calculate a measure of straight-line distance between the two. Attendance is significantly better, on average for children who live closer to school. Children living very close to school (\<100m) attend about 1.5% better on average in Primary. For secondary schools this difference is 2.3%. Conversely, ```{r average absence by binned school distance all provision} # calculated average by binned distance primary dist_data <- sch_dist_sheff_23 |> filter(school_ed_phase_corrected %in% c("Primary","Secondary"), sen_level != "EHCP" ) |> rename(phase = school_ed_phase_corrected) sch_dist_binned_pri <- dist_data |> filter(phase == "Primary") |> mutate(dist_bin = cut(dist_crow, breaks = c(0,100,200,500,1000,2000,5000,10000,1000000))) |> group_by(dist_bin) |> presence_mean_calc() |> filter(!is.na(dist_bin)) |> mutate(dist_bin_label = c("<100m", "100 - 200m", "200 - 500m", "500m - 1km","1-2km","2-5km","5-10km","10km+"), phase = "Primary") # calculated avg by binned distance secondary sch_dist_binned_sec <- dist_data |> filter(phase == "Secondary") |> mutate(dist_bin = cut(dist_crow, breaks = c(0,100,200,500,1000,2000,5000,10000,1000000))) |> group_by(dist_bin) |> presence_mean_calc() |> filter(!is.na(dist_bin)) |> mutate(dist_bin_label = c("<100m", "100 - 200m", "200 - 500m", "500m - 1km","1-2km","2-5km","5-10km","10km+"), phase = "Secondary") # calculate overall averages by phase sch_dist_binned_overall <- dist_data |> mutate( dist_bin = NA_character_, dist_bin_label = "overall avg") |> group_by(dist_bin, dist_bin_label, phase) |> presence_mean_calc() sch_dist_binned <- rbind(sch_dist_binned_pri, sch_dist_binned_sec, sch_dist_binned_overall) |> mutate(fill_code = case_when(dist_bin_label == 'overall avg' ~ 'total', TRUE ~ 'others')) # plot ggplot(sch_dist_binned, aes(x = reorder(dist_bin_label,mean.percent_present), y = mean.percent_present, fill = fill_code)) + geom_col(position = position_dodge(0.9))+ geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, position = position_dodge(0.9))+ geom_text(aes(label = scales::percent(round(mean.percent_present,3))), hjust = 1, colour = "white", size = 3, position = position_dodge(0.9) ) + labs(title = "Attendance by distance to school", subtitle = "Avg % sessions attended, 2023 +-95CI; straight line home to school distance; excluding children with EHCP", caption = "data from Capita One")+ barplottheme_minimal + theme(axis.title.x = eb, axis.text.y = element_text(size = 8), axis.text.x = eb, legend.position = "none", plot.subtitle = element_text(size = 8, face = "italic"), strip.background = eb) + coord_flip() + facet_grid(cols = vars(phase)) + scale_fill_manual(values = c("others"= "#0072B2", "total" = "#b47846")) ``` Plotting the average distance travelled against average attendance rates for secondary schools reveals four groupings: - on the right are two specialist facilities - UTC Sheffield & UTC Sheffield Olympic Legacy Park) and two catholic schools - All Saints and Notre Dame. All of these may incentivise pupils to travel further than normal. - the main bunch of schools in the middle seems to show a linear relationship between distance and attendance. Though this relationship is weak, and relies on us discarding the outliers (more on these below), and may not be a causal relationship. - Outlying this group above, Mercia, Tapton and High Storrs schools, are all in affluent areas of the city, and show higher attendance with average distance travelled - Below this group Chaucer school shows average distance travelled and below average attendance. Though, as we'll see below, the average distance travelled disguises some significant differences. ```{r} #| label: plot secondary school distance travelled v attendance #| fig-height: 6.5 # the distance data is already filtered to just Sheffield schools, but here we want to remove specials & nursery: dist_data <- sch_dist_sheff_23 |> filter(school_type == "mainstream", school_ed_phase == "Secondary") dist_by_sch <- dist_data |> group_by(school_short_name, school_ed_phase) |> summarise(mean.dist_crow = mean(dist_crow, na.rm = TRUE), sd.dist_crow = sd(dist_crow, na.rm = TRUE), n.dist_crow = n() ) |> mutate(se.dist_crow = sd.dist_crow / sqrt(n.dist_crow), lower.ci.dist_crow = mean.dist_crow - qt(1 - (0.05 / 2), n.dist_crow - 1) * se.dist_crow, upper.ci.dist_crow = mean.dist_crow + qt(1 - (0.05 / 2), n.dist_crow - 1) * se.dist_crow) dist_attend_by_sch <- dist_data |> group_by(school_short_name, school_ed_phase) |> presence_mean_calc() sch_dist_by_sch <- inner_join( dist_by_sch, dist_attend_by_sch) # plot ggplot(sch_dist_by_sch, aes(x = mean.dist_crow, y = mean.percent_present#, #colour = "dark blue" ) ) + geom_point(alpha = 0.7, colour = "steel blue")+ geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), colour = "steel blue", alpha = 0.5) + geom_errorbar(aes(xmin = lower.ci.dist_crow, xmax = upper.ci.dist_crow), colour = "steel blue", alpha = 0.5) + geom_text_repel(aes(label = school_short_name), size = 2.5, colour = "steel blue") + labs(title = "Attendance vs distance travelled", subtitle = "Sheffield Secondary schools; 2023 attendance rates", x = "average straight line distance from home to school (m)", y = "average % of sessions attended") + scale_y_continuous(labels = scales::percent) + theme(legend.position = "none") ``` Plotting the distance travelled against attendance at the child level reveals further differences. In the plot below we take one example from each of the four groups described above. We can think of dividing these plots into four quadrants: ![distance quadrants](attendance-in-sheffield-schools_files/figure-html/school_distance_quadrants.png) *Notre Dame High* has good attendance across the board, which varies regardless of the distance travelled. *Mercia* has excellent attendance, and a limited distance travelled, presumably due to it's oversubscription and high demand, with most datapoints appearing in the top left. The trend line points slightly down, as a few children who live further away have lower attendance.\ *Meadowhead* has typical average values for both attendance and distance, appearing in the middle of the pack in the plot above. Most children attend well and those with poorer attendance generally live close by - there are few in the bottom right. *Chaucer* by contrast has a small but significant number of points in the bottom right quadrant - those who attend very poorly and live far away. Some of this may be explained by families failing to secure a place at closer schools, and being placed across the city, with the distance then contributing to poor attendance. ```{r} #| label: plot distance v attendance at child level #| fig-height: 6 sch_dist_sheff_23 |> filter(school_short_name %in% c("Chaucer", "Mercia", "Notre Dame High",#)) |> "Meadowhead")) |> mutate(school = factor(school_short_name, levels = c("Notre Dame High","Mercia","Meadowhead","Chaucer"))) |> ggplot(aes(x = dist_crow, y = percent_present, colour = school, group = school)) + geom_point(alpha = 0.6, size = 1.5) + geom_smooth(alpha = 0.4) + scale_y_continuous(labels = scales::percent) + facet_wrap(vars(school)) + theme(legend.title = eb, legend.text = element_text(size = 7.5), legend.position = "none", strip.background = eb ) + labs(title = "Attendance vs distance travelled", subtitle = "Selected Sheffield secondary schools", x = "straight line home to school distance (m)", y = "% of sessions attended") ``` # Young carers It is difficult to establish the true number of young carers in the city - and perhaps dependent on definitions & methods. A 2023 all party parliamentary group (APPG) for young carers and adult carers [report](https://carers.org/downloads/appg-for-young-carers-and-young-adults-carers-reportlr.pdf) cites several sources: - 1.6% of pupils (2021 Census) - 0.5% of pupils (2023 school census) Though it places little confidence in these first two, preferring the estimates of two surveys: - 10% of all pupils provide high or very high levels of care (BBC / University of Nottingham) - 13% of pupils surveyed (COVID Social Mobility & Opportunities study) ```{r} #| label: estimate Sheffield young carer volumes yc_estimate_10pc <- sheffield_pupil_population_20241 * 0.1 ``` Applying the 10% figure to Sheffield's pupil population would indicate over 7000 young carers in the city. Our local data identifies just `r young_carers |> select(stud_id) |> distinct() |> tally() |> pull()` since 2020, so we provide the analysis here with the following caveat: ::: callout-caution ## data on young carers The data used in this section of the report comes from *young carer* type involvements in capita one, covering around 900 children from 2020 onwards. Clearly our data doesn't capture all young carers (and may skew towards those at the more severe end of the caring spectrum) and/or we are working with different definitions of what a *young carer* is. Issues with getting people of all ages to self-identify as carers are well known, and the perceived stigma attached to caring roles is likely more acute in young people - indeed this is probably a factor in explaining differences in school attendance. ::: The involvements have an open date, but no close date, so a time series analysis of volumes isn't possible, and also that the data implicitly assumes that a young carer remains so for the rest of their school career. A descriptive of demographic analysis may also be misleading, but we can make a comparison of attendance rates, which shows a significant impact. Primary age young carers attend just under 4% less that those without a caring role. In secondary school this gap rises to 10%: ```{r} #| label: calculate young carers summary data # can't do a time series on volumes as there are no close dates # yc_time_series <- # seq(ymd('2015-04-01'),ymd('2024-07-1'), by = '3 months') # young_carers attend_yc_phase <- attend |> filter(ncy >= 1, ncy <= 11, !phase %in% c("Nursery","6th form"), year >= 2020) |> left_join(young_carers, join_by(stud_id == stud_id, ht_start_date >= open_date)) |> mutate(yc_flag = replace_na(code_des,"not young carer")) |> group_by(yc_flag, phase) |> presence_mean_calc() attend_yc_ncy <- attend |> filter(ncy >= 1, ncy <= 11, year >= 2020) |> left_join(young_carers, join_by(stud_id == stud_id, ht_start_date >= open_date)) |> mutate(yc_flag = replace_na(code_des,"not young carer")) |> group_by(yc_flag, ncy) |> presence_mean_calc() attend_yc_year <- attend |> filter(ncy >= 1, ncy <= 11, year >= 2020) |> left_join(young_carers, join_by(stud_id == stud_id, ht_start_date >= open_date)) |> mutate(yc_flag = replace_na(code_des,"not young carer")) |> group_by(yc_flag, year) |> presence_mean_calc() ``` ```{r} #| label: plot attendance by young carers & phase #| fig-height: 3 ggplot(attend_yc_phase, aes(x = reorder(yc_flag,mean.percent_present, desc = TRUE), y = mean.percent_present )) + geom_col(fill = "steel blue", position = position_dodge(0.9))+ geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, position = position_dodge(0.9))+ geom_text(aes(label = scales::percent(round(mean.percent_present,3))), hjust = 1, colour = "white", size = 5, position = position_dodge(0.9)) + labs(title = "Attendance of young carers", subtitle = "Avg % sessions attended, 2023 +-95CI; young carer status from capita one involvements", caption = "data from Capita One")+ barplottheme_minimal + theme(axis.title.x = eb, axis.text.y = element_text(size = 8), axis.text.x = eb, legend.position = "none", plot.subtitle = element_text(size = 8, face = "italic"), strip.background = eb) + coord_flip() + facet_grid(cols = vars(phase)) ``` As we did for deprivation quartiles above, we can create an age profile of attendance for young carers, and compare it to pupils with no caring role. Again we see the greater impact on attendance as age increases, and presumably the expectations and stigmatisation around caring roles also increases. There is a particular drop in attendance going into year 8. ```{r} ggplot(attend_yc_ncy, aes(x = ncy, y = mean.percent_present, colour = yc_flag, group = yc_flag, #label = label )) + geom_point() + geom_line() + #geom_label_repel(hjust = 0, nudge_y = c(0.05,0.02,0.02), min.segment.length = Inf, alpha = 0.8) + geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, alpha = 0.7)+ scale_y_continuous(labels = scales::percent) + scale_x_continuous(breaks = seq(1,11)) + labs(title = "Attendance of young carers and those without a caring role", subtitle = "average % of sessions attended since 2020; young carers data from Capita One involvements", #caption = "data from Capita One" )+ theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.title.x = eb) + MetBrewer::scale_fill_met_d("Kandinsky") + MetBrewer::scale_colour_met_d("Kandinsky") + theme(axis.title.y = eb, legend.position = "none") + geom_vline(aes(xintercept = 6.5), linetype = "dotted", colour = "gray70", linewidth = 1.2) + annotate("text", label = "primary", y = 0.99, x = 3.5, colour = "gray40") + annotate("text", label = "secondary", y = 0.99, x = 9, colour = "gray40") ``` Along with other groups, the attendance of young carers improved into 2025. Note that some of the decline seen effect here may be a function of the cumulative nature of the data, which has no end dates attached, so our cohort of young carers is ageing in in the system ```{r} #| label: plot attendance by young carers over time #| fig-height: 3 ggplot(attend_yc_year |> filter(year > 2020), aes(x = year, y = mean.percent_present, colour = yc_flag, group = yc_flag, )) + geom_point() + geom_line() + geom_errorbar(aes(ymin = lower.ci.percent_present, ymax = upper.ci.percent_present), width = 0.2, alpha = 0.7)+ scale_y_continuous(labels = scales::percent) + labs(title = "Attendance of young carers and those without a caring role", subtitle = "average % of sessions attended since 2020; young carers data from Capita One involvements; 2025 data is part year", caption = "data from Capita One" )+ theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.title = eb, axis.line = eb, axis.ticks = eb) + MetBrewer::scale_fill_met_d("Kandinsky") + MetBrewer::scale_colour_met_d("Kandinsky") ``` ::: callout-tip ## recomendation Better long term data is required to understand volumes, impacts & the geographical distribution of young carers, as well as change over time and the provision of services to young carers. ::: # Trends by annual cohort He we show how attendance is changing for each annual year group cohort of children, and explore some of the intersectionality between age, deprivation and special educational needs. This analysis particularly demonstrates differences in how the COVID pandemic, lockdowns and subsequent societal shifts have affected different groups. Annual cohorts of children are referred to here as, for example, the "class of 2025" meaning the year group who began year 1 in September 2014 and will complete Y11 in July 2025. In each case there is a separate small line chart for each annual cohort. Data are labelled with the academic year and the % attendance rates, and the time period is divided into three phases: *pre* pandemic, *during* (2020 & 2021), and *post* pandemic - all years since. The time periods are denoted by colours or shapes, depending on the chart. The first chart shows the overall picture in secondary schools. The first cohort shown here is the class of 2020, who completed most of Y11 before the pandemic struck, their GCSE exams were wildly disrupted, but their attendance follows only a shallow decline from Y7 through to Y11, while the classes of '23 to '25 (on the middle row), saw dramatic drops during the COVID years, and a *continued decline* in the period since. The classes of '24 and '25 were perhaps worse hit by the pandemic, effectively missing Y6-7 and Y7-8 respectively. Finally, the bottom row shows the latest three cohorts and some small but encouraging signs of recovery: the class of '27 have less of a drop off to Y8, and the class of '28 had the best attendance in Y7 since before the pandemic. ```{r} #| label: plot attendance by annual cohort & ncy - secondary #| fig-height: 8 #| warning: false #| message: false annual_cohort_theme <- theme( legend.position = "top",legend.direction = "vertical",legend.box = "horizontal", legend.title = element_text(size = 8), legend.text = element_text(size = 8), axis.title = eb, strip.background = eb, axis.line.y = eb, axis.text.y = eb, axis.ticks.y = eb) attend |> mutate(covid_year_flag = case_when(year < 2020 ~ "pre-COVID", year == 2020 ~ "lockdown years", year == 2021 ~ "lockdown years", year > 2021 ~ "post-pandemic")) |> filter(cohort >= 2010, phase == "Secondary") |> summarise_attendance(grouping_vars = c("ncy","class_of","year","covid_year_flag","phase")) |> filter(ncy <= 11 & ncy >= 1, child_count > 1000 ) |> ggplot(aes(x = ncy, y = percent_present, colour = covid_year_flag, group = class_of )) + geom_point() + geom_line() + scale_colour_manual(values = c("pre-COVID" = "#4DAF4A" ,"lockdown years" = "#E41A1C", "post-pandemic" = "#377EB8")) + geom_text(aes(label = year), size = 2.5, nudge_y = -0.02, colour = "darkgrey") + geom_text(aes(label = scales::percent(percent_present, accuracy = 0.1L)), size = 2.5, nudge_y = 0.02, alpha = 0.7) + facet_wrap(vars(class_of)) + annual_cohort_theme + theme(axis.title.x = element_text()) + labs(title = "Secondary school attendance by national curriculum year and annual cohort", subtitle = "% of available sessions attended; all Sheffield schools; pandemic years are 2020 and 2021", x = "NCY", colour = "pandemic time period", caption = "data from Capita One") + coord_cartesian(clip = "off") ``` The picture in primary schools looks very different. Children generally attend better in years 2 to 4 than they do in Y1, so the underlying profile is more of a hump than a steady decline seen in secondary. The pandemic had a less dramatic effect on primary age children, and the decline also persisted into the post-pandemic years for many cohorts. However the big difference here, and an encouraging sign for the future, is that all cohorts from the class of '29 onwards show improvements in recent years (here coloured blue), and that the youngest cohorts are showing the fastest improvements of all. ```{r} #| label: plot attendance by annual cohort & ncy - primary #| fig-height: 8 #| warning: false attend |> #select(-phase) |> #rename(phase = school_ed_phase_corrected) |> mutate(covid_year_flag = case_when(year < 2020 ~ "pre-COVID", year == 2020 ~ "lockdown years", year == 2021 ~ "lockdown years", year > 2021 ~ "post-pandemic")) |> mutate(covid_year_flag = fct_relevel(covid_year_flag, c("pre-COVID","lockdown years","post-pandemic"))) |> filter(cohort >= 2016, phase == "Primary") |> summarise_attendance(grouping_vars = c("ncy","class_of","year","covid_year_flag","phase")) |> filter(ncy <= 11 & ncy >= 1, child_count > 1000 ) |> ggplot(aes(x = ncy, y = percent_present, colour = covid_year_flag, group = class_of )) + geom_point() + geom_line() + scale_colour_manual(values = c("pre-COVID" = "#4DAF4A" ,"lockdown years" = "#E41A1C", "post-pandemic" = "#377EB8")) + geom_text(aes(label = year), size = 2.5, nudge_y = -0.005, colour = "darkgrey") + geom_text(aes(label = scales::percent(percent_present, accuracy = 0.1L)), size = 2.5, nudge_y = 0.005, alpha = 0.7) + scale_x_continuous(breaks = seq(1:6)) + coord_cartesian(expand = FALSE, clip = "off") + facet_wrap(vars(class_of))+ annual_cohort_theme + theme(axis.title.x = element_text()) + labs(title = "Primary school attendance by national curriculum year and annual cohort", subtitle = "% of available sessions attended; all Sheffield schools; pandemic years are 2020 and 2021", x = "NCY", colour = "time period: ", caption = "data from Capita One") + coord_cartesian(clip = "off") ``` Re-creating the same plot but split by deprivation quartile, it becomes clear how the effects of the pandemic were concentrated in the more deprived areas of the city. Here the middle two quartiles of deprivation have been removed, and the pairs of lines show the most and least deprived quartiles of the school population, according to the 2019 indices of multiple deprivation scores of their ward of residence. For all annual cohorts, the gap is stark, children living in more deprived areas were worse affected during the pandemic and have seen worse post-pandemic declines in attendance. If there is good news here, it is a narrowing of the gap in the latest Y7 intake. ```{r} #| label: plot attendance by deprivation, annual cohort & ncy - secondary #| fig-height: 8 #| warning: false plot_data <- attend |> left_join(stud_details_joined, by = "stud_id") |> filter(imd_quartile %in% c(1,4)) |> mutate(imd_quartile = case_when(imd_quartile == 1 ~ "least deprived", imd_quartile == 4 ~ "most deprived")) |> mutate(covid_year_flag = case_when(year < 2020 ~ "pre-COVID", year == 2020 ~ "lockdown years", year == 2021 ~ "lockdown years", year > 2021 ~ "post-pandemic")) |> mutate(covid_year_flag = fct_relevel(covid_year_flag, c("pre-COVID","lockdown years","post-pandemic"))) |> filter(cohort >= 2010, phase == "Secondary") |> ungroup() |> summarise_attendance(grouping_vars = c("ncy","class_of","year","covid_year_flag", "phase", "imd_quartile" )) |> filter(ncy <= 11 & ncy >= 1, child_count > 100 ) ggplot(plot_data, aes(x = ncy, y = percent_present, colour = imd_quartile, group = imd_quartile, label = year, shape = covid_year_flag)) + geom_point(size = 2.5) + scale_shape_manual(values = c("pre-COVID" = 1, "lockdown years" = 8, "post-pandemic" = 4)) + geom_line() + geom_text(data = plot_data |> filter(imd_quartile == 4), aes(label = year), size = 2.5, nudge_y = -0.02, colour = "darkgrey") + geom_text(aes(label = scales::percent(percent_present, accuracy = 0.1L)), size = 2.5, nudge_y = 0.02, alpha = 0.7) + scale_y_continuous(labels = scales::percent) + facet_wrap(vars(class_of))+ annual_cohort_theme + theme(strip.text = element_text(size = 7), legend.text = element_text(size = 7)) + labs(title = "Secondary school attendance over time by annual cohort, national curriculum year and deprivation quartile", subtitle = "% of available sessions attended, top & bottom 25% by 2019 Indices of Multiple Deprivation (IMD) score of ward of residence", caption = "data from Capita One", shape = "COVID time period", colour = "IMD quartile") + coord_cartesian(clip = "off") ``` Repeating the same deprivation analysis for primary, and again we see how the pandemic disproportionately affected children in more deprived areas, with steeper dropoffs during the lockdown years. But we can also see recovery after the pandemic, for all cohorts and with steeper rates of increase for children in more deprived areas - but the deprivation gap still remains. ```{r} #| label: plot attendance by deprivation, annual cohort & ncy - primary #| fig-height: 8 #| warning: false plot_data <- attend |> #select(-phase) |> left_join(stud_details_joined, by = "stud_id") |> filter(imd_quartile %in% c(1,4)) |> mutate(imd_quartile = case_when(imd_quartile == 1 ~ "least deprived", imd_quartile == 4 ~ "most deprived")) |> #rename(phase = school_ed_phase_corrected) |> mutate(covid_year_flag = case_when(year < 2020 ~ "pre-COVID", year == 2020 ~ "lockdown years", year == 2021 ~ "lockdown years", year > 2021 ~ "post-pandemic")) |> mutate(covid_year_flag = fct_relevel(covid_year_flag, c("pre-COVID","lockdown years","post-pandemic"))) |> filter(cohort >= 2016, phase == "Primary") |> ungroup() |> summarise_attendance(grouping_vars = c("ncy","class_of","year","covid_year_flag","phase","imd_quartile" )) |> filter(ncy <= 11 & ncy >= 1, child_count > 100 ) |> group_by(class_of) ggplot(plot_data, aes(x = ncy, y = percent_present, colour = imd_quartile, group = imd_quartile, label = year, shape = covid_year_flag)) + geom_point(size = 3) + scale_shape_manual(values = c("pre-COVID" = 1, "lockdown years" = 8, "post-pandemic" = 4)) + geom_line() + geom_text(data = plot_data |> filter(imd_quartile == 4), aes(label = year), size = 2.5, nudge_y = -0.01, colour = "darkgrey") + geom_text(aes(label = scales::percent(percent_present, accuracy = 0.1L)), size = 2.5, nudge_y = 0.01, alpha = 0.7) + scale_y_continuous(labels = scales::percent) + scale_x_continuous(breaks = seq(1,6))+ facet_wrap(vars(class_of))+ annual_cohort_theme + theme(axis.title.x = element_text()) + labs(title = "Primary school attendance over time by annual cohort, national curriculum year and deprivation quartile", subtitle = "% of available sessions attended, top & bottom 25% by 2019 Indices of Multiple Deprivation (IMD) score of ward of residence", caption = "data from Capita One", shape = "COVID time period", colour = "IMD quartile") + coord_cartesian(clip = "off") ``` ```{r} #| label: plot attendance by cohort & SEN level - secondary #| fig-height: 8 #| warning: false #| message: false plot_data <- attend |> mutate(covid_year_flag = case_when(year < 2020 ~ "pre-COVID", year == 2020 ~ "lockdown years", year == 2021 ~ "lockdown years", year > 2021 ~ "post-pandemic")) |> mutate(covid_year_flag = fct_relevel(covid_year_flag, c("pre-COVID","lockdown years","post-pandemic"))) |> filter(cohort >= 2010, phase == "Secondary", ncy >= 7, ncy <= 11) |> ungroup() |> summarise_attendance(grouping_vars = c("ncy","class_of","year","covid_year_flag","sen_level" )) |> filter(ncy <= 11, ncy >= 1) #group_by(class_of) # because children sometimes repeat years, there are some small groups who attended a given NCY in a different year to the rest of their cohort. Here we'll remove those: # find the top rows by child count top_rows = plot_data |> group_by(ncy, class_of, sen_level) |> slice_max(child_count) |> ungroup() # filter the plot data to remove smaller groups plot_data = plot_data |> inner_join(top_rows) |> filter(child_count > 100) # plot ggplot(plot_data, aes(x = ncy, y = percent_present, colour = sen_level, group = sen_level, label = year, shape = covid_year_flag)) + geom_point(size = 3) + scale_shape_manual(values = c("pre-COVID" = 1, "lockdown years" = 8, "post-pandemic" = 4)) + geom_line() + geom_text(data = plot_data |> filter(sen_level == "EHCP"), aes(label = year), size = 2.5, nudge_y = -0.05, colour = "darkgrey") + geom_text(aes(label = scales::percent(percent_present, accuracy = 0.1L)), size = 2.5, nudge_y = 0.01, alpha = 0.7) + scale_y_continuous(labels = scales::percent) + scale_x_continuous(breaks = seq(7,11))+ facet_wrap(vars(class_of))+ annual_cohort_theme + theme(axis.title.x = element_text()) + labs(title = "Secondary school attendance over time by annual cohort, national curriculum year, and SEN level", subtitle = "% of available sessions attended, top & bottom 25% by 2019 Indices of Multiple Deprivation (IMD) score of ward of residence", caption = "data from Capita One", shape = "COVID time period", colour = "SEN level") ``` ```{r} #| label: plot attendance by cohort & SEN level - primary #| fig-height: 8 #| warning: false #| message: false plot_data <- attend |> mutate(covid_year_flag = case_when(year < 2020 ~ "pre-COVID", year == 2020 ~ "lockdown years", year == 2021 ~ "lockdown years", year > 2021 ~ "post-pandemic")) |> mutate(covid_year_flag = fct_relevel(covid_year_flag, c("pre-COVID","lockdown years","post-pandemic"))) |> filter(cohort >= 2016, phase == "Primary") |> ungroup() |> summarise_attendance(grouping_vars = c("ncy","class_of","year","covid_year_flag","phase","sen_level" )) |> filter(ncy <= 11 & ncy >= 1) |> group_by(class_of) # because children sometimes repeat years, there are some small groups who attended a given NCY in a different year to the rest of their cohort. Here we'll remove those: # find the top rows by child count top_rows = plot_data |> group_by(ncy, class_of, sen_level) |> slice_max(child_count) |> ungroup() # filter the plot data to remove smaller groups plot_data = plot_data |> inner_join(top_rows) |> filter(child_count > 100) ggplot(plot_data, aes(x = ncy, y = percent_present, colour = sen_level, group = sen_level, label = year, shape = covid_year_flag)) + geom_point(size = 2) + scale_shape_manual(values = c("pre-COVID" = 1, "lockdown years" = 8, "post-pandemic" = 4)) + geom_line() + #geom_text(data = plot_data |> filter(imd_quartile == 4), # aes(label = year), size = 2.5, nudge_y = -0.01, colour = "darkgrey") + geom_text(aes(label = scales::percent(percent_present, accuracy = 0.1L)), size = 2.5, nudge_y = 0.01, alpha = 0.7) + scale_y_continuous(labels = scales::percent) + scale_x_continuous(breaks = seq(1,6))+ facet_wrap(vars(class_of)) + annual_cohort_theme + theme(axis.title.x = element_text()) + labs(title = "Primary school attendance over time by national curriculum year, and SEN level", subtitle = "% of available sessions attended, top & bottom 25% by 2019 Indices of Multiple Deprivation (IMD) score of ward of residence", caption = "data from Capita One", shape = "COVID time period", colour = "SEN level") ``` # Severe absences Children are classed as severely absent if they miss over 50% of available sessions in any given period. This section explores the characteristics of severely absent children, and how this is changing over time. ::: callout-important Almost 1 in 20 children at Sheffield secondary schools was severely absent in 2023. ::: Severe absences in secondary schools appear to have peaked in 2024. ```{r} #| label: plot severe absences by year and phase #| fig-height: 4 ggplot(attend_year_phase |> filter(phase %in% c("Primary","Secondary"), year >= 2018) |> mutate(grey_flag = if_else(year==2025,0,1)) , aes(x = year, y = pc_of_pupils_severely_absent, alpha = grey_flag, colour = phase)) + geom_point() + geom_line(linetype = "dashed", alpha = 0.5) + geom_text(aes(label = year), size = 3, vjust = 1.5) + barplottheme_minimal + scale_y_continuous(labels = scales::percent) + labs(title = "Severe absence by academic year in primary and secondary schools", subtitle = "percentage of pupils missing over half of available sessions; 2025 is part year", caption = "data from Capita One") + annotate("text", x = 2020.3, y = 0.045, label = "COVID-19", size = 3, hjust = 1.1) + geom_vline(xintercept = 2020.3, linetype = "longdash", colour = "light gray") + theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.text.x = eb, axis.title.x = eb, axis.title.y = eb) + MetBrewer::scale_fill_met_d("Egypt") ``` Next we look at the severe attendance rates of groups with different characteristics in 2023-24. The groupings here are chosen as those that show significant differences in severe absence rates. Note that the characteristics given here are not mutually exclusive. Children with an EHCP plan were nearly 8% more likely to be severely absent than average. Children in Y11 have twice the average rate. All primary years, and a few ethnic groups have significantly lower severe absence rates. ```{r severe absence characteristics} sev_pc_all <- attend_stud_year |> filter(year == 2024) |> group_by(severe_absence) |> tally() |> mutate(pc = n / sum(n), category = "all children") |> filter(severe_absence == 1) |> select(pc) |> pull() sev_pc_eth_cat <- attend_stud_year_ethcat |> filter(year == 2024) |> group_by(ethnicity_category, severe_absence) |> tally() |> mutate(pc = n / sum(n)) |> mutate(category = str_c("ethnicity category ",ethnicity_category)) |> ungroup() |> select(-ethnicity_category) |> filter(severe_absence == 1) sev_pc_eth_cat_2 <- attend |> filter(year == 2024) |> summarise_attendance(grouping_vars = "ethnicity_category") sev_pc_gender <- attend_stud_year_gender |> filter(year == 2024) |> group_by(gender, severe_absence) |> tally() |> mutate(pc = n / sum(n)) |> mutate(category = str_c("gender ",gender)) |> ungroup() |> select(-gender) |> filter(severe_absence == 1) sev_pc_ncy <- attend_stud_year_ncy |> filter(year == 2024) |> group_by(ncy, severe_absence) |> tally() |> mutate(pc = n / sum(n)) |> mutate(category = str_c("ncy ",ncy)) |> ungroup() |> select(-ncy) |> filter(severe_absence == 1) sev_pc_fsm <- attend_stud_year_fsm |> filter(year == 2024) |> group_by(fsm, severe_absence) |> tally() |> mutate(pc = n / sum(n)) |> mutate(category = if_else(fsm == 1, "free school meals", "not on free school meals")) |> ungroup() |> select(-fsm) |> filter(severe_absence == 1) sev_pc_sen_level <- attend_stud_year_sen_level |> filter(year == 2024) |> group_by(sen_level, severe_absence) |> tally() |> mutate(pc = n / sum(n)) |> mutate(category = str_c("SEN level - ",sen_level)) |> ungroup() |> select(-sen_level) |> filter(severe_absence == 1) sev_plot_data <- rbind( #sev_pc_all, sev_pc_eth_cat, #sev_pc_gender, sev_pc_ncy, sev_pc_fsm, sev_pc_sen_level ) |> filter(!is.na(category)) ggplot(sev_plot_data, aes(x = reorder(category,pc), y = pc, fill = pc)) + geom_col() + geom_text(aes(label = scales::percent(pc, accuracy = 1.1L)), size = 2.5, colour = "darkgrey", nudge_y = 0.004) + scale_y_continuous(labels = scales::percent) + theme(axis.title = eb, legend.position = "none", axis.text.y = element_text(size = 7.5)) + labs(title = "Severe absence rates by selected pupil characteristics", subtitle = "% of children in each group attending less than 50% of available sessions in 2024") + geom_hline(aes(yintercept = sev_pc_all), linetype = "dotted") + geom_text(label = str_c("all pupils ",scales::percent(sev_pc_all, accuracy = 1.1L)), x = 4.5, y = 0.045, size = 3, colour = "dark gray") + coord_flip() + scale_fill_distiller(palette = "Spectral") ``` The chart above shows relative severe absence *rates* of different groups, but we'll complement that by quantifying the cohort of severely absent pupils in 2023 by their characteristics. ```{r} #| label: get severe absence 2023 characteristics data sa_2023 <- attend |> filter(year == 2023, ncy >= 1, ncy <= 11, severe_absence == 1 ) |> left_join(stud_details_joined |> select(stud_id, imd_quartile), by = "stud_id") |> select(stud_id, gender, ncy, imd_quartile, primary_specific_need) |> group_by(stud_id) |> slice(1) ``` ```{r} #| label: waffle plot severe absence by ncy sa_2023 |> mutate(ncy = factor(ncy, levels = c(1,2,3,4,5,6,7,8,9,10,11))) |> group_by(ncy) |> count() |> ggplot(aes(fill = ncy, values = n)) + expand_limits(x=c(0,0), y=c(0,0)) + coord_equal() + labs( title = "Severely absent children in Sheffield, by national curriculum year", subtitle = "Pupils missing over 50% of sessions in 2022-23", fill = NULL, colour = NULL) + #theme_ipsum_rc(grid="") + theme_enhance_waffle() + #theme(axis.line = eb, axis.text = eb, axis.ticks = eb) + geom_waffle( size = 0.5, n_rows = 10, colour = "white", #radius = unit(1, "pt") flip = TRUE#, #make_proportional = TRUE ) + facet_grid(~ncy) + theme(axis.line = eb, axis.ticks = eb, axis.text = eb, legend.position = "none") ``` ```{r} #| label: waffle plot severe absence by imd quartile sa_2023 |> filter(!is.na(imd_quartile)) |> mutate(imd_quartile = factor(imd_quartile, levels = c(1,2,3,4))) |> group_by(imd_quartile) |> count() |> ggplot(aes(fill = imd_quartile, values = n)) + expand_limits(x=c(0,0), y=c(0,0)) + coord_equal() + labs( title = "Severely absent children in Sheffield, by deprivation quartile", subtitle = "Pupils missing over 50% of sessions in 2022-23", fill = NULL, colour = NULL) + #theme_ipsum_rc(grid="") + theme_enhance_waffle() + #theme(axis.line = eb, axis.text = eb, axis.ticks = eb) + geom_waffle( size = 0.5, n_rows = 40, colour = "white", #radius = unit(1, "pt") flip = TRUE#, #make_proportional = TRUE ) + geom_text(aes(x = c(1,2,3,4), y = (n / 40) + 2, label = n), nudge_x = 27, size = 2.5) + facet_grid(~imd_quartile) + theme(axis.line = eb, axis.ticks = eb, axis.text = eb, legend.position = "none") ``` ```{r} #| label: waffle plot severe absence by sen specific need sa_2023 |> mutate(primary_specific_need = replace_na(primary_specific_need, "No SEN")) |> mutate(primary_specific_need = factor(primary_specific_need)) |> group_by(primary_specific_need) |> tally() |> mutate(primary_specific_need = reorder(primary_specific_need,desc(n))) |> ggplot(aes(fill = primary_specific_need, area = n, label = paste(primary_specific_need,n, sep = "\n"))) + labs(title = "Severely absent children in Sheffield, by primary specific need", subtitle = "Pupils missing over 50% of sessions in 2022-23", fill = NULL, colour = NULL) + geom_treemap() + geom_treemap_text(place = "centre", size = 8, force.print.labels = TRUE, reflow = TRUE) + theme(legend.position = "none") ``` ## Severe absence - turnover and retention It seems likely that there are children for whom severe absence is for some reason a persistent behaviour, and children for whom a severe absence happens in one or more years for some specific reason - like a crisis of health or personal circumstances. To try to understand this, we looked at year on year turnover and retention in the cohort of severely absent children.In the chart below, severely absent children are classed as *retained* if they were also severely absent the year before, and *new* if not. Both categories have risen in recent years: ```{r} #| label: plot severely absent children by retention category #| warning: false #| message: false #| fig-height: 3.5 sa <- attend_stud_year |> left_join(stud_details_joined) |> #might want this but not yet left_join(attend |> select(stud_id, year, school_ed_phase_corrected) |> distinct()) |> filter(severe_absence == 1, school_ed_phase_corrected == "Secondary") |> select(stud_id, year) |> mutate(sa = 1, prev_year = year - 1) sa_yoy <- sa |> left_join(sa |> select(-prev_year) |> rename(retained = sa), join_by(stud_id == stud_id, prev_year == year)) |> mutate( retained = if_else(is.na(retained),0,1), new = if_else(retained == 0,1,0) ) sa_yoy_crunched <- sa_yoy |> group_by(year) |> summarise(total = sum(sa), new = sum(new), retained = sum(retained), pc_retained = sum(retained) / sum(sa)) |> pivot_longer(cols = -year, names_to = "category", values_to = "value") |> filter(year > 2006) ggplot(sa_yoy_crunched |> filter(year >= 2018, category %in% c("new","retained")), aes(x = year, y = value, colour = category, group = category)) + geom_point() + geom_line() + labs(title = "Severely absent children: new in the year and retained from the previous year", subtitle = "Secondary provision only; count of children attending less than 50% of available sessions; 2024 data excludes the summer term", caption = "data from Capita One") + theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.title = eb) + MetBrewer::scale_fill_met_d("Egypt") ``` So the problem of severe absence is, in part, due to a cohort we could describe as *chronically severely absent*. The retention rate here is calculated as the percentage of all severely absent pupils in a given year that were also severely absent the year before. In secondary schools, in 2023, this was around 40% of children who were severely absent in 2023 were also severely absent in 2022. This retention rate has risen in recent years: ```{r} #| label: plot severely absent retention rate over time #| fig-height: 3 ggplot(sa_yoy_crunched |> filter(year >= 2018, category == "pc_retained"), aes(x = year, y = value, label = scales::percent(value, accuracy = 1.1L))) + geom_point() + geom_line(linetype = "dotted") + geom_text(size = 2.5, nudge_y = -0.02) + #geom_text(aes(label = scales::percent(pc_retained, accuracy = 1.1L, size = 3))) + #geom_col(position = position_stack()) + labs(title = "Year on year severe absence retention rate (secondary)", subtitle = "% of severely absent children who were severely absent in the previous year", caption = "data from Capita One") + theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.title = eb, axis.line.y = eb, axis.text.y = eb, axis.ticks.y = eb) + MetBrewer::scale_fill_met_d("Egypt") ``` Plotting the retention rate by NCY shows increased year on year retention as children grow older. Here we've included the NCY profiles of two years: 2018 and 2024, showing the increased retention rates across the board into 2024. ```{r} #| label: plot sa retention rate by NCY sa_ncy <- attend_stud_year_ncy |> left_join(stud_details_joined) |> #might want this but not yet left_join(attend |> select(stud_id, year, school_ed_phase_corrected) |> distinct()) |> filter(severe_absence == 1, #school_ed_phase_corrected == "Secondary", ncy >= 6, ncy <= 11) |> select(stud_id, year, ncy) |> mutate(sa = 1, prev_year = year - 1) sa_yoy <- sa_ncy |> left_join(sa_ncy |> select(-prev_year, -ncy) |> rename(retained = sa), join_by(stud_id == stud_id, prev_year == year)) |> mutate( retained = if_else(is.na(retained),0,1), new = if_else(retained == 0,1,0) ) |> filter(ncy >= 7) sa_yoy_ncy_crunched <- sa_yoy |> filter(year %in% c(2018, #2019, #2020, #2021, #2022, #2023, 2024)) |> group_by(year, ncy) |> summarise(total = sum(sa), new = sum(new), retained = sum(retained), pc_retained = sum(retained) / sum(sa)) |> #pivot_longer(cols = c(-year,-ncy), # names_to = "category", # values_to = "value") |> mutate(year = factor(year)) |> mutate(label = if_else(ncy == max(ncy), year, NA_character_)) ggplot(sa_yoy_ncy_crunched,#|> filter(category == "pc_retained"), aes(x = ncy, y = pc_retained, colour = year, group = year, label = label)) + geom_point() + geom_line() + geom_label_repel() + scale_y_continuous(labels = scales::percent)+ labs(title = "Severely absent children - year on year retention rate by NCY", subtitle = "Secondary schools only; of children severely absent for the year, the % who were also severely absent the previous year", caption = "data from Capita One") + theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.title = eb) ``` # Daily attendance patterns The analysis so far in this report has used data aggregated up to the half term or annual level. During the course of this project we processed the raw daily data (recorded as a string of symbols and codes) to allow analysis of attendance at the level of the individual day. ## Week day Fridays,(to a lesser extent Mondays) see significantly lower attendance than the other days of the week. ```{r} load(file = "S:/Public Health/Policy Performance Communications/Business Intelligence/Projects/EIP/data/inclusion/attendance_inclusion_day_level_data_model_part_1.RData") load(file = "S:/Public Health/Policy Performance Communications/Business Intelligence/Projects/EIP/data/inclusion/attendance_inclusion_day_level_data_model_part_2.RData") ``` ```{r} #| label: plot attendance by week day ggplot(attend_weekday_phase |> filter(!week_day %in% c("Sat","Sun")), aes(x = week_day, y = percent_present, #fill = phase, label = scales::percent(percent_present, accuracy = 0.1L)) ) + geom_col(fill = "steel blue", position = "dodge") + geom_text(position = position_dodge(0.9), colour = "white", size = 3, vjust = 1.5, fontface = "bold")+ scale_y_continuous(labels = scales::percent) + labs(title = "Attendance by day of the week", subtitle = "percentage of sessions attended since 2016; all Sheffield pupils", caption = "data from Capita One") + theme(plot.title = element_markdown(size = 12), legend.position = "none", axis.title = eb, strip.background = eb, axis.text.y = eb) + facet_grid(cols = vars(phase)) + barplottheme_minimal ``` Looking at a time series, we see that Friday's lower attendance is nothing new, and the gap has not really changed over time: ```{r} ggplot(attend_weekday_phase_year |> filter(year <= 2024) |> mutate(label = if_else(year == max(year), week_day, NA_character_)), aes(x = year, y = percent_present, colour = week_day, group = week_day, label = label)) + geom_point() + geom_line() + geom_label_repel(aes(x = 2023.5), size = 2, vjust = 1, min.segment.length = Inf) + facet_wrap(vars(phase), scales = "free_y", nrow = 2) + scale_x_continuous(limits = c(2018,2025), breaks = seq(2018,2024)) + theme(legend.position = "none", axis.title = eb, strip.background = eb, strip.placement = "top")+ labs(title = "Attendance by weekday & year") ``` ## School attendance across the year The day level data allows us to visualise an entire school year. Here we see how key points in the year and particular dates impact on school attendance. When the data are aggregated to the term level, there is very little seasonal variation, but differences at the day level are more dramatic than the differences we see between demographic groups. In particular, we can see the impacts of: - the first and last days of term - a growing absence rates up towards Christmas - a wave of teachers' strikes - heavy snowfall in March - Eid - the days immediately after bank holidays - study leave - increasing absence through the final summer term ```{r} #| label: plot 2023 full year #| message: false #| warning: false #| fig-height: 8 #| fig-width: 10 ggplot(day_2023 |> filter(half_term != 0, date != as_date("2023-05-01"), !is.na(year), total_sessions > 1000) |> mutate(label = case_when( date == as_date("2022-12-12") ~ "week before Christmas", date == as_date("2023-03-10") ~ "heavy snowfall", date == as_date("2023-02-01") ~ "teachers strike", date == as_date("2023-02-28") ~ "teachers strike", date == as_date("2023-03-16") ~ "teachers strikes", date == as_date("2023-04-21") ~ "Eid al-Fitr", date == as_date("2023-06-21") ~ "study leave", date == as_date("2023-06-28") ~ "Eid al-Adha", date == as_date("2023-07-21") ~ "end of term", TRUE ~ NA_character_)), aes(x = date, y = 1 - percent_present, fill = 1 - percent_present, label = label )) + geom_col() + geom_text_repel(fontface = "italic", nudge_x = -1, size = 3, nudge_y = 0.02, colour = "gray40") + scale_y_continuous(labels = scales::percent) + theme(legend.position = "none", strip.background = eb, axis.title = eb, axis.text.x = element_text(size = 6.5), axis.line = eb, axis.ticks = eb, strip.text = element_text(size = 12)) + scale_x_date(date_labels = "%d-%b-%y") + scale_fill_viridis_c(option = "viridis", direction = -1) + facet_wrap(vars(half_term_name), scales = "free_x") + labs(title = "School absence in Sheffield Schools - a full academic year - 2022/23", subtitle = "each bar = 1 day; % of available sessions attended; all schools & all pupils") ``` Here is the same chart for the 2023-24 year: ```{r} #| label: plot 2024 full year #| message: false #| warning: false #| fig-height: 8 #| fig-width: 10 ggplot(day_2024 |> filter(half_term != 0, #date != as_date("2023-05-01"), !is.na(year), total_sessions > 1000, half_term_name != "summer holiday") |> mutate(label = case_when( date == as_date("2023-12-18") ~ "week before Christmas", #date == as_date("2023-03-10") ~ "heavy snowfall", #date == as_date("2023-02-01") ~ "teachers strike", #date == as_date("2023-04-21") ~ "Eid al-Fitr", date == as_date("2024-06-21") ~ "study leave", date == as_date("2024-06-17") ~ "Eid al-Adha", date == as_date("2024-07-19") ~ "end of term", TRUE ~ NA_character_)), aes(x = date, y = 1 - percent_present, fill = 1 - percent_present, label = label )) + geom_col() + geom_text_repel(fontface = "italic", nudge_x = 2, size = 3, nudge_y = 0.02, colour = "gray40") + scale_y_continuous(labels = scales::percent) + theme(legend.position = "none", strip.background = eb, axis.title = eb, axis.text.x = element_text(size = 6.5), axis.line = eb, axis.ticks = eb, strip.text = element_text(size = 12)) + scale_x_date(date_labels = "%d-%b-%y") + scale_fill_viridis_c(option = "viridis", direction = -1) + facet_wrap(vars(half_term_name), scales = "free_x") + labs(title = "School absence in Sheffield Schools - a full academic year - 2023/24", subtitle = "each bar = 1 day; % of available sessions attended; all schools & all pupils") ``` Recreating the same plot for absences coded as *illness* (though this time showing the count of sick days rather than the % of available sessions) shows how rates increased dramatically through the run up to Christmas, peaks on Fridays (and to a lesser extent Mondays) throughout the year, and a significantly lower rate in the summer. There are also spikes in illness on the last day of each half term (except the summer). This is the plot for 2024 but the pattern is very similar in other years. ```{r} #| label: plot 2024 full year illness #| fig-height: 8 #| fig-width: 10 ggplot(attend_daily |> filter(year == 2024, time_category == "term time", half_term_name != "summer holiday"), aes(x = date, y = illness, fill = illness)) + geom_col() + theme(legend.position = "none", strip.background = eb, axis.title = eb, axis.text.x = element_text(angle = 90)) + scale_x_date(date_labels = "%d-%b") + scale_fill_viridis_c(option= "mako",direction = -1) + facet_wrap(vars(half_term_name), scales = "free_x") + labs(title = "Daily illness in Sheffield Schools - 2023/24", subtitle = "Each bar = 1 day; count of sessions marked code I; all schools & all pupils") ``` The day level *no reason* plot shows a similar shape to the illness plot. We could read this as suggesting that at least some of the *no reason* absences are explained by genuine sickness. Although the major spikes here on the last days of term may be due to unrecorded family holidays or other absences. It's worth comparing the 2023 and 2024 plots for *no reason* absences. As well as reduced levels of no reason absences throughout the year, 2024 sees much less seasonal variation - such as the steady build up to Christmas - although the end of term spikes are more pronounced. ```{r} #| label: plot 2023 full year no_reason #| fig-height: 8 #| fig-width: 10 ggplot(attend_daily |> filter(year == 2023, time_category == "term time", half_term_name != "summer holiday"), aes(x = date, y = no_reason, fill = no_reason)) + geom_col() + theme(legend.position = "none", strip.background = eb, axis.title = eb, axis.text.x = element_text(angle = 90)) + scale_x_date(date_labels = "%d-%b") + scale_fill_viridis_c(option= "magma", direction = -1) + facet_wrap(vars(half_term_name), scales = "free_x") + labs(title = "Absence with no recorded reason in Sheffield Schools - 2022/23", subtitle = "Each bar = 1 day; count of sessions coded N or O; all schools & all pupils") ``` ```{r} #| label: plot 2024 full year no_reason #| fig-height: 8 #| fig-width: 10 ggplot(attend_daily |> filter(year == 2024, time_category == "term time", half_term_name != "summer holiday"), aes(x = date, y = no_reason, fill = no_reason)) + geom_col() + theme(legend.position = "none", strip.background = eb, axis.title = eb, axis.text.x = element_text(angle = 90)) + scale_x_date(date_labels = "%d-%b") + scale_fill_viridis_c(option= "magma", direction = -1) + facet_wrap(vars(half_term_name), scales = "free_x") + labs(title = "Absence with no recorded reason in Sheffield Schools - 2023/24", subtitle = "Each bar = 1 day; count of sessions coded N or O; all schools & all pupils") ``` # Conclusion School attendance is affected by a multitude of factors: age, economic deprivation, special educational needs, caring responsibilities, the culture of individual schools, the attitude of families and ultimately the children themselves. Factors associated with lower attendance are intersectional and compound each other. The pandemic dominates the recent history of school attendance (and much else besides). COVID-19 lockdowns, social distancing and school closures were all surely transformative in cultural attitudes to school attendance, and the impacts were felt differently in different places. However, it would be a mistake to place too much emphasis on COVID-19 alone - deprivation & the cost of living; the rise of smartphones and social media; changes around special educational needs (both prevalences and attitudes) - these are all surely factors, many of which will have influenced one-another. Much of this is not recorded in the available data, and the interactions between these forces will be complex. The good news is that despite the widespread risk factors identified here and despite recent social and cultural shifts, school attendance is recovering. Encouragingly, this recovery is strongest among the youngest cohorts of children. Recent changes to recording and the rules appear to be having an impact, but most inequalities persist, and some continue to widen. The coming years will tell if school attendance can recover to levels seen before the pandemic, and if the most vulnerable children can be helped to attend school as well as their peers. This report is one of several produced under the inclusion & attendance data science project - there are also dedicted reports around Special Educational Needs (strategic needs analysis), the impact and effectiveness of services & interventions, and attendance by early years foundation stage attainment. Please refer to the links at the top of the [SCC Data Science site](https://scc-data-science.sheffield.gov.uk/) for links to these. If you have further questions about the data, analysis and narrative in this report please contact the Sheffield City Council Performance & Insight Team, or email [giles.robinson@sheffield.gov.uk](mailto:giles.robinson@sheffield.gov.uk)