Census 2021 outputs: content design and release phase proposals

Closes 5 Oct 2021

Section 4: Proposals for feasibility work to derive new variables

In the previous section, we discussed the new and changed questions and shared our proposals for the new or changed classifications.

In this section, we consider how to combine existing data, often from multiple questions, in new ways to meet a wider range of current user needs. Each of our proposals below is at an early stage of development, and we aim to further understand user needs through this consultation. Following the consultation, we will use the feedback to prioritise the proposals where we have evidence of user need. However, the release of information remains dependent on research proving the resultant variable is of sufficient quality to meet data users’ needs.

Some proposals are about understanding the impact that the coronavirus pandemic has had on different communities and how we all live. Other proposals relate to alternative wider potential data needs, such as better understanding the student population. We’ve grouped these into three areas, which include:

  • education and employment
  • health and living arrangements
  • accommodation type and vacant addresses

Education and employment

We’re investigating seven new variables linked to education and employment.

Route to highest level of education

We’re aware that an important user need linked to education data is to better understand the qualifications route taken. For example, are the qualifications gained academic, vocational or a mixture of both?

In the 2011 Census, we produced a qualification gained variable. However, because of the complexity of this variable, its use was limited. For 2021, we’re investigating producing a consolidated variable that will allow users to interpret the information more easily.

Before developing a proposal for this classification, we need to further understand what information users need about the routes taken through education.


In the 2011 Census, we defined students as respondents in full-time education who were aged four years and over. For 2021, we’ve slightly raised the age limit to those who are aged five years and over.

We’re also proposing an additional variable that is more specific. This could be all students in full-time education who are aged 18 years and over. Alternatively, we could include only independent students. We would do this by defining them as students in full-time education who are aged 18 and over and not living with parents.

Not in employment, education or training (NEETs)

This indicator variable would highlight the population aged 16 to 24 years who are not in employment, education or training. As the Census 2021 questions are harmonised with the Labour Force Survey (LFS), we’d apply the same definitions.

Temporarily away from work

In 2011, the question on activity last week was primarily used to find out if someone was economically active. We’re exploring the creation of “Temporarily away from work ill, on holiday or temporarily laid off” as an indicator variable. This would also include those away from work because of furlough, self-isolation or quarantine.

We’re also investigating the production of an “On maternity or paternity leave” indicator.

Key or critical worker

This proposed indicator would show people whose occupation was critical to the response to the coronavirus pandemic, as defined by the UK government. We’d define this in line with our other publications on critical workers.

Skills mismatch

Skills mismatch is where a person’s educational level is significantly different to the average level of qualification within their occupation. For example, a person would be defined as under-employed in their role if both:

  • their highest level of education is degree level or above
  • the average qualification for their occupation is at level 3, for example an NVQ level 3 or one A level.

In 2011, we did not formally output this information, but we did publish reports around the topic. These include reports such as the Graduates’ labour market outcomes during the coronavirus (COVID-19) pandemic: occupational switches and skill mismatch. Some of these publications use the statistical methodology that the International Labour Organisation (ILO) uses. We’d derive this variable using highest level of qualification and occupation.

Economic risk created by the coronavirus pandemic indicator

The Business Impacts of Coronavirus (COVID) Survey has identified industries at most risk in the pandemic. The survey identifies different risks, such as risk of unemployment, for those working in different sectors.

This variable will apply the findings of that survey to census data, to identify populations at financial risk because of the pandemic. For example, a person working in hairdressing could be identified as being at financial risk because of the pandemic. It may be possible to produce a more detailed classification than this binary classification.

Health and living arrangements

We’re investigating four new variables for health and living arrangements.

COVID-19 health risk

We could potentially define an output based on the vaccination priority groups by using the self-reported general health status to identify at-risk groups. For example, previous research has shown that there’s a strong relationship between a response of “limited a lot” to the question on how conditions limit your activities, and people being clinically extremely vulnerable.

Houses in multiple occupation (HMO)

The government’s definition states that a dwelling is an HMO if at least three tenants live there, forming more than one household, and you share toilet, bathroom or kitchen facilities with other tenants. If a dwelling meets this definition and five or more tenants live there, then it is a classed as a large HMO. There are different legal obligations for landlords of HMOs and large HMOs.

In 2011, we didn’t output data meeting these definitions. For 2021, we’re investigating the need for outputs to fit this definition. Initial feasibility work suggests we may need to define an HMO as having three or more unrelated people live in the household. This would help ensure we do not count multigenerational households as HMOs.

Multigenerational households

A multigenerational household is defined as any household with more than two generations resident. For example, this could be when children of any age, a parent(s) and a grandparent(s) live together. This indicator would identify households that met that definition.

Living apart together

We’re investigating the feasibility of producing a variable indicating the population who live separately to their current partners. The populations we could identify are:

  • those who are married or in a civil partnership but not living with the person that relationship is with
  • those who are spending 30 or more days a year at a partner’s address, whether married or in a civil partnership or not

Accommodation types and vacant addresses

We’re investigating four new variables linked to accommodation type and vacant addresses.

Care home resident

There’s an existing variable on the type of communal establishment a person is resident in, but this will not provide targeted information on care home residents. This separate indicator would denote if a person was resident in a care home.

To complement this product, we’ve produced a new age classification. This classification has the following categories.

  • Aged 0 to 64 years
  • Aged 65 to 69 years
  • Aged 70 to 74 years
  • Aged 75 to 79 years
  • Aged 80 to 84 years
  • Aged 85 to 89 years
  • Aged 90 years and over

Type of vacant address

We have data from the census collection process that we could potentially use as an indication of whether an address is vacant, a holiday let or a second home. We could then check this against administrative data. We’re investigating if this is of sufficient quality to use in analysis of census counts of vacant addresses.

Resident in a mobile or temporary structure

This proposed indicator would show the population who live in a mobile or temporary structure. For example, this might include a boat or caravan. This could indicate a more transient population. We also have some operational data on the type of mobile or temporary accommodation people were usually resident in on Census Day. We’re investigating whether this is of sufficient quality to use in analytical products.

Homeless (including people sleeping rough and ‘sofa-surfers’)

There are many sorts of homelessness, and we’re developing variables to provide data on those we can. The communal establishment questionnaire contains the response option “hostel or temporary shelter for the homeless”, which may produce some data on 'rough sleeper' populations.

Local authorities and charities helped our field force to distribute census forms and access codes to the homeless population. They did this through night and day shelters to encourage this group to respond.

We also added a response option “Staying temporarily (no usual UK address)” to the individual census forms. This may allow analysis of people with no usual address staying in B&Bs and hotels at the time of the census.

It may also be possible to use the household questionnaire to identify households that have temporary residents with no other address. This would not be a count of individuals, only households, and we’re investigating the usefulness of this in analysis of ‘sofa surfers’ and other inadequately housed populations.

We’re investigating the quality of data collected on the homeless population through these response options. We’re also looking at how we can use administrative data collected as part of the pandemic to supplement census counts.

We’d like to know if you have analytical needs for any of the proposed new variables or indicators. For those you have needs for, we’d like to understand the level of detail you’d require and what definitions we’d ideally use in their creation.

We’d also like to know if you have any other needs for new derived variables using Census 2021 data.