Coronavirus: Loneliness

Coronavirus: Loneliness#

An irregular shaped spreadsheet dataset originally published by the UK Office for National Statistics.

Tutorial Structure#

With these example tutorials I’m going to comment heavily and cover nuances in a follow up section (with liberal targetted previews as needed) as it’s the easiest way to grapple with new ideas. It may also be worth opening up these notebooks yourself (they’re in ./jupyterbook in the tidychef github repo) so you can run, alter and generally have a play about with this yourself as part of your learning.

We’ll cover:

source data
requirements, what we’re aiming to do here
show the full script (all logic commented)
output the selection preview
nuances (where applicable)
view the output

This sequencing is necessary as the output for some of the example is really long so that necessitates it coming last. If you’re viewing this via a jupyter book (i.e on the site) you can navigate between the above sections via your right hand menu.

Note - these tutorial scripts might seem verbose due all the comments but that’s ok (this is a tutorial after all). If you take them out you end up with a fairly succinct and human readable encapsulation of what would otherwise (with existing tools) be a rather convoluted and fragile set of instructions to express.

In virtually all cases I’ll make heavy use of preview and bounded to only look at relevant parts of what can be quite large datasets. Downloads links are provided for the source data.

Source#

from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable

table: XlsxSelectable = acquire.xlsx.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/xlsx/coronavirusandlonelinessreferencetable.xlsx", tables="1.4 Loneliness by sex")
preview(table, bounded="A1:F24")

1.4 Loneliness by sex

	A	B	C	D	E	F
1	Proportion of people who feel lonely, by sex, Great Britain 3rd April to 3rd May 2020
2
3	Definitions: Feeling often/ always lonely Adults aged 16 and over across Great Britain were asked how often they felt lonely, where the options were "often/always" / "some of the time" / "occasionally" / "hardly ever" / "never". Those reporting feeling lonely often / always is used as the indicator. Feeling lonely in the past 7 days Adults aged 16 and over across Great Britain were asked "In the past seven days, how has your well-being being affected?". This indicator is measured by those then reporting "Feeling lonely".
4
5				Percentage
6		Male	Female	All People
7	Often/Always	5.1	4.9	5
8	Lower 95% Confidence Interval	3.7	3.9	4.1
9	Upper 95% Confidence Interval	6.6	6	5.9
10
11	Sample size (rounded)	2400	2860	5260
12				Percentage
13		Male	Female	All People
14	Felt lonely in past 7 days	29.8	31.8	30.9
15	Lower 95% Confidence Interval	25.8	28.5	28.4
16	Upper 95% Confidence Interval	33.8	35.1	33.5
17
18	Sample size (rounded)	1000	1450	2440
19	Source: Opinions and Lifestyle Survey, Office for National Statistics
20
21	Notes:
22	1. All respondents were asked "How often do you feel lonely?". Therefore the base population for those reporting often/ always lonely is those aged 16 and over in Great Britain (n= 5,260)
23	2. Respondents were asked 'In the past seven days, how has your well-being being affected?' giving the option for noting 'Feeling lonely' if they have already noted that they were 'very worried' or 'somewhat worried' about the 'effect that Coronavirus (COVID-19) is having on their life right now?' and if they reported 'My well-being is being affected (for example, boredom, loneliness, anxiety and stress)' when asked "In which ways is Coronavirus (COVID-19) affecting your life?" (n= 2,440)
24

From an xlsx source which can be downloaded here.

Requirements#

We’ll take the priciple description of lonliness (Often/Always etc) as a “Lonliness” column.
We’ll take sex as a “Sex” column.
Sample size will be taken as a “Sample Size” column.
We’ll add a unit of measure so our data makes sense.

One the whole thiis is a particularly good example or a very irregularly presented table, in terms of readibility this was clearly very much constructed for a human not a machine to read.

from tidychef import acquire, preview
from tidychef.direction import right, down, left, up
from tidychef.output import TidyData, Column
from tidychef.selection import XlsxSelectable

# Get the data via the acquire api
table: XlsxSelectable = acquire.xlsx.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/xlsx/coronavirusandlonelinessreferencetable.xlsx", tables="1.4 Loneliness by sex")

# Lets find the unwanted footer information so we can dicount it where needed
# - (a) select the cell with the text "Source: " in it
# - (b) confirm there's only one of those
# - (c) expand to a box selection (think "hold clicking a cell and dragging down right" in excel)
unwanted_footer = table.cell_containing_string("Source: ", strict=False).expand_to_box()
table = table - unwanted_footer

# - (a) get the column with Male in it
# - (b) narrow it down to just specifc cells from that column with Male in them
# - (c) extend/extrude the selection to inlcude two cells to the right of each
sex = table.column_containing_strings(["Male"]).re("Male").extrude(right(2)).label_as("Sex")

# - (a) filter the data to cells with the string "Sample size" in them
# - (b) confirm there's just two of those
# - (c) select all cells located to the right of those two
# - (d) desregard any blank cells you just selected
sample_size = table.cells_containing_string("Sample size", strict=False).assert_len(2).fill(right).is_not_blank().label_as("Sample Size")

# For observations, we just want all numeric values that aren't the sample size
observations = table.is_numeric().label_as("Observations") - sample_size

# Loneliness is anything to the left of an observation cell thats not a number
loneliness = observations.fill(left).is_not_numeric()

# Create a bounded preview inline but also write the full preview to path
preview(observations, sex, sample_size, loneliness, bounded="A1:F20")
preview(observations, sex, sample_size, loneliness, path="loneliness.html")

# Construct the tidydata table
tidy_data = TidyData(
    observations, 
    Column(sex.attach_directly(down)),
    Column(sample_size.attach_directly(up)),
    Column(loneliness.attach_directly(right)),
    Column.constant("Unit of measure", "Percentage")
)

tidy_data.to_csv("loneliness.csv")

Observations

Sex

Sample Size

Observations

1.4 Loneliness by sex

	A	B	C	D	E	F
1	Proportion of people who feel lonely, by sex, Great Britain 3rd April to 3rd May 2020
2
3	Definitions: Feeling often/ always lonely Adults aged 16 and over across Great Britain were asked how often they felt lonely, where the options were "often/always" / "some of the time" / "occasionally" / "hardly ever" / "never". Those reporting feeling lonely often / always is used as the indicator. Feeling lonely in the past 7 days Adults aged 16 and over across Great Britain were asked "In the past seven days, how has your well-being being affected?". This indicator is measured by those then reporting "Feeling lonely".
4
5				Percentage
6		Male	Female	All People
7	Often/Always	5.1	4.9	5
8	Lower 95% Confidence Interval	3.7	3.9	4.1
9	Upper 95% Confidence Interval	6.6	6	5.9
10
11	Sample size (rounded)	2400	2860	5260
12				Percentage
13		Male	Female	All People
14	Felt lonely in past 7 days	29.8	31.8	30.9
15	Lower 95% Confidence Interval	25.8	28.5	28.4
16	Upper 95% Confidence Interval	33.8	35.1	33.5
17
18	Sample size (rounded)	1000	1450	2440
19	Source: Opinions and Lifestyle Survey, Office for National Statistics
20

Nuances#

Subtracting junk text#

There’s a nice example here if why tidychef supports the substraction of one selection from another. Human first tables have a nasty habit of including all sorts of ad hoc footers and side comments that don’t fit into any sort of regular structured data view of a resource.

unwanted_footer = table.filter(filters.contains_string("Source: ")).assert_one().expand_to_box()

which allows us to casually disregard the junk text later on as needed, i.e

loneliness = loneliness - unwanted_footer # clear it up a bit

Sex - why the double ‘Male’ selection?#

This probably seems odd at first. The key point here is re is slow.

So this would work:

sex = table.re("Male").extrude(right(2)).label_as("Sex")

but this is smarter

sex = table.column_containing_strings(["Male"]).re("Male").extrude(right(2)).label_as("Sex")

…in complete honestly, “slow” is relative and unless you’re running re against 10’s of thousands of cells it’s probably fine, even then its minutes not hours -neverthelss- it is a lot slower than the other standard methods so I wanted to make the point. It’s good practice to always aim to tighten your selection prior to re where it’s viable to.

Output#

The full preview can be viewed here.

The tidy data can be downloaded here and a full inline preview of the tidydata generated is shown below.

print(tidy_data)

Observations	Sex	Sample Size	Observations	Unit of measure
Often/Always	Male	2400	Often/Always	Percentage
Often/Always	Female	2860	Often/Always	Percentage
Often/Always	All People	5260	Often/Always	Percentage
Lower 95% Confidence Interval	Male	2400	Lower 95% Confidence Interval	Percentage
Lower 95% Confidence Interval	Female	2860	Lower 95% Confidence Interval	Percentage
Lower 95% Confidence Interval	All People	5260	Lower 95% Confidence Interval	Percentage
Upper 95% Confidence Interval	Male	2400	Upper 95% Confidence Interval	Percentage
Upper 95% Confidence Interval	Female	2860	Upper 95% Confidence Interval	Percentage
Upper 95% Confidence Interval	All People	5260	Upper 95% Confidence Interval	Percentage
Felt lonely in past 7 days	Male	1000	Felt lonely in past 7 days	Percentage
Felt lonely in past 7 days	Female	1450	Felt lonely in past 7 days	Percentage
Felt lonely in past 7 days	All People	2440	Felt lonely in past 7 days	Percentage
Lower 95% Confidence Interval	Male	1000	Lower 95% Confidence Interval	Percentage
Lower 95% Confidence Interval	Female	1450	Lower 95% Confidence Interval	Percentage
Lower 95% Confidence Interval	All People	2440	Lower 95% Confidence Interval	Percentage
Upper 95% Confidence Interval	Male	1000	Upper 95% Confidence Interval	Percentage
Upper 95% Confidence Interval	Female	1450	Upper 95% Confidence Interval	Percentage
Upper 95% Confidence Interval	All People	2440	Upper 95% Confidence Interval	Percentage

Coronavirus: Loneliness

Contents

Coronavirus: Loneliness#

Tutorial Structure#

Source#

1.4 Loneliness by sex

Requirements#

1.4 Loneliness by sex

Nuances#

Subtracting junk text#

Sex - why the double ‘Male’ selection?#

Output#