TidyData: Columns & Direct#
This section gives the first simple examples of how we create TidyData from our selections.
Source Data#
The data source we’re using for these examples is shown below:
Note - this particular table has some very verbose headers we don’t care about, so we’ll be using |
---|
The full data source can be downloaded here. We’ll be using th 8th tab named “Table 3a”.
from typing import List
from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 3a")
preview(table, bounded="A4:H10")
Table 3a
A | B | C | D | E | F | G | H | |
4 | Percentage change period on period | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Annual dataset identifier code | MV5H | MV5I | MVM3 | MV5J | MV5K | MV5L | MV5M |
7 | Quarterly dataset identifier code | MV54 | MV55 | MVM7 | MV56 | MV57 | MV58 | MV59 |
8 | Monthly dataset identifier code | MV4O | MV4P | MVM2 | MV4Q | MV4R | MV4S | MV4T |
9 | 1998 | -19 | 1 | -1.2 | -2.8 | 5.4 | 1.9 | 8.4 |
10 | 1999 | -13.3 | -10 | -10.3 | -2.8 | 12.6 | 3.2 | 12 |
TidyData & Columns#
This example introduces two new classes.
TidyData
is a class representing tidy data created from the selections.Column
represents a single column of data within this TidyData.
Note - for this section we’re going to use only a small sample of the potential observations we could select. This will look a little odd in the previews but is necessary to restrict the size of the TidyData previews to something practical for the context of this documentation. |
---|
Example 1: Time Periods#
The simplest example where we just extract the observations and time period.
A new and critical part will be the following:
tidy_data = TidyData(
observations,
Column(period.finds_observations_directly(right))
)
Which specifies the visual relationship between the “period” selection of cells and the “observations” selection of cells is… “observations are directly to the right of period”
from tidychef import acquire, preview
from tidychef.direction import right
from tidychef.selection import XlsxSelectable
from tidychef.output import Column, TidyData
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 3a")
# Create our selections
# Note, we're not taking all observations as we want to keep the
# output suitably small for this example.
observations = table.excel_ref("B9:C18").label_as("Value")
period = table.excel_ref("A9:A18").label_as("Period")
# First preview the selections
preview(observations, period, bounded="A4:H18")
# Then create some simple TidyData
tidy_data = TidyData(
observations,
Column(period.attach_directly(right))
)
# And view it
print(tidy_data)
Value |
Period |
Table 3a
A | B | C | D | E | F | G | H | |
4 | Percentage change period on period | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Annual dataset identifier code | MV5H | MV5I | MVM3 | MV5J | MV5K | MV5L | MV5M |
7 | Quarterly dataset identifier code | MV54 | MV55 | MVM7 | MV56 | MV57 | MV58 | MV59 |
8 | Monthly dataset identifier code | MV4O | MV4P | MVM2 | MV4Q | MV4R | MV4S | MV4T |
9 | 1998 | -19 | 1 | -1.2 | -2.8 | 5.4 | 1.9 | 8.4 |
10 | 1999 | -13.3 | -10 | -10.3 | -2.8 | 12.6 | 3.2 | 12 |
11 | 2000 | 25.5 | 11.7 | 12.9 | -6.2 | -5.3 | -10.7 | 0.8 |
12 | 2001 | 2.3 | -6.7 | -5.8 | 7.1 | 0.9 | 2.2 | -0.7 |
13 | 2002 | 13.2 | 9 | 9.4 | 13.1 | 26.5 | -20.7 | 3.3 |
14 | 2003 | 13.8 | 24.8 | 23.6 | -5.7 | 25.5 | 5.6 | -3.6 |
15 | 2004 | 20.1 | 21.5 | 21.4 | -12.7 | 12.3 | 3 | 10.3 |
16 | 2005 | -5.9 | 2.9 | 2 | -4.1 | -10 | -2 | -4.3 |
17 | 2006 | 17.9 | 0.3 | 1.9 | -7.8 | -8.1 | 8.4 | 8.5 |
18 | 2007 | 15.5 | -1.6 | 0.2 | -1.4 | -1.7 | -2.4 | 10.1 |
Value | Period |
-19 | 1998 |
1 | 1998 |
-13.3 | 1999 |
-10 | 1999 |
25.5 | 2000 |
11.7 | 2000 |
2.3 | 2001 |
-6.7 | 2001 |
13.2 | 2002 |
9 | 2002 |
13.8 | 2003 |
24.8 | 2003 |
20.1 | 2004 |
21.5 | 2004 |
-5.9 | 2005 |
2.9 | 2005 |
17.9 | 2006 |
0.3 | 2006 |
15.5 | 2007 |
-1.6 | 2007 |
Example 2: Tidy Data#
Now the above example doesn’t accomplish a lot so lets expand our code a little more to add in some more columns.
If you’ve been working the documentation so far you should be able to make sense of the following - there’s nothing new here.
Do take your time and go line by line to make sure you understand this, it’s probably the most key example in this documentation.
from tidychef import acquire, preview
from tidychef.direction import right, down
from tidychef.selection import XlsxSelectable
from tidychef.output import Column, TidyData
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 3a")
# Create our selections
observations = table.excel_ref("B9:C18").label_as("Value")
period = table.excel_ref("A9:A18").label_as("Period")
housing = table.excel_ref("B5").expand(right).label_as("Housing")
annual_dataset_code = housing.shift(down).label_as("Annual Dataset Identifier")
quarterly_dataset_code = annual_dataset_code.shift(down).label_as("Quarterly Dataset Identifier")
monthly_dataset_code = quarterly_dataset_code.shift(down).label_as("Monthly Dataset Identifier")
# First preview the selections
preview(observations, housing, annual_dataset_code, quarterly_dataset_code, monthly_dataset_code, period, bounded="A4:H18")
# Then create some simple TidyData
tidy_data = TidyData(
observations,
Column(period.attach_directly(right)),
Column(housing.attach_directly(down)),
Column(annual_dataset_code.attach_directly(down)),
Column(quarterly_dataset_code.attach_directly(down)),
Column(monthly_dataset_code.attach_directly(down))
)
# And view it
print(tidy_data)
Value |
Housing |
Annual Dataset Identifier |
Quarterly Dataset Identifier |
Monthly Dataset Identifier |
Period |
Table 3a
A | B | C | D | E | F | G | H | |
4 | Percentage change period on period | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Annual dataset identifier code | MV5H | MV5I | MVM3 | MV5J | MV5K | MV5L | MV5M |
7 | Quarterly dataset identifier code | MV54 | MV55 | MVM7 | MV56 | MV57 | MV58 | MV59 |
8 | Monthly dataset identifier code | MV4O | MV4P | MVM2 | MV4Q | MV4R | MV4S | MV4T |
9 | 1998 | -19 | 1 | -1.2 | -2.8 | 5.4 | 1.9 | 8.4 |
10 | 1999 | -13.3 | -10 | -10.3 | -2.8 | 12.6 | 3.2 | 12 |
11 | 2000 | 25.5 | 11.7 | 12.9 | -6.2 | -5.3 | -10.7 | 0.8 |
12 | 2001 | 2.3 | -6.7 | -5.8 | 7.1 | 0.9 | 2.2 | -0.7 |
13 | 2002 | 13.2 | 9 | 9.4 | 13.1 | 26.5 | -20.7 | 3.3 |
14 | 2003 | 13.8 | 24.8 | 23.6 | -5.7 | 25.5 | 5.6 | -3.6 |
15 | 2004 | 20.1 | 21.5 | 21.4 | -12.7 | 12.3 | 3 | 10.3 |
16 | 2005 | -5.9 | 2.9 | 2 | -4.1 | -10 | -2 | -4.3 |
17 | 2006 | 17.9 | 0.3 | 1.9 | -7.8 | -8.1 | 8.4 | 8.5 |
18 | 2007 | 15.5 | -1.6 | 0.2 | -1.4 | -1.7 | -2.4 | 10.1 |
Value | Period | Housing | Annual Dataset Identifier | Quarterly Dataset Identifier | Monthly Dataset Identifier |
-19 | 1998 | Public new housing | MV5H | MV54 | MV4O |
1 | 1998 | Private new housing | MV5I | MV55 | MV4P |
-13.3 | 1999 | Public new housing | MV5H | MV54 | MV4O |
-10 | 1999 | Private new housing | MV5I | MV55 | MV4P |
25.5 | 2000 | Public new housing | MV5H | MV54 | MV4O |
11.7 | 2000 | Private new housing | MV5I | MV55 | MV4P |
2.3 | 2001 | Public new housing | MV5H | MV54 | MV4O |
-6.7 | 2001 | Private new housing | MV5I | MV55 | MV4P |
13.2 | 2002 | Public new housing | MV5H | MV54 | MV4O |
9 | 2002 | Private new housing | MV5I | MV55 | MV4P |
13.8 | 2003 | Public new housing | MV5H | MV54 | MV4O |
24.8 | 2003 | Private new housing | MV5I | MV55 | MV4P |
20.1 | 2004 | Public new housing | MV5H | MV54 | MV4O |
21.5 | 2004 | Private new housing | MV5I | MV55 | MV4P |
-5.9 | 2005 | Public new housing | MV5H | MV54 | MV4O |
2.9 | 2005 | Private new housing | MV5I | MV55 | MV4P |
17.9 | 2006 | Public new housing | MV5H | MV54 | MV4O |
0.3 | 2006 | Private new housing | MV5I | MV55 | MV4P |
15.5 | 2007 | Public new housing | MV5H | MV54 | MV4O |
-1.6 | 2007 | Private new housing | MV5I | MV55 | MV4P |