Preview#
For these example we’re going to use table two of this sample xls file as show below. And use the bounded=
keyword to keep the previews small.
Note - |
---|
from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 1a")
preview(table, bounded="A1:G11")
Table 1a
A | B | C | D | E | F | G | |
1 | Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector | ||||||
2 | This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance. | ||||||
3 | Source: Construction Output and Employment from the Office for National Statistics | ||||||
4 | 2019=100 | ||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work |
6 | Dataset identifier code | MV36 | MV37 | MVL7 | MV38 | MV39 | MV3A |
7 | 1997 | 30.8 | 44.8 | 42.6 | 61.2 | 57.6 | 152.1 |
8 | 1998 | 24.9 | 45.3 | 42 | 59.5 | 60.7 | 155 |
9 | 1999 | 21.6 | 40.7 | 37.7 | 57.9 | 68.3 | 159.9 |
10 | 2000 | 27.1 | 45.5 | 42.6 | 54.3 | 64.7 | 142.7 |
11 | 2001 | 27.7 | 42.5 | 40.1 | 58.1 | 65.3 | 145.8 |
Selection & Preview#
We’re going to make use of two selection methods now as follows.
.excel_ref()
- use excel cel references to explicitly select a range of cells.label_as()
- give a user friendly label to a selection of cells
We’re going to start by making and previewing some simple selections with .excel_ref()
from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 1a")
# Create our selections
time = table.excel_ref("A7:A11")
housing = table.excel_ref("B5:G5")
data_identifier_code = table.excel_ref("B6:G6")
# Note on multiple selections
# - Any selections for previewing are just passed as positional arguments to preview().
# - You dont need to pass in a blank selection, that is only necessary where no selections have been made.
preview(time, housing, data_identifier_code, bounded="A1:G11")
Unnamed Selection: 0 |
Unnamed Selection: 1 |
Unnamed Selection: 2 |
Table 1a
A | B | C | D | E | F | G | |
1 | Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector | ||||||
2 | This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance. | ||||||
3 | Source: Construction Output and Employment from the Office for National Statistics | ||||||
4 | 2019=100 | ||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work |
6 | Dataset identifier code | MV36 | MV37 | MVL7 | MV38 | MV39 | MV3A |
7 | 1997 | 30.8 | 44.8 | 42.6 | 61.2 | 57.6 | 152.1 |
8 | 1998 | 24.9 | 45.3 | 42 | 59.5 | 60.7 | 155 |
9 | 1999 | 21.6 | 40.7 | 37.7 | 57.9 | 68.3 | 159.9 |
10 | 2000 | 27.1 | 45.5 | 42.6 | 54.3 | 64.7 | 142.7 |
11 | 2001 | 27.7 | 42.5 | 40.1 | 58.1 | 65.3 | 145.8 |
……and…beware the gotcha!#
And this makes a good example of why you need to be careful with the bounded=
keyword.
As shown below it can hide information you may need to know - to make this clear lets extend the last preview another two rows and columns.
preview(time, housing, data_identifier_code, bounded="A1:I13")
Unnamed Selection: 0 |
Unnamed Selection: 1 |
Unnamed Selection: 2 |
Table 1a
A | B | C | D | E | F | G | H | I | |
1 | Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector | ||||||||
2 | This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance. | ||||||||
3 | Source: Construction Output and Employment from the Office for National Statistics | ||||||||
4 | 2019=100 | ||||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work | All new work |
6 | Dataset identifier code | MV36 | MV37 | MVL7 | MV38 | MV39 | MV3A | MV3B | MV3C |
7 | 1997 | 30.8 | 44.8 | 42.6 | 61.2 | 57.6 | 152.1 | 84.3 | 63.5 |
8 | 1998 | 24.9 | 45.3 | 42 | 59.5 | 60.7 | 155 | 91.4 | 65.2 |
9 | 1999 | 21.6 | 40.7 | 37.7 | 57.9 | 68.3 | 159.9 | 102.3 | 67.2 |
10 | 2000 | 27.1 | 45.5 | 42.6 | 54.3 | 64.7 | 142.7 | 103.1 | 67.3 |
11 | 2001 | 27.7 | 42.5 | 40.1 | 58.1 | 65.3 | 145.8 | 102.3 | 67.2 |
12 | 2002 | 31.4 | 46.3 | 43.9 | 65.7 | 82.6 | 115.7 | 105.7 | 71.4 |
13 | 2003 | 35.7 | 57.8 | 54.3 | 62 | 103.7 | 122.2 | 101.9 | 75.6 |
From here we’re going to continue using bounded=
for reasons of practicality, just be aware of this gotcha in your own scripts.
Labelling Selections#
Lets do a similar thing but this time let’s use .label_as()
to give our cell selection some semantic meaning.
from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable
tables: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 1a")
# Create our selections
time = table.excel_ref("A7:A11").label_as("Time")
housing = table.excel_ref("B5:G5").label_as("Housing")
data_identifier_code = table.excel_ref("B6:G6").label_as("Data Identifier Code")
preview(time, housing, data_identifier_code, bounded="A1:G11")
Time |
Housing |
Data Identifier Code |
Table 1a
A | B | C | D | E | F | G | |
1 | Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector | ||||||
2 | This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance. | ||||||
3 | Source: Construction Output and Employment from the Office for National Statistics | ||||||
4 | 2019=100 | ||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work |
6 | Dataset identifier code | MV36 | MV37 | MVL7 | MV38 | MV39 | MV3A |
7 | 1997 | 30.8 | 44.8 | 42.6 | 61.2 | 57.6 | 152.1 |
8 | 1998 | 24.9 | 45.3 | 42 | 59.5 | 60.7 | 155 |
9 | 1999 | 21.6 | 40.7 | 37.7 | 57.9 | 68.3 | 159.9 |
10 | 2000 | 27.1 | 45.5 | 42.6 | 54.3 | 64.7 | 142.7 |
11 | 2001 | 27.7 | 42.5 | 40.1 | 58.1 | 65.3 | 145.8 |
Excel#
As previously mentioned, the default behaviour for preview()
is to show excel style column letters and row numbers.
This is nearly always the practical choice when processing but can lead to some confusion when previewing your work (especially where previewing a non excel format).
As a nod to these scenarios you can use the show_excel=
keyword to change this behaviour as per the following example.
preview(time, housing, data_identifier_code, bounded="A1:G11", show_excel=False)
Time |
Housing |
Data Identifier Code |
Table 1a
Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector | ||||||
This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance. | ||||||
Source: Construction Output and Employment from the Office for National Statistics | ||||||
2019=100 | ||||||
Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work |
Dataset identifier code | MV36 | MV37 | MVL7 | MV38 | MV39 | MV3A |
1997 | 30.8 | 44.8 | 42.6 | 61.2 | 57.6 | 152.1 |
1998 | 24.9 | 45.3 | 42 | 59.5 | 60.7 | 155 |
1999 | 21.6 | 40.7 | 37.7 | 57.9 | 68.3 | 159.9 |
2000 | 27.1 | 45.5 | 42.6 | 54.3 | 64.7 | 142.7 |
2001 | 27.7 | 42.5 | 40.1 | 58.1 | 65.3 | 145.8 |
XY#
Underneath the hood tidychef uses the x (horizontal) and y (vertical) index of the cells to calculate relative positions.
The can in some cases be useful information to know and is sometimes included in error messages.
If you want to show x and y values on your preview you can use the show_xy=
keyword.
preview(time, housing, data_identifier_code, bounded="A1:G11", show_xy=True, show_excel=False)
Time |
Housing |
Data Identifier Code |
Table 1a
x/y | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
0 | Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector | ||||||
1 | This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance. | ||||||
2 | Source: Construction Output and Employment from the Office for National Statistics | ||||||
3 | 2019=100 | ||||||
4 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work |
5 | Dataset identifier code | MV36 | MV37 | MVL7 | MV38 | MV39 | MV3A |
6 | 1997 | 30.8 | 44.8 | 42.6 | 61.2 | 57.6 | 152.1 |
7 | 1998 | 24.9 | 45.3 | 42 | 59.5 | 60.7 | 155 |
8 | 1999 | 21.6 | 40.7 | 37.7 | 57.9 | 68.3 | 159.9 |
9 | 2000 | 27.1 | 45.5 | 42.6 | 54.3 | 64.7 | 142.7 |
10 | 2001 | 27.7 | 42.5 | 40.1 | 58.1 | 65.3 | 145.8 |
or both together
preview(time, housing, data_identifier_code, bounded="A1:G11", show_xy=True)
Time |
Housing |
Data Identifier Code |
Table 1a
x/y | 0 | 1 | 2 | 3 | 4 | 5 | 6 | |
A | B | C | D | E | F | G | ||
0 | 1 | Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector | ||||||
1 | 2 | This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance. | ||||||
2 | 3 | Source: Construction Output and Employment from the Office for National Statistics | ||||||
3 | 4 | 2019=100 | ||||||
4 | 5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work |
5 | 6 | Dataset identifier code | MV36 | MV37 | MVL7 | MV38 | MV39 | MV3A |
6 | 7 | 1997 | 30.8 | 44.8 | 42.6 | 61.2 | 57.6 | 152.1 |
7 | 8 | 1998 | 24.9 | 45.3 | 42 | 59.5 | 60.7 | 155 |
8 | 9 | 1999 | 21.6 | 40.7 | 37.7 | 57.9 | 68.3 | 159.9 |
9 | 10 | 2000 | 27.1 | 45.5 | 42.6 | 54.3 | 64.7 | 142.7 |
10 | 11 | 2001 | 27.7 | 42.5 | 40.1 | 58.1 | 65.3 | 145.8 |