Preview#

For these example we’re going to use table two of this sample xls file as show below. And use the bounded= keyword to keep the previews small.

Note - bounded= is useful for presentational purposes such as this, but should generally be avoided when writing processing scripts as its possible to hide data that you might need to know about. .

from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable

table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 1a")
preview(table, bounded="A1:G11")

Table 1a

ABCDEFG
1Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector
2This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance.
3Source: Construction Output and Employment from the Office for National Statistics
42019=100
5Time periodPublic new housingPrivate new housingTotal new housingInfrastructure new workPublic other new workPrivate industrial new work
6Dataset identifier codeMV36MV37MVL7MV38MV39MV3A
7199730.844.842.661.257.6152.1
8199824.945.34259.560.7155
9199921.640.737.757.968.3159.9
10200027.145.542.654.364.7142.7
11200127.742.540.158.165.3145.8

Selection & Preview#

We’re going to make use of two selection methods now as follows.

  • .excel_ref() - use excel cel references to explicitly select a range of cells

  • .label_as() - give a user friendly label to a selection of cells

We’re going to start by making and previewing some simple selections with .excel_ref()

from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable

table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 1a")

# Create our selections
time = table.excel_ref("A7:A11")
housing = table.excel_ref("B5:G5")
data_identifier_code = table.excel_ref("B6:G6")

# Note on multiple selections
# - Any selections for previewing are just passed as positional arguments to preview().
# - You dont need to pass in a blank selection, that is only necessary where no selections have been made.
preview(time, housing, data_identifier_code, bounded="A1:G11")
Unnamed Selection: 0
Unnamed Selection: 1
Unnamed Selection: 2

Table 1a

ABCDEFG
1Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector
2This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance.
3Source: Construction Output and Employment from the Office for National Statistics
42019=100
5Time periodPublic new housingPrivate new housingTotal new housingInfrastructure new workPublic other new workPrivate industrial new work
6Dataset identifier codeMV36MV37MVL7MV38MV39MV3A
7199730.844.842.661.257.6152.1
8199824.945.34259.560.7155
9199921.640.737.757.968.3159.9
10200027.145.542.654.364.7142.7
11200127.742.540.158.165.3145.8

……and…beware the gotcha!#

And this makes a good example of why you need to be careful with the bounded= keyword.

As shown below it can hide information you may need to know - to make this clear lets extend the last preview another two rows and columns.

preview(time, housing, data_identifier_code, bounded="A1:I13")
Unnamed Selection: 0
Unnamed Selection: 1
Unnamed Selection: 2

Table 1a

ABCDEFGHI
1Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector
2This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance.
3Source: Construction Output and Employment from the Office for National Statistics
42019=100
5Time periodPublic new housingPrivate new housingTotal new housingInfrastructure new workPublic other new workPrivate industrial new workPrivate commercial new workAll new work
6Dataset identifier codeMV36MV37MVL7MV38MV39MV3AMV3BMV3C
7199730.844.842.661.257.6152.184.363.5
8199824.945.34259.560.715591.465.2
9199921.640.737.757.968.3159.9102.367.2
10200027.145.542.654.364.7142.7103.167.3
11200127.742.540.158.165.3145.8102.367.2
12200231.446.343.965.782.6115.7105.771.4
13200335.757.854.362103.7122.2101.975.6

From here we’re going to continue using bounded= for reasons of practicality, just be aware of this gotcha in your own scripts.

Labelling Selections#

Lets do a similar thing but this time let’s use .label_as() to give our cell selection some semantic meaning.

from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable

tables: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 1a")

# Create our selections
time = table.excel_ref("A7:A11").label_as("Time")
housing = table.excel_ref("B5:G5").label_as("Housing")
data_identifier_code = table.excel_ref("B6:G6").label_as("Data Identifier Code")

preview(time, housing, data_identifier_code, bounded="A1:G11")
Time
Housing
Data Identifier Code

Table 1a

ABCDEFG
1Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector
2This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance.
3Source: Construction Output and Employment from the Office for National Statistics
42019=100
5Time periodPublic new housingPrivate new housingTotal new housingInfrastructure new workPublic other new workPrivate industrial new work
6Dataset identifier codeMV36MV37MVL7MV38MV39MV3A
7199730.844.842.661.257.6152.1
8199824.945.34259.560.7155
9199921.640.737.757.968.3159.9
10200027.145.542.654.364.7142.7
11200127.742.540.158.165.3145.8

Excel#

As previously mentioned, the default behaviour for preview() is to show excel style column letters and row numbers.

This is nearly always the practical choice when processing but can lead to some confusion when previewing your work (especially where previewing a non excel format).

As a nod to these scenarios you can use the show_excel= keyword to change this behaviour as per the following example.

preview(time, housing, data_identifier_code, bounded="A1:G11", show_excel=False)
Time
Housing
Data Identifier Code

Table 1a

Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector
This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance.
Source: Construction Output and Employment from the Office for National Statistics
2019=100
Time periodPublic new housingPrivate new housingTotal new housingInfrastructure new workPublic other new workPrivate industrial new work
Dataset identifier codeMV36MV37MVL7MV38MV39MV3A
199730.844.842.661.257.6152.1
199824.945.34259.560.7155
199921.640.737.757.968.3159.9
200027.145.542.654.364.7142.7
200127.742.540.158.165.3145.8

XY#

Underneath the hood tidychef uses the x (horizontal) and y (vertical) index of the cells to calculate relative positions.

The can in some cases be useful information to know and is sometimes included in error messages.

If you want to show x and y values on your preview you can use the show_xy= keyword.

preview(time, housing, data_identifier_code, bounded="A1:G11", show_xy=True, show_excel=False)
Time
Housing
Data Identifier Code

Table 1a

x/y0123456
0Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector
1This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance.
2Source: Construction Output and Employment from the Office for National Statistics
32019=100
4Time periodPublic new housingPrivate new housingTotal new housingInfrastructure new workPublic other new workPrivate industrial new work
5Dataset identifier codeMV36MV37MVL7MV38MV39MV3A
6199730.844.842.661.257.6152.1
7199824.945.34259.560.7155
8199921.640.737.757.968.3159.9
9200027.145.542.654.364.7142.7
10200127.742.540.158.165.3145.8

or both together

preview(time, housing, data_identifier_code, bounded="A1:G11", show_xy=True)
Time
Housing
Data Identifier Code

Table 1a

x/y0123456
ABCDEFG
01Table 1a: Construction output in Great Britain, volume, seasonally adjusted, index numbers, by sector
12This worksheet contains one table. Some shorthand is used in this table [R&M] = repair and maintenance.
23Source: Construction Output and Employment from the Office for National Statistics
342019=100
45Time periodPublic new housingPrivate new housingTotal new housingInfrastructure new workPublic other new workPrivate industrial new work
56Dataset identifier codeMV36MV37MVL7MV38MV39MV3A
67199730.844.842.661.257.6152.1
78199824.945.34259.560.7155
89199921.640.737.757.968.3159.9
910200027.145.542.654.364.7142.7
1011200127.742.540.158.165.3145.8