Selection: Simple

Selection: Simple#

Understanding Selection in TidyChef#

Selection is the process of identifying parts of your spreadsheet that you want to work with—such as specific cells, rows, columns, or blocks of data. Because visually structured tables often contain headers, footnotes, and irregular layouts, simply reading the whole sheet won’t produce tidy data.

TidyChef uses Selectables — programmable, composable selectors—to precisely target these meaningful data elements. Mastering selection lets you extract and reshape only the relevant information, forming the foundation for clean, analysis-ready datasets.

Source Data#

The data source we’re using for these examples is shown below:

Note - this particular table has some very verbose headers we don’t care about, so we’ll be using `bounded=` to remove them from the previews as well as to show just the subset of data we’re working with.

The full data source can be downloaded here. We’ll be using th 8th tab named “Table 3a”.

from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable

table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 3a")
preview(table, bounded="A4:H10")

Table 3a

	A	B	C	D	E	F	G	H
4	Percentage change period on period
5	Time period	Public new housing	Private new housing	Total new housing	Infrastructure new work	Public other new work	Private industrial new work	Private commercial new work
6	Annual dataset identifier code	MV5H	MV5I	MVM3	MV5J	MV5K	MV5L	MV5M
7	Quarterly dataset identifier code	MV54	MV55	MVM7	MV56	MV57	MV58	MV59
8	Monthly dataset identifier code	MV4O	MV4P	MVM2	MV4Q	MV4R	MV4S	MV4T
9	1998	-19	1	-1.2	-2.8	5.4	1.9	8.4
10	1999	-13.3	-10	-10.3	-2.8	12.6	3.2	12

Row(s), Column(s) & Ranges#

These are easier to show that tell, so we’re going to give quick examples of each of these.

Selecting a row#

Lets just select a row based on its row mumber.

row3 = table.row("7").label_as("Just row 7")
preview(row3, bounded="A5:H8")

Just row 7

Table 3a

	A	B	C	D	E	F	G	H
5	Time period	Public new housing	Private new housing	Total new housing	Infrastructure new work	Public other new work	Private industrial new work	Private commercial new work
6	Annual dataset identifier code	MV5H	MV5I	MVM3	MV5J	MV5K	MV5L	MV5M
7	Quarterly dataset identifier code	MV54	MV55	MVM7	MV56	MV57	MV58	MV59
8	Monthly dataset identifier code	MV4O	MV4P	MVM2	MV4Q	MV4R	MV4S	MV4T

Selecting a row conditionally#

We know we want one row and its a row with at least one cell with “Time period” in it.

row_with_time_period = table.row_containing_strings(["Time period"]).label_as("A rows with 'Time period' in it")
preview(row_with_time_period, bounded="A4:H6")

A rows with 'Time period' in it

Table 3a

	A	B	C	D	E	F	G	H
4	Percentage change period on period
5	Time period	Public new housing	Private new housing	Total new housing	Infrastructure new work	Public other new work	Private industrial new work	Private commercial new work
6	Annual dataset identifier code	MV5H	MV5I	MVM3	MV5J	MV5K	MV5L	MV5M

Note - notice its strings plural not string singular, this allows us to be extra specific if we need to, example:

table.row_containing_strings(["Foo", "Bar", "Baz"])

This would still only return exactly one row, but only where said row contained a call ppulated with the value “Foo” and another call populated with the value “Bar” and another one populated with the word “Baz”.

Selecting a column#

column_c = table.column("C").label_as("I am column C")
preview(column_c, bounded="B2:D10")

I am column C

Table 3a

	B	C	D
2
3
4
5	Public new housing	Private new housing	Total new housing
6	MV5H	MV5I	MVM3
7	MV54	MV55	MVM7
8	MV4O	MV4P	MVM2
9	-19	1	-1.2
10	-13.3	-10	-10.3

Selecting a column conditionally#

column = table.column_containing_strings(["Private new housing"])
preview(column_c, bounded="B2:D10") 

I am column C

Table 3a

	B	C	D
2
3
4
5	Public new housing	Private new housing	Total new housing
6	MV5H	MV5I	MVM3
7	MV54	MV55	MVM7
8	MV4O	MV4P	MVM2
9	-19	1	-1.2
10	-13.3	-10	-10.3

There’s a point worth understanding in this example, the latter (find by expected contents) is typically more robust than the formcer (explciit reference) as its a good chance of contiuing to work even fi the layout shifts.

Selecting a single cell conditionally#

Very much what is sounds like, in this case you only pass one string because you’re only even looking for one match.

cell = table.cell_containing_string("MV5I").label_as("Single cell selection")
preview(cell, bounded="B4:D8") 

Single cell selection

Table 3a

	B	C	D
4
5	Public new housing	Private new housing	Total new housing
6	MV5H	MV5I	MVM3
7	MV54	MV55	MVM7
8	MV4O	MV4P	MVM2

Understanding `expand_to_box()`#

For virtual all tabulated datasource the observations (the actual values) appear in a box or rectangular, shape - as shown below

preview(table.excel_ref("B9:E12").label_as("Box like selection of value cells"), bounded="A5:E12")

Box like selection of value cells

Table 3a

	A	B	C	D	E
5	Time period	Public new housing	Private new housing	Total new housing	Infrastructure new work
6	Annual dataset identifier code	MV5H	MV5I	MVM3	MV5J
7	Quarterly dataset identifier code	MV54	MV55	MVM7	MV56
8	Monthly dataset identifier code	MV4O	MV4P	MVM2	MV4Q
9	1998	-19	1	-1.2	-2.8
10	1999	-13.3	-10	-10.3	-2.8
11	2000	25.5	11.7	12.9	-6.2
12	2001	2.3	-6.7	-5.8	7.1

1. The way you instinctively want to read the above blue selection is the same way all human beings do,starting from the top left of the blue box your eyes moves left to right, top to bottom - i.e the same way you’d read a book.
1. .expand_to_box() mirrors this quirk of how we read, so it:
  - Starts with a single cell (the top-left part of the box)
  - Sweeps right gathering all the cells
  - Sweeps down gathering all cells beneath those cells.
  - Removes any cells that are blank

example follows:

start_cell = table.excel_ref("B9")
box_of_cells = start_cell.expand_to_box().label_as("Box like selection of value cells")
preview(box_of_cells, bounded="A5:E12")

Box like selection of value cells

Table 3a

	A	B	C	D	E
5	Time period	Public new housing	Private new housing	Total new housing	Infrastructure new work
6	Annual dataset identifier code	MV5H	MV5I	MVM3	MV5J
7	Quarterly dataset identifier code	MV54	MV55	MVM7	MV56
8	Monthly dataset identifier code	MV4O	MV4P	MVM2	MV4Q
9	1998	-19	1	-1.2	-2.8
10	1999	-13.3	-10	-10.3	-2.8
11	2000	25.5	11.7	12.9	-6.2
12	2001	2.3	-6.7	-5.8	7.1

…wait…. can’t we just use that handy excel_ref() thing that keeps cropping up?

Yes you 100% can ….. but…. that’s an explicit reference, so your code breaks the second someone moves whatever you’re targetting so there’s a trade off to consider.

Excel ref is convenient and very useful thing you’ll almost certainly be making use of, but it (and all explicit references) carry an inherent fragility. Decided where to be explicit vs conditional with your selections is very context dependent but always something you should be considering when working with tidychef.

Speaking of…..

Excel ref#

This is analogous to when you click and drag on a spreadhseet to select a rand of cells, examples follow:

preview(table.excel_ref("B9"), bounded="A5:E12")

Unnamed Selection: 0

Table 3a

	A	B	C	D	E
5	Time period	Public new housing	Private new housing	Total new housing	Infrastructure new work
6	Annual dataset identifier code	MV5H	MV5I	MVM3	MV5J
7	Quarterly dataset identifier code	MV54	MV55	MVM7	MV56
8	Monthly dataset identifier code	MV4O	MV4P	MVM2	MV4Q
9	1998	-19	1	-1.2	-2.8
10	1999	-13.3	-10	-10.3	-2.8
11	2000	25.5	11.7	12.9	-6.2
12	2001	2.3	-6.7	-5.8	7.1

preview(table.excel_ref("B9:D11"), bounded="A5:E12")

Unnamed Selection: 0

Table 3a

	A	B	C	D	E
5	Time period	Public new housing	Private new housing	Total new housing	Infrastructure new work
6	Annual dataset identifier code	MV5H	MV5I	MVM3	MV5J
7	Quarterly dataset identifier code	MV54	MV55	MVM7	MV56
8	Monthly dataset identifier code	MV4O	MV4P	MVM2	MV4Q
9	1998	-19	1	-1.2	-2.8
10	1999	-13.3	-10	-10.3	-2.8
11	2000	25.5	11.7	12.9	-6.2
12	2001	2.3	-6.7	-5.8	7.1

preview(table.excel_ref("8:10"), bounded="A5:E12")

Unnamed Selection: 0

Table 3a

	A	B	C	D	E
5	Time period	Public new housing	Private new housing	Total new housing	Infrastructure new work
6	Annual dataset identifier code	MV5H	MV5I	MVM3	MV5J
7	Quarterly dataset identifier code	MV54	MV55	MVM7	MV56
8	Monthly dataset identifier code	MV4O	MV4P	MVM2	MV4Q
9	1998	-19	1	-1.2	-2.8
10	1999	-13.3	-10	-10.3	-2.8
11	2000	25.5	11.7	12.9	-6.2
12	2001	2.3	-6.7	-5.8	7.1

preview(table.excel_ref("D:E"), bounded="A5:E12")

Unnamed Selection: 0

Table 3a

	A	B	C	D	E
5	Time period	Public new housing	Private new housing	Total new housing	Infrastructure new work
6	Annual dataset identifier code	MV5H	MV5I	MVM3	MV5J
7	Quarterly dataset identifier code	MV54	MV55	MVM7	MV56
8	Monthly dataset identifier code	MV4O	MV4P	MVM2	MV4Q
9	1998	-19	1	-1.2	-2.8
10	1999	-13.3	-10	-10.3	-2.8
11	2000	25.5	11.7	12.9	-6.2
12	2001	2.3	-6.7	-5.8	7.1

Is Numeric and Is Not Numeric#

These common methods are convenient aliases two very popular filters (more on filters later) and they work largely as you’d image.

numeric = table.is_numeric().label_as("Is numeric")
preview(numeric, bounded="A5:E12")

Is numeric

Table 3a

	A	B	C	D	E
5	Time period	Public new housing	Private new housing	Total new housing	Infrastructure new work
6	Annual dataset identifier code	MV5H	MV5I	MVM3	MV5J
7	Quarterly dataset identifier code	MV54	MV55	MVM7	MV56
8	Monthly dataset identifier code	MV4O	MV4P	MVM2	MV4Q
9	1998	-19	1	-1.2	-2.8
10	1999	-13.3	-10	-10.3	-2.8
11	2000	25.5	11.7	12.9	-6.2
12	2001	2.3	-6.7	-5.8	7.1

not_numeric = table.is_not_numeric().label_as("In not numeric")
preview(not_numeric, bounded="A5:E12")

In not numeric

Table 3a

	A	B	C	D	E
5	Time period	Public new housing	Private new housing	Total new housing	Infrastructure new work
6	Annual dataset identifier code	MV5H	MV5I	MVM3	MV5J
7	Quarterly dataset identifier code	MV54	MV55	MVM7	MV56
8	Monthly dataset identifier code	MV4O	MV4P	MVM2	MV4Q
9	1998	-19	1	-1.2	-2.8
10	1999	-13.3	-10	-10.3	-2.8
11	2000	25.5	11.7	12.9	-6.2
12	2001	2.3	-6.7	-5.8	7.1

Selection: Simple

Contents

Selection: Simple#

Understanding Selection in TidyChef#

Source Data#

Table 3a

Row(s), Column(s) & Ranges#

Selecting a row#

Table 3a

Selecting a row conditionally#

Table 3a

Selecting a column#

Table 3a

Selecting a column conditionally#

Table 3a

Selecting a single cell conditionally#

Table 3a

Understanding expand_to_box()#

Table 3a

Table 3a

Excel ref#

Table 3a

Table 3a

Table 3a

Table 3a

Is Numeric and Is Not Numeric#

Table 3a

Table 3a

Understanding `expand_to_box()`#