Selection: Simple
Understanding Selection in TidyChef
Selection is the process of identifying parts of your spreadsheet that you want to work with—such as specific cells, rows, columns, or blocks of data. Because visually structured tables often contain headers, footnotes, and irregular layouts, simply reading the whole sheet won’t produce tidy data.
TidyChef uses Selectables
— programmable, composable selectors—to precisely target these meaningful data elements. Mastering selection lets you extract and reshape only the relevant information, forming the foundation for clean, analysis-ready datasets.
Source Data
The data source we’re using for these examples is shown below:
The full data source can be downloaded here. We’ll be using th 8th tab named “Table 3a”.
Table 3a
| A | B | C | D | E | F | G | H |
4 | Percentage change period on period | | | | | | | |
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Annual dataset identifier code | MV5H | MV5I | MVM3 | MV5J | MV5K | MV5L | MV5M |
7 | Quarterly dataset identifier code | MV54 | MV55 | MVM7 | MV56 | MV57 | MV58 | MV59 |
8 | Monthly dataset identifier code | MV4O | MV4P | MVM2 | MV4Q | MV4R | MV4S | MV4T |
9 | 1998 | -19 | 1 | -1.2 | -2.8 | 5.4 | 1.9 | 8.4 |
10 | 1999 | -13.3 | -10 | -10.3 | -2.8 | 12.6 | 3.2 | 12 |
Row(s), Column(s) & Ranges
These are easier to show that tell, so we’re going to give quick examples of each of these.
Selecting a row
Lets just select a row based on its row mumber.
Table 3a
| A | B | C | D | E | F | G | H |
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Annual dataset identifier code | MV5H | MV5I | MVM3 | MV5J | MV5K | MV5L | MV5M |
7 | Quarterly dataset identifier code | MV54 | MV55 | MVM7 | MV56 | MV57 | MV58 | MV59 |
8 | Monthly dataset identifier code | MV4O | MV4P | MVM2 | MV4Q | MV4R | MV4S | MV4T |
Selecting a row conditionally
We know we want one row and its a row with at least one cell with “Time period” in it.
A rows with 'Time period' in it |
Table 3a
| A | B | C | D | E | F | G | H |
4 | Percentage change period on period | | | | | | | |
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Annual dataset identifier code | MV5H | MV5I | MVM3 | MV5J | MV5K | MV5L | MV5M |
Note - notice its strings plural not string singular, this allows us to be extra specific if we need to, example:
table.row_containing_strings(["Foo", "Bar", "Baz"])
This would still only return exactly one row, but only where said row contained a call ppulated with the value “Foo” and another call populated with the value “Bar” and another one populated with the word “Baz”.
Selecting a column
Table 3a
| B | C | D |
2 | | | |
3 | | | |
4 | | | |
5 | Public new housing | Private new housing | Total new housing |
6 | MV5H | MV5I | MVM3 |
7 | MV54 | MV55 | MVM7 |
8 | MV4O | MV4P | MVM2 |
9 | -19 | 1 | -1.2 |
10 | -13.3 | -10 | -10.3 |
Selecting a column conditionally
Table 3a
| B | C | D |
2 | | | |
3 | | | |
4 | | | |
5 | Public new housing | Private new housing | Total new housing |
6 | MV5H | MV5I | MVM3 |
7 | MV54 | MV55 | MVM7 |
8 | MV4O | MV4P | MVM2 |
9 | -19 | 1 | -1.2 |
10 | -13.3 | -10 | -10.3 |
There’s a point worth understanding in this example, the latter (find by expected contents) is typically more robust than the formcer (explciit reference) as its a good chance of contiuing to work even fi the layout shifts.
Selecting a single cell conditionally
Very much what is sounds like, in this case you only pass one string because you’re only even looking for one match.
Table 3a
| B | C | D |
4 | | | |
5 | Public new housing | Private new housing | Total new housing |
6 | MV5H | MV5I | MVM3 |
7 | MV54 | MV55 | MVM7 |
8 | MV4O | MV4P | MVM2 |
Understanding expand_to_box()
For virtual all tabulated datasource the observations (the actual values) appear in a box or rectangular, shape - as shown below
Box like selection of value cells |
Table 3a
| A | B | C | D | E |
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work |
6 | Annual dataset identifier code | MV5H | MV5I | MVM3 | MV5J |
7 | Quarterly dataset identifier code | MV54 | MV55 | MVM7 | MV56 |
8 | Monthly dataset identifier code | MV4O | MV4P | MVM2 | MV4Q |
9 | 1998 | -19 | 1 | -1.2 | -2.8 |
10 | 1999 | -13.3 | -10 | -10.3 | -2.8 |
11 | 2000 | 25.5 | 11.7 | 12.9 | -6.2 |
12 | 2001 | 2.3 | -6.7 | -5.8 | 7.1 |
The way you instinctively want to read the above blue selection is the same way all human beings do,starting from the top left of the blue box your eyes moves left to right, top to bottom - i.e the same way you’d read a book.
.expand_to_box()
mirrors this quirk of how we read, so it:
Starts with a single cell (the top-left part of the box)
Sweeps right gathering all the cells
Sweeps down gathering all cells beneath those cells.
Removes any cells that are blank
example follows:
Box like selection of value cells |
Table 3a
| A | B | C | D | E |
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work |
6 | Annual dataset identifier code | MV5H | MV5I | MVM3 | MV5J |
7 | Quarterly dataset identifier code | MV54 | MV55 | MVM7 | MV56 |
8 | Monthly dataset identifier code | MV4O | MV4P | MVM2 | MV4Q |
9 | 1998 | -19 | 1 | -1.2 | -2.8 |
10 | 1999 | -13.3 | -10 | -10.3 | -2.8 |
11 | 2000 | 25.5 | 11.7 | 12.9 | -6.2 |
12 | 2001 | 2.3 | -6.7 | -5.8 | 7.1 |
…wait…. can’t we just use that handy excel_ref()
thing that keeps cropping up?
Yes you 100% can ….. but…. that’s an explicit reference, so your code breaks the second someone moves whatever you’re targetting so there’s a trade off to consider.
Excel ref is convenient and very useful thing you’ll almost certainly be making use of, but it (and all explicit references) carry an inherent fragility. Decided where to be explicit vs conditional with your selections is very context dependent but always something you should be considering when working with tidychef.
Speaking of…..
Excel ref
This is analogous to when you click and drag on a spreadhseet to select a rand of cells, examples follow:
Table 3a
| A | B | C | D | E |
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work |
6 | Annual dataset identifier code | MV5H | MV5I | MVM3 | MV5J |
7 | Quarterly dataset identifier code | MV54 | MV55 | MVM7 | MV56 |
8 | Monthly dataset identifier code | MV4O | MV4P | MVM2 | MV4Q |
9 | 1998 | -19 | 1 | -1.2 | -2.8 |
10 | 1999 | -13.3 | -10 | -10.3 | -2.8 |
11 | 2000 | 25.5 | 11.7 | 12.9 | -6.2 |
12 | 2001 | 2.3 | -6.7 | -5.8 | 7.1 |
Table 3a
| A | B | C | D | E |
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work |
6 | Annual dataset identifier code | MV5H | MV5I | MVM3 | MV5J |
7 | Quarterly dataset identifier code | MV54 | MV55 | MVM7 | MV56 |
8 | Monthly dataset identifier code | MV4O | MV4P | MVM2 | MV4Q |
9 | 1998 | -19 | 1 | -1.2 | -2.8 |
10 | 1999 | -13.3 | -10 | -10.3 | -2.8 |
11 | 2000 | 25.5 | 11.7 | 12.9 | -6.2 |
12 | 2001 | 2.3 | -6.7 | -5.8 | 7.1 |
Table 3a
| A | B | C | D | E |
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work |
6 | Annual dataset identifier code | MV5H | MV5I | MVM3 | MV5J |
7 | Quarterly dataset identifier code | MV54 | MV55 | MVM7 | MV56 |
8 | Monthly dataset identifier code | MV4O | MV4P | MVM2 | MV4Q |
9 | 1998 | -19 | 1 | -1.2 | -2.8 |
10 | 1999 | -13.3 | -10 | -10.3 | -2.8 |
11 | 2000 | 25.5 | 11.7 | 12.9 | -6.2 |
12 | 2001 | 2.3 | -6.7 | -5.8 | 7.1 |
Table 3a
| A | B | C | D | E |
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work |
6 | Annual dataset identifier code | MV5H | MV5I | MVM3 | MV5J |
7 | Quarterly dataset identifier code | MV54 | MV55 | MVM7 | MV56 |
8 | Monthly dataset identifier code | MV4O | MV4P | MVM2 | MV4Q |
9 | 1998 | -19 | 1 | -1.2 | -2.8 |
10 | 1999 | -13.3 | -10 | -10.3 | -2.8 |
11 | 2000 | 25.5 | 11.7 | 12.9 | -6.2 |
12 | 2001 | 2.3 | -6.7 | -5.8 | 7.1 |
Is Numeric and Is Not Numeric
These common methods are convenient aliases two very popular filters (more on filters later) and they work largely as you’d image.
Table 3a
| A | B | C | D | E |
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work |
6 | Annual dataset identifier code | MV5H | MV5I | MVM3 | MV5J |
7 | Quarterly dataset identifier code | MV54 | MV55 | MVM7 | MV56 |
8 | Monthly dataset identifier code | MV4O | MV4P | MVM2 | MV4Q |
9 | 1998 | -19 | 1 | -1.2 | -2.8 |
10 | 1999 | -13.3 | -10 | -10.3 | -2.8 |
11 | 2000 | 25.5 | 11.7 | 12.9 | -6.2 |
12 | 2001 | 2.3 | -6.7 | -5.8 | 7.1 |
Table 3a
| A | B | C | D | E |
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work |
6 | Annual dataset identifier code | MV5H | MV5I | MVM3 | MV5J |
7 | Quarterly dataset identifier code | MV54 | MV55 | MVM7 | MV56 |
8 | Monthly dataset identifier code | MV4O | MV4P | MVM2 | MV4Q |
9 | 1998 | -19 | 1 | -1.2 | -2.8 |
10 | 1999 | -13.3 | -10 | -10.3 | -2.8 |
11 | 2000 | 25.5 | 11.7 | 12.9 | -6.2 |
12 | 2001 | 2.3 | -6.7 | -5.8 | 7.1 |