Selection: Operators#
This section uses the excel_ref()
method shown previously to give examples of how operators can be used with cell selections.
Source Data#
The data source we’re using for these examples is shown below:
Note - this particular table has some very verbose headers we don’t care about, so we’ll be using |
---|
The full data source can be downloaded here. We’ll be using th 6th tab named “Table2a”.
from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")
preview(table, bounded="A4:H10")
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |
Subtraction#
Here is an example of subracting one cell selection from another.
from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")
# Lets make two selection
# First - lets select all the cells we're looking at
all_viewed_cells = table.excel_ref("A4:H10")
# Secondly lets just select the contents of column D
column_d_cells = table.excel_ref('D')
# Note lets subtract one from the other and preview it
everying_bar_d = all_viewed_cells - column_d_cells
preview(everying_bar_d, bounded="A4:H10")
Unnamed Selection: 0 |
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |
Now for a more practical example, lets select the whole of the row 4 then subtract the cell containing the text “Dataset identifier code”.
Notice how we can do this quite succinctly.
from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")
selection = table.row("6") - table.excel_ref('A6')
preview(selection.label_as('Data identifier codes'), bounded="A4:H10")
Data identifier codes |
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |
Union#
Possibly contrary to expectations the correct operator for combining cell selections is via Union
not Addition
.
This is because any given cell is a unique entity and should never appear more than once in any given selection.
Envision it this way:
If we implemented an addition +
operator then…
[A1, B1, C1] + [B1, C1, D1] = [A1, B1, B1, C1, C1, D1]
And this is obviously not the behviour that we want. Instead we use the union |
operator. To create a combined object of unique cells, i.e:
[A1, B1, C1] | [B1, C1, D1] = [A1, B1, C1, D1]
Now, let’s see what this looks like in practice..
from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")
selection = table.excel_ref("C4:C10") | table.excel_ref('D4:D10') | table.excel_ref('E4:E10')
preview(selection.label_as('C, D & E'), bounded="A4:H10")
# Lets also print out the length of the selection
print(f'Selection has length: {len(selection)}')
C, D & E |
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |
Selection has length: 21
but because its a union we can use overlapping selections without introducing duplicate cells.
from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")
selection = table.excel_ref("C4:D10") | table.excel_ref('D4:E10')
preview(selection.label_as('C, D & E'), bounded="A4:H10")
# Lets also print out the length of the selection
print(f'Selection has length: {len(selection)}')
C, D & E |
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |
Selection has length: 21
Note the length (number of selected cells in this case) is the same as the previous example even though we presented tidychef with overlapping selections.