Selection: Operators

Selection: Operators#

This section uses the excel_ref() method shown previously to give examples of how operators can be used with cell selections.

Source Data#

The data source we’re using for these examples is shown below:

Note - this particular table has some very verbose headers we don’t care about, so we’ll be using bounded= to remove them from the previews as well as to show just the subset of data we’re working with.

The full data source can be downloaded here. We’ll be using th 6th tab named “Table2a”.

from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable

table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")
preview(table, bounded="A4:H10")

Table 2a

ABCDEFGH
4£ million
5Time periodPublic new housingPrivate new housingTotal new housingInfrastructure new workPublic other new workPrivate industrial new workPrivate commercial new work
6Dataset identifier codeMV3WMV3XMVL9MV3YMV3ZMV42MV43
7199721581758819690140156010872125517
8199817481776419448136276334888527655
9199915151598017436132477129916730963
10200019011785319693124306753818231200

Subtraction#

Here is an example of subracting one cell selection from another.

from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable

table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")

# Lets make two selection
# First - lets select all the cells we're looking at
all_viewed_cells = table.excel_ref("A4:H10")

# Secondly lets just select the contents of column D
column_d_cells = table.excel_ref('D')

# Note lets subtract one from the other and preview it
everying_bar_d = all_viewed_cells - column_d_cells
preview(everying_bar_d, bounded="A4:H10")
Unnamed Selection: 0

Table 2a

ABCDEFGH
4£ million
5Time periodPublic new housingPrivate new housingTotal new housingInfrastructure new workPublic other new workPrivate industrial new workPrivate commercial new work
6Dataset identifier codeMV3WMV3XMVL9MV3YMV3ZMV42MV43
7199721581758819690140156010872125517
8199817481776419448136276334888527655
9199915151598017436132477129916730963
10200019011785319693124306753818231200

Now for a more practical example, lets select the whole of the row 4 then subtract the cell containing the text “Dataset identifier code”.

Notice how we can do this quite succinctly.

from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable

table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")

selection = table.row("6") - table.excel_ref('A6')

preview(selection.label_as('Data identifier codes'), bounded="A4:H10")
Data identifier codes

Table 2a

ABCDEFGH
4£ million
5Time periodPublic new housingPrivate new housingTotal new housingInfrastructure new workPublic other new workPrivate industrial new workPrivate commercial new work
6Dataset identifier codeMV3WMV3XMVL9MV3YMV3ZMV42MV43
7199721581758819690140156010872125517
8199817481776419448136276334888527655
9199915151598017436132477129916730963
10200019011785319693124306753818231200

Union#

Possibly contrary to expectations the correct operator for combining cell selections is via Union not Addition.

This is because any given cell is a unique entity and should never appear more than once in any given selection.

Envision it this way:

If we implemented an addition + operator then…

[A1, B1, C1] + [B1, C1, D1] = [A1, B1, B1, C1, C1, D1]

And this is obviously not the behviour that we want. Instead we use the union | operator. To create a combined object of unique cells, i.e:

[A1, B1, C1] | [B1, C1, D1] = [A1, B1, C1, D1]

Now, let’s see what this looks like in practice..

from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable

table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")

selection = table.excel_ref("C4:C10") | table.excel_ref('D4:D10') | table.excel_ref('E4:E10')

preview(selection.label_as('C, D & E'), bounded="A4:H10")

# Lets also print out the length of the selection
print(f'Selection has length: {len(selection)}')
C, D & E

Table 2a

ABCDEFGH
4£ million
5Time periodPublic new housingPrivate new housingTotal new housingInfrastructure new workPublic other new workPrivate industrial new workPrivate commercial new work
6Dataset identifier codeMV3WMV3XMVL9MV3YMV3ZMV42MV43
7199721581758819690140156010872125517
8199817481776419448136276334888527655
9199915151598017436132477129916730963
10200019011785319693124306753818231200

Selection has length: 21

but because its a union we can use overlapping selections without introducing duplicate cells.

from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable

table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")

selection = table.excel_ref("C4:D10") | table.excel_ref('D4:E10')

preview(selection.label_as('C, D & E'), bounded="A4:H10")

# Lets also print out the length of the selection
print(f'Selection has length: {len(selection)}')
C, D & E

Table 2a

ABCDEFGH
4£ million
5Time periodPublic new housingPrivate new housingTotal new housingInfrastructure new workPublic other new workPrivate industrial new workPrivate commercial new work
6Dataset identifier codeMV3WMV3XMVL9MV3YMV3ZMV42MV43
7199721581758819690140156010872125517
8199817481776419448136276334888527655
9199915151598017436132477129916730963
10200019011785319693124306753818231200

Selection has length: 21

Note the length (number of selected cells in this case) is the same as the previous example even though we presented tidychef with overlapping selections.