Selection: Waffle#
A waffle()
is a method for selecting cells that exist at the intersection of two selections.
It’s a concept that’s easiest to explain with examples.
Source Data#
The data source we’re using for these examples is shown below:
Note - this particular table has some very verbose headers we don’t care about, so we’ll be using |
---|
The full data source can be downloaded here. We’ll be using th 6th tab named “Table2a”.
from tidychef import acquire, preview
from tidychef.direction import right, up
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")
preview(table, bounded="A4:H10")
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |
Now lets image we have two selections as follows
codes = table.excel_ref("B5:H5").label_as("Codes")
years = table.excel_ref("A7:A10").label_as("Years")
preview(codes, years, bounded="A4:H10")
Codes |
Years |
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |
Now we have our column values we want to get observations - this is where waffle()
comes in!
The syntax is as follows:
<selection1>.waffle(<direction>, <selection2>)
So in our example we’re starting with the “years” selection and looking right to create a waffle against the “codes” selection, as follows:
waffled_selection = years.waffle(right, codes).label_as("Values")
preview(waffled_selection, bounded="A4:H10")
Values |
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |
Why the direction?#
Is the typical question at this point. It’s because you can always interpret a waffle along more than one axis.
As an example, lets use the same command but look to waffle up rather than right.
waffled_selection = years.waffle(up, codes).label_as("Values")
preview(waffled_selection, bounded="A4:H10")
Values |
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |
You cannot waffle overlaps#
The waffle()
method is very strict in that it will not allow you to waffle two selections that overlap on the relevant axis.
Again, we’ll show this with an example.
codes = table.excel_ref("B5:H5").label_as("Codes")
years = table.excel_ref("C7:C10").label_as("Years")
preview(codes, years, bounded="A4:H10")
waffled_selection = years.waffle(right, codes)
Codes |
Years |
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |
---------------------------------------------------------------------------
AmbiguousWaffleError Traceback (most recent call last)
Cell In[5], line 4
2 years = table.excel_ref("C7:C10").label_as("Years")
3 preview(codes, years, bounded="A4:H10")
----> 4 waffled_selection = years.waffle(right, codes)
File ~/.pyenv/versions/3.12.11/lib/python3.12/site-packages/tidychef/utils/decorators/dontmutate.py:24, in dontmutate.<locals>.wrapper(self, *args, **kwargs)
21 @wraps(method)
22 def wrapper(self, *args, **kwargs):
23 self = copy.deepcopy(self)
---> 24 return method(self, *args, **kwargs)
File ~/.pyenv/versions/3.12.11/lib/python3.12/site-packages/tidychef/selection/selectable.py:537, in Selectable.waffle(self, direction, additional_selection)
535 highest_x = dfc.maximum_x_offset(self.cells)
536 if any([x for x in additional_selection if x.x <= highest_x]):
--> 537 raise AmbiguousWaffleError(
538 "When using waffle right, your additional selections must all "
539 "be right of your initial selections."
540 )
541 if direction.is_left:
542 lowest_x = dfc.minimum_x_offset(self.cells)
AmbiguousWaffleError: When using waffle right, your additional selections must all be right of your initial selections.