Selection: Filters#
Filters are a means of applying conditional logic to filter down any given selection of cells.
Some standard filters are provided and will be explained here. We will also give some simple examples of how to make your own filters.
Source Data#
The data source we’re using for these examples is shown below:
Note - this particular table has some very verbose headers we don’t care about, so we’ll be using |
---|
The full data source can be downloaded here. We’ll be using th 6th tab named “Table2a”.
from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")
preview(table, bounded="A4:H10")
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |
The Intention Of Filters#
Pre-existing filters are imported via the filters
module and typically accessed via dot notation as arguments to the .filter()
method.
This is a conscious design choice to allow us to freely add additional filters without polluting or over complicating the tidychef api.
Note - while this is a reasonably future proofed pattern, the intention here is to principally empower users to create filters or libraries of filters for their own use cases, DRY etc. |
---|
Filter Numeric#
Filters that work with the numeric or non numeric values of cells.
from tidychef import acquire, preview, filters
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")
numeric = table.filter(filters.is_numeric).label_as("Cells that are numeric")
not_numeric = table.filter(filters.is_not_numeric).label_as("Cells that are NOT numeric")
preview(numeric, not_numeric, bounded="A4:H10")
Cells that are numeric |
Cells that are NOT numeric |
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |
Filter Contains String#
This filter is a simple substring matcher, filtering down the selection of cells to just cells whose value contains the provided string.
from tidychef import acquire, preview, filters
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")
housing = table.filter(filters.contains_string("housing")).label_as("Cells containing 'housing'")
preview(housing, bounded="A4:H10")
Cells containing 'housing' |
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |
Custom Filters#
So a filter is just…
A function of python callable that returns True or False when given a “cell”
A “cell” is a tidychef object, the value (cell contents) of a cell can be access with
.value
For now we’ll demonstrate how to create a filter using a simple function and a lambda function.
from tidychef import acquire, preview
from tidychef.models.source.cell import Cell
from tidychef.selection import XlsxSelectable
def m_filter(cell: Cell) -> bool:
"""
A simple custom filter to find cells starting with a capital M
"""
return cell.value.startswith("M")
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")
housing = table.filter(m_filter).label_as("Cells starting with M")
preview(housing, bounded="A4:H10")
Cells starting with M |
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |
Now the same thing but using a lambda function
from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 2a")
housing = table.filter(lambda cell: cell.value.startswith("M")).label_as("Cells starting with M")
preview(housing, bounded="A4:H10")
Cells starting with M |
Table 2a
A | B | C | D | E | F | G | H | |
4 | £ million | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MV3W | MV3X | MVL9 | MV3Y | MV3Z | MV42 | MV43 |
7 | 1997 | 2158 | 17588 | 19690 | 14015 | 6010 | 8721 | 25517 |
8 | 1998 | 1748 | 17764 | 19448 | 13627 | 6334 | 8885 | 27655 |
9 | 1999 | 1515 | 15980 | 17436 | 13247 | 7129 | 9167 | 30963 |
10 | 2000 | 1901 | 17853 | 19693 | 12430 | 6753 | 8182 | 31200 |