Selection: Regex#
No python data wrangling framework would be complete without a nod to regular expressions.
The full syntax of regex is beyond the scope of this tutorial (but you can learn more here or via many online sources) and it’s included as a precaution, you should rarely if ever find you need to dip into regex.
This page is just some simple example on how to use regex in relation to tidychef cells.
IMPORTANT - regex is powerful …but… it’s also quite slow, as a general rule always look to use the other methods whenever they’ll do the same job.
Source Data#
The data source we’re using for these examples is shown below:
Note - this particular table has some very verbose headers we don’t care about, so we’ll be using |
---|
The full data source can be downloaded here. We’ll be using th 10th tab named “Table 3c”.
from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 3c")
preview(table, bounded="A4:H10")
Table 3c
A | B | C | D | E | F | G | H | |
4 | Percentage change 3 months on previous 3 months | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MVO6 | MVO7 | MVO8 | MVO9 | MVP2 | MVP3 | MVP4 |
7 | Jun 2010 | 5.6 | 9.8 | 8.8 | 3 | 4.3 | 3.7 | 1.9 |
8 | Jul 2010 | 2 | 5.6 | 4.8 | 0.2 | -0.2 | 9.7 | 3.5 |
9 | Aug 2010 | 5.5 | 4.5 | 4.7 | -2.9 | -2.9 | 24.4 | 5.9 |
10 | Sep 2010 | 11.7 | 7.5 | 8.5 | -6.8 | -3.3 | 16.1 | 5.3 |
Simple Regex Examples#
The following are simple examples of how to use regex with tidychef selections.
Note, for brevity we use the common shorthand re
for regex.
from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable
table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 3c")
# cells beginning with a capital M
m_cells = table.re("M.*").label_as("Cells starting with a capital M")
# cells containing the word "housing"
housing = table.re(".*housing.*").label_as("Cells containing the word housing")
# cells containing the word "work"
work = table.re(".*work.*").label_as("Cells containing the word work")
# cells ending in a year
year = table.re(".*[0-9][0-9][0-9][0-9]").label_as("Cells ending in a year")
preview(m_cells, housing, work, year, bounded="A4:H10")
Cells starting with a capital M |
Cells containing the word housing |
Cells containing the word work |
Cells ending in a year |
Table 3c
A | B | C | D | E | F | G | H | |
4 | Percentage change 3 months on previous 3 months | |||||||
5 | Time period | Public new housing | Private new housing | Total new housing | Infrastructure new work | Public other new work | Private industrial new work | Private commercial new work |
6 | Dataset identifier code | MVO6 | MVO7 | MVO8 | MVO9 | MVP2 | MVP3 | MVP4 |
7 | Jun 2010 | 5.6 | 9.8 | 8.8 | 3 | 4.3 | 3.7 | 1.9 |
8 | Jul 2010 | 2 | 5.6 | 4.8 | 0.2 | -0.2 | 9.7 | 3.5 |
9 | Aug 2010 | 5.5 | 4.5 | 4.7 | -2.9 | -2.9 | 24.4 | 5.9 |
10 | Sep 2010 | 11.7 | 7.5 | 8.5 | -6.8 | -3.3 | 16.1 | 5.3 |