Apply#
This section explain how to use the apply=
keyword with the Column()
class to do simple data cleansing during extraction.
Source Data#
The data source we’re using for these examples is shown below:
The full data source can be viewed here.
from tidychef import acquire, preview
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
preview(table)
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
What is Apply#
The apply=
keyword allowed you to pass in a single python “callable” (a function or class that can be passed arguments).
Such callables are passed into the Column()
class such as as per the following snippet:
Column(selection.attach_directly(left), apply=myfunction)
In all cases the callable is a thing that acts upon the contents of the specific column at the point of extraction (so it’ll change your output, never your input).
Apply: Function Example#
The following is an example of using apply=
with a simple user defined function.
from tidychef import acquire, preview
from tidychef.direction import right
from tidychef.selection import CsvSelectable, filters
from tidychef.output import TidyData, Column
# this is our custom function for apply
def lennon_iser(cell_value: str):
"""
A simple function to prefix " Lennon" where the value of
a cell is "John"
"""
if cell_value == "John":
return cell_value+" Lennon"
return cell_value
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
observations = table.filter(filters.is_numeric).label_as("Observations")
members = (table.excel_ref("B") | table.excel_ref("H")).is_not_blank().label_as("Members")
preview(observations, members)
tidy_data = TidyData(
observations,
Column(members.attach_directly(right), apply=lennon_iser)
)
print(tidy_data)
Observations |
Members |
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
Observations | Members |
1 | John Lennon |
5 | John Lennon |
9 | John Lennon |
2 | Keith |
6 | Keith |
10 | Keith |
2 | Paul |
6 | Paul |
10 | Paul |
3 | Mick |
7 | Mick |
11 | Mick |
2 | George |
7 | George |
11 | George |
3 | Charlie |
8 | Charlie |
12 | Charlie |
4 | Ringo |
8 | Ringo |
12 | Ringo |
5 | Ronnie |
9 | Ronnie |
13 | Ronnie |
Apply: Lambda Example#
Here’s the same example but using a lambda function
from tidychef import acquire, filters, preview
from tidychef.direction import right
from tidychef.output import TidyData, Column
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
observations = table.filter(filters.is_numeric).label_as("Observations")
members = (table.excel_ref("B") | table.excel_ref("H")).is_not_blank().label_as("Members")
preview(observations, members)
tidy_data = TidyData(
observations,
Column(members.attach_directly(right), apply=lambda x: x+" Lennon" if x == "John" else x)
)
print(tidy_data)
Observations |
Members |
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
Observations | Members |
1 | John Lennon |
5 | John Lennon |
9 | John Lennon |
2 | Keith |
6 | Keith |
10 | Keith |
2 | Paul |
6 | Paul |
10 | Paul |
3 | Mick |
7 | Mick |
11 | Mick |
2 | George |
7 | George |
11 | George |
3 | Charlie |
8 | Charlie |
12 | Charlie |
4 | Ringo |
8 | Ringo |
12 | Ringo |
5 | Ronnie |
9 | Ronnie |
13 | Ronnie |
Apply: Class#
This final example creates a callable python class.
This is a more complex example but also a powerful one, as a class based callable can hold data and allows you to make far more complex conditionals.
If you’re still relatively early in your journey with python you can safely come back to this at a later time. It’s useful but the function and lambda function implementations are sufficient for many if not most use cases.
For this example we’re going to add the surname of all beatles members.
from typing import Dict
from tidychef import acquire, preview
from tidychef.direction import right
from tidychef.output import TidyData, Column
from tidychef.selection import CsvSelectable
# this is our custom class
class SurnamesFromDict:
def __init__(self, name_surname_dict: Dict[str,str]):
"""
A class that adds surnames when first names are provided
"""
self.name_surname_dict = name_surname_dict
def __call__(self, cell_value):
return f"{cell_value} {self.name_surname_dict[cell_value]}"
# now we create an instance of our custom class with the data lookups
surname_getter = SurnamesFromDict({
"John": "Lennon",
"Paul": "McCartney",
"George": "Harrison",
"Ringo": "Starr"
})
# Now we do the extraction
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
observations = table.excel_ref('C4:E7').label_as("Observations")
members = table.excel_ref("B").is_not_blank().label_as("Members")
preview(observations, members)
tidy_data = TidyData(
observations,
Column(members.attach_directly(right), apply=surname_getter)
)
print(tidy_data)
Observations |
Members |
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
Observations | Members |
1 | John Lennon |
5 | John Lennon |
9 | John Lennon |
2 | Paul McCartney |
6 | Paul McCartney |
10 | Paul McCartney |
2 | George Harrison |
7 | George Harrison |
11 | George Harrison |
4 | Ringo Starr |
8 | Ringo Starr |
12 | Ringo Starr |