Apply#

This section explain how to use the apply= keyword with the Column() class to do simple data cleansing during extraction.

Source Data#

The data source we’re using for these examples is shown below:

The full data source can be viewed here.

from tidychef import acquire, preview
from tidychef.selection import CsvSelectable

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
preview(table)

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

What is Apply#

The apply= keyword allowed you to pass in a single python “callable” (a function or class that can be passed arguments).

Such callables are passed into the Column() class such as as per the following snippet:

Column(selection.attach_directly(left), apply=myfunction)

In all cases the callable is a thing that acts upon the contents of the specific column at the point of extraction (so it’ll change your output, never your input).

Apply: Function Example#

The following is an example of using apply= with a simple user defined function.

from tidychef import acquire, preview
from tidychef.direction import right
from tidychef.selection import CsvSelectable, filters
from tidychef.output import TidyData, Column

# this is our custom function for apply
def lennon_iser(cell_value: str):
    """
    A simple function to prefix " Lennon" where the value of
    a cell is "John"
    """
    if cell_value == "John":
        return cell_value+" Lennon"
    return cell_value


table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")

observations = table.filter(filters.is_numeric).label_as("Observations")
members = (table.excel_ref("B") | table.excel_ref("H")).is_not_blank().label_as("Members")
preview(observations, members)

tidy_data = TidyData(
    observations,
    Column(members.attach_directly(right), apply=lennon_iser)
)
print(tidy_data)
Observations
Members

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

ObservationsMembers
1John Lennon
5John Lennon
9John Lennon
2Keith
6Keith
10Keith
2Paul
6Paul
10Paul
3Mick
7Mick
11Mick
2George
7George
11George
3Charlie
8Charlie
12Charlie
4Ringo
8Ringo
12Ringo
5Ronnie
9Ronnie
13Ronnie


Apply: Lambda Example#

Here’s the same example but using a lambda function

from tidychef import acquire, filters, preview
from tidychef.direction import right
from tidychef.output import TidyData, Column
from tidychef.selection import CsvSelectable

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")

observations = table.filter(filters.is_numeric).label_as("Observations")
members = (table.excel_ref("B") | table.excel_ref("H")).is_not_blank().label_as("Members")
preview(observations, members)

tidy_data = TidyData(
    observations,
    Column(members.attach_directly(right), apply=lambda x: x+" Lennon" if x == "John" else x)
)
print(tidy_data)
Observations
Members

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

ObservationsMembers
1John Lennon
5John Lennon
9John Lennon
2Keith
6Keith
10Keith
2Paul
6Paul
10Paul
3Mick
7Mick
11Mick
2George
7George
11George
3Charlie
8Charlie
12Charlie
4Ringo
8Ringo
12Ringo
5Ronnie
9Ronnie
13Ronnie


Apply: Class#

This final example creates a callable python class.

This is a more complex example but also a powerful one, as a class based callable can hold data and allows you to make far more complex conditionals.

If you’re still relatively early in your journey with python you can safely come back to this at a later time. It’s useful but the function and lambda function implementations are sufficient for many if not most use cases.

For this example we’re going to add the surname of all beatles members.

from typing import Dict
from tidychef import acquire, preview
from tidychef.direction import right
from tidychef.output import TidyData, Column
from tidychef.selection import CsvSelectable

# this is our custom class
class SurnamesFromDict:

    def __init__(self, name_surname_dict: Dict[str,str]):
        """
        A class that adds surnames when first names are provided
        """
        self.name_surname_dict = name_surname_dict

    def __call__(self, cell_value):
        return f"{cell_value} {self.name_surname_dict[cell_value]}"

# now we create an instance of our custom class with the data lookups
surname_getter = SurnamesFromDict({
        "John": "Lennon",
        "Paul": "McCartney",
        "George": "Harrison",
        "Ringo": "Starr"
    })


# Now we do the extraction
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
observations = table.excel_ref('C4:E7').label_as("Observations")
members = table.excel_ref("B").is_not_blank().label_as("Members")
preview(observations, members)

tidy_data = TidyData(
    observations,
    Column(members.attach_directly(right), apply=surname_getter)
)
print(tidy_data)
Observations
Members

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

ObservationsMembers
1John Lennon
5John Lennon
9John Lennon
2Paul McCartney
6Paul McCartney
10Paul McCartney
2George Harrison
7George Harrison
11George Harrison
4Ringo Starr
8Ringo Starr
12Ringo Starr