TidyData: Horizontal Conditions#

When extracting data there will be columns whose value will depend conditionally on the values of other columns. In tidychef we handle this scenario with what we call HorizontalConditions.

Source Data#

The data source we’re using for these examples is shown below:

The full data source can be viewed here.

from tidychef import acquire, preview
from tidychef.selection import CsvSelectable

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
preview(table)

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

Syntax#

The basic syntax for constructing a horizontal conditional is as follows:

Column.horizontal_condition(<name>, <callable>)

The <callable> is a python function or lambda function that operates upon a dictionary.

The Horizontal Condition Dictionary#

The dictionary in question consists of:

  • keys: the names of the extracted columns

  • values: the values extracted against the same observation the horoxontal condition is operating against.


Example:

Let’s imagine the you run a transform that will create the following line of tidy data.

Observation

Member

Assets

Band

5

John

Cars

Beatles

If you were to add a horizonal condition to the TidyData constructor, then for that specific observation the dictionary accessible to the horizontal condition would be:

{
    "Observation": "5",
    "Member": "John",
    "Assets": "Cars",
    "Band" : "Beatles
}

This will be shown in context in the examples below.

Condition: Singer In The Beatles#

For our first example, we’ll create a horizontal condition columns to identify members of The Rolling Stones whose names end with “e”.

You’ve seen this example before, so mainly focus on the new Column.horizontal_condition() syntax.

from typing import Dict
from tidychef import acquire, preview, filters
from tidychef.direction import right, below
from tidychef.selection import CsvSelectable, filters
from tidychef.output import TidyData, Column

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")

def e_ending_stones(line: Dict[str, str]) -> str:
    """
    Returns "True" as string if member ends with "e"
    and band is "Rolling Stones"
    """
    if line["Member"].endswith("e") and line["Band"] == "Rolling Stones":
        return "True"
    return "False"

observations = table.filter(filters.is_numeric).label_as("Observation")
bands = (table.excel_ref("A3") | table.excel_ref("G3")).label_as("Band")
assets = table.excel_ref('2').is_not_blank().label_as("Asset")
members = (table.excel_ref("B") | table.excel_ref("H")).is_not_blank().label_as("Member")
preview(observations, bands, assets, members)

tidy_data = TidyData(
    observations,
    Column(bands.attach_closest(right)),
    Column(assets.attach_directly(below)),
    Column(members.attach_directly(right)),
    Column.horizontal_condition("Stones ending with e", e_ending_stones)
)
print(tidy_data)
Observation
Band
Asset
Member

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

ObservationBandAssetMemberStones ending with e
1BeatlesHousesJohnFalse
5BeatlesCarsJohnFalse
9BeatlesBoatsJohnFalse
2Rolling StonesHousesKeithFalse
6Rolling StonesCarsKeithFalse
10Rolling StonesBoatsKeithFalse
2BeatlesHousesPaulFalse
6BeatlesCarsPaulFalse
10BeatlesBoatsPaulFalse
3Rolling StonesHousesMickFalse
7Rolling StonesCarsMickFalse
11Rolling StonesBoatsMickFalse
2BeatlesHousesGeorgeFalse
7BeatlesCarsGeorgeFalse
11BeatlesBoatsGeorgeFalse
3Rolling StonesHousesCharlieTrue
8Rolling StonesCarsCharlieTrue
12Rolling StonesBoatsCharlieTrue
4BeatlesHousesRingoFalse
8BeatlesCarsRingoFalse
12BeatlesBoatsRingoFalse
5Rolling StonesHousesRonnieTrue
9Rolling StonesCarsRonnieTrue
13Rolling StonesBoatsRonnieTrue