TidyData: Closest#

This section explains and gives examples of defining a closest visual relationship.

Source Data#

The data source we’re using for these examples is shown below:

The full data source can be viewed here.

from tidychef import acquire, preview
from tidychef.selection import CsvSelectable

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
preview(table)

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

The Problem#

If you look at the below preview, there are two problem cells in this table:

from tidychef import acquire, preview
from tidychef.selection import CsvSelectable

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")

bands = (table.excel_ref("A3") | table.excel_ref("G3")).label_as("Bands")
preview(bands)
Bands

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

These cells are problematic as they do not exist in a specific cardinal direction to the observations that they inform.

In other words attach_directly() will not work.

Instead, we need to use attach_closest()

The Solution#

Firstly, lets create some selections of cells.

from tidychef import acquire, preview, filters
from tidychef.selection import CsvSelectable

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")

observations = table.filter(filters.is_numeric).label_as("Observations")
bands = (table.excel_ref("A3") | table.excel_ref("G3")).label_as("Bands")
preview(observations, bands)
Observations
Bands

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

Now think of how you’d describe this visual relationship in plain english. They way we think about it in tidychef would be:

For each cell in the “bands” selection, the observations are the ones that are the “closest” to the right.

As a tidychef method, this is expressed as bands.attach_closest(right).

Lets try this in practice.

from tidychef import acquire, preview, filters
from tidychef.direction import right
from tidychef.output import TidyData, Column
from tidychef.selection import CsvSelectable

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")

observations = table.filter(filters.is_numeric).label_as("Observations")
bands = (table.excel_ref("A3") | table.excel_ref("G3")).label_as("Bands")
preview(observations, bands)

tidy_data = TidyData(
    observations,
    Column(bands.attach_closest(right))
)
print(tidy_data)
Observations
Bands

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

ObservationsBands
1Beatles
5Beatles
9Beatles
2Rolling Stones
6Rolling Stones
10Rolling Stones
2Beatles
6Beatles
10Beatles
3Rolling Stones
7Rolling Stones
11Rolling Stones
2Beatles
7Beatles
11Beatles
3Rolling Stones
8Rolling Stones
12Rolling Stones
4Beatles
8Beatles
12Beatles
5Rolling Stones
9Rolling Stones
13Rolling Stones


Putting it all together#

Now let’s combine this new closest method with the direct method previously detailed.

from tidychef import acquire, preview, filters
from tidychef.direction import right, below
from tidychef.output import TidyData, Column
from tidychef.selection import CsvSelectable

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")

observations = table.filter(filters.is_numeric).label_as("Observations")
bands = (table.excel_ref("A3") | table.excel_ref("G3")).label_as("Bands")
assets = table.excel_ref('2').is_not_blank().label_as("Assets")
members = (table.excel_ref("B") | table.excel_ref("H")).is_not_blank().label_as("Members")
preview(observations, bands, assets, members)

tidy_data = TidyData(
    observations,
    Column(bands.attach_closest(right)),
    Column(assets.attach_directly(below)),
    Column(members.attach_directly(right))

)
print(tidy_data)
Observations
Bands
Assets
Members

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

ObservationsBandsAssetsMembers
1BeatlesHousesJohn
5BeatlesCarsJohn
9BeatlesBoatsJohn
2Rolling StonesHousesKeith
6Rolling StonesCarsKeith
10Rolling StonesBoatsKeith
2BeatlesHousesPaul
6BeatlesCarsPaul
10BeatlesBoatsPaul
3Rolling StonesHousesMick
7Rolling StonesCarsMick
11Rolling StonesBoatsMick
2BeatlesHousesGeorge
7BeatlesCarsGeorge
11BeatlesBoatsGeorge
3Rolling StonesHousesCharlie
8Rolling StonesCarsCharlie
12Rolling StonesBoatsCharlie
4BeatlesHousesRingo
8BeatlesCarsRingo
12BeatlesBoatsRingo
5Rolling StonesHousesRonnie
9Rolling StonesCarsRonnie
13Rolling StonesBoatsRonnie