TidyData: Closest#
This section explains and gives examples of defining a closest
visual relationship.
Source Data#
The data source we’re using for these examples is shown below:
The full data source can be viewed here.
from tidychef import acquire, preview
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
preview(table)
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
The Problem#
If you look at the below preview, there are two problem cells in this table:
from tidychef import acquire, preview
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
bands = (table.excel_ref("A3") | table.excel_ref("G3")).label_as("Bands")
preview(bands)
Bands |
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
These cells are problematic as they do not exist in a specific cardinal direction to the observations that they inform.
In other words attach_directly()
will not work.
Instead, we need to use attach_closest()
The Solution#
Firstly, lets create some selections of cells.
from tidychef import acquire, preview, filters
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
observations = table.filter(filters.is_numeric).label_as("Observations")
bands = (table.excel_ref("A3") | table.excel_ref("G3")).label_as("Bands")
preview(observations, bands)
Observations |
Bands |
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
Now think of how you’d describe this visual relationship in plain english. They way we think about it in tidychef
would be:
For each cell in the “bands” selection, the observations are the ones that are the “closest” to the right.
As a tidychef method, this is expressed as bands.attach_closest(right)
.
Lets try this in practice.
from tidychef import acquire, preview, filters
from tidychef.direction import right
from tidychef.output import TidyData, Column
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
observations = table.filter(filters.is_numeric).label_as("Observations")
bands = (table.excel_ref("A3") | table.excel_ref("G3")).label_as("Bands")
preview(observations, bands)
tidy_data = TidyData(
observations,
Column(bands.attach_closest(right))
)
print(tidy_data)
Observations |
Bands |
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
Observations | Bands |
1 | Beatles |
5 | Beatles |
9 | Beatles |
2 | Rolling Stones |
6 | Rolling Stones |
10 | Rolling Stones |
2 | Beatles |
6 | Beatles |
10 | Beatles |
3 | Rolling Stones |
7 | Rolling Stones |
11 | Rolling Stones |
2 | Beatles |
7 | Beatles |
11 | Beatles |
3 | Rolling Stones |
8 | Rolling Stones |
12 | Rolling Stones |
4 | Beatles |
8 | Beatles |
12 | Beatles |
5 | Rolling Stones |
9 | Rolling Stones |
13 | Rolling Stones |
Putting it all together#
Now let’s combine this new closest method with the direct method previously detailed.
from tidychef import acquire, preview, filters
from tidychef.direction import right, below
from tidychef.output import TidyData, Column
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
observations = table.filter(filters.is_numeric).label_as("Observations")
bands = (table.excel_ref("A3") | table.excel_ref("G3")).label_as("Bands")
assets = table.excel_ref('2').is_not_blank().label_as("Assets")
members = (table.excel_ref("B") | table.excel_ref("H")).is_not_blank().label_as("Members")
preview(observations, bands, assets, members)
tidy_data = TidyData(
observations,
Column(bands.attach_closest(right)),
Column(assets.attach_directly(below)),
Column(members.attach_directly(right))
)
print(tidy_data)
Observations |
Bands |
Assets |
Members |
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
Observations | Bands | Assets | Members |
1 | Beatles | Houses | John |
5 | Beatles | Cars | John |
9 | Beatles | Boats | John |
2 | Rolling Stones | Houses | Keith |
6 | Rolling Stones | Cars | Keith |
10 | Rolling Stones | Boats | Keith |
2 | Beatles | Houses | Paul |
6 | Beatles | Cars | Paul |
10 | Beatles | Boats | Paul |
3 | Rolling Stones | Houses | Mick |
7 | Rolling Stones | Cars | Mick |
11 | Rolling Stones | Boats | Mick |
2 | Beatles | Houses | George |
7 | Beatles | Cars | George |
11 | Beatles | Boats | George |
3 | Rolling Stones | Houses | Charlie |
8 | Rolling Stones | Cars | Charlie |
12 | Rolling Stones | Boats | Charlie |
4 | Beatles | Houses | Ringo |
8 | Beatles | Cars | Ringo |
12 | Beatles | Boats | Ringo |
5 | Rolling Stones | Houses | Ronnie |
9 | Rolling Stones | Cars | Ronnie |
13 | Rolling Stones | Boats | Ronnie |