Bands#
Our simple example from the repo splash page.
Tutorial Structure#
With these example tutorials I’m going to comment heavily and cover nuances in a follow up section (with liberal targetted previews as needed) as it’s the easiest way to grapple with new ideas. It may also be worth opening up these notebooks yourself (they’re in ./jupyterbook
in the tidychef github repo) so you can run, alter and generally have a play about with this yourself as part of your learning.
We’ll cover:
source data
requirements, what we’re aiming to do here
show the full script (all logic commented)
output the selection preview
nuances (where applicable)
view the output
This sequencing is necessary as the output for some of the example is really long so that necessitates it coming last. If you’re viewing this via a jupyter book (i.e on the site) you can navigate between the above sections via your right hand menu.
Note - these tutorial scripts might seem verbose due all the comments but that’s ok (this is a tutorial after all). If you take them out you end up with a fairly succinct and human readable encapsulation of what would otherwise (with existing tools) be a rather convoluted and fragile set of instructions to express.
In virtually all cases I’ll make heavy use of preview
and bounded
to only look at relevant parts of what can be quite large datasets. Downloads links are provided for the source data.
Source#
For this example we’re extracting the following table:
From a csv source which can be viewed here.
Specification#
We want a “Value” column to hold the observations
We want other columns of: “Band”, “Name”, “Asset”
We want to preview selections inline.
We want to output one tidy data csv as “bands_tidy.csv”
Recipe#
from tidychef import acquire, filters, preview
from tidychef.direction import down, right, below
from tidychef.output import Column, TidyData
# Load a CSV table from a URL
table = acquire.csv.http(
"https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv"
)
# Select numeric observations and label them
observations = table.is_numeric().label_as("Value")
# Select headers and label them
bands = table.row_containing_strings(["Beatles"]).is_not_blank().label_as("Band")
assets = table.row_containing_strings(["Cars"]).is_not_blank().label_as("Asset")
names = table.cell_containing_string("Beatles").shift(down).expand_to_box().is_not_numeric().label_as("Name")
# We'll request a preview to see our selections
preview(observations, bands, assets, names)
# Build tidy data by attaching observations and headers
tidy_data = TidyData(
observations,
Column(bands.attach_closest(right)),
Column(assets.attach_directly(below)),
Column(names.attach_directly(right)),
)
# Export the tidy data to CSV
tidy_data.to_csv("bands_tidy.csv")
Value |
Band |
Asset |
Name |
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
Nuances#
So this is our initial example so is reasonaby easy to follow along with, the only thing that I’d really stop and consider is the TidyData
class, this:
tidy_data = TidyData(
observations,
Column(bands.attach_closest(right)),
Column(assets.attach_directly(below)),
Column(names.attach_directly(right)),
)
which is really just
tidy_data = TidyData(
<your values>, # This becomes the "Value" column
<a column and how it visually related to those values> # Becomes the next column "Band"
<another column and how it visually relates to those values> # Becomes the next column "Asset"
<another column and how it visually related to those values # Becomes the next column "Name"
)
That’s probably the key insight here, every row entry in the TidyData
class becomes a column in your output file and they’re presented in the order you speciy them (note - your value/observation column is always first).
Outputs#
The output generated by the above script can be downloaded here or viewed below.
print(tidy_data)
Value | Band | Asset | Name |
1 | Beatles | Houses | John |
5 | Beatles | Cars | John |
9 | Beatles | Boats | John |
2 | Rolling Stones | Houses | Keith |
6 | Rolling Stones | Cars | Keith |
10 | Rolling Stones | Boats | Keith |
2 | Beatles | Houses | Paul |
6 | Beatles | Cars | Paul |
10 | Beatles | Boats | Paul |
3 | Rolling Stones | Houses | Mick |
7 | Rolling Stones | Cars | Mick |
11 | Rolling Stones | Boats | Mick |
2 | Beatles | Houses | George |
7 | Beatles | Cars | George |
11 | Beatles | Boats | George |
3 | Rolling Stones | Houses | Charlie |
8 | Rolling Stones | Cars | Charlie |
12 | Rolling Stones | Boats | Charlie |
4 | Beatles | Houses | Ringo |
8 | Beatles | Cars | Ringo |
12 | Beatles | Boats | Ringo |
5 | Rolling Stones | Houses | Ronnie |
9 | Rolling Stones | Cars | Ronnie |
13 | Rolling Stones | Boats | Ronnie |