Bands

Bands#

Our simple example from the repo splash page.

Tutorial Structure#

With these example tutorials I’m going to comment heavily and cover nuances in a follow up section (with liberal targetted previews as needed) as it’s the easiest way to grapple with new ideas. It may also be worth opening up these notebooks yourself (they’re in ./jupyterbook in the tidychef github repo) so you can run, alter and generally have a play about with this yourself as part of your learning.

We’ll cover:

source data
requirements, what we’re aiming to do here
show the full script (all logic commented)
output the selection preview
nuances (where applicable)
view the output

This sequencing is necessary as the output for some of the example is really long so that necessitates it coming last. If you’re viewing this via a jupyter book (i.e on the site) you can navigate between the above sections via your right hand menu.

Note - these tutorial scripts might seem verbose due all the comments but that’s ok (this is a tutorial after all). If you take them out you end up with a fairly succinct and human readable encapsulation of what would otherwise (with existing tools) be a rather convoluted and fragile set of instructions to express.

In virtually all cases I’ll make heavy use of preview and bounded to only look at relevant parts of what can be quite large datasets. Downloads links are provided for the source data.

Source#

For this example we’re extracting the following table:

From a csv source which can be viewed here.

Specification#

We want a “Value” column to hold the observations
We want other columns of: “Band”, “Name”, “Asset”
We want to preview selections inline.
We want to output one tidy data csv as “bands_tidy.csv”

Recipe#

from tidychef import acquire, filters, preview
from tidychef.direction import down, right, below
from tidychef.output import Column, TidyData

# Load a CSV table from a URL
table = acquire.csv.http(
    "https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv"
)

# Select numeric observations and label them
observations = table.is_numeric().label_as("Value")

# Select headers and label them
bands = table.row_containing_strings(["Beatles"]).is_not_blank().label_as("Band")
assets = table.row_containing_strings(["Cars"]).is_not_blank().label_as("Asset")
names = table.cell_containing_string("Beatles").shift(down).expand_to_box().is_not_numeric().label_as("Name")

# We'll request a preview to see our selections
preview(observations, bands, assets, names)

# Build tidy data by attaching observations and headers
tidy_data = TidyData(
    observations,
    Column(bands.attach_closest(right)),
    Column(assets.attach_directly(below)),
    Column(names.attach_directly(right)),
)

# Export the tidy data to CSV
tidy_data.to_csv("bands_tidy.csv")

Value

Band

Asset

Name

Unnamed Table

	A	B	C	D	E	F	G	H	I	J	K
1
2			Houses	Cars	Boats				Houses	Cars	Boats
3	Beatles						Rolling Stones
4		John	1	5	9			Keith	2	6	10
5		Paul	2	6	10			Mick	3	7	11
6		George	2	7	11			Charlie	3	8	12
7		Ringo	4	8	12			Ronnie	5	9	13
8

Nuances#

So this is our initial example so is reasonaby easy to follow along with, the only thing that I’d really stop and consider is the TidyData class, this:

tidy_data = TidyData(
    observations,
    Column(bands.attach_closest(right)),
    Column(assets.attach_directly(below)),
    Column(names.attach_directly(right)),
)

which is really just

tidy_data = TidyData(
    <your values>,                                                           # This becomes the "Value" column                                                     
    <a column and how it visually related to those values>                   # Becomes the next column "Band" 
    <another column and how it visually relates to those values>             # Becomes the next column "Asset"
    <another column and how it visually related to those values              # Becomes the next column "Name"
)

That’s probably the key insight here, every row entry in the TidyData class becomes a column in your output file and they’re presented in the order you speciy them (note - your value/observation column is always first).

Outputs#

The output generated by the above script can be downloaded here or viewed below.

print(tidy_data)

Value	Band	Asset	Name
1	Beatles	Houses	John
5	Beatles	Cars	John
9	Beatles	Boats	John
2	Rolling Stones	Houses	Keith
6	Rolling Stones	Cars	Keith
10	Rolling Stones	Boats	Keith
2	Beatles	Houses	Paul
6	Beatles	Cars	Paul
10	Beatles	Boats	Paul
3	Rolling Stones	Houses	Mick
7	Rolling Stones	Cars	Mick
11	Rolling Stones	Boats	Mick
2	Beatles	Houses	George
7	Beatles	Cars	George
11	Beatles	Boats	George
3	Rolling Stones	Houses	Charlie
8	Rolling Stones	Cars	Charlie
12	Rolling Stones	Boats	Charlie
4	Beatles	Houses	Ringo
8	Beatles	Cars	Ringo
12	Beatles	Boats	Ringo
5	Rolling Stones	Houses	Ronnie
9	Rolling Stones	Cars	Ronnie
13	Rolling Stones	Boats	Ronnie