Bands#

Our simple example from the repo splash page.

Tutorial Structure#

With these example tutorials I’m going to comment heavily and cover nuances in a follow up section (with liberal targetted previews as needed) as it’s the easiest way to grapple with new ideas. It may also be worth opening up these notebooks yourself (they’re in ./jupyterbook in the tidychef github repo) so you can run, alter and generally have a play about with this yourself as part of your learning.

We’ll cover:

  • source data

  • requirements, what we’re aiming to do here

  • show the full script (all logic commented)

  • output the selection preview

  • nuances (where applicable)

  • view the output

This sequencing is necessary as the output for some of the example is really long so that necessitates it coming last. If you’re viewing this via a jupyter book (i.e on the site) you can navigate between the above sections via your right hand menu.

Note - these tutorial scripts might seem verbose due all the comments but that’s ok (this is a tutorial after all). If you take them out you end up with a fairly succinct and human readable encapsulation of what would otherwise (with existing tools) be a rather convoluted and fragile set of instructions to express.

In virtually all cases I’ll make heavy use of preview and bounded to only look at relevant parts of what can be quite large datasets. Downloads links are provided for the source data.

Source#

For this example we’re extracting the following table:

From a csv source which can be viewed here.

Specification#

  • We want a “Value” column to hold the observations

  • We want other columns of: “Band”, “Name”, “Asset”

  • We want to preview selections inline.

  • We want to output one tidy data csv as “bands_tidy.csv”

Recipe#

from tidychef import acquire, filters, preview
from tidychef.direction import down, right, below
from tidychef.output import Column, TidyData

# Load a CSV table from a URL
table = acquire.csv.http(
    "https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv"
)

# Select numeric observations and label them
observations = table.is_numeric().label_as("Value")

# Select headers and label them
bands = table.row_containing_strings(["Beatles"]).is_not_blank().label_as("Band")
assets = table.row_containing_strings(["Cars"]).is_not_blank().label_as("Asset")
names = table.cell_containing_string("Beatles").shift(down).expand_to_box().is_not_numeric().label_as("Name")

# We'll request a preview to see our selections
preview(observations, bands, assets, names)

# Build tidy data by attaching observations and headers
tidy_data = TidyData(
    observations,
    Column(bands.attach_closest(right)),
    Column(assets.attach_directly(below)),
    Column(names.attach_directly(right)),
)

# Export the tidy data to CSV
tidy_data.to_csv("bands_tidy.csv")
Value
Band
Asset
Name

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

Nuances#

So this is our initial example so is reasonaby easy to follow along with, the only thing that I’d really stop and consider is the TidyData class, this:

tidy_data = TidyData(
    observations,
    Column(bands.attach_closest(right)),
    Column(assets.attach_directly(below)),
    Column(names.attach_directly(right)),
)

which is really just

tidy_data = TidyData(
    <your values>,                                                           # This becomes the "Value" column                                                     
    <a column and how it visually related to those values>                   # Becomes the next column "Band" 
    <another column and how it visually relates to those values>             # Becomes the next column "Asset"
    <another column and how it visually related to those values              # Becomes the next column "Name"
)

That’s probably the key insight here, every row entry in the TidyData class becomes a column in your output file and they’re presented in the order you speciy them (note - your value/observation column is always first).

Outputs#

The output generated by the above script can be downloaded here or viewed below.

print(tidy_data)
ValueBandAssetName
1BeatlesHousesJohn
5BeatlesCarsJohn
9BeatlesBoatsJohn
2Rolling StonesHousesKeith
6Rolling StonesCarsKeith
10Rolling StonesBoatsKeith
2BeatlesHousesPaul
6BeatlesCarsPaul
10BeatlesBoatsPaul
3Rolling StonesHousesMick
7Rolling StonesCarsMick
11Rolling StonesBoatsMick
2BeatlesHousesGeorge
7BeatlesCarsGeorge
11BeatlesBoatsGeorge
3Rolling StonesHousesCharlie
8Rolling StonesCarsCharlie
12Rolling StonesBoatsCharlie
4BeatlesHousesRingo
8BeatlesCarsRingo
12BeatlesBoatsRingo
5Rolling StonesHousesRonnie
9Rolling StonesCarsRonnie
13Rolling StonesBoatsRonnie