Pandas#

The following is a quick note on how to transfer data extracted via tidychef into a pandas dataframe.

Note - pandas is a huge and ever evolving ecosphere all of itself. As such we’re not going to bring it into tidychef as a dependency (it has very differnet goals and we like our simple dependency chain). We do however provide the following conveniences for quickly shunting tidy data outputs into pandas in a suitably decoupled way

Source Data#

The data source we’re using for these examples is shown below:

The full data source can be viewed here.

from tidychef import acquire, preview
from tidychef.selection import CsvSelectable

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
preview(table)

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

TidyData.to_dict()#

The following is an example of using the TidyData.from_dict() method.

Note - the dict in question is the same structure used by the pandas DataFrame.from_dict() method - this will form our mechanism of handover.

import json
from tidychef import acquire, preview, filters
from tidychef.direction import right, below
from tidychef.selection import CsvSelectable
from tidychef.output import TidyData, Column

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")

observations = table.filter(filters.is_numeric).label_as("Observation")
bands = (table.excel_ref("A3") | table.excel_ref("G3")).label_as("Band")
assets = table.excel_ref('2').is_not_blank().label_as("Asset")
members = (table.excel_ref("B") | table.excel_ref("H")).is_not_blank().label_as("Member")
preview(observations, bands, assets, members)

tidy_data = TidyData(
    observations,
    Column(bands.attach_closest(right)),
    Column(assets.attach_directly(below)),
    Column(members.attach_directly(right))
)

# See the non dict tidydata
print(tidy_data)

# Now see it in dictionary form.
# We'll use json to prettyify the dict version
print(json.dumps(tidy_data.to_dict(), indent=2))
Observation
Band
Asset
Member

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

ObservationBandAssetMember
1BeatlesHousesJohn
5BeatlesCarsJohn
9BeatlesBoatsJohn
2Rolling StonesHousesKeith
6Rolling StonesCarsKeith
10Rolling StonesBoatsKeith
2BeatlesHousesPaul
6BeatlesCarsPaul
10BeatlesBoatsPaul
3Rolling StonesHousesMick
7Rolling StonesCarsMick
11Rolling StonesBoatsMick
2BeatlesHousesGeorge
7BeatlesCarsGeorge
11BeatlesBoatsGeorge
3Rolling StonesHousesCharlie
8Rolling StonesCarsCharlie
12Rolling StonesBoatsCharlie
4BeatlesHousesRingo
8BeatlesCarsRingo
12BeatlesBoatsRingo
5Rolling StonesHousesRonnie
9Rolling StonesCarsRonnie
13Rolling StonesBoatsRonnie

{
  "Observation": [
    "1",
    "5",
    "9",
    "2",
    "6",
    "10",
    "2",
    "6",
    "10",
    "3",
    "7",
    "11",
    "2",
    "7",
    "11",
    "3",
    "8",
    "12",
    "4",
    "8",
    "12",
    "5",
    "9",
    "13"
  ],
  "Band": [
    "Beatles",
    "Beatles",
    "Beatles",
    "Rolling Stones",
    "Rolling Stones",
    "Rolling Stones",
    "Beatles",
    "Beatles",
    "Beatles",
    "Rolling Stones",
    "Rolling Stones",
    "Rolling Stones",
    "Beatles",
    "Beatles",
    "Beatles",
    "Rolling Stones",
    "Rolling Stones",
    "Rolling Stones",
    "Beatles",
    "Beatles",
    "Beatles",
    "Rolling Stones",
    "Rolling Stones",
    "Rolling Stones"
  ],
  "Asset": [
    "Houses",
    "Cars",
    "Boats",
    "Houses",
    "Cars",
    "Boats",
    "Houses",
    "Cars",
    "Boats",
    "Houses",
    "Cars",
    "Boats",
    "Houses",
    "Cars",
    "Boats",
    "Houses",
    "Cars",
    "Boats",
    "Houses",
    "Cars",
    "Boats",
    "Houses",
    "Cars",
    "Boats"
  ],
  "Member": [
    "John",
    "John",
    "John",
    "Keith",
    "Keith",
    "Keith",
    "Paul",
    "Paul",
    "Paul",
    "Mick",
    "Mick",
    "Mick",
    "George",
    "George",
    "George",
    "Charlie",
    "Charlie",
    "Charlie",
    "Ringo",
    "Ringo",
    "Ringo",
    "Ronnie",
    "Ronnie",
    "Ronnie"
  ]
}

A dictionary like the above can be passed directly into pandas trivially as per the following

import pandas as pd

# ..
# ...
# the code from the above example
# ...
# ..


tidy_data_dict = tidy_data.to_dict()
dataframe = pd.DataFrame.from_dict(tidy_data_dict)