Pandas#
The following is a quick note on how to transfer data extracted via tidychef into a pandas dataframe.
Note - pandas is a huge and ever evolving ecosphere all of itself. As such we’re not going to bring it into tidychef as a dependency (it has very differnet goals and we like our simple dependency chain). We do however provide the following conveniences for quickly shunting tidy data outputs into pandas in a suitably decoupled way |
---|
Source Data#
The data source we’re using for these examples is shown below:
The full data source can be viewed here.
from tidychef import acquire, preview
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
preview(table)
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
TidyData.to_dict()#
The following is an example of using the TidyData.from_dict()
method.
Note - the dict in question is the same structure used by the pandas DataFrame.from_dict()
method - this will form our mechanism of handover.
import json
from tidychef import acquire, preview, filters
from tidychef.direction import right, below
from tidychef.selection import CsvSelectable
from tidychef.output import TidyData, Column
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
observations = table.filter(filters.is_numeric).label_as("Observation")
bands = (table.excel_ref("A3") | table.excel_ref("G3")).label_as("Band")
assets = table.excel_ref('2').is_not_blank().label_as("Asset")
members = (table.excel_ref("B") | table.excel_ref("H")).is_not_blank().label_as("Member")
preview(observations, bands, assets, members)
tidy_data = TidyData(
observations,
Column(bands.attach_closest(right)),
Column(assets.attach_directly(below)),
Column(members.attach_directly(right))
)
# See the non dict tidydata
print(tidy_data)
# Now see it in dictionary form.
# We'll use json to prettyify the dict version
print(json.dumps(tidy_data.to_dict(), indent=2))
Observation |
Band |
Asset |
Member |
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
Observation | Band | Asset | Member |
1 | Beatles | Houses | John |
5 | Beatles | Cars | John |
9 | Beatles | Boats | John |
2 | Rolling Stones | Houses | Keith |
6 | Rolling Stones | Cars | Keith |
10 | Rolling Stones | Boats | Keith |
2 | Beatles | Houses | Paul |
6 | Beatles | Cars | Paul |
10 | Beatles | Boats | Paul |
3 | Rolling Stones | Houses | Mick |
7 | Rolling Stones | Cars | Mick |
11 | Rolling Stones | Boats | Mick |
2 | Beatles | Houses | George |
7 | Beatles | Cars | George |
11 | Beatles | Boats | George |
3 | Rolling Stones | Houses | Charlie |
8 | Rolling Stones | Cars | Charlie |
12 | Rolling Stones | Boats | Charlie |
4 | Beatles | Houses | Ringo |
8 | Beatles | Cars | Ringo |
12 | Beatles | Boats | Ringo |
5 | Rolling Stones | Houses | Ronnie |
9 | Rolling Stones | Cars | Ronnie |
13 | Rolling Stones | Boats | Ronnie |
{
"Observation": [
"1",
"5",
"9",
"2",
"6",
"10",
"2",
"6",
"10",
"3",
"7",
"11",
"2",
"7",
"11",
"3",
"8",
"12",
"4",
"8",
"12",
"5",
"9",
"13"
],
"Band": [
"Beatles",
"Beatles",
"Beatles",
"Rolling Stones",
"Rolling Stones",
"Rolling Stones",
"Beatles",
"Beatles",
"Beatles",
"Rolling Stones",
"Rolling Stones",
"Rolling Stones",
"Beatles",
"Beatles",
"Beatles",
"Rolling Stones",
"Rolling Stones",
"Rolling Stones",
"Beatles",
"Beatles",
"Beatles",
"Rolling Stones",
"Rolling Stones",
"Rolling Stones"
],
"Asset": [
"Houses",
"Cars",
"Boats",
"Houses",
"Cars",
"Boats",
"Houses",
"Cars",
"Boats",
"Houses",
"Cars",
"Boats",
"Houses",
"Cars",
"Boats",
"Houses",
"Cars",
"Boats",
"Houses",
"Cars",
"Boats",
"Houses",
"Cars",
"Boats"
],
"Member": [
"John",
"John",
"John",
"Keith",
"Keith",
"Keith",
"Paul",
"Paul",
"Paul",
"Mick",
"Mick",
"Mick",
"George",
"George",
"George",
"Charlie",
"Charlie",
"Charlie",
"Ringo",
"Ringo",
"Ringo",
"Ronnie",
"Ronnie",
"Ronnie"
]
}
A dictionary like the above can be passed directly into pandas trivially as per the following
import pandas as pd
# ..
# ...
# the code from the above example
# ...
# ..
tidy_data_dict = tidy_data.to_dict()
dataframe = pd.DataFrame.from_dict(tidy_data_dict)