TidyData: Observation Formatting#
This section deals with how to format the observations column as part of a tidy data extraction.
Some common examples of where you’d want to do this:
The observation data has been created with superfluous “.0”s after all integer observations.
The observation data has a very high degree of precision beyond what is required or can be ingested by your data systems.
The observations have superflous characters or notations you wish to remove (note: how to capture this sort of data marker metadata as its own column is covered in the intermediate section)
Note: As with everything else in this framework this is just a tool. Whether and to what extent you should be using it depends on your own use case and domain knowledge.
Source Data#
The data source we’re using for these examples is shown below:
The full data source can be viewed here.
from tidychef import acquire, preview
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
preview(table)
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
Explanation#
To format the observations you pass a callable to the TidyData
constructor as a obs_apply=
keyword argument.
Some simple examples follow using lambda functions (though as with the other callables, you could instead use function or classes with the __call__
dunder method).
Lambda Example 1:#
Replace trailing “.0” values with an empty string.
tidy_data = TidyData(
observations,
Column(bands.attach_closest(right)),
obs_apply=lambda x: x.replace(".0", "")
)
Lambda Example 2:#
Remove the last character is the last character is non numeric.
tidy_data = TidyData(
observations,
Column(bands.attach_closest(right)),
obs_apply=lambda x: x[:-1] if not x[:-1].isnumeric() else x
)
The point is that any callable that takes a str
and returns a str
can be passed to obs_apply
.
Example#
To show this in practice we’re just going to prefix each observation with “foo “.
from tidychef import acquire, preview, filters
from tidychef.direction import right, below
from tidychef.output import TidyData, Column
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
observations = table.filter(filters.is_numeric).label_as("Observations")
bands = (table.excel_ref("A3") | table.excel_ref("G3")).label_as("Bands")
assets = table.excel_ref('2').is_not_blank().label_as("Assets")
members = (table.excel_ref("B") | table.excel_ref("H")).is_not_blank().label_as("Members")
preview(observations, bands, assets, members)
tidy_data = TidyData(
observations,
Column(bands.attach_closest(right)),
Column(assets.attach_directly(below)),
Column(members.attach_directly(right)),
obs_apply=lambda x: "foo "+x
)
print(tidy_data)
Observations |
Bands |
Assets |
Members |
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
Observations | Bands | Assets | Members |
foo 1 | Beatles | Houses | John |
foo 5 | Beatles | Cars | John |
foo 9 | Beatles | Boats | John |
foo 2 | Rolling Stones | Houses | Keith |
foo 6 | Rolling Stones | Cars | Keith |
foo 10 | Rolling Stones | Boats | Keith |
foo 2 | Beatles | Houses | Paul |
foo 6 | Beatles | Cars | Paul |
foo 10 | Beatles | Boats | Paul |
foo 3 | Rolling Stones | Houses | Mick |
foo 7 | Rolling Stones | Cars | Mick |
foo 11 | Rolling Stones | Boats | Mick |
foo 2 | Beatles | Houses | George |
foo 7 | Beatles | Cars | George |
foo 11 | Beatles | Boats | George |
foo 3 | Rolling Stones | Houses | Charlie |
foo 8 | Rolling Stones | Cars | Charlie |
foo 12 | Rolling Stones | Boats | Charlie |
foo 4 | Beatles | Houses | Ringo |
foo 8 | Beatles | Cars | Ringo |
foo 12 | Beatles | Boats | Ringo |
foo 5 | Rolling Stones | Houses | Ronnie |
foo 9 | Rolling Stones | Cars | Ronnie |
foo 13 | Rolling Stones | Boats | Ronnie |