From Messy Tables
to Tidy Data.
Tidychef is an open-source Python library designed to solve a common, frustrating problem: extracting clean, usable data from spreadsheets and other messy table formats that were made for humans, not machines. It bridges the critical gap between visually complex tables and analysis-ready dataframes.
Messy Table
Region: North
Product | Q1 Sales | Q2 Sales
Apples | 100 | 120
Merged Header for Oranges | 150
Tidy Dataframe
Region | Product | Quarter | Sales
North | Apples | Q1 | 100
North | Apples | Q2 | 120
North | Oranges | Q2 | 150
Simple Example Problem Data
Two separate blocks (NHS Trust A & City Council), each with sub-columns (Budget, Staff, Projections) and with non standard indication of metrics β human readable, but parsing is painful.
Budget | Staff | Projects | Budget | Staff | Projects | ||||
NHS Trust A | City Council | ||||||||
Finance | 12M | 120 | 15 | Planning | 8M | 80 | 12 | ||
HR | 6M | 60 | 8 | Housing | 5M | 45 | 10 | ||
IT | 4M | 40 | 6 | Transport | 3M | 30 | 7 | ||
Operations | 10M | 100 | 20 | Environmental | 2M | 20 | 5 |
The Tidychef Solution
A long-form tidy output with columns: Band, Name, Asset, Value. The spatial structure is programmatically expressed through tidychefβs declarative API, making this complex layout easy to reshape.
Organisation | Department | Metric | Value |
---|---|---|---|
NHS Trust A | Finance | Budget | 12M |
NHS Trust A | Finance | Staff | 120 |
NHS Trust A | Finance | Projects | 15 |
NHS Trust A | HR | Budget | 6M |
NHS Trust A | HR | Staff | 60 |
NHS Trust A | HR | Projects | 8 |
NHS Trust A | IT | Budget | 4M |
NHS Trust A | IT | Staff | 40 |
NHS Trust A | IT | Projects | 6 |
NHS Trust A | Operations | Budget | 10M |
NHS Trust A | Operations | Staff | 100 |
NHS Trust A | Operations | Projects | 20 |
City Council | Planning | Budget | 8M |
City Council | Planning | Staff | 80 |
City Council | Planning | Projects | 12 |
City Council | Housing | Budget | 5M |
City Council | Housing | Staff | 45 |
City Council | Housing | Projects | 10 |
City Council | Transport | Budget | 3M |
City Council | Transport | Staff | 30 |
City Council | Transport | Projects | 7 |
City Council | Environmental | Budget | 2M |
City Council | Environmental | Staff | 20 |
City Council | Environmental | Projects | 5 |
Core Capabilities
Tidychef provides a declarative API to handle the most common pitfalls of messy tabulated data sources.
Visual Selection
Select data cells based on their spatial relationship to headers (e.g., "all cells to the right of 'Year' and below 'Sales'").
Reproducible Pipelines
Define a repeatable recipe for cleaning a specific report format, making data pipelines robust and easy to update.
Multi-Format Support
Works out-of-the-box with various formats including `.xls`, `.xlsx`, `.ods`, and `.csv`.
Pandas Integration
Seamlessly integrates with the data science ecosystem by outputting clean data directly into a Pandas DataFrame.
Competitive Landscape
Tidychef fills a specific niche that other popular tools don't fully address. See how it compares.
The radar chart below visualizes the strengths of each tool across key areas. Tidychef excels at programmatic, visual-based wrangling, a unique combination.
Select a tool to compare its specific pros and cons against Tidychef.
vs. OpenRefine
Pros of Tidychef
- Fully programmatic and scriptable
- Ideal for automated, reproducible pipelines
- Version-controllable (part of a codebase)
Cons of Tidychef
- Requires Python knowledge
- Less interactive/visual for exploration
Value Proposition & Adoption Potential
By focusing on a well-defined, common problem, Tidychef offers significant value to a range of data professionals, giving it strong potential for adoption.
Target User Base
Data Analysts & Scientists
Who need to quickly turn messy source files into analyzable data.
Data Engineers
Building robust and automated ETL/ELT pipelines from unreliable sources.
Government & Research
Working with public data often released in complex table formats.
Key to Adoption
Niche Focus
It solves one problem exceptionally well, making it a clear choice for the task.
Ecosystem Fit
It doesn't seek to replace Pandas, but to complement it, easing adoption.
Documentation
Clear examples and training materials will be crucial for user onboarding.