From Messy Tables
to Tidy Data.

Tidychef is an open-source Python library designed to solve a common, frustrating problem: extracting clean, usable data from spreadsheets and other messy table formats that were made for humans, not machines. It bridges the critical gap between visually complex tables and analysis-ready dataframes.

Messy Table

Region: North

Product | Q1 Sales | Q2 Sales

Apples | 100 | 120

Merged Header for Oranges | 150

β†’

Tidy Dataframe

Region | Product | Quarter | Sales

North | Apples | Q1 | 100

North | Apples | Q2 | 120

North | Oranges | Q2 | 150

Simple Example Problem Data

Two separate blocks (NHS Trust A & City Council), each with sub-columns (Budget, Staff, Projections) and with non standard indication of metrics β€” human readable, but parsing is painful.

Budget Staff Projects Budget Staff Projects
NHS Trust A City Council
Finance 12M 120 15 Planning 8M 80 12
HR 6M 60 8 Housing 5M 45 10
IT 4M 40 6 Transport 3M 30 7
Operations 10M 100 20 Environmental 2M 20 5

The Tidychef Solution

A long-form tidy output with columns: Band, Name, Asset, Value. The spatial structure is programmatically expressed through tidychef’s declarative API, making this complex layout easy to reshape.

Organisation Department Metric Value
NHS Trust AFinanceBudget12M
NHS Trust AFinanceStaff120
NHS Trust AFinanceProjects15
NHS Trust AHRBudget6M
NHS Trust AHRStaff60
NHS Trust AHRProjects8
NHS Trust AITBudget4M
NHS Trust AITStaff40
NHS Trust AITProjects6
NHS Trust AOperationsBudget10M
NHS Trust AOperationsStaff100
NHS Trust AOperationsProjects20
City CouncilPlanningBudget8M
City CouncilPlanningStaff80
City CouncilPlanningProjects12
City CouncilHousingBudget5M
City CouncilHousingStaff45
City CouncilHousingProjects10
City CouncilTransportBudget3M
City CouncilTransportStaff30
City CouncilTransportProjects7
City CouncilEnvironmentalBudget2M
City CouncilEnvironmentalStaff20
City CouncilEnvironmentalProjects5

Core Capabilities

Tidychef provides a declarative API to handle the most common pitfalls of messy tabulated data sources.

πŸ—ΊοΈ

Visual Selection

Select data cells based on their spatial relationship to headers (e.g., "all cells to the right of 'Year' and below 'Sales'").

πŸ”„

Reproducible Pipelines

Define a repeatable recipe for cleaning a specific report format, making data pipelines robust and easy to update.

πŸ“¦

Multi-Format Support

Works out-of-the-box with various formats including `.xls`, `.xlsx`, `.ods`, and `.csv`.

🀝

Pandas Integration

Seamlessly integrates with the data science ecosystem by outputting clean data directly into a Pandas DataFrame.

Competitive Landscape

Tidychef fills a specific niche that other popular tools don't fully address. See how it compares.

The radar chart below visualizes the strengths of each tool across key areas. Tidychef excels at programmatic, visual-based wrangling, a unique combination.

Select a tool to compare its specific pros and cons against Tidychef.

vs. OpenRefine

Pros of Tidychef

  • Fully programmatic and scriptable
  • Ideal for automated, reproducible pipelines
  • Version-controllable (part of a codebase)

Cons of Tidychef

  • Requires Python knowledge
  • Less interactive/visual for exploration

Value Proposition & Adoption Potential

By focusing on a well-defined, common problem, Tidychef offers significant value to a range of data professionals, giving it strong potential for adoption.

Target User Base

πŸ“Š

Data Analysts & Scientists

Who need to quickly turn messy source files into analyzable data.

βš™οΈ

Data Engineers

Building robust and automated ETL/ELT pipelines from unreliable sources.

πŸ›οΈ

Government & Research

Working with public data often released in complex table formats.

Key to Adoption

🎯

Niche Focus

It solves one problem exceptionally well, making it a clear choice for the task.

🧩

Ecosystem Fit

It doesn't seek to replace Pandas, but to complement it, easing adoption.

πŸ“š

Documentation

Clear examples and training materials will be crucial for user onboarding.