TidyData: Concatenation

TidyData: Concatenation#

This section details how you can concatenate multiple tidy data sheets together when you output them.

Source Data#

The data source we’re using for these examples is shown below:

Note - this particular table has some very verbose headers we don’t care about, so we’ll be using `bounded=` to remove them from the previews as well as to show just the subset of data we’re working with.

The full data source can be downloaded here. We’ll be using th 10th tab named “Table 3c”.

from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable

table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 3c")
preview(table, bounded="A4:H10")

Table 3c

	A	B	C	D	E	F	G	H
4	Percentage change 3 months on previous 3 months
5	Time period	Public new housing	Private new housing	Total new housing	Infrastructure new work	Public other new work	Private industrial new work	Private commercial new work
6	Dataset identifier code	MVO6	MVO7	MVO8	MVO9	MVP2	MVP3	MVP4
7	Jun 2010	5.6	9.8	8.8	3	4.3	3.7	1.9
8	Jul 2010	2	5.6	4.8	0.2	-0.2	9.7	3.5
9	Aug 2010	5.5	4.5	4.7	-2.9	-2.9	24.4	5.9
10	Sep 2010	11.7	7.5	8.5	-6.8	-3.3	16.1	5.3

The “Two” Sources#

To keep these examples reasonably brief we’re going to create two TidyData class from the above as follow:

one with observations from B7 though H8
one with observations from B9 through H10

from tidychef import acquire, preview
from tidychef.direction import right, down
from tidychef.output import TidyData, Column
from tidychef.selection import XlsxSelectable

table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 3c")


houses = table.excel_ref("B5").expand(right).label_as("Houses")
dataset_identifier_code = houses.shift(down).label_as("Dataset Identifier Code")
period = table.excel_ref("A7").expand(down).label_as("Period")

observations1 = table.excel_ref("B7:H8").label_as("Observations")
observations2 = table.excel_ref("B9:H10").label_as("Observations")

## Preview Both Sets of Selection

print("Preview selections for tidydata1")
preview(observations1, houses, dataset_identifier_code, period, bounded="A4:H10")

print("Preview selections for tidydata1")
preview(observations2, houses, dataset_identifier_code, period, bounded="A4:H10")

Preview selections for tidydata1

Observations

Houses

Dataset Identifier Code

Period

Table 3c

	A	B	C	D	E	F	G	H
4	Percentage change 3 months on previous 3 months
5	Time period	Public new housing	Private new housing	Total new housing	Infrastructure new work	Public other new work	Private industrial new work	Private commercial new work
6	Dataset identifier code	MVO6	MVO7	MVO8	MVO9	MVP2	MVP3	MVP4
7	Jun 2010	5.6	9.8	8.8	3	4.3	3.7	1.9
8	Jul 2010	2	5.6	4.8	0.2	-0.2	9.7	3.5
9	Aug 2010	5.5	4.5	4.7	-2.9	-2.9	24.4	5.9
10	Sep 2010	11.7	7.5	8.5	-6.8	-3.3	16.1	5.3

Preview selections for tidydata1

Observations

Houses

Dataset Identifier Code

Period

Table 3c

	A	B	C	D	E	F	G	H
4	Percentage change 3 months on previous 3 months
5	Time period	Public new housing	Private new housing	Total new housing	Infrastructure new work	Public other new work	Private industrial new work	Private commercial new work
6	Dataset identifier code	MVO6	MVO7	MVO8	MVO9	MVP2	MVP3	MVP4
7	Jun 2010	5.6	9.8	8.8	3	4.3	3.7	1.9
8	Jul 2010	2	5.6	4.8	0.2	-0.2	9.7	3.5
9	Aug 2010	5.5	4.5	4.7	-2.9	-2.9	24.4	5.9
10	Sep 2010	11.7	7.5	8.5	-6.8	-3.3	16.1	5.3

Create and preview TidyData objects#

tidydata1 = TidyData(
    observations1,
    Column(period.attach_directly(right)),
    Column(dataset_identifier_code.attach_directly(down)),
    Column(houses.attach_directly(down))
)
print("Preview tidydata1")
print(tidydata1)

tidydata2 = TidyData(
    observations2,
    Column(period.attach_directly(right)),
    Column(dataset_identifier_code.attach_directly(down)),
    Column(houses.attach_directly(down))
)
print("Preview tidydata2")
print(tidydata2)

Preview tidydata1

Observations	Period	Dataset Identifier Code	Houses
5.6	Jun 2010	MVO6	Public new housing
9.8	Jun 2010	MVO7	Private new housing
8.8	Jun 2010	MVO8	Total new housing
3	Jun 2010	MVO9	Infrastructure new work
4.3	Jun 2010	MVP2	Public other new work
3.7	Jun 2010	MVP3	Private industrial new work
1.9	Jun 2010	MVP4	Private commercial new work
2	Jul 2010	MVO6	Public new housing
5.6	Jul 2010	MVO7	Private new housing
4.8	Jul 2010	MVO8	Total new housing
0.2	Jul 2010	MVO9	Infrastructure new work
-0.2	Jul 2010	MVP2	Public other new work
9.7	Jul 2010	MVP3	Private industrial new work
3.5	Jul 2010	MVP4	Private commercial new work

Preview tidydata2

Observations	Period	Dataset Identifier Code	Houses
5.5	Aug 2010	MVO6	Public new housing
4.5	Aug 2010	MVO7	Private new housing
4.7	Aug 2010	MVO8	Total new housing
-2.9	Aug 2010	MVO9	Infrastructure new work
-2.9	Aug 2010	MVP2	Public other new work
24.4	Aug 2010	MVP3	Private industrial new work
5.9	Aug 2010	MVP4	Private commercial new work
11.7	Sep 2010	MVO6	Public new housing
7.5	Sep 2010	MVO7	Private new housing
8.5	Sep 2010	MVO8	Total new housing
-6.8	Sep 2010	MVO9	Infrastructure new work
-3.3	Sep 2010	MVP2	Public other new work
16.1	Sep 2010	MVP3	Private industrial new work
5.3	Sep 2010	MVP4	Private commercial new work

Concatenation via ‘+’#

The simplest way to concatenate two TidyData objects is just to add them together.

tidy_data_all = tidydata1+tidydata2
print(tidy_data_all)

Observations	Period	Dataset Identifier Code	Houses
5.6	Jun 2010	MVO6	Public new housing
9.8	Jun 2010	MVO7	Private new housing
8.8	Jun 2010	MVO8	Total new housing
3	Jun 2010	MVO9	Infrastructure new work
4.3	Jun 2010	MVP2	Public other new work
3.7	Jun 2010	MVP3	Private industrial new work
1.9	Jun 2010	MVP4	Private commercial new work
2	Jul 2010	MVO6	Public new housing
5.6	Jul 2010	MVO7	Private new housing
4.8	Jul 2010	MVO8	Total new housing
0.2	Jul 2010	MVO9	Infrastructure new work
-0.2	Jul 2010	MVP2	Public other new work
9.7	Jul 2010	MVP3	Private industrial new work
3.5	Jul 2010	MVP4	Private commercial new work
5.5	Aug 2010	MVO6	Public new housing
4.5	Aug 2010	MVO7	Private new housing
4.7	Aug 2010	MVO8	Total new housing
-2.9	Aug 2010	MVO9	Infrastructure new work
-2.9	Aug 2010	MVP2	Public other new work
24.4	Aug 2010	MVP3	Private industrial new work
5.9	Aug 2010	MVP4	Private commercial new work
11.7	Sep 2010	MVO6	Public new housing
7.5	Sep 2010	MVO7	Private new housing
8.5	Sep 2010	MVO8	Total new housing
-6.8	Sep 2010	MVO9	Infrastructure new work
-3.3	Sep 2010	MVP2	Public other new work
16.1	Sep 2010	MVP3	Private industrial new work
5.3	Sep 2010	MVP4	Private commercial new work

Concatenation via a list#

The next method is to use the TidyData.from_tidy_list() constructor.

tidy_data_all = TidyData.from_tidy_list([tidydata1, tidydata2])
print(tidy_data_all)

Observations	Period	Dataset Identifier Code	Houses
5.6	Jun 2010	MVO6	Public new housing
9.8	Jun 2010	MVO7	Private new housing
8.8	Jun 2010	MVO8	Total new housing
3	Jun 2010	MVO9	Infrastructure new work
4.3	Jun 2010	MVP2	Public other new work
3.7	Jun 2010	MVP3	Private industrial new work
1.9	Jun 2010	MVP4	Private commercial new work
2	Jul 2010	MVO6	Public new housing
5.6	Jul 2010	MVO7	Private new housing
4.8	Jul 2010	MVO8	Total new housing
0.2	Jul 2010	MVO9	Infrastructure new work
-0.2	Jul 2010	MVP2	Public other new work
9.7	Jul 2010	MVP3	Private industrial new work
3.5	Jul 2010	MVP4	Private commercial new work
5.5	Aug 2010	MVO6	Public new housing
4.5	Aug 2010	MVO7	Private new housing
4.7	Aug 2010	MVO8	Total new housing
-2.9	Aug 2010	MVO9	Infrastructure new work
-2.9	Aug 2010	MVP2	Public other new work
24.4	Aug 2010	MVP3	Private industrial new work
5.9	Aug 2010	MVP4	Private commercial new work
11.7	Sep 2010	MVO6	Public new housing
7.5	Sep 2010	MVO7	Private new housing
8.5	Sep 2010	MVO8	Total new housing
-6.8	Sep 2010	MVO9	Infrastructure new work
-3.3	Sep 2010	MVP2	Public other new work
16.1	Sep 2010	MVP3	Private industrial new work
5.3	Sep 2010	MVP4	Private commercial new work

Concatenation via arguments#

Lastly, you can pass in objects of class TidyData as positional arguments to the TidyData.from_tidy() constructor.

tidy_data_all = TidyData.from_tidy(tidydata1, tidydata2)
print(tidy_data_all)

Observations	Period	Dataset Identifier Code	Houses
5.6	Jun 2010	MVO6	Public new housing
9.8	Jun 2010	MVO7	Private new housing
8.8	Jun 2010	MVO8	Total new housing
3	Jun 2010	MVO9	Infrastructure new work
4.3	Jun 2010	MVP2	Public other new work
3.7	Jun 2010	MVP3	Private industrial new work
1.9	Jun 2010	MVP4	Private commercial new work
2	Jul 2010	MVO6	Public new housing
5.6	Jul 2010	MVO7	Private new housing
4.8	Jul 2010	MVO8	Total new housing
0.2	Jul 2010	MVO9	Infrastructure new work
-0.2	Jul 2010	MVP2	Public other new work
9.7	Jul 2010	MVP3	Private industrial new work
3.5	Jul 2010	MVP4	Private commercial new work
5.5	Aug 2010	MVO6	Public new housing
4.5	Aug 2010	MVO7	Private new housing
4.7	Aug 2010	MVO8	Total new housing
-2.9	Aug 2010	MVO9	Infrastructure new work
-2.9	Aug 2010	MVP2	Public other new work
24.4	Aug 2010	MVP3	Private industrial new work
5.9	Aug 2010	MVP4	Private commercial new work
11.7	Sep 2010	MVO6	Public new housing
7.5	Sep 2010	MVO7	Private new housing
8.5	Sep 2010	MVO8	Total new housing
-6.8	Sep 2010	MVO9	Infrastructure new work
-3.3	Sep 2010	MVP2	Public other new work
16.1	Sep 2010	MVP3	Private industrial new work
5.3	Sep 2010	MVP4	Private commercial new work

Misaligned Headers Error#

Where concatenating together TidyData the column headers must match precisely.

The following is an example of what happens if you try and concatenate two TidyData classes with misaligned headers.

# We're going to redefine tidydata2 but this time we're going to label the observations column
# such that it no longer matches tidydata1 

tidydata2 = TidyData(
    observations2.label_as("Value"),
    Column(period.attach_directly(right)),
    Column(dataset_identifier_code.attach_directly(down)),
    Column(houses.attach_directly(down))
)

# And we'll preview both so you can see the column headers no longer match
print("Preview tidydata1")
print(tidydata1)

print("Preview tidydata2")
print(tidydata2)

Preview tidydata1

Observations	Period	Dataset Identifier Code	Houses
5.6	Jun 2010	MVO6	Public new housing
9.8	Jun 2010	MVO7	Private new housing
8.8	Jun 2010	MVO8	Total new housing
3	Jun 2010	MVO9	Infrastructure new work
4.3	Jun 2010	MVP2	Public other new work
3.7	Jun 2010	MVP3	Private industrial new work
1.9	Jun 2010	MVP4	Private commercial new work
2	Jul 2010	MVO6	Public new housing
5.6	Jul 2010	MVO7	Private new housing
4.8	Jul 2010	MVO8	Total new housing
0.2	Jul 2010	MVO9	Infrastructure new work
-0.2	Jul 2010	MVP2	Public other new work
9.7	Jul 2010	MVP3	Private industrial new work
3.5	Jul 2010	MVP4	Private commercial new work

Preview tidydata2

Value	Period	Dataset Identifier Code	Houses
5.5	Aug 2010	MVO6	Public new housing
4.5	Aug 2010	MVO7	Private new housing
4.7	Aug 2010	MVO8	Total new housing
-2.9	Aug 2010	MVO9	Infrastructure new work
-2.9	Aug 2010	MVP2	Public other new work
24.4	Aug 2010	MVP3	Private industrial new work
5.9	Aug 2010	MVP4	Private commercial new work
11.7	Sep 2010	MVO6	Public new housing
7.5	Sep 2010	MVO7	Private new housing
8.5	Sep 2010	MVO8	Total new housing
-6.8	Sep 2010	MVO9	Infrastructure new work
-3.3	Sep 2010	MVP2	Public other new work
16.1	Sep 2010	MVP3	Private industrial new work
5.3	Sep 2010	MVP4	Private commercial new work

# Now we try and concatenate them - and get an error!

tidydata1 + tidydata2

---------------------------------------------------------------------------
MisalignedHeadersError                    Traceback (most recent call last)
Cell In[9], line 3
      1 # Now we try and concatenate them - and get an error!
----> 3 tidydata1 + tidydata2

File ~/.pyenv/versions/3.12.11/lib/python3.12/site-packages/tidychef/utils/decorators/dontmutate.py:24, in dontmutate.<locals>.wrapper(self, *args, **kwargs)
     21 @wraps(method)
     22 def wrapper(self, *args, **kwargs):
     23     self = copy.deepcopy(self)
---> 24     return method(self, *args, **kwargs)

File ~/.pyenv/versions/3.12.11/lib/python3.12/site-packages/tidychef/output/tidydata.py:223, in TidyData.__add__(self, other_tidy_data)
    220 # Error if we're joining two TidyData objects
    221 # with different headers
    222 if self._data[0] != other_tidy_data._data[0]:
--> 223     raise MisalignedHeadersError(
    224         f"""
    225         You are attempting to sum two tidy data
    226         outputs but they do not have the same
    227         column headers.
    228 
    229         TidyData1 headers:
    230         {self._data[0]}
    231 
    232         TidyData2 headers:
    233         {other_tidy_data._data[0]}
    234     """
    235     )
    237 # Since the headers match, join all but the header
    238 # row from the new source
    239 self._data += other_tidy_data._data[1:]

MisalignedHeadersError: 
                You are attempting to sum two tidy data
                outputs but they do not have the same
                column headers.

                TidyData1 headers:
                [(VIRTUAL CELL, value:"Observations"), (VIRTUAL CELL, value:"Period"), (VIRTUAL CELL, value:"Dataset Identifier Code"), (VIRTUAL CELL, value:"Houses")]

                TidyData2 headers:
                [(VIRTUAL CELL, value:"Value"), (VIRTUAL CELL, value:"Period"), (VIRTUAL CELL, value:"Dataset Identifier Code"), (VIRTUAL CELL, value:"Houses")]
            

TidyData: Concatenation

Contents

TidyData: Concatenation#

Source Data#

Table 3c

The “Two” Sources#

Table 3c

Table 3c

Create and preview TidyData objects#

Concatenation via ‘+’#

Concatenation via a list#

Concatenation via arguments#

Misaligned Headers Error#