Time#

One of the challenges of working with spreadsheet data is that spreadsheets have specific handling of time that doesn’t always translate well when parsed into python.

In all cases tidychef will seek to format the time type cells via the exact same human readable format said cells would appear as when viewed via excel.

The process is as follows:

  • When an excel cell is read that has an excel type of time.

  • tidychef will check the excel information for how that time type is formatted for presentation.

  • tidychef will apply that formatting so the value in tidychef matches that you’d expect from viewing the original source.

  • if custom or unknown time formatting is applied, tidychef will fall back on the raw cell value and will show a warning.

An example of this behaviour and how you can handle these scenarios is detailed below.

from typing import List

from tidychef import acquire, preview
from tidychef.selection import XlsSelectable

tables: List[XlsSelectable] = acquire.xls.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/xls/house-prices.xls")
WARNING:root:When processing table "Contents" an unknown excel time format "[$-F800]dddd\,\ mmmm\ dd\,\ yyyy" was encountered. Using raw cell value instead. 
For more details on handling excel time formatting see tidychef documentation. Cell(s) in question (max 5 shown): x:2, y:13

Now lets have a look at the cell in question

table: XlsSelectable = acquire.xls.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/xls/house-prices.xls",
                                               tables="Contents")
suspect_cell = table.excel_ref("C14")
preview(suspect_cell, show_xy=True)
WARNING:root:When processing table "Contents" an unknown excel time format "[$-F800]dddd\,\ mmmm\ dd\,\ yyyy" was encountered. Using raw cell value instead. 
For more details on handling excel time formatting see tidychef documentation. Cell(s) in question (max 5 shown): x:2, y:13
Unnamed Selection: 0

Contents

x/y0123456789101112131415
ABCDEFGHIJKLMNOP
01
12House Price Index
23Publication Date: 24 May 2023
34To access data tables, select the table headings or tabs.
45To return to contents click "Back to contents" link at the top of each page.
56
67The previous ONS HPI has now been replaced by the UKHPI for which the main publication can be found;
78https://www.gov.uk/government/collections/uk-house-price-index-reports
89
910Cover sheet
1011Notes
1112
1213Next Scheduled Up-date
1314Quarterly45154.0
1415
1516DatasetFrequencyHouse Price Indices - descriptionPreviously found in DCLG
1617Table 11QuarterlySimple average house prices, by new/other dwellings, type of buyer and region, United Kingdom, from 1992Live Table 504
1718Table 15QuarterlySimple average house prices, mortgage advances and incomes of borrowers, by new/other dwellings, type of buyer and region, United Kingdom, from 1992Live Table 514
1819
1920The data published in these tables are based on a sub-sample of RMS data. These results will therefore differ from results produced using full sample data. For further information please contact the ONS using the contact details on the previous page.

You’ll notice it just contains a number, that’s a python datetime object representing the specific moment in time in a way that python understands.

In this instance someone has created quite a complex custom time formatting string of [$-F800]dddd\,\ mmmm\ dd\,\ yyyy to represent the value (as expressed in the original file) as:

Wednesday, August 16th, 2023

So how do we specify the time formatting to use for these kinds of custom and complex time formats?

Specifying time formatting#

Tidychef allows you to map excel formats to the standard patterns supported by the python datatime.strftime() method.

This is done by passing an appropriate dictionary into acquire with the keyword argument custom_time_formats= to map excel time formats to a strftime() pattern.

In this instance we’ll provide the stftime() format string for the excel pattern in question.

Example follows:

from tidychef import acquire, preview
from tidychef.selection import XlsSelectable

table: XlsSelectable = acquire.xls.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/xls/house-prices.xls",
                                        custom_time_formats={r"[$-F800]dddd\,\ mmmm\ dd\,\ yyyy": "%A, %B %d, %Y"}, tables="Contents")
suspect_cell = table.excel_ref("C14")
preview(suspect_cell, show_xy=True)
Unnamed Selection: 0

Contents

x/y0123456789101112131415
ABCDEFGHIJKLMNOP
01
12House Price Index
23Publication Date: 24 May 2023
34To access data tables, select the table headings or tabs.
45To return to contents click "Back to contents" link at the top of each page.
56
67The previous ONS HPI has now been replaced by the UKHPI for which the main publication can be found;
78https://www.gov.uk/government/collections/uk-house-price-index-reports
89
910Cover sheet
1011Notes
1112
1213Next Scheduled Up-date
1314QuarterlyWednesday, August 16, 2023
1415
1516DatasetFrequencyHouse Price Indices - descriptionPreviously found in DCLG
1617Table 11QuarterlySimple average house prices, by new/other dwellings, type of buyer and region, United Kingdom, from 1992Live Table 504
1718Table 15QuarterlySimple average house prices, mortgage advances and incomes of borrowers, by new/other dwellings, type of buyer and region, United Kingdom, from 1992Live Table 514
1819
1920The data published in these tables are based on a sub-sample of RMS data. These results will therefore differ from results produced using full sample data. For further information please contact the ONS using the contact details on the previous page.

Note#

tidychef comes with a large number of excel format to strftime() translations as default but given the high level of customization an exhaustive mapping is difficult to create.

Users are encouraged to PR additional mappings as encountered to this file.