Selection: Validation#

This section details how to use the validate() method with your extractions.

The purpose of this method is not to alter the selection in any way, it is to let you confirm you have selected the cell values that you were expecting.

Note - we’re going to use the tidychef against module here which is a collection of premade user friendly validation classes. There will be more information on this module, how it works (and how to write your own) later in this guide.

for our purposes here:

  • validate() - is how you validate cell selections.

  • against - is a collection of tools that validate() makes use of.

Source Data#

The data source we’re using for these examples is shown below:

The full data source can be viewed here.

from tidychef import acquire, preview
from tidychef.selection import CsvSelectable

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
preview(table)

Unnamed Table

ABCDEFGHIJK
1
2HousesCarsBoatsHousesCarsBoats
3BeatlesRolling Stones
4John159Keith2610
5Paul2610Mick3711
6George2711Charlie3812
7Ringo4812Ronnie5913
8

.validate(against.items())#

The against.items() validator compares the value of each cell to the contents of a list.

So the following example will not raise an AssertionError (because our selected cells all have values that are in the list).

from tidychef import acquire, against
from tidychef.selection import CsvSelectable

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")

validated_selection = table.excel_ref('B').is_not_blank().validate(against.items(["John", "Paul", "Ringo", "George"]))

whereas this example will raise an AssertionError (as we’ve removed “George” from the list).

Note - we’re going to use a try catch to catch then print the exception, this is purely so it doesn’t stop the execution of this notebook - you dont need to do anything like this in practice.

from tidychef import acquire, against
from tidychef.selection import CsvSelectable

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")

try:
    table.excel_ref('B').is_not_blank().validate(against.items(["John", "Paul", "Ringo"]))
except Exception as err:
    print(err)
When making selections from table: Unnamed Table the
following validation errors were encountered:
"George" not in list: ['John', 'Paul', 'Ringo']
                

.validate(against.regex())#

The against.regex() validator compares the value of each cell to see if it matches the provided regular expression.

So the following example will will raise an error for cells whose content does not explicitly match “John”.

Note - we’re going to use a try catch to catch then print the exception, this is purely so it doesn’t stop the execution of this notebook - you dont need to do anything like this in practice.

from tidychef import acquire, against
from tidychef.selection import CsvSelectable

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")

try:
    table.excel_ref('B').is_not_blank().validate(against.regex("John"))
except Exception as err:
    print(err)
When making selections from table: Unnamed Table the
following validation errors were encountered:
"Paul" does not match pattern: "John"
"George" does not match pattern: "John"
"Ringo" does not match pattern: "John"
                

validate(): viewing a lone exception#

There will be occasions where you’ll want less verbose exceptions (if you’ve have 1000 invalid values, you probably don’t want an exception message with all of them in it).

If you pass in the keyword raise_first_error=True to validate() it will immediately raise upon encountering an error rather than waiting an collecting all validation error messages.

In the example that follows we run the same code from above but include this keyword argument (as its also the last example on this page we’ll also allow the error to be raised properly).

from tidychef import acquire, against
from tidychef.selection import CsvSelectable

table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")

table.excel_ref('B').is_not_blank().validate(against.regex("John"), raise_first_error=True)
---------------------------------------------------------------------------
CellValidationError                       Traceback (most recent call last)
Cell In[5], line 6
      2 from tidychef.selection import CsvSelectable
      4 table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
----> 6 table.excel_ref('B').is_not_blank().validate(against.regex("John"), raise_first_error=True)

File ~/.pyenv/versions/3.12.11/lib/python3.12/site-packages/tidychef/selection/selectable.py:457, in Selectable.validate(self, validator, raise_first_error)
    455             if not validator(cell):
    456                 if raise_first_error:
--> 457                     raise CellValidationError(
    458                         f"""
    459 When making selections from table: {self.name} the
    460 following validation error was encountered:
    461 {validator.msg(cell)}
    462                 """
    463                     )
    464                 else:
    465                     validation_errors.append(validator.msg(cell))

CellValidationError: 
When making selections from table: Unnamed Table the
following validation error was encountered:
"Paul" does not match pattern: "John"