Selection: Validation#
This section details how to use the validate()
method with your extractions.
The purpose of this method is not to alter the selection in any way, it is to let you confirm you have selected the cell values that you were expecting.
Note - we’re going to use the tidychef |
---|
for our purposes here:
validate()
- is how you validate cell selections.against
- is a collection of tools thatvalidate()
makes use of.
Source Data#
The data source we’re using for these examples is shown below:
The full data source can be viewed here.
from tidychef import acquire, preview
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
preview(table)
Unnamed Table
A | B | C | D | E | F | G | H | I | J | K | |
1 | |||||||||||
2 | Houses | Cars | Boats | Houses | Cars | Boats | |||||
3 | Beatles | Rolling Stones | |||||||||
4 | John | 1 | 5 | 9 | Keith | 2 | 6 | 10 | |||
5 | Paul | 2 | 6 | 10 | Mick | 3 | 7 | 11 | |||
6 | George | 2 | 7 | 11 | Charlie | 3 | 8 | 12 | |||
7 | Ringo | 4 | 8 | 12 | Ronnie | 5 | 9 | 13 | |||
8 |
.validate(against.items())#
The against.items()
validator compares the value of each cell to the contents of a list.
So the following example will not raise an AssertionError (because our selected cells all have values that are in the list).
from tidychef import acquire, against
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
validated_selection = table.excel_ref('B').is_not_blank().validate(against.items(["John", "Paul", "Ringo", "George"]))
whereas this example will raise an AssertionError (as we’ve removed “George” from the list).
Note - we’re going to use a try catch to catch then print the exception, this is purely so it doesn’t stop the execution of this notebook - you dont need to do anything like this in practice. |
---|
from tidychef import acquire, against
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
try:
table.excel_ref('B').is_not_blank().validate(against.items(["John", "Paul", "Ringo"]))
except Exception as err:
print(err)
When making selections from table: Unnamed Table the
following validation errors were encountered:
"George" not in list: ['John', 'Paul', 'Ringo']
.validate(against.regex())#
The against.regex()
validator compares the value of each cell to see if it matches the provided regular expression.
So the following example will will raise an error for cells whose content does not explicitly match “John”.
Note - we’re going to use a try catch to catch then print the exception, this is purely so it doesn’t stop the execution of this notebook - you dont need to do anything like this in practice. |
---|
from tidychef import acquire, against
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
try:
table.excel_ref('B').is_not_blank().validate(against.regex("John"))
except Exception as err:
print(err)
When making selections from table: Unnamed Table the
following validation errors were encountered:
"Paul" does not match pattern: "John"
"George" does not match pattern: "John"
"Ringo" does not match pattern: "John"
validate(): viewing a lone exception#
There will be occasions where you’ll want less verbose exceptions (if you’ve have 1000 invalid values, you probably don’t want an exception message with all of them in it).
If you pass in the keyword raise_first_error=True
to validate()
it will immediately raise upon encountering an error rather than waiting an collecting all validation error messages.
In the example that follows we run the same code from above but include this keyword argument (as its also the last example on this page we’ll also allow the error to be raised properly).
from tidychef import acquire, against
from tidychef.selection import CsvSelectable
table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
table.excel_ref('B').is_not_blank().validate(against.regex("John"), raise_first_error=True)
---------------------------------------------------------------------------
CellValidationError Traceback (most recent call last)
Cell In[5], line 6
2 from tidychef.selection import CsvSelectable
4 table: CsvSelectable = acquire.csv.http("https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv")
----> 6 table.excel_ref('B').is_not_blank().validate(against.regex("John"), raise_first_error=True)
File ~/.pyenv/versions/3.12.11/lib/python3.12/site-packages/tidychef/selection/selectable.py:457, in Selectable.validate(self, validator, raise_first_error)
455 if not validator(cell):
456 if raise_first_error:
--> 457 raise CellValidationError(
458 f"""
459 When making selections from table: {self.name} the
460 following validation error was encountered:
461 {validator.msg(cell)}
462 """
463 )
464 else:
465 validation_errors.append(validator.msg(cell))
CellValidationError:
When making selections from table: Unnamed Table the
following validation error was encountered:
"Paul" does not match pattern: "John"