# TidyData: Column: validate=

The following is an introduction to basic validation in tidychef and usage of the `validate=` keyword provided by the `Column` class when constructing `TidyData`.

| <span style="color:green">Note - this keyword uses exactly the same `against` module we used when explaining `selection.validate()` method earlier in this documentation - the difference here is intent (explained below in more detail).</span>|
|-----------------------------------------|

## validate= vs .validate()

The key difference to understand is as follow:

- The `<selectable>.validate()` method run against **selections** so police your _extraction logic_.
- Validations via the `Column(validate=)` keyword runs against the **output** so police your _final product_.

Consider the following scenarios:

- 1.) You want to select an "anchor cell" or a selection of cells for the sole purpose of subtracting it from another selection. It could be important to confirm these selections are accurate but because they're not directly extracted values then `validate=` will never see them (just the consequence of them) so the `<selectable>.validate()` is more appropriate.

- 2.) You're are using `apply=` to cleanse cell value data _at the point of extraction_ and need to make sure the correct things are happening, the `<selectable>.validate()` method will **never see these cleansed values**, but `validate=` will.

There are nuances on where its best to use each but the pithy version is "use both strategies, wherever possible and as much as its practical to do so".

## Source Data

The data source we're using for these examples is shown below:

| <span style="color:green">Note - this particular table has some very verbose headers we don't care about, so we'll be using `bounded=` to remove them from the previews as well as to show just the subset of data we're working with.</span>|
|-----------------------------------------|

The [full data source can be downloaded here](https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx). We'll be using th 10th tab named "Table 3c".

In [None]:
from tidychef import acquire, preview
from tidychef.selection import XlsxSelectable

table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 3c")
preview(table, bounded="A4:H10")

## Simple Regex Validation

For this example we're going to use the same `against` module we used early in this documentation. This has a simple regex validator that works in exactly the way explained above.

i.e `against` is just a convenience, you could just define this yourself.

So the following example is an example of finding an invalid cell with the regex provided:

In [None]:
from tidychef import acquire, against
from tidychef.direction import right, down
from tidychef.output import TidyData, Column
from tidychef.selection import XlsxSelectable

table: XlsxSelectable = acquire.xlsx.http("https://github.com/mikeAdamss/tidychef/raw/main/tests/fixtures/xlsx/ons-oic.xlsx", tables="Table 3c")

observations = table.excel_ref("B7:H10").label_as("Observations")
dataset_identifier_code = table.excel_ref("B6").expand(right).label_as("Dataset Identifier Codes")

# Note: matches a regex of capital M followed by anything
tidy_data = TidyData(
    observations,
    Column(dataset_identifier_code.attach_directly(down), validate=against.regex("L.*"))
)

print(tidy_data)

## A Note on Lazy Evaluation

One thing you many notice about the above is that the validation error does not occur until we try and print the `tidy_data` variable, this is because the `TidyData` class uses _lazy evaluation_.

Simply put, this means the tidy data is never extracted until the last possible moment that is has to be, in this case when we print.