check_output*¶

The @check_output decorator enables you to add simple data quality checks to your code.

For example:

import pandas as pd
import numpy as np
from hamilton.function_modifiers import check_output

@check_output(
    data_type=np.int64,
    range=(0,100),
)
def some_int_data_between_0_and_100() -> pd.Series:
    pass

The check_output validator takes in arguments that each correspond to one of the default validators. These arguments tell it to add the default validator to the list. The above thus creates two validators, one that checks the datatype of the series, and one that checks whether the data is in a certain range.

Note that you can also specify custom decorators using the @check_output_custom decorator.

See data_quality for more information on available validators and how to build custom ones.

Note we also have a plugin that allows you to use pandera. There are two ways to access it: 1. @check_output(schema=pandera_schema) 2. @h_pandera.check_output() on a function that declares a typed pandera dataframe as an output


Reference Documentation

class hamilton.function_modifiers.check_output(importance: str = 'warn', default_validator_candidates: List[Type[BaseDefaultValidator]] = None, target_: str | Collection[str] | None | ellipsis = None, **default_validator_kwargs: Any)¶

The @check_output decorator enables you to add simple data quality checks to your code.

For example:

import pandas as pd
import numpy as np
from hamilton.function_modifiers import check_output

@check_output(
    data_type=np.int64,
    data_in_range=(0,100),
    importance="warn",
)
def some_int_data_between_0_and_100() -> pd.Series:
    ...

The check_output decorator takes in arguments that each correspond to one of the default validators. These arguments tell it to add the default validator to the list. The above thus creates two validators, one that checks the datatype of the series, and one that checks whether the data is in a certain range.

Pandera example that shows how to use the check_output decorator with a Pandera schema:

import pandas as pd
import pandera as pa
from hamilton.function_modifiers import check_output
from hamilton.function_modifiers import extract_columns

schema = pa.DataFrameSchema(...)

@extract_columns('col1', 'col2')
@check_output(schema=schema, target_="builds_dataframe", importance="fail")
def builds_dataframe(...) -> pd.DataFrame:
    ...
__init__(importance: str = 'warn', default_validator_candidates: List[Type[BaseDefaultValidator]] = None, target_: str | Collection[str] | None | ellipsis = None, **default_validator_kwargs: Any)¶

Creates the check_output validator.

This constructs the default validator class.

Note: that this creates a whole set of default validators. TODO – enable construction of custom validators using check_output.custom(*validators).

Parameters:
  • importance – For the default validator, how important is it that this passes.

  • default_validator_candidates – List of validators to be considerred for this check.

  • default_validator_kwargs – keyword arguments to be passed to the validator.

  • target_ – a target specifying which nodes to decorate. See the docs in check_output_custom for a quick overview and the docs in function_modifiers.base.NodeTransformer for more detail.

class hamilton.function_modifiers.check_output_custom(*validators: DataValidator, target_: str | Collection[str] | None | ellipsis = None)¶

Class to use if you want to implement your own custom validators.

Come chat to us in slack if you’re interested in this!

__init__(*validators: DataValidator, target_: str | Collection[str] | None | ellipsis = None)¶

Creates a check_output_custom decorator. This allows passing of custom validators that implement the DataValidator interface.

Parameters:
  • validators – Validator to use.

  • target_ –

    The nodes to check the output of. For more detail read the docs in function_modifiers.base.NodeTransformer, but your options are:

    1. None: This will check just the “final node” (the node that is returned by the decorated function).

    2. … (Ellipsis): This will check all nodes in the subDAG created by this.

    3. string: This will check the node with the given name.

    4. Collection[str]: This will check all nodes specified in the list.

    In all likelihood, you don’t want ..., but the others are useful.

class hamilton.plugins.h_pandera.check_output(importance: str = 'warn', target: str | Collection[str] | None | ellipsis = None)¶
__init__(importance: str = 'warn', target: str | Collection[str] | None | ellipsis = None)¶

Specific output-checker for pandera schemas. This decorator utilizes the output type of the function, which has to be of type pandera.typing.pandas.DataFrame or pandera.typing.pandas.Series, with an annotation argument.

Parameters:
  • schema – The schema to use for validation. If this is not provided, then the output type of the function is used.

  • importance – Importance level (either “warn” or “fail”) – see documentation for check_output for more details.

  • target – The target of the decorator – see documentation for check_output for more details.

Let’s look at equivalent examples to demonstrate:

import pandera as pa
import pandas as pd
from hamilton.plugins import h_pandera
from pandera.typing.pandas import DataFrame

class MySchema(pa.DataFrameModel):
    a: int
    b: float
    c: str = pa.Field(nullable=True)  # For example, allow None values
    d: float    # US dollars

@h_pandera.check_output()
def foo() -> DataFrame[MySchema]:
    return pd.DataFrame() # will fail
from hamilton import function_modifiers

schema = pa.DataFrameSchema({
    "a": pa.Column(pa.Int),
    "b": pa.Column(pa.Float),
    "c": pa.Column(pa.String, nullable=True),
    "d": pa.Column(pa.Float),
})

@function_modifiers.check_output(schema=schema)
def foo() -> pd.DataFrame:
    return pd.DataFrame() # will fail

These two are functionally equivalent. Note that we do not (yet) support modification of the output.