tag*

Allows you to attach metadata to a node (any node decorated with the function). A common use of this is to enable marking nodes as part of some data product, or for GDPR/privacy purposes.

For instance:

import pandas as pd
from hamilton.function_modifiers import tag

def intermediate_column() -> pd.Series:
    pass

@tag(data_product='final', pii='true')
def final_column(intermediate_column: pd.Series) -> pd.Series:
    pass

How do I query by tags?

Right now, we don’t have a specific interface to query by tags, however we do expose them via the driver. Using the list_available_variables() capability exposes tags along with their names & types, enabling querying of the available outputs for specific tag matches. E.g.

from hamilton import driver
dr = driver.Driver(...)  # create driver as required
all_possible_outputs = dr.list_available_variables()
desired_outputs = [o.name for o in all_possible_outputs
                   if 'my_tag_value' == o.tags.get('my_tag_key')]
output = dr.execute(desired_outputs)

Reference Documentation

class hamilton.function_modifiers.tag(*, target_: str | Collection[str] | None | ellipsis = None, bypass_reserved_namespaces_: bool = False, **tags: str | List[str])

Decorator class that adds a tag to a node. Tags take the form of key/value pairings. Tags can have dots to specify namespaces (keys with dots), but this is usually reserved for special cases (E.G. subdecorators) that utilize them. Usually one will pass in tags as kwargs, so we expect tags to be un-namespaced in most uses.

That is using:

@tag(my_tag='tag_value')
def my_function(...) -> ...:

is un-namespaced because you cannot put a . in the keyword part (the part before the ‘=’).

But using:

@tag(**{'my.tag': 'tag_value'})
def my_function(...) -> ...:

allows you to add dots that allow you to namespace your tags.

Currently, tag values are restricted to allowing strings only, although we may consider changing the in the future (E.G. thinking of lists).

Hamilton also reserves the right to change the following: * adding purely positional arguments * not allowing users to use a certain set of top-level prefixes (E.G. any tag where the top level is one of the values in RESERVED_TAG_PREFIX).

Example usage:

@tag(foo='bar', a_tag_key='a_tag_value', **{'namespace.tag_key': 'tag_value'})
def my_function(...) -> ...:
   ...
__init__(*, target_: str | Collection[str] | None | ellipsis = None, bypass_reserved_namespaces_: bool = False, **tags: str | List[str])

Constructor for adding tag annotations to a function.

Parameters:
  • bypass_reserved_namespaces_ – Whether to bypass Reserved Namespace checking.

  • target_

    Target nodes to decorate. This can be one of the following:

    • None: tag all nodes outputted by this that are “final” (E.g. do not have a node outputted by this that depend on them)

    • Ellipsis (…): tag all nodes outputted by this

    • Collection[str]: tag only the nodes with the specified names

    • str: tag only the node with the specified name

  • tags – the keys are always going to be strings, so the type annotation here means the values are strings or lists of values. Implicitly this is Dict[str, Union[str, List[str]]] but the PEP guideline is to only annotate it with the value Union[str, List[str]].

class hamilton.function_modifiers.tag_outputs(**tag_mapping: Dict[str, str | List[str]])
__init__(**tag_mapping: Dict[str, str | List[str]])

Creates a tag_outputs decorator.

Note that this currently does not validate whether the nodes are spelled correctly as it takes in a superset of nodes.

Parameters:

tag_mapping – Mapping of output name to tags – this is akin to applying @tag to individual outputs produced by the function.

Example usage:

@tag_output(**{'a': {'a_tag': 'a_tag_value'}, 'b': {'b_tag': 'b_tag_value'}})
@extract_columns("a", "b")
def example_tag_outputs() -> pd.DataFrame:
    return pd.DataFrame.from_records({"a": [1], "b": [2]})