Schema Enforcer Custom Validators

Blog Detail

We’re excited to introduce support for custom validators in Schema Enforcer. Schema Enforcer provides a framework for testing structured data against schema definitions using JSON Schema and now using custom Python validators. You can check out Introducing Schema Enforcer for more background and an introduction to Schema Enforcer.

What Is a Custom Validator?

A custom validator is a Python module that allows you to run any logic against your data on a per-host basis.

Let’s start with an example. What if you want to validate that every edge router has at least two core interfaces defined?

Here’s a possible way we could model our data in an Ansible host_var file:

---
hostname: "az-phx-pe01"
pair_rtr: "az-phx-pe02"
upstreams: []
interfaces:
  MgmtEth0/0/CPU0/0:
    ipv4: "172.16.1.1"
  Loopback0:
    ipv4: "192.168.1.1"
    ipv6: "2001:db8:1::1"
  GigabitEthernet0/0/0/0:
    ipv4: "10.1.0.1"
    ipv6: "2001:db8::"
    peer: "az-phx-pe02"
    peer_int: "GigabitEthernet0/0/0/0"
    type: "core"
  GigabitEthernet0/0/0/1:
    ipv4: "10.1.0.37"
    ipv6: "2001:db8::12"
    peer: "co-den-p01"
    peer_int: "GigabitEthernet0/0/0/2"
    type: "core"

In this example, each physical interface has a type key, which we can evaluate in our custom validator. JSON Schema can be used to validate that this field exists and contains a desired value (e.g., “core”, “access”, etc.). However, it cannot check whether there are at least two interfaces with this key set to “core”.

JMESPath Custom Validators

As a shortcut for basic use cases, Schema Enforcer provides the JmesPathModelValidation class. This class supports using JMESPath queries against your data along with generic comparison operators. The logic is provided by the base class, so no Python is required beyond setting a few variables.

To solve the preceding example, we can use the following custom validator:

from schema_enforcer.schemas.validator import JmesPathModelValidation

class CheckInterface(JmesPathModelValidation):  # pylint: disable=too-few-public-methods
    top_level_properties = ["interfaces"]
    id = "CheckInterface"  # pylint: disable=invalid-name
    left = "interfaces.*[@.type=='core'][] | length([?@])"
    right = 2
    operator = "gte"
    error = "Less than two core interfaces"

The top_level_properties variable maps this validator to the interfaces object in our data. The real work is done by the leftright, and operator variables. Think of these as part of an expression:

{left} {operator} {right}

Or for our example:

"interfaces.*[@.type=='core'][] | length([?@])" gte 2

This custom validator uses the JMESPath expression to query the data. The query returns all interfaces that have type of “core”. The output is piped to a built-in JMESPath function that gives us the length of the return value. When applied to our example data, the value of the query is 2. When checked by our custom validator, this host will pass, as the value of the query is greater than or equal to 2.

root@b295daf33db5:/local/examples/ansible3# schema-enforcer ansible --show-checks
Found 2 hosts in the inventory
Ansible Host              Schema ID
--------------------------------------------------------------------------------
az_phx_pe01               ['CheckInterface']
az_phx_pe02               ['CheckInterface']

In the preceding output, we see the CheckInterface validator is applied to two hosts.

When Schema Enforcer is run against the inventory, the output shows if any hosts fail the validation. If a host fails, the error message defined in the CheckInterface class error variable will be shown.

root@b295daf33db5:/local/examples/ansible3# schema-enforcer ansible
Found 2 hosts in the inventory
FAIL | [ERROR] Less than two core interfaces [HOST] az_phx_pe02 [PROPERTY]
root@b295daf33db5:/local/examples/ansible3#

Advanced Use Cases

For more advanced use cases, Schema Enforcer provides the BaseValidation class which can be used to build your own complex validation classes. BaseValidation provides two helper functions for reporting pass/fail: add_validation_pass and add_validation_error. Schema Enforcer will automatically call the validate method of your custom class for all instances of your data. The logic as to whether a validator passes or fails is up to your implementation.

Since we can run arbitrary logic against the data using Python, one possible use case is to check data against some external service. In the example below, a simple BGP peer data file is checked against the ARIN database to validate that the name is correct.

Sample Data

---
bgp_peers:
  - asn: 6939
    name: "Hurricane Electric LLC"
  - asn: 701
    name: "VZW"
  - asn: 100000
    name: "Private"

Validator

"""Custom validator for BGP peer information."""
import requests

from schema_enforcer.schemas.validator import BaseValidation


class CheckARIN(BaseValidation):
    """Verify that BGP peer name matches ARIN ASN information."""

    def validate(self, data, strict):
        """Validate BGP peers for each host."""
        headers = {"Accept": "application/json"}
        for peer in data["bgp_peers"]:
            # pylint: disable=invalid-name
            r = requests.get(f"http://whois.arin.net/rest/asn/{peer['asn']}", headers=headers)
            if r.status_code != requests.codes.ok:  # pylint: disable=no-member
                self.add_validation_error(f"ARIN lookup failed for peer {peer['name']} with ASN {peer['asn']}")
                continue
            arin_info = r.json()
            arin_name = arin_info["asn"]["orgRef"]["@name"]
            if peer["name"] != arin_name:
                self.add_validation_error(
                    f"Peer name {peer['name']} for ASN {peer['asn']} does not match ARIN database: {arin_name}"
                )
            else:
                self.add_validation_pass()

If we run Schema Enforcer with this validator, we get the following output:

root@da72aae39ede:/local/examples/example4# schema-enforcer validate --show-checks
Structured Data File                               Schema ID
--------------------------------------------------------------------------------
./bgp/peers.yml                                    ['CheckARIN']
root@da72aae39ede:/local/examples/example4# schema-enforcer validate
FAIL | [ERROR] Peer name VZW for ASN 701 does not match ARIN database: MCI Communications Services, Inc. d/b/a Verizon Business [FILE] ./bgp/peers.yml [PROPERTY]
FAIL | [ERROR] ARIN lookup failed for peer Private with ASN 100000 [FILE] ./bgp/peers.yml [PROPERTY]

You could expand this example to do other validation, such as checking that the ASN is valid before making the request to ARIN.

For more information on this Schema Enforcer feature, see the docs. And if you have any interesting use cases, please let us know!



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Filter JSON Data in Ansible Using json_query

Blog Detail

Parsing structured JSON data in Ansible playbooks is a common task. Nowadays, the JSON format is heavily used by equipment vendors to represent complex objects in a structured way to allow programmatic interaction with devices. JSON utilizes two main data structures:

  • object – an unordered collection of key/value pairs (like Python dict)
  • array – an ordered sequence of objects (like Python list)

Ansible provides many built-in capabilities to consume JSON using Ansible specific filters or the Jinja2 built-in filters. The extensive filters available, and what to use and when, can be overwhelming at first and the desired result can often require multiple filters chained together. This can lead to complex task definition, making playbook maintenance more difficult.

The built-in json_query filter provides the functionality for filtering, shaping, and transforming JSON data. It uses the third-party jmespath library, a powerful JSON query language supporting the parsing of complex structured data.

Setup

The jmespath third-party library must be installed on the host for the json_query filter to operate.

pip install jmespath

Data

Those familiar with Palo Alto firewalls will recognize a modified version of an application content update query. This result dataset will be used to demonstrate how to manipulate JSON data using the json_query filter.

response:
      result:
        content-updates:
          entry:
          - app-version: 8368-6520
            current: 'no'
            downloaded: 'no'
            filename: panupv2-all-apps-8368-6520
            version: 8368-6520
          - app-version: 8369-6522
            current: 'no'
            downloaded: 'no'
            filename: panupv2-all-apps-8369-6522
            version: 8369-6522
          - app-version: 8367-6513
            current: 'no'
            downloaded: 'no'
            filename: panupv2-all-apps-8367-6513
            version: 8367-6513
          - app-version: 8371-6531
            current: 'no'
            downloaded: 'no'
            filename: panupv2-all-apps-8371-6531
            version: 8371-6531
          - app-version: 8366-6503
            current: 'no'
            downloaded: 'no'
            filename: panupv2-all-apps-8366-6503
            version: 8366-6503
          - app-version: 8370-6526
            current: 'no'
            downloaded: 'no'
            filename: panupv2-all-apps-8370-6526
            version: 8370-6526
          - app-version: 8373-6537
            current: 'no'
            downloaded: 'yes'
            filename: panupv2-all-apps-8373-6537
            version: 8373-6537
          - app-version: 8365-6501
            current: 'no'
            downloaded: 'no'
            filename: panupv2-all-apps-8365-6501
            version: 8365-6501
          - app-version: 8364-6497
            current: 'no'
            downloaded: 'no'
            filename: panupv2-all-apps-8364-6497
            version: 8364-6497
          - app-version: 8372-6534
            current: 'yes'
            downloaded: 'yes'
            filename: panupv2-all-apps-8372-6534
            version: 8372-6534

On Palo Alto devices the default stdout response is returned as a JSON encoded string. This string can be passed into the from_json filter to provide a valid JSON data structure. A result variable was set to simplify the readability of the following examples. The json_query filter expects valid JSON as an input, so the Jinja2 expression below can also be passed directly into the json_query filter.

- name: "SET FACT FOR DEVICE STDOUT RESPONSE"
  set_fact:
    result: "{{ (content_info['stdout'] | from_json)['response']['result'] }}"

The yaml callback plugin is configured in ansible.cfg for the following examples. This will render output to the console terminal in YAML format, which can be slightly easier to read. See references at the end for Ansible callback usage.

# Use the YAML callback plugin for output
stdout_callback = yaml

Practical Examples

The JMESPath Operators table below summarizes some of the most common operators used in the jmespath query language. See references at the end for the jmespath specification.

JMESPath Operators

OperatorDescription
@The current node being evaluated.
*Wildcard. All elements.
.keyDot-notation to access a value of the given key.
[index0index1, ..]Indexing array elements, like a list.
[?expression]Filter expression. Boolean evaluation.
&&AND expression.
|Pipe expression, like unix pipe.
&expressionUsing an expression evaluation as a data type.

Basic Filter

Using the result data, the filter is applied to the valid JSON result with an additional query string argument. The query string provides the expression that is evaluated against the data to return filtered output. The default data type returned by json_query filter is a list.

In the Basic Filter example below, the query string selects the version key for each element in the entry array.

- name: "BASIC FILTER"
  debug:
    msg: "{{ result['content-updates'] | json_query('entry[*].version') }}"
TASK [BASIC FILTER - VERSION] ***************************************************
ok: [demo-fw01] =>
  msg:
  - 8368-6520
  - 8369-6522
  - 8367-6513
  - 8371-6531
  - 8366-6503
  - 8370-6526
  - 8373-6537
  - 8365-6501
  - 8364-6497
  - 8372-6534

Return Array Element

A pipe expression can be used to select the value of an array by declaring the desired index location. This operates in much the same way as the unix pipe, within the query string.

In the Array Index example below, the first element is selected. Array indexing begins at zero.

- name: "ARRAY INDEX VALUE"
  debug:
    msg: "{{ result['content-updates'] | json_query('entry[*].version | [0]') }}"

The response output is a string value for the first element in the array.

TASK [ARRAY INDEX VALUE] *********************************************************
ok: [demo-fw01] =>
  msg: 8368-6520

Filter

Using a filter expression within the json_query string, the query result can be filtered using standard comparison operators. A filter expression allows each element of the array to be evaluated against the expression. If the result evaluates to true, the element is included in the returned result.

The equality comparision operator is used in the following example to retrieve the filename(s) of the software version(s) downloaded on the device. Elements in the array that have downloaded=='yes' will have the filename included in the returned result. This provides the ability to use certain keys for the selection criteria, and return the value(s) of other keys for the result.

- name: "FILTER EXACT MATCH"
  debug:
    msg: "{{ result['content-updates'] | json_query('entry[?downloaded==`yes`].filename') }}"

The output shows the list of filenames returned.

TASK [FILTER EXACT MATCH] ********************************************************
ok: [demo-fw01] =>
  msg:
  - panupv2-all-apps-8373-6537
  - panupv2-all-apps-8372-6534

Function

The jmespath library provides built-in functions to assist in transformation and filtering tasks, for example the max_by function that returns the maximum element in an array. The following task selects the filename of the maximum app-version value from the entry array. The & provides the ability to define an expression which will be evaluated as a data type value when processed by the function.

Using an Ansible variable for the query string can be a cleaner approach to the query definition. It also helps with string quotations that are necessary when the key name is hyphenated. When using an expression data type, the jmespath library requires the key name within quotation marks, e.g., &"key-name". Where key names are not hyphenated, quotation marks are not required, e.g., &keyname.

- name: "MAX BY APP-VERSION"
  set_fact:
    content_file: "{{ result['content-updates'] | json_query(querystr) }}"
  vars:
    querystr: 'max_by(entry, &"app-version").filename'

The resulting filename value is returned.

TASK [JMESPATH FUNCTION] *********************************************************
ok: [demo-fw01] =>
  msg: panupv2-all-apps-8373-6537

Using single quotes in the previous example instructs the filter to treat the expression as a string, returning the first element of the array.

vars:
  querystr: "max_by(entry, &'app-version').filename"
TASK [MAX BY APP-VERSION] *******************************************************
ok: [demo-fw01] =>
  msg: panupv2-all-apps-8368-6520

Query String with Dynamic Variable

Ansible facts can be substituted into the query string. Again, quotation marks are relevant, so it is preferable to define a separate string as a variable within the task.

In this example the filter expression is used with a built-in function, contains. This function provides a boolean result, based on a match with a search string, on any element within the version array. The search string in this case is an Ansible variable passed into the query using a Jinja2 template.

- name: "FUNCTION WITH VARIABLE"
  debug:
    msg: "{{ result['content-updates'] | json_query(querystr) }}"
  vars:
    querystr: "entry[?contains(version, '{{ content_version }}')].filename | [0]"
    content_version: "8368-6520"

Again, the first filename, within the returned list using the indexing capability, has been selected.

TASK [FUNCTION WITH VARIABLE] **************************************************
ok: [demo-fw01] =>
  msg: panupv2-all-apps-8368-6520

Multiple Expressions

Multiple expressions can be evaluated using the logical AND operation, following the normal truth table rules. Along with the filter operator, two expressions are provided to be evaluated. This will filter the resulting data based on two selection criteria and provide a list of version values.

- name: "MULTIPLE FILTER EXPRESSIONS"
  debug:
    msg: "{{ result['content-updates']  | json_query(querystr) }}"
  vars:
    querystr: "entry[?contains(filename, '{{ version }}') && downloaded==`yes`].version"
    version: "8373-6537"

In this case, one element is returned in the list.

TASK [MULTIPLE FILTER EXPRESSIONS] ***********************************************
ok: [demo-fw01] =>
  msg:
  - 8373-6537

Tips

Here are some general tips that could be useful while developing a playbook using json_query.

  • Get familiar with basic jmespath expressions using the interactive tutorial.
  • Ensure the data provided as input to the json_query filter is a valid JSON object.
  • Use a test playbook, using the same data as your original playbook, with tasks dedicated to printing query string evaluation output to the console.
  • Be mindful of quotes within a query string, especially when using linters that specify a standard that may conflict with the jmespath specification.

Conclusion

The jmespath specification has good documentation and support for numerous languages. It requires learning a new query language, but hopefully this guide will help you get started with some common use cases.

The json_query filter is a powerful tool to have at your disposal. It can solve many JSON parsing tasks within your playbook, avoiding the need to write custom filters. If you’d like to see more, please let us know.

-Paddy

References



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Using Schema Enforcer with Docker and CI

Blog Detail

Recently Network to Code open sourced schema-enforcer, and immediately my mind turned to integrating this tool in with CI pipelines. The goal is to have fast, repeatable, and reusable pipelines that ensure the integrity of the data stored in Git repositories. We will be accomplishing repeatability and reusability by packaging schema-enforcer with Docker and publishing to a common Docker registry.

Why integrate with CI pipelines?

By integrating repositories containing structured data with a CI pipeline that enforces schema you are better able to predict the repeatability of the downstream automation that consumes the structured data. This is critical when using the data as a source of truth for automation to consume. It also helps to react faster to an incorrect schema before this is used by a configuration tool, such as Ansible. Imagine being able to empower other teams to make chages to data repositories and trust the automation is performing the checks an engineer manually does today.

How containers can speed up CI execution.

Containers can be a catalyst to speeding up the process of CI execution for the following reasons:

  1. Having purpose built containers in CI allows for standardized pipelines with little setup time.
  2. Sourcing from a pre built image to execute a single test command removes the need to build an image or manage a virtual environment per repository.
  3. Reducing build times from using pre-built images allows for faster running pipeline and helps to shorten the feedback loop to the end user.

Example with privately hosted GitLab.

For today’s example I am using my locally hosted GitLab and Docker Registry. This was done to showcase the power of building internal resources that can be easily integrated with on-premise solutions. This example could easily be adapted to run in GitHub & Travis CI with the same level of effectiveness and speed of execution.

Building a container to use in CI.

Click Here for documentation on Dockerfile construction and docker build commands. The Dockerfile is starting with python:3.8 as a base. We then set the working directory, install schema-enforcer, and lastly setting the default entrypoint and command for the container image.

Dockerfile

FROM python:3.8

WORKDIR /usr/src/app

RUN python -m pip install schema-enforcer

ENTRYPOINT ["schema-enforcer"]

CMD ["validate", "--show-pass"]

Publishing schema-enforcer container to private registry.

Click Here for documentation on hosting a private docker registry. If using Docker Hub, the image tag would change to <namespace>/<container name>:tag. If I was to push this to my personal Docker Hub namespace, the image would be whitej6/schema-enforcer:latest.

docker build -t registry.whitej6.com/ntc/docker/schema-enforcer:latest .

docker push registry.whitej6.com/ntc/docker/schema-enforcer:latest

Integrating GitLab Runner with data repo.

For the first use case, we are starting with example1 in the schema-enforcer repository located here. We then add a docker-compose.yml, where we mount in the full project repo into the previously built container and create a pipeline with two stages in .gitlab-ci.yml, which is triggered on every commit.

➜  schema-example git:(master) ✗ tree -a -I '.git' 
.
├── .gitlab-ci.yml
├── chi-beijing-rt1
│   ├── dns.yml # This will be the offending file in the failing CI pipeline.
│   └── syslog.yml
├── docker-compose.yml
├── eng-london-rt1
│   ├── dns.yml
│   └── ntp.yml
└── schema
    └── schemas
        ├── dns.yml # This will be the schema definition that triggers in the failure.
        ├── ntp.yml
        └── syslog.yml

4 directories, 9 files

Click Here for documentation on docker-compose and structuring the docker-compose.yml file. We are defining a single service called schema that uses the image we just publish to the Docker registry and are mounting in the current working directory of the pipeline execution into the container at /usr/scr/app. We are using the default entrypoint and cmd specified in the Dockerfile as schema-enforcer validate --show-pass but this could be overwritten in the service definition. For instance, if we would like to enable the strict flag, we would add command: ['validate', '--show-pass', '--strict'] inside the schema service. Keep in mind the command attribute of a service overwrites the CMD directive in the Dockerfile.

---
version: "3.8"
services:
  schema:
    # Uncomment the next line to enable strict on schema-enforcer
    # command: ['validate', '--show-pass', '--strict']
    image: registry.whitej6.com/ntc/docker/schema-enforcer:latest
    volumes:
      - ./:/usr/src/app/

Click Here for documentation on structuring the .gitlab-ci.yml file. We are defining two stages in the pipeline, and each stage has one job. The first stage ensures we have the most up to date container image for schema-enforcer and next we run schema service from the docker-compose.yml file. By specifiying --exit-code-from schema we are passing the exit code from the schema to the docker-compose command. The commands specified in the script are used to determine whether the job runs successfully. If the schema service returns a non-zero exit code, the job and pipeline will be marked as failed. The second stage ensures we are good tenants of docker and clean up after ourselves, docker-compose down will ensure we remove any containers or networks associated with this project.

---
stages:
  - test
  - clean

test:
  stage: test
  script:
    - docker-compose pull
    - docker-compose up --exit-code-from schema schema

clean:
  stage: clean
  script:
    - docker-compose down || true
  when: always

Failing.

In this example chi-beijing-rt1/dns.yml has a boolean value instead of an IPv4 address as specified in the schema/schemas/dns.yml. As you can see, the container returned a non-zero exit code, failing the pipeline and blocking the merge into a protected branch.

chi-beijing-rt1/dns.yml

# jsonschema: schemas/dns_servers
---
dns_servers:
  - address: true # This is a boolean value and we are expecting a string value in an IPv4 format
  - address: "10.2.2.2"

schema/schemas/dns.yml

---
$schema: "http://json-schema.org/draft-07/schema#"
$id: "schemas/dns_servers"
description: "DNS Server Configuration schema."
type: "object"
properties:
  dns_servers:
    type: "array"
    items:
      type: "object"
      properties:
        name:
          type: "string"
        address: # This is the specific property that will be used in the failed example.
          type: "string"
          format: "ipv4"
        vrf:
          type: "string"
      required:
        - "address"
      uniqueItems: true
required:
  - "dns_servers"

Runner output.

We see exactly which file and attribute fails the pipeline along with the runtime of the pipeline in seconds. 

Blocked Merge Request.

When sourcing from a branch with a failing pipeline, GitLab has the ability to block merging until the pipeline succeeds. By having the pipeline triggered on each commit we can resolve the issue on the next commit, which then triggers a new pipeline. Once the issue has been resolved, we will see the Merge button is no longer greyed out and can be merged into the target branch.

Passing.

Now the previous error has been corrected and a new commit has been made on the same branch. GitLab has then rerun the same pipeline with the new commit and upon passing the branch can be merged into the protected branch.

chi-beijing-rt1/dns.yml

# jsonschema: schemas/dns_servers
---
dns_servers:
  - address: "10.2.2.3" # This is the value that has been updated to align with the schema definition.
  - address: "10.2.2.2"

Runner output,

With the issue resolved and committed, we now see the previously offending file is passing the pipeline. 

Fixed Merge Request.

The merge request is now able to be merged into the target branch.

As a network engineer by trade that has come into automation, it at times has been difficult to trust the machine that was building the machine let alone trusting others eager to collaborate. Building safe guards for schema into my early pipelines would have saved me a tremendous amount of time and headache.

Friends don't let friends merge bad data.



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!