back to home

sodadata / soda-core

Data Contracts engine for the modern data stack. https://www.soda.io

2,308 stars
259 forks
166 issues
PythonShellMakefile

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing sodadata/soda-core in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/sodadata/soda-core)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

Soda Core — Data Contracts Engine Soda Core is a data quality and data contract verification engine. It lets you define data quality contracts in YAML and automatically validate both schema and data across your data stack. Soda Core provides the Soda Command-Line Interface (CLI) and Python API, which you can use to generate, test, publish, and verify contracts. These operations can be executed locally during development, embedded programmatically within your data pipelines (Airflow, Dagster, Prefect, etc.), or executed remotely when connected to Soda Cloud. Highlights • Define data contracts using a clean, human-readable YAML syntax • Run checks on PostgreSQL, Snowflake, BigQuery, Databricks, DuckDB, and more • Use 50+ built-in data quality checks for common and advanced validations • Integrate with Soda Cloud for centralized management and anomaly detection monitoring Setup This repository hosts the open source Soda Core packages which are installable using the **Public PyPI installation flow** described in Soda's documentation. Requirements To use Soda, you must have installed the following on your system. • **Python 3.9, 3.10, 3.11, or 3.12** To check your existing version, use the CLI command: or . If you have not already installed Python, consider using to manage multiple versions of Python in your environment. **Note:** While Python 3.12 is the highest officially supported version, there are no known issues preventing use of Python 3.13+. • **UV (recommended) or Pip 21.0 or greater** We recommend using UV for faster and more reliable package management. To install UV, see the UV installation guide. Alternatively, you can use pip (version 21.0+). To check your pip version: Installation Soda Core v4 open source packages are available on public PyPI and have the form . Using UV (recommended) Using pip Replace with the appropriate package for your data source. For a list of supported data sources, packages, and configurations, see the data source reference for Soda Core. Working with legacy Soda Core v3 Soda package names have changed with the release of version 4. Legacy version 3 open source packages have the form . For example, to install Soda Core v3 for Postgres, pinning the version at : For a list of supported data sources and other details, see the v3 documentation within this repository. For information about Soda Core v3, see the v3 README file and the Soda v3 online documentation. Development Prerequisites • Python 3.10+ • UV (recommended) or pip 21.0+ Setup (UV — recommended) Clone the repo and install all workspace packages with dev dependencies: Setup (pip — alternative) If you don't have UV, you can still set up a dev environment with pip: Running tests Pre-commit checks Quickstart The examples show minimal configurations. For more detailed examples and explanations, see the Soda Cloud documentation. To see detailed logs, add or to any of these commands. Configure a data source To define a local configuration for your data source and validate the connection, run the following commands. Create data source config Parameter | Required | Description --- | --- | --- , | Yes | Output file path for the data source YAML configuration file. By default, the YAML file generated as is a template for PostgreSQL connections. To learn how to populate a data source configuration file, see the data source reference for Soda Core. Test data source config Parameter | Required | Description --- | --- | --- , | Yes | Path to a data source YAML configuration file. Create a contract Create and populate new contract YAML file. To understand how to write a contract, see the online documentation, or the example below, which is configured to test a table or view with qualified name within a data source named . This table is assumed to have columns named , , and . The data source name must match the property in the data source configuration file. Verify a contract locally To evaluate a contract, run a contract verification scan: Parameter | Required | Description --- | --- | --- , | Yes | Path to a data source YAML configuration file. , | Yes | Path to a data contract YAML configuration file. Interact with Soda Cloud To execute commands remotely, connect Sore Core to Soda Cloud. To learn how configure Soda Cloud, see the documentation about configuring data sources and datasets and working with contracts. > **Request a free Soda Cloud account** > > Request a free account to evaluate Soda Cloud. You’ll get access for up to three datasets. Connect to Soda Cloud To connect Soda Core to Soda Cloud, generate a Soda Cloud config file: Parameter | Required | Description --- | --- | --- , | Yes | Output file path for the Soda Cloud YAML configuration file. Obtain Soda Cloud credentials and add them to the Soda Cloud config file. To generate credentials, follow this procedure. To test the connection: Parameter | Required | Description --- | --- | --- , | Yes | Path to a Soda Cloud YAML configuration file. Publish to Soda Cloud To publish a local contract to Soda Cloud as the source of truth: Parameter | Required | Description --- | --- | --- , | Yes | Path to a contract YAML file. , | Yes | Path to Soda Cloud YAML configuration file. To publish local contract verification results to Soda Cloud, add a Soda Cloud YAML configuration file and enable the flag: Parameter | Required | Description --- | --- | --- , | Yes | Path to a data source YAML configuration file. , | Yes | Path to a data contract YAML configuration file. , | Yes | Path to a Soda Cloud YAML configuration file. , | No | Publish results and contract to Soda Cloud. Requires "Manage contract" permission; learn about permissions here. Verify a contract remotely using Soda Agent To verify contracts via Soda Cloud using the Soda Agent, configure a dataset and contract in Soda Cloud and configure an agent in your environment. To obtain the Soda Cloud dataset identifier, for example , open t…