back to home

vertti / daffy

Lightweight DataFrame validation decorators for Pandas, Polars, Modin, and PyArrow. No custom types required.

57 stars
5 forks
1 issues
Python

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing vertti/daffy in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/vertti/daffy)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

Daffy — Validate pandas & Polars DataFrames with Python Decorators **Validate your pandas and Polars DataFrames at runtime with simple Python decorators.** Daffy catches missing columns, wrong data types, and invalid values before they cause downstream errors in your data pipeline. Also supports Modin and PyArrow DataFrames. • ✅ **Column & dtype validation** — lightweight, minimal overhead • ✅ **Value constraints** — nullability, uniqueness, range checks • ✅ **Row validation with Pydantic** — when you need deeper checks • ✅ **Works with pandas, Polars, Modin, PyArrow** — no lock-in --- Installation or with conda: Works with whatever DataFrame library you already have installed. Python 3.10–3.14. --- Quickstart If a column is missing, has wrong dtype, or violates a constraint — **Daffy fails fast** with a clear error message at the function boundary. --- Why Daffy? Most DataFrame validation tools are schema-first (define schemas separately) or pipeline-wide (run suites over datasets). **Daffy is decorator-first:** validate inputs and outputs where transformations happen. | | | | ------------------------ | -------------------------------------------------------------------------------- | | **Non-intrusive** | Just add decorators — no refactoring, no custom DataFrame types, no schema files | | **Easy to adopt** | Add in 30 seconds, remove just as fast if needed | | **In-process** | No external stores, orchestrators, or infrastructure | | **Pay for what you use** | Column validation is essentially free; opt into row validation when needed | --- Examples Column validation Regex column matching Match dynamic column names with regex patterns: Value constraints Vectorized checks with zero row iteration overhead: Available checks: , , , , , , , , , Also supported: , , , , Nullability and uniqueness Row validation with Pydantic For complex, cross-field validation: --- Daffy vs Alternatives | Use Case | Daffy | Pandera | Great Expectations | | ---------------------------- | :-----------------: | :----------------: | :-----------------: | | Function boundary guardrails | ✅ Primary focus | ⚠️ Possible | ❌ Not designed for | | Quick column/type checks | ✅ Lightweight | ⚠️ Requires schemas | ⚠️ Requires setup | | Complex statistical checks | ⚠️ Limited | ✅ Extensive | ✅ Extensive | | Pipeline/warehouse QA | ❌ Not designed for | ⚠️ Some support | ✅ Primary focus | | Multi-backend support | ✅ | ⚠️ Varies | ✅ | --- Configuration Configure Daffy project-wide via : --- Documentation Full documentation available at **daffy.readthedocs.io** • Getting Started — quick introduction • Usage Guide — comprehensive reference • API Reference — decorator signatures • Changelog — version history --- Contributing Issues and pull requests welcome on GitHub. License MIT