mozilla / bigquery-etl
Bigquery ETL
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing mozilla/bigquery-etl in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewBigQuery ETL This repository contains Mozilla Data Team's: • Derived ETL jobs that do not require a custom container • User-defined functions (UDFs) • Airflow DAGs for scheduled bigquery-etl queries • Tools for query & UDF deployment, management and scheduling For more information, see https://mozilla.github.io/bigquery-etl/ Quick Start Prerequisites • **Pyenv** (optional) Recommended if you want to install different versions of Python, see instructions here. After the installation of pyenv, make sure that your terminal app is configured to run the shell as a login shell. • **Homebrew** (not required, but useful for Mac) - Follow the instructions here to install Homebrew on your Mac. • **Python 3.11** - (see this guide for instructions if you're on a Mac and haven't installed anything other than the default system Python). GCP CLI tools • **For Mozilla Employees (not in Data Engineering)** - Set up GCP command line tools, as described on docs.telemetry.mozilla.org. Note that some functionality (e.g. writing UDFs or backfilling queries) may not be allowed. Run to authenticate against GCP. • **For Data Engineering** - In addition to setting up the command line tools, you will want to log in to if making changes to production systems. Run (if you have not run it previously). Installing bqetl • Clone the repository • Install the command line tool • Install standard pre-commit hooks Finally, if you are using Visual Studio Code, you may also wish to use our recommended defaults: And you should now be set up to start working in the repo. The easiest way to do this for many tasks is to use . You may also want to read up on common workflows. Releasing a new version of To push a new version of to PyPI, update the in . The version numbers follow the CalVer scheme, with the _Micro_ version numbers starting at 1. For example, for the first package version getting published in March 2024, the version would be .