databricks / databricks-sdk-py
Databricks SDK for Python (Beta)
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing databricks/databricks-sdk-py in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewDatabricks SDK for Python (Beta) ) Beta: This SDK is supported for production use cases, but we do expect future releases to have some interface changes; see Interface stability. We are keen to hear feedback from you on these SDKs. Please file issues, and we will address them. | See also the SDK for Java | See also the SDK for Go | See also the Terraform Provider | See also cloud-specific docs (AWS, Azure, GCP) | See also the API reference on readthedocs The Databricks SDK for Python includes functionality to accelerate development with Python for the Databricks Lakehouse. It covers all public Databricks REST API operations. The SDK's internal HTTP client is robust and handles failures on different levels by performing intelligent retries. Contents • Getting started • Code examples • Authentication • Long-running operations • Paginated responses • Retries • Single-sign-on with OAuth • User Agent Request Attribution • Error handling • Logging • Integration with • Interface stability Getting started • Please install Databricks SDK for Python via and instantiate : Databricks SDK for Python is compatible with Python 3.7 _(until June 2023)_, 3.8, 3.9, 3.10, and 3.11. **Note:** Databricks Runtime starting from version 13.1 includes a bundled version of the Python SDK. It is highly recommended to upgrade to the latest version which you can do by running the following in a notebook cell: followed by Code examples The Databricks SDK for Python comes with a number of examples demonstrating how to use the library for various common use-cases, including • Using the SDK with OAuth from a webserver • Using long-running operations • Authenticating a client app using OAuth These examples and more are located in the directory of the Github repository. Some other examples of using the SDK include: • Unity Catalog Automated Migration heavily relies on Python SDK for working with Databricks APIs. • ip-access-list-analyzer checks & prunes invalid entries from IP Access Lists. Authentication If you use Databricks configuration profiles or Databricks-specific environment variables for Databricks authentication, the only code required to start working with a Databricks workspace is the following code snippet, which instructs the Databricks SDK for Python to use its default authentication flow: The conventional name for the variable that holds the workspace-level client of the Databricks SDK for Python is , which is shorthand for . In this section • Default authentication flow • Databricks native authentication • Azure native authentication • Overriding .databrickscfg • Additional authentication configuration options Default authentication flow If you run the Databricks Terraform Provider, the Databricks SDK for Go, the Databricks CLI, or applications that target the Databricks SDKs for other languages, most likely they will all interoperate nicely together. By default, the Databricks SDK for Python tries the following authentication methods, in the following order, until it succeeds: • Databricks native authentication • Azure native authentication • If the SDK is unsuccessful at this point, it returns an authentication error and stops running. You can instruct the Databricks SDK for Python to use a specific authentication method by setting the argument as described in the following sections. For each authentication method, the SDK searches for compatible authentication credentials in the following locations, in the following order. Once the SDK finds a compatible set of credentials that it can use, it stops searching: • Credentials that are hard-coded into configuration arguments. :warning: **Caution**: Databricks does not recommend hard-coding credentials into arguments, as they can be exposed in plain text in version control systems. Use environment variables or configuration profiles instead. • Credentials in Databricks-specific environment variables. • For Databricks native authentication, credentials in the file's configuration profile from its default file location ( for Linux or macOS, and for Windows). • For Azure native authentication, the SDK searches for credentials through the Azure CLI as needed. Depending on the Databricks authentication method, the SDK uses the following information. Presented are the and arguments (which have corresponding file fields), their descriptions, and any corresponding environment variables. Databricks native authentication By default, the Databricks SDK for Python initially tries Databricks token authentication ( argument). If the SDK is unsuccessful, it then tries Workload Identity Federation (WIF). See Supported WIF for the supported JWT token providers. • For Databricks token authentication, you must provide and ; or their environment variable or file field equivalents. • For Databricks OIDC authentication, you must provide the , and _(optional)_ either directly, through the corresponding environment variables, or in your configuration file. • For Azure DevOps OIDC authentication, the is irrelevant as the audience is always set to . Also, the pipeline variable required for OIDC request must be exposed as the environment variable, following Pipeline variables | Argument | Description | Environment variable | |------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------| | | _(String)_ The Databricks host URL for either the Databricks workspace endpoint or the Databricks accounts endpoint. | | | | _(String)_ The Databricks account ID for the Databricks accounts endpoint. Only has effect when is either _(AWS)_, _(Azure)_, or _(GCP)_. | | | | _(String)_ The Databricks personal access token (PAT) _(AWS, Azure, and GCP)_ or Azure Active Directory (Azure AD) token _(Azure)_. | | | | _(String)_ The…