Create Date: Jun 5 2025
Create by: Smars Hu
This guide summarizes the security best practices for production environments: controlling permissions via Service Principal + RBAC, storing sensitive credentials with Databricks Secret Scope + Key Vault, and centrally configuring Spark credentials at the cluster level.
A Service Principal is an application identity registered in Azure AD, used for programs or services to access Azure resources as a non-human identity.
RBAC (Role-Based Access Control) allows fine-grained control over this identity’s access to Azure resources.
- Register an application in Azure AD to generate a Service Principal.
- Assign the minimum required permissions. For example:
- Assign the SP to the
Storage Blob Data Contributorrole of the Storage Account.
- Assign the SP to the
- Ensure the SP’s
client_idandclient_secretare properly secured and not exposed in code.
Secret Scope is a mechanism in Databricks for storing sensitive information (such as SP secrets, access tokens, etc.), supporting two modes:
- Databricks-managed Scope
- Azure Key Vault-backed Scope (recommended)
-
Create and configure a Key Vault in Azure to store the SP’s
client_secret. -
Create a Secret Scope in Databricks that is bound to the Key Vault:
databricks secrets create-scope --scope kv-scope --scope-backend-type AZURE_KEYVAULT \ --resource-id <keyvault-resource-id> --dns-name https://<kv-name>.vault.azure.net/
-
Access credentials using
dbutils.secrets.get():dbutils.secrets.get(scope="kv-scope", key="sp-client-secret")
3. Centrally Configure Spark Credential at the Cluster Level to Enhance Security and Maintainability
Centralized cluster-level credential configuration avoids repetitive setup in notebooks and improves security and consistency.
Configure the following items in the Spark Config of the Databricks cluster to access ADLS using OAuth + SP:
spark.hadoop.fs.azure.account.auth.type.<storage>.dfs.core.windows.net OAuth
spark.hadoop.fs.azure.account.oauth.provider.type.<storage>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.id.<storage>.dfs.core.windows.net <client-id>
spark.hadoop.fs.azure.account.oauth2.client.secret.<storage>.dfs.core.windows.net {{secrets/kv-scope/sp-client-secret}}
spark.hadoop.fs.azure.account.oauth2.client.endpoint.<storage>.dfs.core.windows.net https://login.microsoftonline.com/<tenant-id>/oauth2/tokenOnce configured, all notebooks connected to the cluster can seamlessly access ADLS.
- Credentials hardcoded in code are easily leaked if accidentally uploaded to Git or other version control systems.
- Updating credentials requires manual modification of all notebooks, which is hard to maintain.
- This does not comply with enterprise information security policies.
- All sensitive information should be managed via Secret Scope.
- All permission control should be implemented via RBAC, not by granting broad access through keys.
| Component | Responsibility |
|---|---|
| Service Principal | Azure AD identity for accessing protected resources |
| RBAC | Controls SP’s access to resources (least privilege) |
| Azure Key Vault | Securely stores SP secrets and other sensitive information |
| Databricks Secret Scope | Connects to Key Vault and securely references secrets in notebooks or cluster configs |
| Spark Config in Cluster | Globally configures credentials so all notebooks connected to the cluster have access automatically |
[Notebook]
↓ (connect)
[Databricks Cluster] —— Spark Config reads ——> [Secret Scope (KeyVault-backed)]
↓
[Service Principal Credentials]
↓
Accesses [Storage Account] via OAuth
↑
Access granularity controlled by Azure RBAC
- Use Service Principal + RBAC for fine-grained permission control
- Store all credentials in Key Vault and inject them securely via Secret Scope
- Credential configuration should be centralized at the cluster level, not scattered in notebooks
- All notebook users should access storage systems through the cluster, and no plaintext passwords or keys should appear in code
This configuration pattern is the current standard practice for enterprises running Databricks on Azure, meeting requirements for security, maintainability, and compliance.
Create Date: Jun 5 2025
Create by: Smars Hu
本指南总结了在生产环境中通过 Service Principal + RBAC 控制权限、结合 Databricks Secret Scope + Key Vault 存储敏感凭证,并在 集群层统一配置 Spark 凭证 的安全实践模式。
Service Principal(服务主体)是注册在 Azure AD 中的应用身份,用于程序或服务以非人类身份访问 Azure 资源。
RBAC(Role-Based Access Control)允许精细地控制该身份对 Azure 资源的访问权限。
- 在 Azure AD 中注册应用,生成 Service Principal。
- 分配最小必要的权限。例如:
- 将 SP 分配给 Storage Account 的
Storage Blob Data Contributor角色。
- 将 SP 分配给 Storage Account 的
- 确保该 SP 的
client_id和client_secret被妥善保管,不在代码中明文暴露。
Secret Scope 是 Databricks 中用来存储敏感信息(如 SP 密钥、访问令牌等)的机制,支持两种方式:
- Databricks-managed Scope(托管)
- Azure Key Vault-backed Scope(推荐)
-
在 Azure 中创建并配置 Key Vault,存储 SP 的
client_secret。 -
在 Databricks 中创建与 Key Vault 绑定的 Secret Scope:
databricks secrets create-scope --scope kv-scope --scope-backend-type AZURE_KEYVAULT \ --resource-id <keyvault-resource-id> --dns-name https://<kv-name>.vault.azure.net/
-
使用
dbutils.secrets.get()方式访问凭证:dbutils.secrets.get(scope="kv-scope", key="sp-client-secret")
统一配置集群层凭证可以避免在 notebook 中重复配置,提高安全性与一致性。
在 Databricks 集群的 Spark Config 中配置如下项,使用 OAuth + SP 的方式访问 ADLS:
spark.hadoop.fs.azure.account.auth.type.<storage>.dfs.core.windows.net OAuth
spark.hadoop.fs.azure.account.oauth.provider.type.<storage>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.id.<storage>.dfs.core.windows.net <client-id>
spark.hadoop.fs.azure.account.oauth2.client.secret.<storage>.dfs.core.windows.net {{secrets/kv-scope/sp-client-secret}}
spark.hadoop.fs.azure.account.oauth2.client.endpoint.<storage>.dfs.core.windows.net https://login.microsoftonline.com/<tenant-id>/oauth2/token上述配置一经设定,所有连接该集群的 notebook 均可无缝访问 ADLS。
- 凭证写在代码中极易被误上传到 Git 等版本库,造成泄露。
- 凭证更新需手动修改所有 notebook,不易维护。
- 不符合企业信息安全政策。
- 所有敏感信息应通过 Secret Scope 管理。
- 所有权限控制应通过 RBAC 实施,而非通过密钥暴力授权。
| 组件 | 职责 |
|---|---|
| Service Principal | Azure AD 身份,用于访问受保护资源 |
| RBAC | 控制 SP 对资源的访问权限(最小权限) |
| Azure Key Vault | 安全存储 SP 密钥等敏感信息 |
| Databricks Secret Scope | 连接 Key Vault,并在 notebook 或集群配置中安全引用 |
| Spark Config in Cluster | 全局配置凭证,使所有连接该集群的 notebook 自动具备访问权限 |
[Notebook]
↓ (连接)
[Databricks Cluster] —— Spark Config 读取 ——> [Secret Scope (KeyVault-backed)]
↓
[Service Principal 凭证]
↓
以 OAuth 方式访问 [Storage Account]
↑
权限由 Azure RBAC 控制访问粒度
- 使用 Service Principal + RBAC 实现细粒度权限控制
- 所有凭证存储于 Key Vault,通过 Secret Scope 安全注入
- 凭证配置应在集群层集中设置,避免分散在 notebook 中
- 所有 notebook 用户应通过集群统一访问存储系统,代码中不应出现明文密码或密钥
该配置模式是当前主流企业在 Azure 上运行 Databricks 的标准实践,符合安全、可维护、合规性要求。