Skip to content

Commit e4bc9bb

Browse files
authored
Merge pull request #72 from forcedotcom/jo_sf_cli
sf cli auth
2 parents 5c7fe22 + 048a907 commit e4bc9bb

9 files changed

Lines changed: 814 additions & 65 deletions

File tree

README.md

Lines changed: 74 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,10 @@ Use of this project with Salesforce is subject to the [TERMS OF USE](./TERMS_OF_
1212
- JDK 17
1313
- Docker support like [Docker Desktop](https://docs.docker.com/desktop/)
1414
- A salesforce org with some DLOs or DMOs with data and this feature enabled (it is not GA)
15-
- An [External Client App](#creating-an-external-client-app)
15+
- **One of the following** for authentication:
16+
- A Salesforce org already authenticated via the [Salesforce CLI](https://developer.salesforce.com/tools/salesforcecli)
17+
(simplest — no External Client App needed)
18+
- An [External Client App](#creating-an-external-client-app) configured with OAuth settings
1619

1720
## Installation
1821
The SDK can be downloaded directly from PyPI with `pip`:
@@ -65,6 +68,13 @@ datacustomcode configure
6568
datacustomcode run ./payload/entrypoint.py
6669
```
6770

71+
> [!TIP]
72+
> **Already using the Salesforce CLI?** If you have authenticated an org with `sf org login web
73+
> --alias myorg`, you can skip `datacustomcode configure` entirely:
74+
> ```zsh
75+
> datacustomcode run ./payload/entrypoint.py --sf-cli-org myorg
76+
> ```
77+
6878
> [!IMPORTANT]
6979
> The example entrypoint.py requires a `Account_std__dll` DLO to be present. And in order to deploy the script (next step), the output DLO (which is `Account_std_copy__dll` in the example entrypoint.py) also needs to exist and be in the same dataspace as `Account_std__dll`.
7080
@@ -183,17 +193,19 @@ Options:
183193
- `--auth-type TEXT`: Authentication method (default: `oauth_tokens`)
184194
- `oauth_tokens` - OAuth tokens with refresh_token
185195
- `client_credentials` - Server-to-server using client_id/secret only
186-
- `--login-url TEXT`: Salesforce login URL
187196

188-
For OAuth Tokens authentication:
189-
- `--client-id TEXT`: External Client App Client ID
190-
- `--client-secret TEXT`: External Client App Client Secret
191-
- `--refresh-token TEXT`: OAuth refresh token (see [Obtaining Refresh Token](#obtaining-refresh-token-and-core-token))
192-
- `--core-token TEXT`: (Optional) OAuth core/access token - if not provided, it will be obtained using the refresh token
197+
You will be prompted for the following depending on auth type:
198+
199+
*Common to all auth types:*
200+
- **Login URL**: Salesforce login URL
201+
- **Client ID**: External Client App Client ID
202+
203+
*For OAuth Tokens authentication:*
204+
- **Client Secret**: External Client App Client Secret
205+
- **Redirect URI**: OAuth redirect URI
193206

194-
For Client Credentials authentication (server-to-server):
195-
- `--client-id TEXT`: External Client App Client ID
196-
- `--client-secret TEXT`: External Client App Client Secret
207+
*For Client Credentials authentication:*
208+
- **Client Secret**: External Client App Client Secret
197209

198210
##### Using Environment Variables (Alternative)
199211

@@ -255,6 +267,9 @@ Options:
255267
- `--config-file TEXT`: Path to configuration file
256268
- `--dependencies TEXT`: Additional dependencies (can be specified multiple times)
257269
- `--profile TEXT`: Credential profile name (default: "default")
270+
- `--sf-cli-org TEXT`: Salesforce CLI org alias or username (e.g. `dev1`). Fetches
271+
credentials via `sf org display` — no `datacustomcode configure` step needed.
272+
Takes precedence over `--profile` if both are supplied.
258273

259274

260275
#### `datacustomcode zip`
@@ -277,7 +292,7 @@ Options:
277292
- `--version TEXT`: Version of the transformation job (default: "0.0.1")
278293
- `--description TEXT`: Description of the transformation job (default: "")
279294
- `--network TEXT`: docker network (default: "default")
280-
- `--cpu-size TEXT`: CPU size for the deployment (default: "CPU_XL"). Available options: CPU_L(Large), CPU_XL(Extra Large), CPU_2XL(2X Large), CPU_4XL(4X Large)
295+
- `--cpu-size TEXT`: CPU size for the deployment (default: `CPU_2XL`). Available options: CPU_L(Large), CPU_XL(Extra Large), CPU_2XL(2X Large), CPU_4XL(4X Large)
281296

282297

283298
## Docker usage
@@ -365,6 +380,54 @@ You can read more about Jupyter Notebooks here: https://jupyter.org/
365380

366381
You now have all fields necessary for the `datacustomcode configure` command.
367382

383+
### Using the Salesforce CLI for authentication
384+
385+
The [Salesforce CLI](https://developer.salesforce.com/tools/salesforcecli) (`sf`) lets you authenticate an org once and then reference it by alias across tools — including this SDK via `--sf-cli-org`.
386+
387+
#### Installing the Salesforce CLI
388+
389+
Follow the [official install guide](https://developer.salesforce.com/docs/atlas.en-us.sfdx_setup.meta/sfdx_setup/sfdx_setup_install_cli.htm), or use a package manager:
390+
391+
```zsh
392+
# macOS (Homebrew)
393+
brew install sf
394+
395+
# npm (all platforms)
396+
npm install --global @salesforce/cli
397+
```
398+
399+
Verify the install:
400+
```zsh
401+
sf --version
402+
```
403+
404+
#### Authenticating an org
405+
406+
**Browser-based (recommended for developer orgs and sandboxes):**
407+
```zsh
408+
# Production / Developer Edition
409+
sf org login web --alias myorg
410+
411+
# Sandbox
412+
sf org login web --alias mysandbox --instance-url https://test.salesforce.com
413+
414+
# Custom domain
415+
sf org login web --alias myorg --instance-url https://mycompany.my.salesforce.com
416+
```
417+
418+
Each command opens a browser tab. After you log in and approve access, the CLI stores the session locally.
419+
420+
**Verify the stored org and confirm the alias:**
421+
```zsh
422+
sf org list
423+
sf org display --target-org myorg
424+
```
425+
426+
Once authenticated, pass the alias directly to `datacustomcode run`:
427+
```zsh
428+
datacustomcode run ./payload/entrypoint.py --sf-cli-org myorg
429+
```
430+
368431
### Obtaining Refresh Token and Core Token
369432

370433
If you're using OAuth Tokens authentication, the initial configure will retrieve and store tokens. Run `datacustomcode auth` to refresh these when they expire.

src/datacustomcode/cli.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,11 @@
1616
import json
1717
import os
1818
import sys
19-
from typing import List, Union
19+
from typing import (
20+
List,
21+
Optional,
22+
Union,
23+
)
2024

2125
import click
2226
from loguru import logger
@@ -294,12 +298,20 @@ def scan(filename: str, config: str, dry_run: bool, no_requirements: bool):
294298
@click.option("--config-file", default=None)
295299
@click.option("--dependencies", default=[], multiple=True)
296300
@click.option("--profile", default="default")
301+
@click.option(
302+
"--sf-cli-org",
303+
default=None,
304+
help="SF CLI org alias or username. Fetches credentials via `sf org display`.",
305+
)
297306
def run(
298307
entrypoint: str,
299308
config_file: Union[str, None],
300309
dependencies: List[str],
301310
profile: str,
311+
sf_cli_org: Optional[str],
302312
):
303313
from datacustomcode.run import run_entrypoint
304314

305-
run_entrypoint(entrypoint, config_file, dependencies, profile)
315+
run_entrypoint(
316+
entrypoint, config_file, dependencies, profile, sf_cli_org=sf_cli_org
317+
)

src/datacustomcode/io/reader/query_api.py

Lines changed: 41 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -22,50 +22,21 @@
2222
Union,
2323
)
2424

25-
import pandas.api.types as pd_types
26-
from pyspark.sql.types import (
27-
BooleanType,
28-
DoubleType,
29-
LongType,
30-
StringType,
31-
StructField,
32-
StructType,
33-
TimestampType,
34-
)
3525
from salesforcecdpconnector.connection import SalesforceCDPConnection
3626

3727
from datacustomcode.credentials import AuthType, Credentials
3828
from datacustomcode.io.reader.base import BaseDataCloudReader
29+
from datacustomcode.io.reader.sf_cli import SFCLIDataCloudReader
30+
from datacustomcode.io.reader.utils import _pandas_to_spark_schema
3931

4032
if TYPE_CHECKING:
41-
import pandas
4233
from pyspark.sql import DataFrame as PySparkDataFrame, SparkSession
43-
from pyspark.sql.types import AtomicType
34+
from pyspark.sql.types import AtomicType, StructType
4435

4536
logger = logging.getLogger(__name__)
4637

4738

4839
SQL_QUERY_TEMPLATE: Final = "SELECT * FROM {} LIMIT {}"
49-
PANDAS_TYPE_MAPPING = {
50-
"object": StringType(),
51-
"int64": LongType(),
52-
"float64": DoubleType(),
53-
"bool": BooleanType(),
54-
}
55-
56-
57-
def _pandas_to_spark_schema(
58-
pandas_df: pandas.DataFrame, nullable: bool = True
59-
) -> StructType:
60-
fields = []
61-
for column, dtype in pandas_df.dtypes.items():
62-
spark_type: AtomicType
63-
if pd_types.is_datetime64_any_dtype(dtype):
64-
spark_type = TimestampType()
65-
else:
66-
spark_type = PANDAS_TYPE_MAPPING.get(str(dtype), StringType())
67-
fields.append(StructField(column, spark_type, nullable))
68-
return StructType(fields)
6940

7041

7142
def create_cdp_connection(
@@ -136,6 +107,7 @@ class QueryAPIDataCloudReader(BaseDataCloudReader):
136107
Supports multiple authentication methods:
137108
- OAuth Tokens (default, needs client_id/secret with refresh_token)
138109
- Client Credentials (server-to-server, needs client_id/secret only)
110+
- SF CLI (uses ``sf org display`` access token via the REST API directly)
139111
140112
Supports dataspace configuration for querying data within specific dataspaces.
141113
When a dataspace is provided (and not "default"), queries are executed within
@@ -149,6 +121,7 @@ def __init__(
149121
spark: SparkSession,
150122
credentials_profile: str = "default",
151123
dataspace: Optional[str] = None,
124+
sf_cli_org: Optional[str] = None,
152125
) -> None:
153126
"""Initialize QueryAPIDataCloudReader.
154127
@@ -160,14 +133,30 @@ def __init__(
160133
dataspace: Optional dataspace identifier. If provided and not "default",
161134
the connection will be configured for the specified dataspace.
162135
When None or "default", uses the default dataspace.
136+
sf_cli_org: Optional SF CLI org alias or username. When set, the
137+
reader delegates to :class:`SFCLIDataCloudReader` which calls
138+
the Data Cloud REST API directly using the token obtained from
139+
``sf org display``, bypassing the CDP token-exchange flow.
163140
"""
164141
self.spark = spark
165-
credentials = Credentials.from_available(profile=credentials_profile)
166-
logger.debug(
167-
"Initializing QueryAPIDataCloudReader with "
168-
f"auth_type={credentials.auth_type.value}"
169-
)
170-
self._conn = create_cdp_connection(credentials, dataspace)
142+
if sf_cli_org:
143+
logger.debug(
144+
f"Initializing QueryAPIDataCloudReader with SF CLI org '{sf_cli_org}'"
145+
)
146+
self._sf_cli_reader: Optional[SFCLIDataCloudReader] = SFCLIDataCloudReader(
147+
spark=spark,
148+
sf_cli_org=sf_cli_org,
149+
dataspace=dataspace,
150+
)
151+
self._conn = None
152+
else:
153+
self._sf_cli_reader = None
154+
credentials = Credentials.from_available(profile=credentials_profile)
155+
logger.debug(
156+
"Initializing QueryAPIDataCloudReader with "
157+
f"auth_type={credentials.auth_type.value}"
158+
)
159+
self._conn = create_cdp_connection(credentials, dataspace)
171160

172161
def read_dlo(
173162
self,
@@ -186,8 +175,15 @@ def read_dlo(
186175
Returns:
187176
PySparkDataFrame: The PySpark DataFrame.
188177
"""
178+
sf_cli_reader: Optional[SFCLIDataCloudReader] = getattr(
179+
self, "_sf_cli_reader", None
180+
)
181+
if sf_cli_reader is not None:
182+
return sf_cli_reader.read_dlo(name, schema, row_limit)
183+
189184
query = SQL_QUERY_TEMPLATE.format(name, row_limit)
190185

186+
assert self._conn is not None
191187
pandas_df = self._conn.get_pandas_dataframe(query)
192188

193189
# Convert pandas DataFrame to Spark DataFrame
@@ -214,8 +210,15 @@ def read_dmo(
214210
Returns:
215211
PySparkDataFrame: The PySpark DataFrame.
216212
"""
213+
sf_cli_reader: Optional[SFCLIDataCloudReader] = getattr(
214+
self, "_sf_cli_reader", None
215+
)
216+
if sf_cli_reader is not None:
217+
return sf_cli_reader.read_dmo(name, schema, row_limit)
218+
217219
query = SQL_QUERY_TEMPLATE.format(name, row_limit)
218220

221+
assert self._conn is not None
219222
pandas_df = self._conn.get_pandas_dataframe(query)
220223

221224
# Convert pandas DataFrame to Spark DataFrame

0 commit comments

Comments
 (0)