Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,45 @@
# Changelog

## 1.0.1

### Added

- **`llm_gateway_generate_text()` UDF wrapper for AI-powered DataFrame transformations.**

New method on proxy providers to generate AI completions in DataFrame operations via the `llm_gateway_generate` UDF.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this and the two other mentions of the UDF, can we remove the llm_gateway_generate name of the built-in UDF? I think it makes sense to document the implementation of this being a built-in UDF, so that users can understand its usage. But the name of it might change over time, and is opaque/hidden from the user of the SDK. My proposal would be via a built-in UDF for the end of this sentence, and similar for other mentions of the UDF name?


```python
from datacustomcode import Client
from pyspark.sql.functions import col

client = Client()

# Generate summaries in a DataFrame column
df = df.withColumn(
"summary",
client._proxy.llm_gateway_generate_text(
"Summarize {company}: revenue={revenue}, CEO={ceo}",
{
"company": col("company"),
"revenue": col("revenue"),
"ceo": col("ceo")
},
llmModelId="sfdc_ai__DefaultGPT4Omni",
maxTokens=200
)
)
```

**Local Development:** Returns placeholder string (doesn't execute)
**BYOC Production:** Calls real `llm_gateway_generate` UDF

**Parameters:**
- `template` (str): Prompt template with {placeholder} syntax
- `values` (dict or Column): Dict mapping placeholders to Columns, or pre-built named_struct
- `llmModelId` (str): Model identifier (required, e.g., "sfdc_ai__DefaultGPT4Omni")
- `maxTokens` (int): Maximum response length (required, e.g., 200)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct- my understanding would be that maxTokens is actually providing a ceiling for how many LLM tokens can be used, instead of a max response text length. Might want to double-check LLM Gateway docs.



## 1.0.0

### Breaking Changes
Expand Down
30 changes: 27 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ You should only need the following methods:
* `write_to_dmo(name, spark_dataframe, write_mode)` – Write to a Data Lake Object by name with a Spark dataframe

For example:
```
```python
from datacustomcode import Client

client = Client()
Expand All @@ -166,10 +166,34 @@ sdf = client.read_dlo('my_DLO')
client.write_to_dlo('output_DLO')
```

### LLM Gateway

> [!WARNING]
> Currently we only support reading from DMOs and writing to DMOs or reading from DLOs and writing to DLOs, but they cannot mix.
Copy link
Copy Markdown
Contributor

@markdlv-sf markdlv-sf Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this warning? I think it should stay in the section above this one.

Generate AI completions in DataFrame transformations using the LLM gateway UDF.

```python
from datacustomcode import Client
from pyspark.sql.functions import col

client = Client()

# Use template with placeholders
df = df.withColumn(
"summary",
client._proxy.llm_gateway_generate_text(
"Summarize {company}: revenue={revenue}, CEO={ceo}",
{
"company": col("company"),
"revenue": col("revenue"),
"ceo": col("ceo")
},
llmModelId="sfdc_ai__DefaultGPT4Omni",
maxTokens=200
)
)
```

> [!WARNING]
> This method returns a placeholder string in local development and won't execute. It only works when deployed, where it calls the real LLM Gateway service via the `llm_gateway_generate` UDF.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could we remove "and won't execute" from this sentence? It almost sounds like it won't work locally at all, but really it (should) return a placeholder response string successfully.


## CLI

Expand Down
3 changes: 3 additions & 0 deletions src/datacustomcode/proxy/client/LocalProxyClientProvider.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,6 @@ def __init__(self, **kwargs: object) -> None:

def call_llm_gateway(self, llmModelId: str, prompt: str, maxTokens: int) -> str:
return f"Hello, thanks for using {llmModelId}. So many tokens: {maxTokens}"

def llm_gateway_generate_text(self, template, values, llmModelId: str, maxTokens: int):
return f"Using Generate Text with {llmModelId} and maxTokens: {maxTokens}"
3 changes: 3 additions & 0 deletions src/datacustomcode/proxy/client/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,6 @@ def __init__(self):

@abstractmethod
def call_llm_gateway(self, llmModelId: str, prompt: str, maxTokens: int) -> str: ...

@abstractmethod
def llm_gateway_generate_text(self, template, values, llmModelId: str, maxTokens: int): ...
Loading