-
Notifications
You must be signed in to change notification settings - Fork 6
add llm_gateway_generate_text #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,45 @@ | ||
| # Changelog | ||
|
|
||
| ## 1.0.1 | ||
|
|
||
| ### Added | ||
|
|
||
| - **`llm_gateway_generate_text()` UDF wrapper for AI-powered DataFrame transformations.** | ||
|
|
||
| New method on proxy providers to generate AI completions in DataFrame operations via the `llm_gateway_generate` UDF. | ||
|
|
||
| ```python | ||
| from datacustomcode import Client | ||
| from pyspark.sql.functions import col | ||
|
|
||
| client = Client() | ||
|
|
||
| # Generate summaries in a DataFrame column | ||
| df = df.withColumn( | ||
| "summary", | ||
| client._proxy.llm_gateway_generate_text( | ||
| "Summarize {company}: revenue={revenue}, CEO={ceo}", | ||
| { | ||
| "company": col("company"), | ||
| "revenue": col("revenue"), | ||
| "ceo": col("ceo") | ||
| }, | ||
| llmModelId="sfdc_ai__DefaultGPT4Omni", | ||
| maxTokens=200 | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| **Local Development:** Returns placeholder string (doesn't execute) | ||
| **BYOC Production:** Calls real `llm_gateway_generate` UDF | ||
|
|
||
| **Parameters:** | ||
| - `template` (str): Prompt template with {placeholder} syntax | ||
| - `values` (dict or Column): Dict mapping placeholders to Columns, or pre-built named_struct | ||
| - `llmModelId` (str): Model identifier (required, e.g., "sfdc_ai__DefaultGPT4Omni") | ||
| - `maxTokens` (int): Maximum response length (required, e.g., 200) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is correct- my understanding would be that |
||
|
|
||
|
|
||
| ## 1.0.0 | ||
|
|
||
| ### Breaking Changes | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -155,7 +155,7 @@ You should only need the following methods: | |
| * `write_to_dmo(name, spark_dataframe, write_mode)` – Write to a Data Lake Object by name with a Spark dataframe | ||
|
|
||
| For example: | ||
| ``` | ||
| ```python | ||
| from datacustomcode import Client | ||
|
|
||
| client = Client() | ||
|
|
@@ -166,10 +166,34 @@ sdf = client.read_dlo('my_DLO') | |
| client.write_to_dlo('output_DLO') | ||
| ``` | ||
|
|
||
| ### LLM Gateway | ||
|
|
||
| > [!WARNING] | ||
| > Currently we only support reading from DMOs and writing to DMOs or reading from DLOs and writing to DLOs, but they cannot mix. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why remove this warning? I think it should stay in the section above this one. |
||
| Generate AI completions in DataFrame transformations using the LLM gateway UDF. | ||
|
|
||
| ```python | ||
| from datacustomcode import Client | ||
| from pyspark.sql.functions import col | ||
|
|
||
| client = Client() | ||
|
|
||
| # Use template with placeholders | ||
| df = df.withColumn( | ||
| "summary", | ||
| client._proxy.llm_gateway_generate_text( | ||
| "Summarize {company}: revenue={revenue}, CEO={ceo}", | ||
| { | ||
| "company": col("company"), | ||
| "revenue": col("revenue"), | ||
| "ceo": col("ceo") | ||
| }, | ||
| llmModelId="sfdc_ai__DefaultGPT4Omni", | ||
| maxTokens=200 | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| > [!WARNING] | ||
| > This method returns a placeholder string in local development and won't execute. It only works when deployed, where it calls the real LLM Gateway service via the `llm_gateway_generate` UDF. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: could we remove "and won't execute" from this sentence? It almost sounds like it won't work locally at all, but really it (should) return a placeholder response string successfully. |
||
|
|
||
| ## CLI | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this and the two other mentions of the UDF, can we remove the
llm_gateway_generatename of the built-in UDF? I think it makes sense to document the implementation of this being a built-in UDF, so that users can understand its usage. But the name of it might change over time, and is opaque/hidden from the user of the SDK. My proposal would bevia a built-in UDFfor the end of this sentence, and similar for other mentions of the UDF name?