Hi Evidently team,
We are currently using Evidently for ML observability, specifically the Data Drift reports. For numerical columns, the data drift visualizations (e.g., chi-square–based plots / line graphs) use a binned representation, where the x-axis corresponds to bin indices created internally by Evidently.
At the moment, it appears that the binning strategy and bin size are fixed or automatically determined, and the x-axis shows these bin indices rather than user-controlled bins.
Questions:
-
Is there any existing parameter or configuration that allows users to:
-
Customize the number of bins or bin width, or
-
Control how the bin indices on the x-axis are created for data drift plots?
-
Additionally, we’ve observed that when the dataset size is small, each bin sometimes ends up containing only a single data point. In such cases, the drift plot appears to reflect the raw data values rather than an aggregated statistic (e.g., mean/median) over a batch of data points.
If these options are not currently supported, please let us know if there is a recommended workaround or if this is something planned for future releases.
Thanks in advance, and appreciate all the work on Evidently!
Hi Evidently team,
We are currently using Evidently for ML observability, specifically the Data Drift reports. For numerical columns, the data drift visualizations (e.g., chi-square–based plots / line graphs) use a binned representation, where the x-axis corresponds to bin indices created internally by Evidently.
At the moment, it appears that the binning strategy and bin size are fixed or automatically determined, and the x-axis shows these bin indices rather than user-controlled bins.
Questions:
Is there any existing parameter or configuration that allows users to:
Customize the number of bins or bin width, or
Control how the bin indices on the x-axis are created for data drift plots?
Additionally, we’ve observed that when the dataset size is small, each bin sometimes ends up containing only a single data point. In such cases, the drift plot appears to reflect the raw data values rather than an aggregated statistic (e.g., mean/median) over a batch of data points.
Is this the expected behavior?
Is there a way to enforce a minimum number of points per bin or control the aggregation used within each bin?
If these options are not currently supported, please let us know if there is a recommended workaround or if this is something planned for future releases.
Thanks in advance, and appreciate all the work on Evidently!