Skip to content

Self-serve CSV user data export has poor query performance #5954

@bjester

Description

@bjester

This issue is not open for contribution. Visit Contributing guidelines to learn about the contributing process and how to find suitable issues.

Overview

The generateusercsv_task which is triggerable from a user's settings page causes queries that perform poorly, particularly for users with many channels.

Complexity: Medium
Target branch: unstable

Context

A single query spawned from the task took nearly 3 hours before it was killed. The task was queued for a user with large upload usage (20GB) and many channels.

The Change

Trace generateusercsv_task to each query it produces and optimize them as needed using previous known techniques for optimizing queries:

  • aligning filters with indices
  • using CTEs
  • avoiding complex joins

In particular, special attention should be given to file-related queries.

How to Get There

The task can be triggered from the /en/settings/#/account page, using the EXPORT DATA button

Out of Scope

Any queries not related to generateusercsv_task

Acceptance Criteria

  • For optimized queries, before and after SQL dumps and EXPLAIN analysis are ideal for communicating improvements

Testing

Ideally, tests should exist (be written if not) before any changes are made, to ensure the changes do not break the task's functionality.

References

https://learningequality.slack.com/archives/C0WHZ9FPX/p1780509080212179

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions