Skip to content

Latest commit

 

History

History
344 lines (221 loc) · 9.16 KB

File metadata and controls

344 lines (221 loc) · 9.16 KB

🎯 Subarrays With Exactly K Distinct Integers #992

This code solves the classic “Subarrays with Exactly K Distinct Integers” problem (LeetCode 992). You’re using the very powerful pattern:

🧠 exactly(K) = atMost(K) - atMost(K - 1)

Let’s turn this into clean, structured notes you can drop straight into your DSA repo: problem, intuition, brute force, optimal approach, full code, tests, complexity, variations, FAQs… with a bit of personality sprinkled in 😄


🧾 Problem Statement

Given an integer array nums and an integer k, return the number of contiguous subarrays that contain exactly k distinct integers.

Example

nums = [1, 2, 1, 2, 3], k = 2

Valid subarrays with exactly 2 distinct:
[1, 2]
[1, 2, 1]
[2, 1, 2]
[1, 2]
Total = 4

Return 4.


🧠 High-Level Intuition

Counting exactly k distinct elements is tricky directly… but counting at most k is much easier with sliding window.

Key idea:

  • Let:

    • F(k) = number of subarrays with at most k distinct integers.
  • Then:

    • Subarrays with exactly k distinct = F(k) - F(k-1).

Because:

  • Any subarray with exactly k distinct is counted in F(k),
  • But not counted in F(k-1) (since it has too many distinct elements),
  • So subtracting removes all subarrays with fewer than k distinct, leaving only those with exactly k.

So the whole problem reduces to:

  1. Efficiently compute atMostK(nums, k)
  2. Efficiently compute atMostK(nums, k-1)
  3. Answer = difference.

🐢 Brute Force Approach

Idea

Check every subarray nums[i..j]:

  1. For each i from 0 to n-1:

    • Keep a frequency map from i to each j ≥ i.
    • Count distinct elements in nums[i..j].
    • If distinct == k, increment answer.

Complexity

  • There are O(n²) subarrays.
  • Maintaining distinct count per (i,j) naive → O(n) each.
  • Total: O(n³) naive, or O(n²) with careful reuse of counts.

Way too slow for large n (like 10⁵). Time to bring in the sliding window superheroes. 🦸


⚡ Optimal Approach — Sliding Window for “At Most K Distinct”

Core concept

We’ll create a helper:

int atMostK(vector<int>& nums, int K)

that returns the number of subarrays with at most K distinct integers.

We use a sliding window [left..right] with:

  • unordered_map<int, int> freq to count occurrences of each integer
  • K (we treat it as “allowed distinct remaining” and decrement/increment as needed)
  • For each right, we extend window and if we exceed K distinct, we shrink from the left until valid again.

For each position right, once the window [left..right] has at most K distinct numbers, the number of valid subarrays ending at right is:

right - left + 1

Why? Because all subarrays:

  • [left, right]
  • [left+1, right]
  • [left+2, right]
  • [right, right]

have at most K distinct numbers (since they are smaller windows inside a valid window).

We add these to a running count.


🧩 Walking Through atMostK

Here’s your helper function, annotated:

int atMostK(vector<int>& nums, int K) {
    unordered_map<int, int> freq;
    int left = 0, count = 0;

    for (int right = 0; right < nums.size(); right++) {
        // When we see a new distinct number
        if (freq[nums[right]] == 0) {
            K--;  // we now use one of our allowed distinct slots
        }
        freq[nums[right]]++;

        // If we've used more than K distinct numbers, shrink from the left
        while (K < 0) {
            freq[nums[left]]--;
            if (freq[nums[left]] == 0) {
                K++;  // we freed up a distinct number
            }
            left++;
        }

        // All subarrays ending at 'right' with starting index in [left..right] are valid
        count += (right - left + 1);
    }

    return count;
}

Important subtle point

K is passed by value into atMostK (copied), so modifications inside atMostK don’t affect the original k in subarraysWithKDistinct. This is crucial and totally safe.

Within atMostK, we interpret K as “how many more distinct integers we’re allowed to introduce into the current window.”


🧮 Putting It Together: exactly K = atMost(k) - atMost(k-1)

Your main function:

int subarraysWithKDistinct(vector<int>& nums, int k) {
    return atMostK(nums, k) - atMostK(nums, k - 1);
}

This uses the identity:

# exactly k = # at most k# at most (k − 1)

Done. Clean. Beautifully mathy. 🧡


✅ Polished Full Code (with tiny improvements)

#include <bits/stdc++.h>
using namespace std;

class Solution {
public:
    long long atMostK(const vector<int>& nums, int K) {
        unordered_map<int, int> freq;
        int left = 0;
        long long count = 0;

        for (int right = 0; right < (int)nums.size(); ++right) {
            if (freq[nums[right]] == 0) {
                K--; // we are using one more distinct integer
            }
            freq[nums[right]]++;

            // If we have more than K distinct, shrink window from left
            while (K < 0) {
                freq[nums[left]]--;
                if (freq[nums[left]] == 0) {
                    K++; // freed one distinct type
                }
                left++;
            }

            // Number of subarrays ending at right with at most K distinct
            count += (right - left + 1);
        }

        return count;
    }

    int subarraysWithKDistinct(vector<int>& nums, int k) {
        long long ans = atMostK(nums, k) - atMostK(nums, k - 1);
        return (int)ans;
    }
};

🔍 Example Walkthrough

Let’s walk the classic example:

nums = [1, 2, 1, 2, 3], k = 2
  1. Compute atMostK(nums, 2):

    • We count all subarrays with at most 2 distinct integers.
  2. Compute atMostK(nums, 1):

    • We count all subarrays with at most 1 distinct integer.
  3. Subtract them:

    • Those with exactly 2 distinct = atMost(2) − atMost(1).

I won’t expand every step here (would be long), but if you want, we can do a fully tabulated dry run later.


⏱ Complexity Analysis

  • Time:

    • atMostK is O(n) (each element is visited at most twice: once by right, once by left).
    • We call it twice → O(2n) → O(n) overall.
  • Space:

    • unordered_map<int,int> stores at most k or number of distinct elements in nums, so O(min(n, distinct(nums))).
    • Overall: O(n) in the worst case, but often much less.

🧪 Test Cases

1. Basic

nums = {1,2,1,2,3}, k = 2 -> 4
nums = {1,2,1,3,4}, k = 3 -> ?  // try it by hand

2. Edge Cases

nums = {1,1,1,1}, k = 1 -> 10 (all subarrays)
nums = {1,2,3},   k = 1 -> 3 (each single element)
nums = {1,2,3},   k = 3 -> 1 (whole array)
nums = {},        k = 1 -> 0

3. More Interesting

nums = {1,2,1,3,4}, k = 2
// Subarrays with exactly 2 distinct: [1,2], [1,2,1], [2,1], [1,3], [3,4] etc.

⚠️ Common Pitfalls

  • Forgetting the K < 0 condition in the inner while loop (must be strictly < 0, not <=).
  • Confusing exactly K with at most K — remember to subtract atMost(k-1).
  • Not handling large counts (use long long in heavy test scenarios).
  • Not realizing K is passed by value, and being worried that it’s being “mutated globally” — it’s local to each atMostK call.

🔁 Variations & Related Problems

This pattern is super reusable — MCU-level reusable 😂

  • Strings:

    • “Substrings with Exactly K Distinct Characters” — same logic but with char instead of int.
  • Arrays of other types:

    • Just change unordered_map<int,int> to unordered_map<whatever,int>.
  • At most K distinct but no “exactly”:

    • Directly return atMostK(nums, k).
  • Longest subarray with at most K distinct:

    • Similar sliding window, but tracking max length instead of count.

💡 Tips for Interviews

  • Explicitly state the identity:

    “I’ll compute the number of subarrays with at most K distinct, call it F(K). Then answer = F(K) − F(K−1).”

  • Explain the sliding window invariant:

    “At every step, the window [left..right] contains at most K distinct integers.”

  • Mention amortized O(n) time:

    “Both pointers move forward only and never move back.”

  • If asked about optimization:

    • Point out unordered_map is average O(1); could use map but that’s O(log n)`.

❓ FAQs

Q: Why treat K as mutable inside atMostK? A: It’s a neat trick: instead of maintaining a separate distinctCount, we let K represent how many more distinct integers we’re allowed to include. When we add a new distinct, we do K--; when we remove one completely, we do K++.

Q: Could I implement atMostK by storing a distinctCount variable instead? A: Absolutely. You’d keep distinctCount, increment it when a new key is introduced, decrement when a key’s freq hits 0, and check while (distinctCount > K).

Q: Does this work if nums has negative numbers or large values? A: Yes — values don’t matter for distinct count. unordered_map<int,int> handles any int.