Skip to content

Latest commit

 

History

History
194 lines (122 loc) · 7.75 KB

File metadata and controls

194 lines (122 loc) · 7.75 KB

Subsets II (subsets without duplicates)🌲✨

Problem statement 📝

Input: array nums (may contain duplicates). Output: list of all unique subsets (each subset is a list of numbers). Order of subsets doesn’t matter. Each subset may be returned in any order, but there must be no duplicate subsets in the result.

Example:

nums = [1,2,2]
Output (any order): [ [], [1], [2], [1,2], [2,2], [1,2,2] ]

Intuition ✨

If nums had no duplicates, generating all subsets is straightforward: at index i either include nums[i] or exclude it (2 choices per element) → 2^n subsets.

With duplicates, the same subset can be produced in multiple ways if we treat identical values as distinct by position. To avoid duplicates we:

  1. Sort nums first — equal numbers are adjacent.
  2. During recursion at a given level (same ind), skip any nums[i] that equals nums[i-1] (and i > ind). That prevents choosing the second occurrence as the first element at this recursion level — thus avoiding duplicate subsets.

This produces subsets in non-decreasing order (because we move left-to-right through the sorted array) and ensures uniqueness.


Brute-force (why naive is bad) 🐢

  • Enumerate all 2^n subsets (e.g., using bitmasks), then use a set of vectors (or serialize subsets) to remove duplicates.
  • Costly: generating 2^n for n up to 20–30 is okay, but deduping via hashing/sets has added overhead. Sorting each subset for canonical form and hashing is extra work. Backtracking with skip is simpler and more efficient.

Optimal approach — Backtracking with duplicate-skip ✅

Key rules:

  • Sort nums.

  • Recurse with an index ind, and maintain current subset ds.

  • At a recursion level, loop i from ind to n-1:

    • If i > ind && nums[i] == nums[i-1], continue; (skip duplicate choices at same level).
    • Include nums[i]ds.push_back(nums[i]), recurse with i+1, then backtrack (pop_back()).
  • Also include the path where you don't pick any more elements — typically by pushing ds when you reach or consider each level. Two common patterns:

    • Push ds at every recursive invocation (gives subsets as you go).
    • Or push only at base case ind == n. Both produce all subsets; pushing at each node gives an iterative-like order.

Either way, the duplicate-skip rule prevents duplicate subsets.

Full idiomatic C++ solution (backtracking)

#include <bits/stdc++.h>
using namespace std;

class Solution {
public:
    vector<vector<int>> subsetsWithDup(vector<int>& nums) {
        vector<vector<int>> res;
        vector<int> ds;
        sort(nums.begin(), nums.end());
        backtrack(0, nums, ds, res);
        return res;
    }

private:
    void backtrack(int ind, const vector<int>& nums, vector<int>& ds, vector<vector<int>>& res) {
        // record current subset (we record at every node so we include partial subsets)
        res.push_back(ds);

        for (int i = ind; i < (int)nums.size(); ++i) {
            // skip duplicates at the same recursion level
            if (i > ind && nums[i] == nums[i - 1]) continue;

            ds.push_back(nums[i]);
            backtrack(i + 1, nums, ds, res);  // move to next index; each element used at most once
            ds.pop_back();
        }
    }
};

Notes:

  • We push_back(ds) at the start of backtrack, so the empty subset is included and every extension is recorded in DFS order.
  • i > ind && nums[i] == nums[i-1] ensures duplicates at the same recursion depth are skipped.
  • Passing nums by const& avoids copies.

Example walkthrough

nums = [1,2,2] → after sorting [1,2,2]:

  • Start ind=0, ds=[] → record [].

  • i=0: pick 1ds=[1] → record [1].

    • recurse ind=1:

      • i=1: pick 2ds=[1,2] → record [1,2].

        • recurse ind=2: i=2: pick 2ds=[1,2,2] → record [1,2,2].
        • backtrack (pop last 2)
      • i=2: i>ind and nums[2] == nums[1] → skip (prevents [1,2] duplication at this level)

  • backtrack, i=1 at top-level: pick 2ds=[2] → record [2].

    • recurse ind=2: i=2: pick 2ds=[2,2] → record [2,2].
  • i=2 at top-level: i>ind && nums[2] == nums[1] → skip.

Result: [[], [1], [1,2], [1,2,2], [2], [2,2]] — unique subsets only.


Complexity analysis 📊

  • Time complexity: O(2^n * n) in worst case. There are up to 2^n subsets (n = nums.size()), and copying/pushing a subset into results takes up to O(n) time. Duplicate-skip prunes many branches when duplicates exist.

  • Space complexity:

    • Recursion depth O(n) (stack + current subset ds).
    • Output size can be O(2^n * n) in worst case (dominates memory usage).
    • Additional overhead: sorting is O(n log n).

Test cases 🧪

  1. Simple with duplicates

    • Input: [1,2,2] Output (order may vary): [[], [1], [2], [1,2], [2,2], [1,2,2]]
  2. All elements same

    • Input: [2,2,2] Output: [[], [2], [2,2], [2,2,2]]
  3. No duplicates

    • Input: [1,2,3] Output: all 2^3 = 8 subsets.
  4. Empty input

    • Input: [] → Output: [[]]
  5. Larger mix

    • Input: [1,2,2,3] — verify duplicates handled and all unique subsets produced.

Tips & tricks ✨

  • Sort first. This is the single most important step for handling duplicates easily.
  • Skip duplicates only at the same recursion level using the i > ind && nums[i] == nums[i-1] guard.
  • Record subsets at every recursion call (push current ds), or push only at base-case — both work. Recording at every node yields the empty set first and then progressively larger subsets.
  • Use ds.reserve(nums.size()) if you want modest micro-optimizations.
  • If you want subsets sorted lexicographically, sort nums and then the generation order will be deterministic (DFS order), but you may need to post-process if strict lexicographic order is required.

Variations & related problems 🔁

  • Subsets (no duplicates) — identical approach without duplicate-skip.

  • Subsets II iterative — iterative approach that uses previous size to avoid duplicates:

    • Maintain start index for each unique value group and only extend subsets formed before this value's first appearance.
  • Combination Sum / Permutations / Partition problems — similar backtracking skeletons but different constraints/choices.

  • Generate unique permutations with duplicates — uses similar sort + skip rules but with different loop structure.

FAQs ❓

Q: Why do we skip when i > ind && nums[i] == nums[i-1]?

A: Because choosing nums[i] at this recursion depth would create the same subsets as choosing the previous identical value nums[i-1]. We only want the first occurrence at this level to avoid duplicates.

Q: Can we avoid sorting?

A: You can avoid sorting by using a hash-set to deduplicate final results, but that is usually slower and more memory-heavy. Sorting plus skip is succinct and efficient.

Q: Where to push the current subset into results — start or base case?

A: Both are valid:

  • Push at every call records current partial subset (includes empty set).
  • Push at base case (ind==n) records when you've considered all elements. Either yields a complete power set when combined with recursive exploration. I prefer pushing at every call for clarity.

Q: Does the algorithm handle negative numbers?

A: Yes— duplicates logic is independent of sign. Sorting works for negatives as well. No pruning based on value is used here.

Q: Complexity concerns for large n?

A: The number of subsets grows exponentially; practical n for exhaustive subset generation is typically ≤ 20–25. For larger n, you need problem-specific constraints or different objectives (e.g., count only, find subset with property, etc.).