Skip to content

Latest commit

 

History

History
199 lines (167 loc) · 7.15 KB

File metadata and controls

199 lines (167 loc) · 7.15 KB

Key Insights

Shared Emails vs Names

// Input
account1 = ["Google", "a", "b", "c"]
account2 = ["Google", "c", "d"]
account3 = ["Google", "x", "y"]
account4 = ["OpenAI", "p", "q"]

// Output
["Google", "a", "b", "c", "d"]  // Merge account1 and account2
["Google", "x", "y"]            // account3 is another account with the same name "Google"
["OpenAI", "p", "q"]            // account4 is another account with the same name "OpenAI"
  • Each email uniquely identifies a person, the same email doesn't belong to multiple people.
  • But names are not unique, multiple people can have the same name.

Intuitions

1. "Common Attribute = Link" Trigger

The rule: "Two accounts definitely belong to the same person if there is some common email to both accounts."

Whenever a problem states that two separate entities become "related", "connected", or "the same group" because they share a specific item, you are looking at a graph problem.

2. Transitive Property

  • Account A has email x.
  • Account B has email x. -> A and B are the same person.
  • Account B also has email y.
  • Account C has email y. -> B and C are the same person.

-> Account A and C are the same person, even though they share no email directly.

The indirect relationship (A is connected to C through B) is transitive, it's the definition of connectivity in graph.

3. Goal: "Clustering" (Connected Components)

If the problem asks you to "group" or "merge" all related items into distinct lists, it's asking for connected components in graph:

  • A component is an isolated island of nodes that are all connected to each other.
  • The problem is asking "How many isolated islands exist? And who are in each island?"

Whenever a problem asks you to group items into isolated clusters based on shared relationships, it's a graph problem.

"Aha!" Moment

  • Input: ["Google", "a", "b", "c"]
  • Graph intuition: "Oh, this input tells me that I can connect a to b, and b to c. I just poll on a, and see what other emails are connected to it. (b and c)

Modeling

In graph problem, we need to decide the nodes and edges of the graph. We treat eaach email as a node. And the problem states: "Two accounts definitely belong to the same person if there is some common email to both accounts." This implies a transitive relationship between the emails:

  1. Acount A has a and b.
  2. Then a and b belongs to the same person.
  3. So there's a connection between a and b.

Union: For each account list: ["Google", "a", "b", "c"], we take the first email as the root, and "union" (connect) the other emails to the root.

account1 = ["Google", "a", "b", "c"]
a -- b
a -- c

Same component: a -- b -- c

account2 = ["Google", "c", "d"]
c -- d

Same component: c -- d

Because a -- b -- c and c -- d, so a -- b -- c -- d is also a same component, they shared the same "parent", "root" or "representative" email. We can use union find to find the "root" of each email.

Then attach the name "Google" to the final group of a -- b -- c.

Union Find

Based on the above intuition, we can use union find to find the "root" of each email.

  1. Create a union find data structure.
  2. Iterate through every account:
    • Union: Within each account, connect the first email to the rest of the emails.
    • Find: Find the "root" of each email.

We could use DFS or BFS in this problem, but union find is more preferred because:

  1. It's efficient to handle dynamic connectivity. (merge groups on the fly)
  2. We don't need the path between two emails, only case if they are connected.
class Solution {
    fun accountsMerge(accounts: List<List<String>>): List<List<String>> {
        val uf = UnionFind<String>()
        val emailToName = mutableMapOf<String, String>()
        for (account in accounts) {
            val name = account.first()
            val firstEmail = account[1]
            for (i in 1 until account.size) {
                val email = account[i]
                emailToName[email] = name
                uf.union(firstEmail, email)
            }
        }
        val components = mutableMapOf<String, MutableList<String>>()
        for (email in emailToName.keys) {
            val parent = uf.find(email)
            components.getOrPut(parent) { mutableListOf() }.add(email)
        }
        val results = mutableListOf<MutableList<String>>()
        for ((parent, emails) in components) {
            val name = emailToName[parent]!!
            emails.sort()
            val nameEmails = mutableListOf<String>(name)
            nameEmails.addAll(emails)
            results.add(nameEmails)
        }
        return results
    }
}

class UnionFind<T> {
    private val parentMap = mutableMapOf<T, T>()
    private val sizeMap = mutableMapOf<T, Int>()

    fun find(x: T): T {
        // If the node is completely new, create it as its own root.
        if (x !in parentMap) {
            parentMap[x] = x
            sizeMap[x] = 1
            return x
        }
        // Path compression
        if (parentMap[x] != x) {
            parentMap[x] = find(parentMap[x]!!)
        }
        return parentMap[x]!!
    }

    fun union(x: T, y: T): Boolean {
        val parentX = find(x)
        val parentY = find(y)

        // Already in the same family tree
        if (parentX == parentY) return false

        // Find the size of root from the two nodes, not nodes `x`, `y` itself.
        val sizeX = sizeMap[parentX] ?: 1
        val sizeY = sizeMap[parentY] ?: 1

        // Attach the smaller tree under the larger tree
        if (sizeX < sizeY) {
            parentMap[parentX] = parentY
            sizeMap[parentY] = sizeX + sizeY
        } else {
            parentMap[parentY] = parentX
            sizeMap[parentX] = sizeX + sizeY
        }
        return true
    }
}

DFS

fun accountsMerge(accounts: List<List<String>>): List<List<String>> {
    val emailNameMap = HashMap<String, String>()
    val graph = HashMap<String, MutableList<String>>()
    for (account in accounts) {
        val name = account.first()
        val source = account[1]
        if (source !in graph) graph[source] = mutableListOf<String>()
        emailNameMap[source] = name
        for (i in 2 until account.size) {
            val to = account[i]
            if (to !in graph) graph[to] = mutableListOf<String>()
            graph[source]!!.add(to)
            graph[to]!!.add(source)
            emailNameMap[to] = name
        }
    }

    val visited = HashSet<String>()
    val ans = mutableListOf<MutableList<String>>()
    for (source in graph.keys) {
        val collected = mutableListOf<String>()
        if (source !in visited) {
            dfs(graph, source, visited, collected)
            collected.sort()
            collected.add(0, emailNameMap[source]!!)
            ans.add(collected)
        }
    }
    return ans
}

private fun dfs(graph: HashMap<String, MutableList<String>>, i: String, visited: HashSet<String>, collected: MutableList<String>) {
    if (i in visited) return
    visited.add(i)
    collected.add(i)
    graph[i]?.forEach { adj ->
        dfs(graph, adj, visited, collected)
    }
}