Skip to content

Latest commit

 

History

History
242 lines (166 loc) · 6.99 KB

0839._Similar_String_Groups.md

File metadata and controls

242 lines (166 loc) · 6.99 KB

839. Similar String Groups

难度: Hard

刷题内容

原题连接

内容描述

Two strings X and Y are similar if we can swap two letters (in different positions) of X, so that it equals Y.

For example, "tars" and "rats" are similar (swapping at positions 0 and 2), and "rats" and "arts" are similar, but "star" is not similar to "tars", "rats", or "arts".

Together, these form two connected groups by similarity: {"tars", "rats", "arts"} and {"star"}.  Notice that "tars" and "arts" are in the same group even though they are not similar.  Formally, each group is such that a word is in the group if and only if it is similar to at least one other word in the group.

We are given a list A of strings.  Every string in A is an anagram of every other string in A.  How many groups are there?

Example 1:

Input: ["tars","rats","arts","star"]
Output: 2
Note:

A.length <= 2000
A[i].length <= 1000
A.length * A[i].length <= 20000
All words in A consist of lowercase letters only.
All words in A have the same length and are anagrams of each other.
The judging time limit has been increased for this question.

解题方案

思路 1 - 时间复杂度: O(N^2 * W)- 空间复杂度: O(N^2)******

开始没有看到这句话All words in A have the same length and are anagrams of each other., 导致isSimilar函数写的很繁琐

普通的union-find,超时,原因是当我们的A很长,但是word都很短的时候,我们没有必要两两word比较一下,因为A.length * A[i].length <= 20000, 假设说我们的A长度为10000,可能就会超时了,因为双重循环里面还要check similar,然后可能还要union

时间复杂度中的W为A中每个word的长度

class UnionFind(object):
    def __init__(self, n):  # 初始化uf数组和组数目
        self.count = n
        self.uf = [i for i in range(n)]

    def find(self, x):  # 判断节点所属于的组
        while x != self.uf[x]:
            self.uf[x] = self.uf[self.uf[x]]
            x = self.uf[x]
        return self.uf[x]

    def union(self, x, y):  # 连接两个节点
        x_root = self.find(x)
        y_root = self.find(y)
        if x_root == y_root:
            return
        self.uf[y_root] = x_root
        self.count -= 1

    def getCount(self):  # 返回所有组的数目
        return self.count 
    
class Solution:
    def numSimilarGroups(self, A):
        """
        :type A: List[str]
        :rtype: int
        """
        uf = UnionFind(len(A))
        def isSimilar(s1, s2):
            tmp = []
            for i in range(len(s1)):
                if s1[i] != s2[i]:
                    tmp.append(s1[i])
                    tmp.append(s2[i])
            if len(tmp) != 4:
                return False
            return tmp[0] == tmp[-1] and tmp[1] == tmp[2]
        
        for (i1, word1), (i2, word2) in \
                itertools.combinations(enumerate(A), 2):
            if isSimilar(word1, word2):
                uf.union(i1, i2)
        return uf.getCount()

思路 2 - 时间复杂度: O(N^2 * W) or O(N * W^3)- 空间复杂度: O(N^2)******

吸取了上面思路的教训,我们分两种方法来解决,

  • len(A) < len(A[0]) ** 2的时候,即A比较短,采用brute force on every two word. O(N^2 * W)
  • check all possible similar words. O(N * W^3)

这个解法在python2可以过,python3过不了,无语。。

class UnionFind(object):
    def __init__(self, n):  # 初始化uf数组和组数目
        self.count = n
        self.uf = [i for i in range(n)]

    def find(self, x):  # 判断节点所属于的组
        while x != self.uf[x]:
            self.uf[x] = self.uf[self.uf[x]]
            x = self.uf[x]
        return self.uf[x]

    def union(self, x, y):  # 连接两个节点
        x_root = self.find(x)
        y_root = self.find(y)
        if x_root == y_root:
            return
        self.uf[y_root] = x_root
        self.count -= 1

    def getCount(self):  # 返回所有组的数目
        return self.count 
    
class Solution:
    def numSimilarGroups(self, A):
        """
        :type A: List[str]
        :rtype: int
        """
        uf = UnionFind(len(A))
        
        def isSimilar(s1, s2):
            return sum(i != j for i, j in zip(s1, s2)) == 2
        
        lookup = {word: idx for idx, word in enumerate(A)}
        if len(A) < len(A[0]) ** 2: # short A
            for (i, w1), (j, w2) in itertools.combinations(enumerate(A), 2):
                if isSimilar(w1, w2):
                    uf.union(i, j)
        else: # short word
            for idx, w1 in enumerate(A):
                for i, j in itertools.combinations(range(len(A[0])), 2):
                    w2 = w1[:i] + w1[j] + w1[i+1:j] + w1[i] + w1[j+1:]
                    if w2 in lookup:
                        uf.union(idx, lookup[w2])
        return uf.getCount()

思路 3 - 时间复杂度: O(N^2 * W) or O(N * W^3)- 空间复杂度: O(N^2)******

我们换一种union-find, 用字典实现的,然后我们直接union的就是word本身,参考寒神

这里有一个很奇怪的case,第59个test case是这样的:["aaaaaaa", "aaaaaaa", "aaaaaaa"..."aaaaaaa", "aaaaaaa", "aaaaaaa"],

虽然我觉得很困惑,因为anagram的定义是:

An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, 
typically using all the original letters exactly once.

也就是说所有的word都必须different才行,

但是这里为了AC没办法了,

  • 只能在第一行加上一个A = list(set(A))
  • 或者不加A = list(set(A))也可以,但是要把self.count = len(A)改成self.count = len(uf)

beats 24.32%

class Solution:
    def numSimilarGroups(self, A):
        """
        :type A: List[str]
        :rtype: int
        """
        A = list(set(A))
        uf = {word: word for word in A}
        self.count = len(A)
        
        def find(x):
            if x != uf[x]:
                uf[x] = find(uf[x])
            return uf[x]
        
        def union(x, y):
            x_root = find(x)
            y_root = find(y)
            if x_root == y_root:
                return
            uf[x_root] = y_root
            self.count -= 1
        
        def isSimilar(s1, s2):
            return sum(i != j for i, j in zip(s1, s2)) == 2
        
        if len(A) < len(A[0]): # short A
            for w1, w2 in itertools.combinations(A, 2):
                if isSimilar(w1, w2):
                    union(w1, w2)
        else: # short word
            for w1 in A:
                for i, j in itertools.combinations(range(len(A[0])), 2):
                    w2 = w1[:i] + w1[j] + w1[i+1:j] + w1[i] + w1[j+1:]
                    if w2 in uf:
                        union(w1, w2)
        return self.count