Wednesday, March 16, 2011

Find duplicates

从MITBBS上面看来的brainteaser题目:
====================
发信人: hytt (荷叶田田), 信区: JobHunting

标 题: 两道algorithm电面题(update 答案)
发信站: BBS 未名空间站 (Wed Mar 16 13:31:32 2011, 美东)

两种方法底下有很多人都做出来了。最简单的方法是求array的和,求1-1000的和,求两者差。面试的人给我的algorithm是,假设原来的arry是a,另创一个array b, 把a中的element作b 的index: 例如a10的数是3,那末在b3里存1。每次存入1到b里时都check,如果已经有1存入就是重复的那个了。第二题同理可作。

——————————————————————————

应该算很简单的,我实在是比较水,学艺不精:

1) A big array with a thousand and one elements storing integers from 1 to 1000, not sorted. One number is duplicated. How do you find the duplicated number most efficiently?

2) An array with 1000-1 elements storing integers from 1 to 1000, not sorted. One number is missing in the array. How do you find that missing number in the most efficient way.
====================
荷叶田田的第二个题在NetApp的电面中已经碰到过,就不再赘述了。最简单的方法就是用加减法:先把1到1000连续加起来,再把给定的数组中每个元素的值全部相加,两个结果相减,就得到missing的那个数了。当然如果有两个数missing的话,就要用异或的方法了。

正如荷叶田田所提到的,第一个题也可以用上述的加减法来判断duplicate的数字。不过,如果两个或两个以上的数字出现重复,这个方法就不太好使了。他/她的面试官提到的方法本质上就是利用了一个associative container,比如map或者hash table。

以下的程序可供参考:
#include <iostream>
#include <map>

#define NELEMS(array) (sizeof(array)/sizeof(array[0]))

using namespace std;

int main()
{
    int arr[8] = {3, 2, 7, 4, 4, 6, 1, 5};
    map counters;

    for (int i = 0; i < NELEMS(arr); ++i)
        ++counters[arr[i]];

    for (map::const_iterator it = counters.begin();
          it != counters.end(); ++it)
    {
        cout << it->first << "\t" << it->second << endl;
    }

    return 0;
}
这段代码是将《Accelerated C++》里面Section 7.2里面的例子稍作修改而来。

更全面的总结可以看这里:http://geeksforgeeks.org/?p=7953,其中的Method 2、3、4比较有意思。

Method 2 (Use Count array)

Traverse the array once. While traversing, keep track of count of all elements in the array using a temp array count[] of size n, when you see an element whose count is already set, print it as duplicate.

This method uses the range given in the question to restrict the size of count[], but doesn’t use the data that there are only two repeating elements.

#include <stdio.h>
#include <stdlib.h>

void printRepeating(int arr[], int size)
{
    int *count = (int *)calloc(sizeof(int), (size - 2));
    int i;

    printf("Repeating elements are ");

    for (i = 0; i < size; i++)
    {
        if (count[arr[i]] == 1)
            printf(" %d ", arr[i]);
        else
            count[arr[i]]++;
    }

    free(count);
    return;
}

int main()
{
    int arr[] = {4, 2, 4, 5, 2, 3, 1};
    int arr_size = sizeof(arr)/sizeof(arr[0]);

    printRepeating(arr, arr_size);

    getchar();
    return 0;
}

Time Complexity: O(n)
Auxiliary Space: O(n)

Method 3 (Make two equations)

We have to find two numbers, so two unknowns. We know the sum of n numbers is n(n+1)/2 and product is n!. Make two equations using these sum and product formulas, and get values of two unknowns using the two equations.

Let summation of all numbers in array be S and product be P.
Let the numbers which are being repeated are X and Y.

X + Y = S – n(n+1)/2
XY = P/n!

Using above two equations, we can find out X and Y. For array = 4 2 4 5 2 3 1, we get S = 21 and P as 960.

X + Y = 21 – 15 = 6
XY = 960/5! = 8

X – Y = sqrt((X+Y)^2 – 4*XY) = sqrt(4) = 2

Using below two equations, we easily get X = (6 + 2)/2 and Y = (6-2)/2

X + Y = 6
X – Y = 2

Thanks to geek4u for suggesting this method. As pointed by Beginer , there can be addition and multiplication overflow problem with this approach.

#include <stdio.h>
#include <stdlib.h>>
#include <math.h>

/* function to get factorial of n */
int fact(int n);

void printRepeating(int arr[], int size)
{
    int S = 0; /* S is for sum of elements in arr[] */
    int P = 1; /* P is for product of elements in arr[] */
    int x, y; /* x and y are two repeating elements */
    int D; /* D is for difference of x and y, i.e., x-y */
    int n = size - 2, i;

    /* Calculate Sum and Product of all elements in arr[] */
    for (i = 0; i < size; i++)
    {
        S = S + arr[i];
        P = P*arr[i];
    }

    S = S - n*(n+1)/2; /* S is x + y now */
    P = P/fact(n); /* P is x*y now */

    D = sqrt(S*S - 4*P); /* D is x - y now */

    x = (D + S)/2;
    y = (S - D)/2;

    printf("The two Repeating elements are %d & %d", x, y);
    return;
}

int fact(int n)
{
    return (n == 0)? 1 : n*fact(n-1);
}

int main()
{
    int arr[] = {4, 2, 4, 5, 2, 3, 1};
    int arr_size = sizeof(arr)/sizeof(arr[0]);

    printRepeating(arr, arr_size);

    getchar();
    return 0;
}

Time Complexity: O(n)
Auxiliary Space: O(1)

Method 4 (Use XOR)

Thanks to neophyte for suggesting this method.

The approach used here is similar to method 2 of this post.

Let the repeating numbers be X and Y, if we xor all the elements in the array and all integers from 1 to n, then the result is X xor Y.

The 1’s in binary representation of X xor Y is corresponding to the different bits between X and Y. Suppose that the kth bit of X xor Y is 1, we can xor all the elements in the array and all integers from 1 to n, whose kth bits are 1. The result will be one of X and Y.

void printRepeating(int arr[], int size)
{
    int xor = arr[0]; /* Will hold xor of all elements */
    int set_bit_no; /* Will have only single set bit of xor */
    int i;
    int n = size - 2;
    int x = 0, y = 0;

    /* Get the xor of all elements in arr[] and {1, 2 .. n} */
    for (i = 0; i < size; i++)
        xor ^= arr[i];
    for (i = 1; i <= n; i++)
        xor ^= i;

    /* Get the rightmost set bit in set_bit_no */
    set_bit_no = xor & ~(xor-1);

    /* Now divide elements in two sets by comparing rightmost set
        bit of xor with bit at same position in each element. */
    for (i = 0; i < size; i++)
    {
        if (arr[i] & set_bit_no)
            x = x ^ arr[i]; /* XOR of first set in arr[] */
        else
            y = y ^ arr[i]; /* XOR of second set in arr[] */
    }
    for (i = 1; i <= n; i++)
    {
        if (i & set_bit_no)
            x = x ^ i; /* XOR of first set in arr[] and {1, 2, ...n } */
        else
            y = y ^ i; /* XOR of second set in arr[] and {1, 2, ...n } */
    }

    printf("\n The two repeating elements are %d & %d ", x, y);
    return;
}

int main()
{
    int arr[] = {4, 2, 4, 5, 2, 3, 1};
    int arr_size = sizeof(arr)/sizeof(arr[0]);

    printRepeating(arr, arr_size);

    getchar();
    return 0;
}

No comments:

Post a Comment