Thursday, March 24, 2011

临时抱佛脚

将要迎来到美国之后的第一个onsite interview了,来自一个半导体芯片公司Qualcomm,大公司,很不错。希望不要浪掷机会。

赶紧上mitbbs的JobHunting版去找以前有关Qualcomm的面试题,还真看见一个有用的:
---
发信人: Zhuimeng1314 ( 追梦一生), 信区: JobHunting
标 题: Qualcomm 电面面经
发信站: BBS 未名空间站 (Fri Mar 11 21:31:21 2011, 美东)

问简历
如何测试一个DVD player

第二题
int x=20; int y=35;
x=y+++x++;
y=++y+++x;
问之后x 和y 是多少

第三题
怎样交换两个数而不用temp (XOR)

哎。。简历被问得SB了。。

 
 
发信人: luckyseeker (lucky), 信区: JobHunting
标 题: Re: Qualcomm 电面面经
发信站: BBS 未名空间站 (Fri Mar 11 23:33:28 2011, 美东)

我靠,第一题我用C和java分别run了一下,居然结果不一样。

java的结果(run in eclipse 3.6.1)
x=55, y=36 //第一步
x=56, y=93 //第二步

C的结果(cygwin + gcc)
x=56, y=36 //第一步
x=57, y=94 //第二步

顺便说一下,y=++y+++x;两个都报错,至少要空一格写成:y=++y+ ++x;
觉得java下面,第一步算出来x=20+55 = 55,后面的x++不运行。
但是C下面,x=55之后,x++还要先加再用一次,就变成了56。
跟内存模式有关么?是不是java下面的left side构造了一个新地址for x?



发信人: Zhuimeng1314 ( 追梦一生), 信区: JobHunting
标 题: Re: Qualcomm 电面面经
发信站: BBS 未名空间站 (Sat Mar 12 01:09:42 2011, 美东)

我问用来播放什么,用来干什么。。他不说。。。
不过我倒是没想到遥控器什么的
还有什么是bt测试?我完全不知道。。
盗版盘用英文怎么说?

【 在 HanSolo7 (隼) 的大作中提到: 】
: 如何测试一个DVD player:
: 先询问对方: 用来做什么?播放什么?谁使用?接什么电视?在哪个国家?
: 测试项目: DVD机器本身,遥控器,连接线
: 然后测试各种功能:普通测试,极限测试,bt测试(比如同时操作遥控器和DVD机
: 器上的按钮,盗版盘测试)。。。。



发信人: done (折埋), 信区: JobHunting
标 题: Re: Qualcomm 电面面经
发信站: BBS 未名空间站 (Sat Mar 12 01:10:49 2011, 美东)

以前版上有无数人说过,这++的题不同编译器的结果是不一样的



发信人: done (折埋), 信区: JobHunting
标 题: Re: Qualcomm 电面面经
发信站: BBS 未名空间站 (Sat Mar 12 01:15:42 2011, 美东)

我认为凭自己的理解知道i++和++i的区别去计算结果就好了。
或者如牛人说的,直接答"如果谁敢写这种code就马上炒了他/她"

【 在 Zhuimeng1314 ( 追梦一生) 的大作中提到: 】
: 哦~~那他拿来折磨我。。。
---
对这个帖子,我所关心的是第三个题:如何交换两个数而不用临时变量。

正如楼主说用,用xor可以做到。参见Bit Twidding Hacks上面的文章:http://graphics.stanford.edu/~seander/bithacks.html#SwappingValuesXOR

Swapping values with XOR

#define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))

This is an old trick to exchange the values of the variables a and b without using extra space for a temporary variable.

On January 20, 2005, Iain A. Fleming pointed out that the macro above doesn't work when you swap with the same memory location, such as SWAP(a[i], a[j]) with i == j. So if that may occur, consider defining the macro as (((a) == (b))

(((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))). On July 14, 2009, Hallvard Furuseth suggested that on some machines, (((a) ^ (b)) && ((b) ^= (a) ^= (b), (a) ^= (b))) might be faster, since the (a) ^ (b) expression is reused.

还有一种方法是用加减法:http://graphics.stanford.edu/~seander/bithacks.html#SwappingValuesSubAdd

Swapping values with subtraction and addition

#define SWAP(a, b) ((&(a) == &(b)) \
                                 (((a) -= (b)), ((b) += (a)), ((a) = (b) - (a))))

This swaps the values of a and b without using a temporary variable. The initial check for a and b being the same location in memory may be omitted when you know this can't happen. (The compiler may omit it anyway as an optimization.) If you enable overflows exceptions, then pass unsigned values so an exception isn't thrown. The XOR method that follows may be slightly faster on some machines. Don't use this with floating-point numbers (unless you operate on their raw integer representations).

Sanjeev Sivasankaran suggested I add this on June 12, 2007. Vincent Lefèvre pointed out the potential for overflow exceptions on July 9, 2008

这种方法更易于理解,不过有overflow的危险,且不能用于浮点数的交换。话说回来,异或算法也不是没有缺点。比如当两个要交换的值相同的时候,用这种方法会使之成为零(因为A^A == 0)。

对于这两种算法的证明和分析,可以参考wikipedia上面的链接:http://en.wikipedia.org/wiki/XOR_swap_algorithm

核心在于异或几条类似于加法的性质。
The binary operation XOR over bit strings of length N exhibits the following properties (where denotes XOR):

L1. Commutativity: A ^ B = B ^ A (中文名称:交换律)
L2. Associativity: (A ^ B) ^ C = A ^ (B ^ C) (中文名称:结合律)
L3. Identity exists: there is a bit string, 0, (of length N) such that for any A ^ 0 = A (这个性质中文该怎么讲?)
L4. Each element is its own inverse: for each A, A ^ A = 0. (这个性质和加法就有些区别了)

值得注意的是,加减swap可以看作是异或swap的一个扩展。

Friday, March 18, 2011

Find both the duplicated number and the missing number

How to find the duplicate and missing number in an array of unsorted set of continuous numbers (1 to N)?

Solution from http://anuviswan.blogspot.com/2007/08/missing-and-duplicate-number-algorithm.html:

1. Find RealSum = ( sum of all numbers in array )
2. Find RealProduct = ( product of all numbers in array )
3. Find ExpectedSum = n*(n+1) / 2 where n is number of integers
4. Find ExpectedProduct = n!

Equation 1 : X - Y = (RealSum - ExpectedSum)
Equation 2 : X/Y = (RealProduct / ExpectedProduct)

Solve this equation.

X is the duplicated number
Y is the missing number

这种算法易于理解,且实现很简单,但有overflow的危险。能否用异或的方法解决呢?其实也是可以的。

把这个给定的set里面所有的数都xor一遍,可以看出,duplicate的两个数字必定相互抵消。换句话说,相当于duplicate的数字missing了。

从而我们可以得出结论:如果set里面有duplicate,那么finding duplicate的问题可以等价为finding missing element的问题。如果set里面有一个数字duplicate,那么可以转化为求一个missing number的问题。如果set里面有一个duplicate和一个missing,那么找出它们和找出两个missing numbers是等价的。

Wednesday, March 16, 2011

Find duplicates

从MITBBS上面看来的brainteaser题目:
====================
发信人: hytt (荷叶田田), 信区: JobHunting

标 题: 两道algorithm电面题(update 答案)
发信站: BBS 未名空间站 (Wed Mar 16 13:31:32 2011, 美东)

两种方法底下有很多人都做出来了。最简单的方法是求array的和,求1-1000的和,求两者差。面试的人给我的algorithm是,假设原来的arry是a,另创一个array b, 把a中的element作b 的index: 例如a10的数是3,那末在b3里存1。每次存入1到b里时都check,如果已经有1存入就是重复的那个了。第二题同理可作。

——————————————————————————

应该算很简单的,我实在是比较水,学艺不精:

1) A big array with a thousand and one elements storing integers from 1 to 1000, not sorted. One number is duplicated. How do you find the duplicated number most efficiently?

2) An array with 1000-1 elements storing integers from 1 to 1000, not sorted. One number is missing in the array. How do you find that missing number in the most efficient way.
====================
荷叶田田的第二个题在NetApp的电面中已经碰到过,就不再赘述了。最简单的方法就是用加减法:先把1到1000连续加起来,再把给定的数组中每个元素的值全部相加,两个结果相减,就得到missing的那个数了。当然如果有两个数missing的话,就要用异或的方法了。

正如荷叶田田所提到的,第一个题也可以用上述的加减法来判断duplicate的数字。不过,如果两个或两个以上的数字出现重复,这个方法就不太好使了。他/她的面试官提到的方法本质上就是利用了一个associative container,比如map或者hash table。

以下的程序可供参考:
#include <iostream>
#include <map>

#define NELEMS(array) (sizeof(array)/sizeof(array[0]))

using namespace std;

int main()
{
    int arr[8] = {3, 2, 7, 4, 4, 6, 1, 5};
    map counters;

    for (int i = 0; i < NELEMS(arr); ++i)
        ++counters[arr[i]];

    for (map::const_iterator it = counters.begin();
          it != counters.end(); ++it)
    {
        cout << it->first << "\t" << it->second << endl;
    }

    return 0;
}
这段代码是将《Accelerated C++》里面Section 7.2里面的例子稍作修改而来。

更全面的总结可以看这里:http://geeksforgeeks.org/?p=7953,其中的Method 2、3、4比较有意思。

Method 2 (Use Count array)

Traverse the array once. While traversing, keep track of count of all elements in the array using a temp array count[] of size n, when you see an element whose count is already set, print it as duplicate.

This method uses the range given in the question to restrict the size of count[], but doesn’t use the data that there are only two repeating elements.

#include <stdio.h>
#include <stdlib.h>

void printRepeating(int arr[], int size)
{
    int *count = (int *)calloc(sizeof(int), (size - 2));
    int i;

    printf("Repeating elements are ");

    for (i = 0; i < size; i++)
    {
        if (count[arr[i]] == 1)
            printf(" %d ", arr[i]);
        else
            count[arr[i]]++;
    }

    free(count);
    return;
}

int main()
{
    int arr[] = {4, 2, 4, 5, 2, 3, 1};
    int arr_size = sizeof(arr)/sizeof(arr[0]);

    printRepeating(arr, arr_size);

    getchar();
    return 0;
}

Time Complexity: O(n)
Auxiliary Space: O(n)

Method 3 (Make two equations)

We have to find two numbers, so two unknowns. We know the sum of n numbers is n(n+1)/2 and product is n!. Make two equations using these sum and product formulas, and get values of two unknowns using the two equations.

Let summation of all numbers in array be S and product be P.
Let the numbers which are being repeated are X and Y.

X + Y = S – n(n+1)/2
XY = P/n!

Using above two equations, we can find out X and Y. For array = 4 2 4 5 2 3 1, we get S = 21 and P as 960.

X + Y = 21 – 15 = 6
XY = 960/5! = 8

X – Y = sqrt((X+Y)^2 – 4*XY) = sqrt(4) = 2

Using below two equations, we easily get X = (6 + 2)/2 and Y = (6-2)/2

X + Y = 6
X – Y = 2

Thanks to geek4u for suggesting this method. As pointed by Beginer , there can be addition and multiplication overflow problem with this approach.

#include <stdio.h>
#include <stdlib.h>>
#include <math.h>

/* function to get factorial of n */
int fact(int n);

void printRepeating(int arr[], int size)
{
    int S = 0; /* S is for sum of elements in arr[] */
    int P = 1; /* P is for product of elements in arr[] */
    int x, y; /* x and y are two repeating elements */
    int D; /* D is for difference of x and y, i.e., x-y */
    int n = size - 2, i;

    /* Calculate Sum and Product of all elements in arr[] */
    for (i = 0; i < size; i++)
    {
        S = S + arr[i];
        P = P*arr[i];
    }

    S = S - n*(n+1)/2; /* S is x + y now */
    P = P/fact(n); /* P is x*y now */

    D = sqrt(S*S - 4*P); /* D is x - y now */

    x = (D + S)/2;
    y = (S - D)/2;

    printf("The two Repeating elements are %d & %d", x, y);
    return;
}

int fact(int n)
{
    return (n == 0)? 1 : n*fact(n-1);
}

int main()
{
    int arr[] = {4, 2, 4, 5, 2, 3, 1};
    int arr_size = sizeof(arr)/sizeof(arr[0]);

    printRepeating(arr, arr_size);

    getchar();
    return 0;
}

Time Complexity: O(n)
Auxiliary Space: O(1)

Method 4 (Use XOR)

Thanks to neophyte for suggesting this method.

The approach used here is similar to method 2 of this post.

Let the repeating numbers be X and Y, if we xor all the elements in the array and all integers from 1 to n, then the result is X xor Y.

The 1’s in binary representation of X xor Y is corresponding to the different bits between X and Y. Suppose that the kth bit of X xor Y is 1, we can xor all the elements in the array and all integers from 1 to n, whose kth bits are 1. The result will be one of X and Y.

void printRepeating(int arr[], int size)
{
    int xor = arr[0]; /* Will hold xor of all elements */
    int set_bit_no; /* Will have only single set bit of xor */
    int i;
    int n = size - 2;
    int x = 0, y = 0;

    /* Get the xor of all elements in arr[] and {1, 2 .. n} */
    for (i = 0; i < size; i++)
        xor ^= arr[i];
    for (i = 1; i <= n; i++)
        xor ^= i;

    /* Get the rightmost set bit in set_bit_no */
    set_bit_no = xor & ~(xor-1);

    /* Now divide elements in two sets by comparing rightmost set
        bit of xor with bit at same position in each element. */
    for (i = 0; i < size; i++)
    {
        if (arr[i] & set_bit_no)
            x = x ^ arr[i]; /* XOR of first set in arr[] */
        else
            y = y ^ arr[i]; /* XOR of second set in arr[] */
    }
    for (i = 1; i <= n; i++)
    {
        if (i & set_bit_no)
            x = x ^ i; /* XOR of first set in arr[] and {1, 2, ...n } */
        else
            y = y ^ i; /* XOR of second set in arr[] and {1, 2, ...n } */
    }

    printf("\n The two repeating elements are %d & %d ", x, y);
    return;
}

int main()
{
    int arr[] = {4, 2, 4, 5, 2, 3, 1};
    int arr_size = sizeof(arr)/sizeof(arr[0]);

    printRepeating(arr, arr_size);

    getchar();
    return 0;
}

Friday, March 4, 2011

NetApp的Phone Interview惨败(五)

9. 一个很大的整型数组(length possibly reaches millions),里面有duplicate,用怎样的方法可以快速地去除这些duplicate?

我感觉应该先把这个数组sort一下,但是之后该怎么办呢?完全没有思路。当时只提到了用quicksort排序先,没想到面试官接着又问quicksort的工作原理。

亏了我前晚在看1337coder的blog关于recursion的文章的时候顺带查了一下The C Programming Language, 2nd Edition, by K&R,大概是第87页上谈到了quicksort()的实现。The Practice of Programming by K&R的第32~34页上也谈到了其实现,还给出了算法的量级。

我就和他大致讲了一下quicksort的实现思路。不料他又继续追问这个算法的量级,这个TCPL没有提到,TPOP提到了,但我没有仔细看。结果随口报了个O(logN)的量级,他没做声。后来我翻书一查,才发现是O(NLogN)的量级。操,又挨了一刀。

OK,转入正题。究竟该怎样做才可以remove duplicates呢?下面是我在mitbbs上的求助和高手的应助:
---
一个很大的整型数组,里面有duplicate,用怎样的方法可以快速地去除这些duplicate?我感觉应该先把这个数组sort一下,但是之后该怎么办呢?完全没有思路。

by johny

bitmap比较好,当然sort也可以做。这个有个问题就是,可能in-place remove 所用的两个指针所指的位置不能同时装入内存,这个问题还没想好如何解决。

by Chevy

bitmap不是图形格式吗?能把数组变成一个bitmap?不太懂啊。能不能给段代码的例子?

by johny

bitmap你可以看看programming pearls的第一章

by Chevy

还有其他的方法吗?

by johny

我知道的就这些了。如果本来已经排序好了的话,可以用binary search,在O(logN)时间内找到一个。

by Chevy

binary search倒是可以用,但找到以后,还要把duplicate从这个大数组里面剔除,我所能想到的只有remove之后,再依次把后面的element向前挪一位依次填坑,但似乎这样太影performance。

当时面试官给的提示是再新建一个数组,似乎他的意思是把没有duplicate的element往新数组里面扔。而且他还提到的hashtable,这个地方该怎样应用hashtable呢?

by johny

你每次遇到一个数,先判断已经在不在hashtable中,在的话说明重复;不在的话,插入hashtable。

by Chevy

Remove duplicates 的 Sort 的方法如下:

/* Remove the duplicates in place, and returns
   the new size of the array. Assumes the array
   is already sorted. */
int removeDuplicates(int A[], int n) {
    int j = 0;

    for (int i = 0; i < n; i++) {
        if (A[i] != A[j])
            A[++j] = A[i];
    }

    return j+1;
}

by ihasleetcode

已经验证过了,确实可以实现in place remove duplicates. 顺便问一下,这段code在你的blog里面出现过吗?这是个有名的算法吗?

当时面试官给的提示是再新建一个数组,似乎他的意思是把没有duplicate的element往新数组里面扔。他还提到要用hashtable的方法,这个地方该怎样应用hashtable呢?据他说用hashtable的话,就用不着先对数组进行排序了。

我对hashtable没什么概念,只知道是个key-value pair的table。似乎还要定义一个hash function来建立从key到value之间的mapping关系。

如果用面试官所说的方法,该怎么解呢?可否给个example code?

by johny
---
还没有完全解开,但已经有了大致的思路。其一是参照Programming Pearl第一章去用所谓的“bitmap”;其二是“每次遇到一个数,先判断已经在不在hashtable中,在的话说明重复;不在的话,插入hashtable”,但关于hash的知识还得补;其三为in-place removal as 1337 exampled(当然要先quicksort一把)。


后续的帖子:
---
不要sorting的remove duplicate:

const int REMOVE = -1000;

int removeduplicate(int a[], int n) {
    unordered_map hash;

    for(int i = 0; i < n; i++) {
        if(hash.find(a[i]) != hash.end()) {
            a[i] = REMOVE;
        } else {
            hash[a[i]] = 1;
        }
    }

    int j=0;
    int i=0;

    while (i < n) {
        while (a[i] == REMOVE) {
            i++;
        }
        a[j] = a[i];
        i++;
        j++;
    }

    return j;
}

by icn9

谢谢啊。这两天我也在学习hashtable的方法,刚好看到C++里面有个map可以用。调试了一下你的程序,稍作调整后的code如下:

#include <iostream>
//#include <unordered_map>
#include <map>

using namespace std;

const int REMOVE = -1000;

int remove_duplicate(int a[], int n) {
    int i = 0, j = 0;
    //unordered_map hash;
    map hash;

    for (i = 0; i < n; ++i) {
        if (hash.find(a[i]) != hash.end()) {
            a[i] = REMOVE;
        } else {
            hash[a[i]] = 1;
        }
    }

    j = 0;
    i = 0;

    while (i < n) {
        while (a[i] == REMOVE) {
            i++;
        }
        a[j] = a[i];
        i++;
        j++;
    }

    return j;
}

int main()
{
    int arr[8] = {3, 2, 7, 4, 4, 6, 1, 5};
    int len = 0, i = 0;

    len = remove_duplicate(arr, sizeof(arr));
    cout << "The new size of arr[] is " << len << endl;

    for (i = 0; i < len; ++i)
        cout << arr[i] << "\t";
    cout << endl;

    getchar();
    return 0;
}

我在windows xp下的用的是Dev-C++,似乎比较老,因此而不支持,因此我把它改成map了。但编译通过后,跑出来的结果很奇怪,见附图。能帮我看一下是什么导致的吗?


by johny

You passed a wrong size of the array and accessed memory out of boundary:

    int arr[8] = {3, 2, 7, 4, 4, 6, 1, 5};
    ...
    len = remove_duplicate(arr, sizeof(arr));

When sizeof is applied to an array, the result is the size in bytes of the array in memory. Thus, it should be changed to:

    len = remove_duplicate(arr, sizeof(arr)/sizeof(int));

by recursive

多谢,确实是这个问题。我的疏忽。改成remove_duplicate(arr, sizeof(arr)/sizeof(arr[0]))了,编译通过,运行正常。多谢指出这个错误。

不过,Dev-C++似乎不支持unordered_map。这个模板类不在C++的标准库里面吗?但是我用了1337coder的coding panel,也是有通不过这个模板类的编译。

为什么呢?

by johny

你的问题不在于map和unordered_map的区别。
不过如果你要用到unordered_map,可以这样写:

    #include <tr1/unordered_map>
    ...
    tr1::unordered_map hash;

同时这两个东西是有区别的,你应该要能理解他们的不同。

by recursive

: 你的问题不在于map和unordered_map的区别。

那倒是。用unordered_map的想法是不想用map对数组重新排序。节省一点CPU开销,出于performance的考虑。不过对这个测试程序来说,也许用不着。

: 不过如果你要用到unordered_map,可以这样写:
:     #include <tr1/unordered_map>
:     ...
:     tr1::unordered_map hash;
: 同时这两个东西是有区别的,你应该要能理解他们的不同。

谢谢提醒啊。加上你提示的改变之后,1337的coding panel可以通过,并顺利跑起来了。但是Dev-C++的编译还是报错。看来应该是编译器有点老了(2005年最后一个release)。

你的意思是unordered_map与tr1::unordered_map有区别吗?

by johny
---
这个问题到这里基本告一段落,特别感谢Chevy、1337、icn9与recursive同学。

NetApp的Phone Interview惨败(四)

8. 给我9999个数字,数值连续从1到10000,但缺了一个。要求用一种非常简洁的方法得出缺了哪个数字?

完全不会,面试官的答案是从1连续加到10000,得到A;再把给定的9999个数字加起来,得到B;然后A - B即可。

他后来又说,如果是9998个数字,数值从1到10000,缺两个。这种情况该怎么处理?

以下是我在mitbbs上求助和高手的应助:
---
给我9999个数字,数值连续从1到10000,但缺了一个。要求用一种非常简洁的方法得出缺了哪个数字?

by johny

方法很多:
a. 全部加起来,看缺多少,就是哪个数
b. 1xor2xor...10000,然后继续xor给的9999个数字

by Chevy

这第一个方法就是面试官后来跟我说的。不过他后来又说,如果是9998个数字,数值从1到10000,缺两个,加法就貌似不行了。这种情况下,该用什么方法呢?

by johny

两个的话,可以解方程:
x+y+其他9998个数 = 1+...+10000
x^2+y^2+...=1^2+...+10000^2

by Chevy

两个仍然用xor
不然很可能溢出

by fzblg

不明白这个xor解法的原理是什么,可以简单讲一下或给个参考文献吗?

by johny

xor有个性质就是,对于一个数a,a xor a = 0
所以用这个性质其他9999个数都消掉了。

by Chevy

没找到,我试着讲一下:仍然按照Chevy的方法做。

如果是一个数:最后的结果就是M。
如果是两个数:结果就是M1 xor M2。

从这个结果中挑一个bit为1的,就可以把原数组划分为两组。missing numbers分别在这两个组中。

然后,按照1个数的方法再分别做一遍。据说这个思路可以扩展到3个,4个数去。

有个类似的http://geeksforgeeks.org/?p=2457

by fzblg
---
从这些帖子,我可以理解一个数的解法。但对于两个数的解法,还是有些迷惑。



去到fzblg提供的链接,仔细读了一番,终于基本搞明白了。

Given an array in which all numbers except two are repeated once. (i.e. we have 2n+2 numbers and n numbers are occurring twice and remaining two have occurred once). Find those two numbers in the most efficient way.

Method 1(Use Sorting)
First sort all the elements. In the sorted array, by comparing adjacent elements we can easily get the non-repeating elements. Time complexity of this method is O(nLogn).

Method 2(Use XOR)
Let x and y be the non-repeating elements we are looking for and arr[] be the input array. First calculate the XOR of all the array elements.

        xor = arr[0]^arr[1]^arr[2].....arr[n-1]

All the bits that are set in xor will be set in one non-repeating element (x or y) and not in other. So if we take any set bit of xor and divide the elements of the array in two sets – one set of elements with same bit set and other set with same bit not set. By doing so, we will get x in one set and y in another set. Now if we do XOR of all the elements in first set, we will get first non-repeating element, and by doing same in other set we will get the second non-repeating element.

Let us see an example.
        arr[] = {2, 4, 7, 9, 2, 4}
1) Get the XOR of all the elements.
        xor = 2^4^7^9^2^4 = 14 (1110)
2) Get a number which has only one set bit of the xor.
    Since we can easily get the rightmost set bit, let us use it.
        set_bit_no = xor & ~(xor-1) = (1110) & ~(1101) = 0010
    Now set_bit_no will have only set as rightmost set bit of xor.
3) Now divide the elements in two sets and do xor of
    elements in each set, and we get the non-repeating
    elements 7 and 9. Please see implementation for this
    step.

Implementation:

#include <stdio.h>
#include <stdlib.h>

/* This function sets the values of *x and
   *y to non-repeating elements in an array
   arr[] of size n */
void get2NonRepeatingNos(int arr[], int n, int *x, int *y)
{
  int xor = arr[0]; /* Will hold xor of all elements */
  int set_bit_no;  /* Will have only single set bit of xor */
  int i;
  *x = 0;
  *y = 0;  

    /* Get the xor of all elements */
  for(i = 1; i < n; i++)
    xor ^= arr[i];

  /* Get the rightmost set bit in set_bit_no */
  set_bit_no = xor & ~(xor-1);

    /* Now divide elements in two sets by comparing
     rightmost set bit of xor with bit at same
     position in each element. */
  for (i=0; i < n; i++)
  {
    if(arr[i] & set_bit_no)
      *x = *x ^ arr[i]; /* XOR of 1st set */
    else
      *y = *y ^ arr[i]; /* XOR of 2nd set */
  }


/* Driver program to test above function */
int main()

  int arr[] = {2, 3, 7, 9, 11, 2, 3, 11};
//johny.0
  /* if we declare "int x" and "int y" here, and
     call "get2NonRepeatingNos(..., &x, &y)", we
     would not need to free them later */
//johny.1
  int *x = (int *)malloc(sizeof(int));
  int *y = (int *)malloc(sizeof(int));

  get2NonRepeatingNos(arr, 8, x, y); 
  printf("The non-repeating elements are %d and %d",
         *x, *y); 
  getchar();

//johny.0
  free(x);
  free(y);

  return 0; 
//johny.1
}

Time Complexity: O(n)
Auxiliary Space: O(1)


Can some1 explain me the logical part of how it works? I find it difficult to analyse it.

by nn

XOR of two same numbers results in 0(000..00).

XOR of two different numbers x and y results in a number which contains set bits at the places where x and y differ. So if x and y are 10...0100 and 11...1001, then result would be 01...1101.

So the idea is to XOR all the elements in set. In the result xor, all repeating elements would nullify each other. The result would contain the set bits where two non-repeating elements differ.

Now, if we take any set bit of the result xor and again do XOR of the subset where that particular bit is set, we get the one non-repeating element. And for other non-repeating element we can take the subset where that particular bit is not set.

We have chosen the rightmost set bit of the xor as it is easy to find out.

by Sandeep


这个帖子和里面的code基本把要点都讲到了,非常好。其实,昨天面试官出的题目本质上就是这个题。设想9998个数的数值是1到10000,缺了两个。然后我再补上连续的1到10000,原来找漏的问题不就是变成找出两个non-repeated numbers了吗?


对于缺两个数的情形,程锦后来提出一个巧妙的算法:在1~10000中,大致找到一个中点,比如5000这个数,分别计算1+...+5000 = A;5001+...+10000 = B。接着再把给定的9998个数分成两堆,一堆是1到5000,另一堆是5001到10000,分别求两堆的和,设为C和D。然后A-B,再C-D,如果两个差俱不为0,那么说明漏掉的数一边一个,得出的两个差值就是漏掉的两个数。如果任何一个差为0,就说明两个漏掉的数在另一堆里面。假设是在5001到10000这一堆里面,则下一步为在这一堆里面找一个中点,再重复上面的步骤。这样,总有一个时刻,会有两个差都不为0的时候。

这是个类似binary search的解法。虽然程锦没有CS的背景,可她有一颗比我聪明得多的脑袋,呵呵。

NetApp的Phone Interview惨败(三)

6. CIFS SMB protocol里面,当要read big chunks of data的时候,client端send出去的message type是什么?

这个我也不会。以前在工作中都是用Wireshark检查SMB的network trace,而且主要是关注authentication这块(所用的msg type似乎是SESSION SETUP ANDX REQUEST ),很少bug是发生在data transfer中的,所以也未曾关注过其msg type。不过,只要手上有一个SMB的trace,用Wireshark应该很容易看得出来的。

依稀记得这个msg type是“User Data”?不确切,网络上也找不到这方面的准确资料。留待以后手上有trace再说吧。Not a critical problem, after all.


7. 一个storage server的运行过程中,突然出现stop responding的现象。如果不看code,不看GDB的trace和应用程序的log,该怎样快速诊断出可能的原因?

当时面试官和我说的是这种问题,第一个想法应该是不是disk IO出问题了?也有可能是database的操作出了问题,因为database的操作和读写硬盘紧密相关。不过我想,不看log,不看trace,怎么能断定是disk IO的问题?莫非去看硬盘指示灯?

因为电面表现太差,也没好意思去详细问这个问题的思路。

Thursday, March 3, 2011

NetApp的Phone Interview惨败(二)

4. Perl里面,$_的含义是什么?

说实话,我的Perl都是自学的,工作中基本用不到。不过既然写进简历里面(号称精通Perl),就免不了要被问到。而且一问就问倒了。

以下是从Perl in A Nutshell的Section 4.4.1里面抄来的:

The most common special variable is $_, which contains the default input and pattern-searching string. For example:

foreach ('hickory','dickory','doc') {
        print;
}

The first time the loop is executed, "hickory" is printed. The second time around, "dickory" is printed, and the third time, "doc" is printed. That's because in each iteration of the loop, the current string is placed in $_ and is used by default by print. Here are the places where Perl will assume $_, even if you don't specify it:
  • Various unary functions, including functions such as ord and int, as well as the all file tests (-f, -d), except for -t, which defaults to STDIN.
  • Various list functions such as print and unlink.
  • The pattern-matching operations m//, s///, and tr/// when used without an =~ operator.
  • The default iterator variable in a foreach loop if no other variable is supplied.
  • The implicit iterator variable in the grep and map functions.  
  • The default place to put an input record when a line-input operation's result is tested by itself as the sole criterion of a while test (i.e., ). Note that outside of a while test, this does not happen.
说实话,还是看不太明白,毕竟没有真正在工作中应用过,印象不深。Interviewer看我这个粗浅的问题都答不上来,就没再往下问Perl的东西了。


5. C++里面,virtual class的作用是什么?

操,我只听说过virtual function,可从来没听说过virtual class,这下抓瞎了。听他的意思好象是用在inheritance里面,防止出现多个instance?我记不太清楚他说的答案了。

万能的Google又一次找到了答案:http://publib.boulder.ibm.com/infocenter/comphelp/v8v101/topic/com.ibm.xlcpp8a.doc/language/ref/cplr135.htm

其实就是多重继承时,若中间层的class共有同一个祖先,则定义中间层级的class时,将他们声明为继承virtual祖先class就可以防止底层class(从多个中间层的class继承下来)拥有多个祖先class的copy。其实,Bjarne Stroustrup大神的C++ Bible之Section 15.2.4里面也谈到了这个问题。

NetApp的Phone Interview惨败(一)

1. What is the difference between a mutex and a spinlock?

当头第一棒,一下子就把我捶晕了。mutex我当然明白了,就是个binary semaphore。但是spinlock我从没接触过。

在网上搜索一番,总算找到了答案:
http://www.alexonlinux.com/pthread-spinlocks
http://www.alexonlinux.com/pthread-mutex-vs-pthread-spinlock

According to the author, spinlock is more effective (than mutex) in term of CPU consumption and speed.

Things to note:

(1) Spinlocks perform better than mutexes in this particular setup, on multi-processor computer.
(2) Spinlocks are totally useless on uni-processor computers. This is due to a nature of spinlocks.


2. GDB调试程序的时候,attach之后,怎样显示thread information?

这个我也傻眼了。以前用GDB的时候最多就是用backtrace(bt)命令检查一下coredump里面的信息,还真没用过attach,更不要说显示线程信息了。

还是Google帮忙:
http://sourceware.org/gdb/onlinedocs/gdb/Threads.html
http://developer.apple.com/library/mac/#documentation/DeveloperTools/gdb/gdb/gdb_5.html
http://stackoverflow.com/questions/47701/is-there-a-way-to-attach-a-debugger-to-a-multi-threaded-python-process

The most basic command is "info threads".


3. In network socket programming, what are the basic APIs?

我提到先用socket()做initialization,再分配address,whether it is IPv4 or IPv6。接着在server端是listen() and accept(),在client端,开始也是initialization,接着应该是某个send request的function,但是具体我记不清了,以前在Richard Stevens的Unix Network Programming, 2nd Edition, Volume 1里面见过。

Interviewer后来说client端send出request的function是connect(),server端还缺少一个bind()。

考这个题我稍微有点意见,何必问这种书上一查就有的东西呢?当然,如果说从这个问题察看我网络编程的熟练程度,我也无话可说。

还是查一下Stevens的书吧,先看server的例子:

1 #include "unp.h"

2 int
3 main(int argc, char **argv)
4 {
5     int listenfd, connfd;
6     pid_t childpid;
7     socklen_t clilen;
8     struct sockaddr_in cliaddr, servaddr;

9     listenfd = Socket (AF_INET, SOCK_STREAM, 0);

10   bzero(&servaddr, sizeof(servaddr));
11   servaddr.sin_family = AF_INET;
12   servaddr.sin_addr.s_addr = htonl (INADDR_ANY);
13   servaddr.sin_port = htons (SERV_PORT);

14   Bind(listenfd, (SA *) &servaddr, sizeof(servaddr));

15   Listen(listenfd, LISTENQ);

16   for ( ; ; ) {
17       clilen = sizeof(cliaddr);
18       connfd = Accept(listenfd, (SA *) &cliaddr, &clilen);

19       if ( (childpid = Fork()) == 0) { /* child process */
20           Close(listenfd); /* close listening socket */
21           str_echo(connfd); /* process the request */
22           exit (0);
23       }
24       Close(connfd); /* parent closes connected socket */
25   }
26 }

1 #include "unp.h"

2 void
3 str_echo(int sockfd)
4 {
5     ssize_t n;
6     char buf[MAXLINE];

7   again:
8     while ( (n = read(sockfd, buf, MAXLINE)) > 0)
9         Writen(sockfd, buf, n);

10   if (n < 0 && errno == EINTR)
11       goto again;
12   else if (n < 0)
13       err_sys("str_echo: read error");
14 }

摘自Section 5.2 and 5.3,code都是self explanatory的,不用多说什么。

再来看看client端的code:

1 #include "unp.h"

2 int
3 main(int argc, char **argv)
4 {
5     int sockfd;
6     struct sockaddr_in servaddr;

7     if (argc != 2)
8         err_quit("usage: tcpcli ");

9     sockfd = Socket(AF_INET, SOCK_STREAM, 0);

10   bzero(&servaddr, sizeof(servaddr));
11   servaddr.sin_family = AF_INET;
12   servaddr.sin_port = htons(SERV_PORT);
13   Inet_pton(AF_INET, argv[1], &servaddr.sin_addr);

14   Connect(sockfd, (SA *) &servaddr, sizeof(servaddr));

15   str_cli(stdin, sockfd); /* do it all */

16   exit(0);
17 }

1 #include "unp.h"

2 void
3 str_cli(FILE *fp, int sockfd)
4 {
5     char sendline[MAXLINE], recvline[MAXLINE];

6     while (Fgets(sendline, MAXLINE, fp) != NULL) {
7         Writen(sockfd, sendline, strlen (sendline));

8         if (Readline(sockfd, recvline, MAXLINE) == 0)
9             err_quit("str_cli: server terminated prematurely");

10       Fputs(recvline, stdout);
11   }
12 }

摘自Section 5.4 and 5.5,也不用多作解释。没啥办法,手不熟,难怪会被问倒。

Wikipedia上面有个简洁的介绍:http://en.wikipedia.org/wiki/Berkeley_sockets,可供快速参考。

后来interviewer又问我,如果server有两个port可以accept connection from client,应该用怎样的function?我提到select(),他认可了,没有追究怎样实现的细节。问了我也不会。

这篇文章(http://stackoverflow.com/questions/3655053/python-listen-on-two-ports)谈到一些,不过细节还不是很深入。UNP第1卷第6章I/O Multiplexing也可以参考。

Basically, the idea is to create two sockets with each port, and listen on these two sockets. Then select() will let you know which socket is ready and when, so you can accept() appropriately.

这个Unix Socket FAQ论坛(http://developerweb.net/viewtopic.php?pid=23096)里面的mlampkin也简明扼要地列出了以下的要点:

The select function is designed to handle multiple sockets so the only changes you would need to make are:

* Create 3 more server sockets ( bound / listening ) with the new addresses - port pairs.
* Change the select nfds parameter from 1 to 4, add the new server sockets to the readfds group and potentially the errorfds group.
* When you call select and it returns, just iterate thru the readfds group with FD_ISSET to determine which socket actually is accepting a new connection.

While you could add a lot more complexity, in simplest terms that really should be about it...

by Michael