Thursday, October 18, 2012

Interview Question at LSSiData

There is a huge database listing of around 100 million records, which is divided into some 200 plain text files, with the record in csv format.

Given 5000 telephone numbers, how to retrieve the records matching those telephone numbers?

Brute-force is what comes up first:
Read the csv files one line at a time (each record is a line in the csv file), and get its telephone number field. Then compare the field with the given telephone number. If there is a match, print the record into that file. Otherwise, discard it.

Shortcoming: Need 5000 passes of the 100 million records, which is a performance nightmare.

How to improve the performance? Hash-table is the answer:
Organize the 5000 telephone numbers into a hash-table (hash the telephone numbers into a table, which is actually an array of keys computed from hashing the telephone numbers). Then, as we examine the record and find a match to the given telephone number, we just append the record to the linked list associated with the hash key. This way, we need only to iterate/traverse the 100 million records for once.

For details of hash-table, please refer to Section 2.9 and 3.3 of "The Practice of Programming" by Brian Kernighan and Rob Pike.

Thursday, March 8, 2012

整理一下旧照片

2011年4月6日在NC Raleigh附近的State Park里面教会租了个烧烤棚搞活动,和大地的小儿子照的照片,猴子也凑上来作鬼脸。

肉乎乎的,真可爱啊。



2011年4月11日前后在Atlanta教会活动接待家庭照的照片,这是接待家庭的儿子。

我左边的就是这次接待我们的兄弟,名叫岳望军,非常感谢他们的接待。

Wednesday, February 29, 2012

二月份的最后一天

今天比较开心,因为独立解决了一个有关OpenLDAP的问题,使得ssl/tls connection中certificate的选项LDAP_OPT_X_TLS_REQUIRE_CERT对于每个connection/binding单独有效,而不是设置后覆盖整个process的life span,很费了一番时间在网上搜索,最后在这里找到答案:http://www.mailinglistarchive.com/postfix-users@postfix.org/msg57688.html

看来我平时在网上寻找资源时练就的搜索工夫没有白费啊,在工作中也能得到应用,嘿嘿。

贴张贝贝的照片,这是贝贝在Lexmark Building 082前面的草坪上和我踢球间隙的抓拍:

Saturday, January 21, 2012

兔年盘点

总体来说,这过去的一年还不错,虽然说在家呆了近一年,可也享受和老婆、儿子在一起的时光,享受平淡的幸福。

这期间,我还完成了《围棋死活辞典》(上册,赵治勋编著)的SGF第二版的录入,比起第一版要精细了很多,认真研读的话,是涨棋的好帮手。

还上了很多读书论坛,下载了不少好书,也结识到不少书友,虽然说这么多书不知道哪一天才能静下心来慢慢读。但这种“我有、我可以”的感觉着实不错。而且和书友的交流也是让人愉悦的。

最重要的是,在接近年末的时候找到了一份工作,终于有收入啦。可能是飞扬围棋论坛上面棋友的祝福起了作用吧。年初的时候完成《围棋死活辞典》的录入并发布在飞扬论坛上时,谈到没有工作,所以有时间来完成这份繁琐的任务。很多棋友就祝我在兔年找到一份工作,没想到还真的应验了,百感交集。

小宝在Lexington和爸爸补过生日