Friday, April 29, 2011

Facbook Algorithm

The Social Network (2010) - FaceMash Algorithm





I saw Social Network three times in 1 week. Not for entertainment. Not because I had nothing better to do. But because I wanted to understand the math and computer science fundae used in the movie. I would like to discuss one particularly interesting scene from the movie.

You may remember Mark inviting his friend Eduardo to give him his chess algorithm at the beginning of the movie (Mark was drinking, blogging and hacking simultaneously and creating Facemash.com). You may also remember the scribbles on the window:




and



What is this? This is actually the math behind Elo Rating System. Elo rating system is a method for calculating the relative skill levels of players in two-player games such as chess. It is named after its creator Arpad Elo, a Hungarian-born American physics professor.

As explained in the movie, Facemash was quite simple. Not unlike hotornot.com, students went to a page and 2 random images of girls were picked and presented to them. The students then had to click on the hottest girl presented to them and then another set of two girls would be presented asking the students to repeat the same actions they had done. The difference with hotornot was that the girls presented were all Harvard students. In other words, the students were rating girls of Harvard based on their looks (You can imagine why Mark got into trouble).

The algorithm used - The Elo Rating system - insured a fair rating and ranking of the girls. A hot girl A winning over hot girl B girl would gain more points than winning (or being picked) against ugly girl C. Same goes for the following: ugly girl C wins over ugly girl D. ugly girl C gains some points in her general ranking. If ugly girl C wins over hot girl A then ugly girl C gains more points because the ranking of hot girl A is much higher than ugly girl D. The previous scenario is roughly what the algorithm implemented by Mark was doing. It was somewhat insuring a level of fairness despite the misogynistic nature of the product.

In today’s society, the Elo Rating system is used by many rating and ranking system to predict the outcome of matches but also insure a level of fairness between teams of different levels playing against each others. FIFA uses it to rank football teams.

You can read more about the system on wikipedia page(http://en.wikipedia.org/wiki/Elo_rating_system).

Wednesday, April 13, 2011

Problem Set-3 Dynamic Programming

Dynamic Programming Practice Problems
This site contains a collection of practice dynamic programming problems and their solutions. The problems listed below are also available in a pdf handout. To view the solution to one of the problems below, click on its title. To view the solutions, you'll need a machine which can view Macromedia Flash animations and which has audio output. If you want, you can also view a quick review from recitation on how to solve the integer knapsack problem (with multiple copies of items allowed) using dynamic programming.
Problems:

1. Maximum Value Contiguous Subsequence. Given a sequence of n real numbers A(1) ... A(n), determine a contiguous subsequence A(i) ... A(j) for which the sum of elements in the subsequence is maximized.

2. Making Change. You are given n types of coin denominations of values v(1) < v(2) < ... < v(n) (all integers). Assume v(1) = 1, so you can always make change for any amount of money C. Give an algorithm which makes change for an amount of money C with as few coins as possible. [on problem set 4]

3. Longest Increasing Subsequence. Given a sequence of n real numbers A(1) ... A(n), determine a subsequence (not necessarily contiguous) of maximum length in which the values in the subsequence form a strictly increasing sequence. [on problem set 4]

4. Box Stacking. You are given a set of n types of rectangular 3-D boxes, where the i^th box has height h(i), width w(i) and depth d(i) (all real numbers). You want to create a stack of boxes which is as tall as possible, but you can only stack a box on top of another box if the dimensions of the 2-D base of the lower box are each strictly larger than those of the 2-D base of the higher box. Of course, you can rotate a box so that any side functions as its base. It is also allowable to use multiple instances of the same type of box.

5. Building Bridges. Consider a 2-D map with a horizontal river passing through its center. There are n cities on the southern bank with x-coordinates a(1) ... a(n) and n cities on the northern bank with x-coordinates b(1) ... b(n). You want to connect as many north-south pairs of cities as possible with bridges such that no two bridges cross. When connecting cities, you can only connect city i on the northern bank to city i on the southern bank. (Note: this problem was incorrectly stated on the paper copies of the handout given in recitation.)

6. Integer Knapsack Problem (Duplicate Items Forbidden). This is the same problem as the example above, except here it is forbidden to use more than one instance of each type of item.

7. Balanced Partition. You have a set of n integers each in the range 0 ... K. Partition these integers into two subsets such that you minimize |S1 - S2|, where S1 and S2 denote the sums of the elements in each of the two subsets.

8. Edit Distance. Given two text strings A of length n and B of length m, you want to transform A into B with a minimum number of operations of the following types: delete a character from A, insert a character into A, or change some character in A into a new character. The minimal number of such operations required to transform A into B is called the edit distance between A and B.

9. Counting Boolean Parenthesizations. You are given a boolean expression consisting of a string of the symbols 'true', 'false', 'and', 'or', and 'xor'. Count the number of ways to parenthesize the expression such that it will evaluate to true. For example, there is only 1 way to parenthesize 'true and false xor true' such that it evaluates to true.

10. Optimal Strategy for a Game. Consider a row of n coins of values v(1) ... v(n), where n is even. We play a game against an opponent by alternating turns. In each turn, a player selects either the first or last coin from the row, removes it from the row permanently, and receives the value of the coin. Determine the maximum possible amount of money we can definitely win if we move first.

11. Two-Person Traversal of a Sequence of Cities. You are given an ordered sequence of n cities, and the distances between every pair of cities. You must partition the cities into two subsequences (not necessarily contiguous) such that person A visits all cities in the first subsequence (in order), person B visits all cities in the second subsequence (in order), and such that the sum of the total distances travelled by A and B is minimized. Assume that person A and person B start initially at the first city in their respective subsequences.

12. Bin Packing (Simplified Version). You have n1 items of size s1, n2 items of size s2, and n3 items of size s3. You'd like to pack all of these items into bins each of capacity C, such that the total number of bins used is minimized.

Source http://people.csail.mit.edu/bdean/6.046/dp/
http://net.pku.edu.cn/~course/cs101/2007/resource/Intro2Algorithm/book6/chap16.htm
http://www.cs.berkeley.edu/~vazirani/algorithms/chap6.pdf
http://www.ms.unimelb.edu.au/~moshe/dp/
http://web.iiit.ac.in/~avidullu/pdfs/dynprg/
http://coders-stop.blogspot.com/search/label/Dynamic%20Programming

Sunday, April 10, 2011

Study of Advance data Structure & Algorithm With Code

http://www.stanford.edu/~blp/avl/libavl.html/index.html#toc_AVL-Trees

Problems Set 1- Data Structure

Data Structures

K Nearest Neighbor Algorithm Implementation and Overview

http://www.codeproject.com/KB/recipes/K_nearest_neighbor.aspx

Stacks, Queues, and Lists


3-1. A common problem for compilers and text editors is determining whether the parentheses in a string are balanced and properly nested. For example, the string ((())())() contains properly nested pairs of parentheses, which the strings )()( and ()) do not. Give an algorithm that returns true if a string contains properly nested and balanced parentheses, and false if otherwise. For full credit, identify the position of the first offending parenthesis if the string is not properly nested and balanced.

(Solution 3.1)


3-2. Write a program to reverse the direction of a given singly-linked list. In other words, after the reversal all pointers should now point backwards. Your algorithm should take linear time.


(Solution 3.2)


3-3. We have seen how dynamic arrays enable arrays to grow while still achieving constant-time amortized performance. This problem concerns extending dynamic arrays to let them both grow and shrink on demand.

1. Consider an underflow strategy that cuts the array size in half whenever the array falls below half full. Give an example sequence of insertions and deletions where this strategy gives a bad amortized cost.
2. Then, give a better underflow strategy than that suggested above, one that achieves constant amortized cost per deletion.

(Solution 3.3)


Trees and Other Dictionary Structures


3-4. Design a dictionary data structure in which search, insertion, and deletion can all be processed in O(1) time in the worst case. You may assume the set elements are integers drawn from a finite set 1,2,..,n, and initialization can take O(n) time.

(Solution 3.4)


3-5. Find the overhead fraction (the ratio of data space over total space) for each of the following binary tree implementations on n nodes:

1. All nodes store data, two child pointers, and a parent pointer. The data field requires four bytes and each pointer requires four bytes.
2. Only leaf nodes store data; internal nodes store two child pointers. The data field requires four bytes and each pointer requires two bytes.

(Solution 3.5)


3-6. Describe how to modify any balanced tree data structure such that search, insert, delete, minimum, and maximum still take O(logn) time each, but successor and predecessor now take O(1) time each. Which operations have to be modified to support this?


3-7. Suppose you have access to a balanced dictionary data structure, which supports each of the operations search, insert, delete, minimum, maximum, successor, and predecessor in O(logn) time. Explain how to modify the insert and delete operations so they still take O(logn) but now minimum and maximum take O(1) time. (Hint: think in terms of using the abstract dictionary operations, instead of mucking about with pointers and the like.)

(Solution 3.7)


3-8. Design a data structure to support the following operations:

1. insert(x,T) -- Insert item x into the set T.
2. delete(k,T) -- Delete the kth smallest element from T.
3. member(x,T) -- Return true iff x \in T.

All operations must take O(logn) time on an n-element set.


3-9. A concatenate operation takes two sets S1 and S2, where every key in S1 is smaller than any key in S2, and merges them together. Give an algorithm to concatenate two binary search trees into one binary search tree. The worst-case running time should be O(h), where h is the maximal height of the two trees.

(Solution 3.9)


Applications of Tree Structures


3-10. In the bin-packing problem, we are given n metal objects, each weighing between zero and one kilogram. Our goal is to find the smallest number of bins that will hold the n objects, with each bin holding one kilogram at most.

1. The best-fit heuristic for bin packing is as follows. Consider the objects in the order in which they are given. For each object, place it into the partially filled bin with the smallest amount of extra room after the object is inserted.. If no such bin exists, start a new bin. Design an algorithm that implements the best-fit heuristic (taking as input the n weights w1,w2,...,wn and outputting the number of bins used) in O(nlogn) time.
2. Repeat the above using the worst-fit heuristic, where we put the next object in the partially filled bin with the largest amount of extra room after the object is inserted.


3-11. Suppose that we are given a sequence of n values x1,x2,...,xn and seek to quickly answer repeated queries of the form: given i and j, find the smallest value in x_i,\ldots,x_j.

1. Design a data structure that uses O(n2) space and answers queries in O(1) time.
2. Design a data structure that uses O(n) space and answers queries in O(logn) time. For partial credit, your data structure can use O(nlogn) space and have O(logn) query time.

(Solution 3.11)


3-12. Suppose you are given an input set S of n numbers, and a black box that if given any sequence of real numbers and an integer k instantly and correctly answers whether there is a subset of input sequence whose sum is exactly k. Show how to use the black box O(n) times to find a subset of S that adds up to k.


3-13. Let A[1..n] be an array of real numbers. Design an algorithm to perform any sequence of the following operations:

1. Add(i,y) -- Add the value y to the ith number.
2. Partial-sum(i) -- Return the sum of the first i numbers, i.e.

\sum_{j=1}^i A[j]. There are no insertions or deletions; the only change is to the values of the numbers. Each operation should take O(logn) steps. You may use one additional array of size n as a work space.

(Solution 3.13)


3-14. Extend the data structure of the previous problem to support insertions and deletions. Each element now has both a key and a value. An element is accessed by its key. The addition operation is applied to the values, but the elements are accessed by its key. The Partial-sum operation is different.

1. Add(k,y) -- Add the value y to the item with key k.
2. Insert(k,y) -- Insert a new item with key k and value y.
3. Delete(k) -- Delete the item with key k.
4. Partial-sum(k) --

Return the sum of all the elements currently in the set whose key is less than y, i.e. \sum_{x_j

3-15. Design a data structure that allows one to search, insert, and delete an integer X in O(1) time (i.e., constant time, independent of the total number of integers stored). Assume that 1 \leq X \leq n and that there are m + n units of space available, where m is the maximum number of integers that can be in the table at any one time. (Hint: use two arrays A[1..n] and B[1..m].) You are not allowed to initialize either A or B, as that would take O(m) or O(n) operations. This means the arrays are full of random garbage to begin with, so you must be very careful.

(Solution 3.15)


Implementation Projects


3-16. Implement versions of several different dictionary data structures, such as linked lists, binary trees, balanced binary search trees, and hash tables. Conduct experiments to assess the relative performance of these data structures in a simple application that reads a large text file and reports exactly one instance of each word that appears within it. This application can be efficiently implemented by maintaining a dictionary of all distinct words that have appeared thus far in the text and inserting/reporting each word that is not found. Write a brief report with your conclusions.


3-17. A Caesar shift (see cryptography) is a very simple class of ciphers for secret messages. Unfortunately, they can be broken using statistical properties of English. Develop a program capable of decrypting Caesar shifts of sufficiently long texts.

(Solution 3.17)

Interview Problems


3-18. What method would you use to look up a word in a dictionary?


3-19. Imagine you have a closet full of shirts. What can you do to organize your shirts for easy retrieval?

(Solution 3.19)


3-20. Write a function to find the middle node of a singly-linked list.

(Solution 3.20)


3-21. Write a function to compare whether two binary trees are identical. Identical trees have the same key value at each position and the same structure.

(Solution 3.21)


3-22. Write a program to convert a binary search tree into a linked list.


(Solution 3.22)


3-23. Implement an algorithm to reverse a linked list. Now do it without recursion.

(Solution 3.23)


3-24. What is the best data structure for maintaining URLs that have been visited by a Web crawler? Give an algorithm to test whether a given URL has already been visited, optimizing both space and time.

(Solution 3.24)


3-25. You are given a search string and a magazine. You seek to generate all the characters in search string by cutting them out from the magazine. Give an algorithm to efficiently determine whether the magazine contains all the letters in the search string.

(Solution 3.25)


3-26. Reverse the words in a sentence---i.e., My name is Chris becomes Chris is name My. Optimize for time and space.


3-27. Determine whether a linked list contains a loop as quickly as possible without using any extra storage. Also, identify the location of the loop.

(Solution 3.27)


3-28. You have an unordered array X of n integers. Find the array M containing n elements where Mi is the product of all integers in X except for Xi. You may not use division. You can use extra memory. (Hint: There are solutions faster than O(n2).)


3-29. Give an algorithm for finding an ordered word pair (e.g., New York) occurring with the greatest frequency in a given webpage. Which data structures would you use? Optimize both time and space.

(Solution 3.29)

Source For Solution

http://www2.algorithm.cs.sunysb.edu/mediawiki/index.php/Data-structures-TADM2E-2