I, Me, MySelf and CS Problems: 2011

Monday, July 4, 2011

Google Interview Question Set-3

1. Given a linked list structure where every node represents a linked list and contains two pointers of its type:
(i) pointer to next node in the main list.
(ii) pointer to a linked list where this node is head.
Write a C function to flatten the list into a single linked list. (from here)

2. Hundred of millions of irregular shape objects are moving in random direction. Provide data structure and algo to detect collision. (from here)

3. You are given an array of integer A[1..n] and a maximum sliding window of size w. you need to output an array B where B[i] is the maximum in A[i, i+w-1]. (from here)

4. [Design] Find duplicate profiles in Facebook.

5. You have an array of 0s and 1s and you want to output all the intervals (i, j) where the number of 0s and numbers of 1s are equal. (from here)

6. You have a stream of random numbers that are inserted into an expanding array. How can you maintain the current median and what is the complexity.

7. Given n arrays, find n number such that sum of their differences is minimum. For e.g. if there are three arrays

A = {4, 10, 15, 20}
B = {1, 13, 29}
C = {5, 14, 28}

find three numbers a, b, c such that |a-b| + |b-c| + |c-a| is minimum. Here the answer is a = 15, b = 13, and c = 14

8. [Coding] Write a C/C++ program to create a bitmap of any size as determined by user. Say user says 64k bitmap, then create 64k long bitmap. Have set and unset methods.

9. Reorder 1st string based on 2nd string.
eg: (tractor,car)
output: carrtto or carrott or carrtot.

10. There is a straight roads with ‘n’ number of milestones. You are given an array with the distance between all the pairs of milestones in some random order. Find the position of milestones.
Example:
Consider a road with 4 milestones(a,b,c,d) :
a bcd
Distance between a and b = 3
Distance between a and c = 8
Distance between a and d = 10
Distance between b and c = 5
Distance between b and d = 7
Distance between c and d = 2
All the above values are given in a random order say 7, 10, 5, 2, 8, 3.
The output must be 3,5,2 or 2,5,3

11. Given a set of KEY->VALUE pairs such that each KEY is unique, describe a method of storing these pairs on disk, and a method for accessing the corresponding VALUE given a KEY. Assume that RAM is fixed at 1gb and the set of pairs requires 40gb.

12. Consider there is an array with duplicates and you are given two numbers as input and you have to return the minimum distance between the two in the array with minimum complexity.

13. You are given an array of integers, say array1 of size n. You have to create another array (say array2) and fill its contents by following this rule: array2[i] will hold the value of the left-most integer from the range array1[i+1] to array1[n-1] such that it is greater than array1[i]. If no such element exists, assign -1.

14. An external system has an array. You are given 3 inbuilt functions which perform their operation in O(1) time.
a) get(i): returns ith element of array
b) length(): returns length of array
c) reverse(i, j): reverses elements of subarray starting from i and ending with j (both inclusive)

Write function SortArray() using these three functions.

15. Given a binary tree, find 2 leaf nodes say X and Y such that F(X,Y) is maximum where F(X,Y) = sum of nodes in the path from root to X + sum of nodes in the path from root to Y – sum of nodes in the common path from root to first common ancestor of the Nodes X and Y.

16. Given an array, A, find if all elements in the sorted version of A are consecutive in less than O(nlogn).
eg: A: 5 4 1 2 3 on sorting 1 2 3 4 5 all elements are consecutive — true
A: 1 9 2 22 on sorting 1 2 9 22 all elements are NOT consecutive — false

17. Given a string (assume there no spaces or punctuations), write a code that returns the max. length of the string that has repeated more than once

18.Given a network of the employees of a company such that edges are between those employees who are friends to each other. Divide the employees into two teams such that no two people who are friends to each other are in the same team?

Wednesday, June 29, 2011

Recursion World

List of some of the best resources to learn Recursion, Backtracking and most importantly to start thinking recursively.

1. Recursion Basics, Solving Problems recursively, raise to the power example and mechanics of what is going to happen, Choosing a subset
Video Lecture

2. Thinking Recursively, Permute Code, More Examples, Tree of Recursive calls
Video Lecture http://www.youtube.com/watch?v=uFJhEPrbycQ&feature=PlayList&p=FE6E58F856038C69&index=8

3. Testing with different cases, subsets, subset stratergy, Exhaustive Recursion, Recursive BackTracking, Anagram Finder, N Queens
Video Lecture http://www.youtube.com/watch?v=NdF1QDTRkck&feature=PlayList&p=FE6E58F856038C69&index=9

4. Backtracking Pseudocode, Sudoku Solver, Sudoku Code, Cryptarithmetic
Video Lecture http://www.youtube.com/watch?v=p-gpaIGRCQI&feature=PlayList&p=FE6E58F856038C69&index=10

Reference: http://see.stanford.edu/see/courseInfo.aspx?coll=11f4f422-5670-4b4c-889c-008262e09e4e (Stanford Engineering Everywhere)

Monday, June 27, 2011

Google Interview Shared -Question Set 2

1. Throughout this column we have used the simple definition
that words are separated by white space. Many real documents,
such as those in HTML or RTF, contain formatting commands.
How could you deal with such commands? Are there any other
tasks that you might need to perform?
2. On a machine with ample main memory, how could you use
the C++ STL sets or maps to solve the searching problem in Section 13.8? How much memory does it consume
compared to McIlroy's structure?
3. How much speedup can you achieve by incorporating the special-purpose malloc of Solution 9.2 into the
hashing program of Section 15.1?
4. When a hash table is large and the hash function scatters the data well, almost every list in the table has only a
few elements. If either of these conditions is violated, though, the time spent searching down the list can be
substantial. When a new string is not found in the hash table in Section 15.1, it is placed at the front of the list. To
simulate hashing trouble, set NHASH to 1 and experiment with this and other list strategies, such as appending to
the end of the list, or moving the most recently found element to the front of the list.
5. When we looked at the output of the word frequency programs in Section 15.1, it was most interesting to print
the words in decreasing frequency. How would you modify the C and C++ programs to accomplish this task? How
would you print only the M most common words, where M is a constant such as 10 or 1000?
6. Given a new input string, how would you search a suffix array to find the longest match in the stored text? How
would you build a GUI interface for this task?
7. Our program for finding duplicated strings was very fast for ``typical'' inputs, but it can be very slow (greater
than quadratic) for some inputs. Time the program on such an input. Do such inputs ever arise in practice?
8. How would you modify the program for finding duplicated strings to find the longest string that occurs more
than M times?
9. Given two input texts, find the longest string that occurs in both.
10. Show how to reduce the number of pointers in the duplication program by pointing only to suffixes that start on
word boundaries. What effect does this have on the output produced by the program?
11. Implement a program to generate letter-level Markov text.
12. How would you use the tools and techniques of Section 15.1 to generate (order-0 or non-Markov) random text?
13. The program for generating word-level Markov text is at this book's web site. Try it on some of your
documents.
14. How could you use hashing to speed up the Markov program?
15. Shannon's quote in Section 15.3 describes the algorithm he used to construct Markov text; implement that
algorithm in a program. It gives a good approximation to the Markov frequencies, but not the exact form; explain
why not. Implement a program that scans the entire string from scratch to generate each word (and thereby uses the
true frequencies).
16. How would you use the techniques of this column to assemble a word list for a dictionary (the problem that
Doug McIlroy faced in Section 13.8)? How would you build a spelling checker without using a dictionary? How
would you build a grammar checker without using rules of grammar?
17. Investigate how techniques related to k-gram analysis are used in applications such as speech recognition and
data compression.

Sunday, June 26, 2011

15 Puzzle Problem

Optimal 8/15-Puzzle Solver

The 8-puzzle is a classic problem in AI that can be solved with the A* algorithm.

* A* maintains two lists, called open and closed.
* At the beginning of the algorithm, the initial node is placed on the open list. The list is sorted according to an admissible heuristic that measures how close the state of the node is to the goal state.
* At each step, bestNode is removed from the open list. In this case, bestNode is always the head of the open list. If the state of bestNode is the goal state, then the algorithm terminates. Otherwise bestNode is expanded (we determine all the possible states that can be reached within a single move), and bestNode is placed on the closed list.
* The successors of bestNode are placed on the open list if either:
o the successor is not already on the open or closed list, or
o the successor is already on the open or closed list but has a lower cost.

Many 15-puzzles cannot be solved with the A* algorithm, since it generates too many new states and consumes a lot of memory maintaining these lists. To solve this larger puzzle, Iterative-Deepening-A* (IDA*) can be used. Like the A* algorithm, it finds optimal solutions when paired with an admissible heuristic but is much more efficient with respect to space. IDA* is described as follows:

* Set threshold equal to the heuristic evaluation of the initial state.
* Conduct a depth-first search, pruning a branch when the cost of the latest node exceeds threshold. If a solution is found during the search, return it.
* If no solution is found during the current iteration, increment threshold by the minimum amount it was exceeded, and go back to the previous step.

Three heuristics have been implemented for comparison's sake:

* Manhattan Distance
* Manhattan Distance + Linear Conflict
* O. Hansson, A. Mayer, and M. Yung, "Criticizing Solutions to Relaxed Models Yields Powerful Admissible Heuristics," Information Sciences, Vol. 63, Issue 3, pp. 207-227, 1992. Additive Pattern Database Heuristic with Static 6-6-3 Partitioning
A. Felner, R. Korf, and S. Hanan, "Additive Pattern Database Heuristics," Journal of AI Research, Vol. 22, pp. 279-318, 2004.

This JAR file is about 5.6 MB. Please be patient while it loads.
You must accept the certificate in order to run multi-threaded IDA*.

Download standalone version: PuzzleSolver.jar

15-Puzzle
Screenshot of 8/15-Puzzle Solver.

Source Code

* AStar.java
* AStarNode.java
* AboutDialog.java
* Algorithm.java
* Application.java
* ApplicationStarter.java
* BFSNode.java
* Centerable.java
* CenterableDialog.java
* DFSWorker.java
* Entry.java
* FibonacciHeap.java
* GUI.java
* IDAStar.java
* IDAStarNode.java
* ImagePanel.java
* MessageBox.java
* Node.java
* PatternDatabaseGenerator.java
* PrimitiveHashMap.java
* Puzzle.java
* PuzzleConfiguration.java
* PuzzleSolver.java
* Utility.java

Source http://www.brian-borowski.com/Software/Puzzle/

Implementation

This applet provides a means of comparing how different algorithms and heuristics perform on the sliding-tile puzzles. A* is more than acceptable for solving the the 8-puzzle, with its 9! / 2 = 181,440 states and optimal solutions of up to 31 moves. It becomes painfully slow, however, for even average length solutions of the 15-puzzle, which has 16! / 2 = 10,461,394,944,000 states and optimal solutions of up to 80 moves.

IDA* works well on most 15-puzzle configurations, especially when paired with the additive pattern database heuristic. 6-6-3 partitioning provides a good compromise between speed and the amount of memory required to hold the state-to-cost mappings, of which there are 11,534,880. The application runs slightly faster on 64-bit systems running a 64-bit JVM, since there are operations on long primitives.

Starting with version 2.2.0, multi-threaded IDA* can be used to more efficiently solve puzzles. This new alogrithm starts with a breadth-first search of the tree to a level that has a number of nodes greater than or equal to the number of threads. Any nodes that contain the same board configuration are pruned before starting the thread pool. Then multiple depth-first search workers crawl through the search space, looking for a solution of up to threshold moves, as discussed above.

The number of threads is curently hard-coded at twice the number of available cores. This seemingly unusual number has provided the best performance in most cases on my AMD Phenom II x6. Ideally, we want the algorithm to minimize CPU idle time. Idle time occurs toward the end of an iteration in IDA*, when the depth-first search workers start to taper off. Without more complicated code, the next iteration cannot start until we are certain there is no solution for the given threshold. Thus, starting more threads than cores keeps the CPU at max capacity over most of the iteration and generally performs better than when starting fewer threads, even with the added cost of context switching.

Why bother with multi-threading? A single-threaded Java implementation of IDA* cannot compete with the same C implementation. When profiling this code, one will observe that a large majority of execution time is spent in looking up costs in Node.h(). This section of this method that implements with the pattern database heuristic is simple, but Java's array bounds checking truly hinders performance. In a simple test of Java versus C (compiled with gcc, -O2) where array elements are accessed in random order millions of times, C outperformed Java by a factor of 8. When applying the applet's pattern database heuristic, arrays are used only for cost lookups and to store the path to the goal state. Everything else has been implemented with primitives and shift operations. So, in order to improve performance, one must turn to multi-threading.

Usage

The 8/15-Puzzle Solver is easy to use. The options should be straightforward. To enter the configuration of a puzzle, simply list the numbers of the tiles from left to right, top to bottom, each separated by a comma. Enter 0 for the space. For example, "2,4,0,1,8,5,3,6,7" is what you would enter for the board configuration below:

2 4
1 8 5
3 6 7

Note that you must use IDA* to solve 15-puzzles, as the A* option has been purposefully disabled. Multi-threading is not available unless you accept the certificate, since the thread pool cannot be shutdown with the default policy settings, and is not available with the 8-puzzle (since a single thread is more than sufficient).

Saturday, June 25, 2011

Google Interview Shared -Question Set 1

Question #1) Given k sorted streams where each stream could possibly be infinite in length, describe an efficient algorithm to merge the k streams into a new stream (also in sorted order).

Question #2) Given a set of KEY->VALUE pairs such that each KEY is unique, describe a method of storing these pairs on disk, and a method for accessing the corresponding VALUE given a KEY. Assume that RAM is fixed at 1gb and the set of pairs requires 40gb.

HINT: we are trying to minimize page-transfers

Question #3) Given N computers networked together, with each computer storing N integers, describe a procedure for finding the median of all of the numbers. Assume that a computer can only hold O(N) integers (i.e. no computer can store all N^2 integers). Also assume that there exists a computer on the network without integers, that we can use to interface with the computers storing the integers.

Question #4) Given the sequence S1 = {a,b,c,d,...,x,y,z,aa,ab,ac.... } and given that this sequence corresponds (term for term) to the sequence S2 = {1,2,3,4,....}
Write code to convert an element of S1 to the corresponding element of S2. Write code to convert an element of S2 to the corresponding element of S1.

Question #5) Given a binary tree with the following constraints:
a) A node has either both a left and right child OR no children
b) The right child of a node is either a leaf or NULL

write code to invert this tree. HINT: Draw this out

Question #6) Given a square with side length = 1, describe all points inside square that are closer to the center of the square than to the edge of the square.

Question #7) How many 0's are at the end of N!

hint: look at the prime factorization of N!

Question #8) Given an array A[string], an array of strings where each string represents a word in a text document. Also given 3 search terms T1, T2, and T3 and 3 corresponding sorted sequences of integers S1, S2, and S3 where each integer in Si represents an index in A where search term Ti occured (i.e. S1, S2, and S3 contain the locations of the search terms in our array of words). Now find a minimal subarray of A that contains all of the search terms T1, T2, and T3. Extend this algorithm for an arbitrary number of search terms.

hint: think of the brute force algorithm first

Question #9) Design a data structure that supports push(), pop(), and min() all in O(1) time

Q: "You are shrunk to the height of a nickel and your mass is proportionally reduced so as to maintain your original density. You are then thrown into an empty glass blender. The blades will start moving in 60 seconds. What do you do?"

Q: "How would you find out if a machine's stack grows up or down in memory?"

Q: "Explain a database in three sentences to your eight-year-old nephew."

Q: "How many gas stations would you say there are in the United States?"
================================================== =======

In case you were wondering, no, i didn't get the job. I attended one of their "College Days" a couple weeks back for my second round of interviews. I was only asked 6 of the above questions (#1, #2, #3,#4, #5, #8) and out of them i answered 3 correctly, 2 sorta correctly after receiving hints, and 1 i didnt answer at all (this was also my first interview on campus and my first real tech interview ever.) Basically my impression is: if you want a job at Google, you better eventually arrive at complete and optimal solutions to all of the questions.

I feel like i could of done better ( i was nervous and only got 3 hours of sleep the night before), but i learned a lot about technical interviews and here are some tips:

**** IMPORTANT TIPS FOR INTERVIEWING AT TECH COMPANIES:

#1) If you're asked to write code, first forumlate an algorithm in your head. Next DOUBLE check that algorithm, even if it takes 5 minutes, and even if it takes drawing some detailed example cases. Whatever you do, make 100% SURE that what you're going to code is correct, THEN write the code.

After you're finished writing the code, the interviewer will probably have you explain the code and illustrate a test run of the code, possibly on a white board. If your code is incorrect, you both will eventually arrive at that conclusion, and by the time you've found out its incorrect, there won't be enough time to fix the code. This is why its 100% important to get it the first time. This exact scenario happened to me - i tried writing up a solution real fast (#5) , but it back fired on me.

#2) If you're asked to describe an algorithm for solving a particular problem efficiently, FIRST think of the brute-force inefficient case. The brute force algorithm will most likely be very obvious, so go over the brute force approach in detail first. This is key, because a thorough investigation of the brute force algorithm will help you understand the problem space better, and may lead you directly to the efficient algorithm. I didn't do this during my first interview (problem #8) and i didnt arrive at the answer. I tried doing the same problem at home one night, worked out the brute force algorithm, and within minutes i had the optimal solution.

#3) Don't try to rush, just stay calm and go through things methodically.
[/b]
ones3k is offline Reply With Quote

Friday, June 24, 2011

P & NP Problmes of Computer Science

http://www.cs.gsu.edu/~cscskp/Algorithms/NP/NP.html

Solving Sudoku Using Object Oriented programming Java & Tree Data Structure

Solving Sudokus with Java and tree data structures

with one comment

A few weeks ago I published Java and tree data structures a post where I mentioned that I would try to put together some code to handle tree data structures… Well, this is the first installment of what I hope it will become a series of articles/code that will look into creating this tree library.

This first article/code is going to be about building trees dynamically. It is going to be illustrated by an example on how to use dynamic tree building to find the solution for a Sudoku. All the source code is available for downloading in a SVN repository, the URLs are:

Tree source code: https://subversion.assembla.com/svn/making-good-software/trunk/mgsTrees
Sudoku source code: https://subversion.assembla.com/svn/making-good-software/trunk/mgsSudoku
Dynamic tree building to find a solution

One widely used technique in computer science is to use a tree data structure to find the solution for something that needs to be solved but which solution is not obvious and requires many steps. One example could be chess, or a puzzle, a crossword, or like in this case, a sudoku.

The premise is simple, given an initial state, start building a tree of developed states from this original step until we find the state we were looking for. In the specific case of a sudoku, start building a tree where the branches are the different valid developments for that sudoku and stop when we find the solution.

The key element for building a tree dynamically is recursivity, starting from the original state, we find what the next candidate are, and them we do the same with each of them. In the mgsTrees library this can be easily accomplished by implementing an interface:
view plaincopy to clipboardprint?

1. public interface ExpandTreeStrategy {
2. public ExpandAction expand (T toExpand);
3. }

public interface ExpandTreeStrategy {
public ExpandAction expand (T toExpand);
}

The return type of the ExpandTreeStrategy is an ExpandAction. An ExpandAction has 3 flavors.

* stopBranch: It signals that this node is not expandable. In the Sudoku solver this is used when an ilegal Sudoku is reached.
* stopTree: It signals that is not necessary to keep building the tree. In the sudoku solver this is used when the solution is found.
* continueWith: This is used to specify what are the next states to look at. In the sudoku solver this is used when a intermediate sudoku is reached to specify what the next candidates are.

This is how to create the ExpandTreeStrategy Implementation for the Sudoku solver.
view plaincopy to clipboardprint?

1. public class ExpandSudokuLookingForSolutionStrategy implements ExpandTreeStrategy {
2. private final SudokuAnaliser sudokuAnaliser;
3.
4. public ExpandSudokuLookingForSolutionStrategy(SudokuAnaliser sudokuAnaliser) {
5. this.sudokuAnaliser = sudokuAnaliser;
6. }
7.
8. @Override
9. public ExpandAction expand(Sudoku toExpand) {
10. SudokuAnalysis sudokuAnalysis = sudokuAnaliser.analise(toExpand) ;
11.
12. if (sudokuAnalysis.isSolved()) return ExpandAction.stopTree ();
13. if (! sudokuAnalysis.isLegal()) return ExpandAction.stopBranch ();
14.
15. if (sudokuAnalysis.hasDirectSolutions()) {
16. return continueWithSudokuWithAllDirectSolutionsFilled(sudokuAnalysis);
17. } else if (sudokuAnalysis.hasNextCandidates()){
18. return continueWithSudokuVariantsForSquareLocationWithLessPossibleValues(sudokuAnalysis);
19. }else{
20. throw new RuntimeException("Is not possible to have a sudoku not solved, legal and without direct soluctions or candidates");
21. }
22. }
23.
24. private ExpandAction continueWithSudokuVariantsForSquareLocationWithLessPossibleValues(SudokuAnalysis sudokuAnalysis) {
25. List movements = sudokuAnalysis.getPossibleMovementsForSquareWithLessPossibleValues();
26. List nextSudokus = new ArrayList();
27. for (SudokuMovement movement : movements) {
28. nextSudokus.add(sudokuAnalysis.getSudoku().with(movement));
29. }
30. return ExpandAction.continueWith(nextSudokus);
31. }
32.
33. private ExpandAction continueWithSudokuWithAllDirectSolutionsFilled(SudokuAnalysis sudokuAnalysis) {
34. List movements = sudokuAnalysis.getAllDirectSolutionMovements();
35. return ExpandAction.continueWith(sudokuAnalysis.getSudoku().with(movements));
36. }
37. }

public class ExpandSudokuLookingForSolutionStrategy implements ExpandTreeStrategy {
private final SudokuAnaliser sudokuAnaliser;

public ExpandSudokuLookingForSolutionStrategy(SudokuAnaliser sudokuAnaliser) {
this.sudokuAnaliser = sudokuAnaliser;
}

@Override
public ExpandAction expand(Sudoku toExpand) {
SudokuAnalysis sudokuAnalysis = sudokuAnaliser.analise(toExpand) ;

if (sudokuAnalysis.isSolved()) return ExpandAction.stopTree ();
if (! sudokuAnalysis.isLegal()) return ExpandAction.stopBranch ();

if (sudokuAnalysis.hasDirectSolutions()) {
return continueWithSudokuWithAllDirectSolutionsFilled(sudokuAnalysis);
} else if (sudokuAnalysis.hasNextCandidates()){
return continueWithSudokuVariantsForSquareLocationWithLessPossibleValues(sudokuAnalysis);
}else{
throw new RuntimeException("Is not possible to have a sudoku not solved, legal and without direct soluctions or candidates");
}
}

private ExpandAction continueWithSudokuVariantsForSquareLocationWithLessPossibleValues(SudokuAnalysis sudokuAnalysis) {
List movements = sudokuAnalysis.getPossibleMovementsForSquareWithLessPossibleValues();
List nextSudokus = new ArrayList();
for (SudokuMovement movement : movements) {
nextSudokus.add(sudokuAnalysis.getSudoku().with(movement));
}
return ExpandAction.continueWith(nextSudokus);
}

private ExpandAction continueWithSudokuWithAllDirectSolutionsFilled(SudokuAnalysis sudokuAnalysis) {
List movements = sudokuAnalysis.getAllDirectSolutionMovements();
return ExpandAction.continueWith(sudokuAnalysis.getSudoku().with(movements));
}
}

At the end the entire tree building logic is wired in the TreeFactory class, which has a method, buildTree, that takes the starting point for the tree, and which constructor takes the ExpandStrategy to be used to build the tree. The result of this call is a TreeResult, which contains three main methods;

* getTree(): The resulting tree
* isBuildInterrupted(): Will tell you if the build is interrupted, in the case of the SudokuSolver this would mean that the solution was found.
* getLastBranch(): A sequential ordered list of all the last branch nodes startging from the root node, the original node. In the case of the sudoku solver this is used to retrieve in case that the solution is found (isBuildInterrupted == true) the sequence of Sudokus from the original one to the solution.

view plaincopy to clipboardprint?

1. public class TreeFactory {
2. private final ExpandTreeStrategy expandTreeStrategy;
3.
4. public TreeFactory(ExpandTreeStrategy expandTreeStrategy) {
5. this.expandTreeStrategy = expandTreeStrategy;
6. }
7.
8. public TreeResult buildTree (T seed){
9. System.out.println(seed);
10. ExpandAction expandActionResult = expandTreeStrategy.expand (seed);
11. if (expandActionResult.isStopBranch ()) return TreeResult.stopBranch (seed);
12. if (expandActionResult.isStopTree ()) return TreeResult.stopTree (seed);
13.
14. Node rootNode = new Node (seed);
15. List> currentBranch = new ArrayList>();
16. currentBranch.add(rootNode);
17.
18. Iterator childIterator = expandActionResult.getChildIterator();
19. TreeResult lastSubTreeResult = null;
20. while (childIterator.hasNext()){
21. Node next = new Node (childIterator.next());
22. lastSubTreeResult = buildTree(next.getContent());
23. rootNode.addChild(lastSubTreeResult.getTree().getRootNode());
24. if (lastSubTreeResult.isBuildInterrupted()) {
25. currentBranch.addAll(lastSubTreeResult.getLastNodeBranch());
26. return TreeResult.interrupt (rootNode, currentBranch);
27. }
28. }
29.
30. currentBranch.addAll(lastSubTreeResult.getLastNodeBranch());
31. return TreeResult.buildCompleted (rootNode, currentBranch);
32. }
33. }

public class TreeFactory {
private final ExpandTreeStrategy expandTreeStrategy;

public TreeFactory(ExpandTreeStrategy expandTreeStrategy) {
this.expandTreeStrategy = expandTreeStrategy;
}

public TreeResult buildTree (T seed){
System.out.println(seed);
ExpandAction expandActionResult = expandTreeStrategy.expand (seed);
if (expandActionResult.isStopBranch ()) return TreeResult.stopBranch (seed);
if (expandActionResult.isStopTree ()) return TreeResult.stopTree (seed);

Node rootNode = new Node (seed);
List> currentBranch = new ArrayList>();
currentBranch.add(rootNode);

Iterator childIterator = expandActionResult.getChildIterator();
TreeResult lastSubTreeResult = null;
while (childIterator.hasNext()){
Node next = new Node (childIterator.next());
lastSubTreeResult = buildTree(next.getContent());
rootNode.addChild(lastSubTreeResult.getTree().getRootNode());
if (lastSubTreeResult.isBuildInterrupted()) {
currentBranch.addAll(lastSubTreeResult.getLastNodeBranch());
return TreeResult.interrupt (rootNode, currentBranch);
}
}

currentBranch.addAll(lastSubTreeResult.getLastNodeBranch());
return TreeResult.buildCompleted (rootNode, currentBranch);
}
}

The final code to solve any given Sudoku will look like:
view plaincopy to clipboardprint?

1. public static void main(String[] args) {
2. Sudoku sudoku = new Sudoku(
3. new SquareValue [][]{
4. {__, __, __, __, __, __, _6, __, __},
5. {__, __, _3, _7, __, __, __, __, _1},
6. {__, _8, __, _1, __, _6, _9, _4, __},
7.
8. {_7, __, __, _5, __, __, __, __, __},
9. {__, __, _5, __, _4, __, _8, __, __},
10. {__, __, __, __, __, _2, __, __, _7},
11.
12. {__, _2, _9, _6, __, _4, __, _3, __},
13. {_6, __, __, __, __, _8, _4, __, __},
14. {__, __, _4, __, __, __, __, __, __}
15. }
16. );
17.
18. IteratorStrategy rowNavigator = OneLineIteratorStrategy.horizontal();
19. IteratorStrategy columnNavigator = OneLineIteratorStrategy.vertical();
20. IteratorStrategy matrixNavigator = MatrixIteratorStrategy.instance();
21. SudokuProcessor sudokuProcessor = new SudokuProcessorImpl(rowNavigator, columnNavigator, matrixNavigator);
22. SudokuAnaliser sudokuAnaliser = new SudokuAnaliserImpl(sudokuProcessor );
23. ExpandTreeStrategy expandTreeStrategy = new ExpandSudokuLookingForSolutionStrategy(sudokuAnaliser);
24. TreeFactory treeFactory = new TreeFactory(expandTreeStrategy);
25.
26. TreeResult buildTree = treeFactory.buildTree(sudoku);
27. List lastBranch = buildTree.getLastBranch();
28. System.out.println("Solution: " + lastBranch.get(lastBranch.size()-1));
29. }

Source
Tree :https://subversion.assembla.com/svn/making-good-software/trunk/mgsTrees
Sudoku:https://subversion.assembla.com/svn/making-good-software/trunk/mgsSudoku

Designing The Chess Engine Using Object Oriented programming Java

The Chess Board class
view plaincopy to clipboardprint?

1. public class ChessBoard {
2. private final Squares squares;
3.
4. public ChessBoard() {
5. this (new Squares());
6. }
7.
8. public ChessBoard(Squares squares) {
9. this.squares = squares;
10. }
11.
12. public ChessBoard performMovement(Piece piece, Location locationFrom, Location locationTo) {
13. Squares copy = squares.copy();
14. copy.setContent(SquareContent.EMPTY_SQUARE, locationFrom.getCoordinateX(), locationFrom.getCoordinateY());
15. copy.setContent(piece, locationTo.getCoordinateX(), locationTo.getCoordinateY());
16. return new ChessBoard(copy);
17. }
18.
19. public ChessBoard performMovement(PieceOnLocation pieceInBoard, Location locationTo) {
20. return performMovement(pieceInBoard.getPiece(), pieceInBoard.getLocation(), locationTo);
21. }
22.
23. public ChessBoard performMovement(Movement movement) {
24. return performMovement(movement.getMovingPiece(), movement.getFrom(), movement.getTo());
25. }
26.
27. public SquareContent getSquareContent(Location location) {
28. return squares.getContent(location.getCoordinateX(), location.getCoordinateY());
29. }
30.
31. public ChessBoard addPiece(Piece piece, Location location) {
32. Squares copy = squares.copy();
33. copy.setContent (piece, location.getCoordinateX(), location.getCoordinateY());
34. return new ChessBoard(copy);
35. }
36.
37. public List getPieces(Color movingColor) {
38. List pieces = new ArrayList();
39. for (Location location : Location.values()) {
40. SquareContent content = squares.getContent(location.getCoordinateX(), location.getCoordinateY());
41. if (! content.isEmpty()){
42. Piece piece = (Piece) content;
43. if (piece.getColor() == movingColor) pieces.add(piece.on(location));
44. }
45. }
46. return pieces;
47. }
48.
49. public List find(Piece pieceToFind) {
50. List pieces = new ArrayList();
51. for (Location location : Location.values()) {
52. SquareContent content = squares.getContent(location.getCoordinateX(), location.getCoordinateY());
53. if (! content.isEmpty()){
54. Piece piece = (Piece) content;
55. if (piece == pieceToFind) pieces.add(piece.on(location));
56. }
57. }
58. return pieces;
59. }
60.
61. public PieceOnLocation getPieceOnLocation(Location location) {
62. return new PieceOnLocation(getSquareContent(location).asPiece(), location);
63. }
64.
65. public ChessBoard empty(Location location) {
66. Squares copy = squares.copy();
67. copy.setContent(SquareContent.EMPTY_SQUARE, location.getCoordinateX(), location.getCoordinateY());
68. return new ChessBoard(copy);
69. }
70. }

public class ChessBoard {
private final Squares squares;

public ChessBoard() {
this (new Squares());
}

public ChessBoard(Squares squares) {
this.squares = squares;
}

public ChessBoard performMovement(Piece piece, Location locationFrom, Location locationTo) {
Squares copy = squares.copy();
copy.setContent(SquareContent.EMPTY_SQUARE, locationFrom.getCoordinateX(), locationFrom.getCoordinateY());
copy.setContent(piece, locationTo.getCoordinateX(), locationTo.getCoordinateY());
return new ChessBoard(copy);
}

public ChessBoard performMovement(PieceOnLocation pieceInBoard, Location locationTo) {
return performMovement(pieceInBoard.getPiece(), pieceInBoard.getLocation(), locationTo);
}

public ChessBoard performMovement(Movement movement) {
return performMovement(movement.getMovingPiece(), movement.getFrom(), movement.getTo());
}

public SquareContent getSquareContent(Location location) {
return squares.getContent(location.getCoordinateX(), location.getCoordinateY());
}

public ChessBoard addPiece(Piece piece, Location location) {
Squares copy = squares.copy();
copy.setContent (piece, location.getCoordinateX(), location.getCoordinateY());
return new ChessBoard(copy);
}

public List getPieces(Color movingColor) {
List pieces = new ArrayList();
for (Location location : Location.values()) {
SquareContent content = squares.getContent(location.getCoordinateX(), location.getCoordinateY());
if (! content.isEmpty()){
Piece piece = (Piece) content;
if (piece.getColor() == movingColor) pieces.add(piece.on(location));
}
}
return pieces;
}

public List find(Piece pieceToFind) {
List pieces = new ArrayList();
for (Location location : Location.values()) {
SquareContent content = squares.getContent(location.getCoordinateX(), location.getCoordinateY());
if (! content.isEmpty()){
Piece piece = (Piece) content;
if (piece == pieceToFind) pieces.add(piece.on(location));
}
}
return pieces;
}

public PieceOnLocation getPieceOnLocation(Location location) {
return new PieceOnLocation(getSquareContent(location).asPiece(), location);
}

public ChessBoard empty(Location location) {
Squares copy = squares.copy();
copy.setContent(SquareContent.EMPTY_SQUARE, location.getCoordinateX(), location.getCoordinateY());
return new ChessBoard(copy);
}
}

About the Chess Board class is important to notice 3 things:

1. Is thread safe. All is member variables are private and final, and is immutable, all the “changing operations” create a copy of the object with the new state.
2. It delegates all the functionality to Squares. Squares is the main class, it contains all the logic for holding SquareContent objects (Pieces and empty squares). It is wrapped by the Board to achieve thread safeness.
3. It is easy to create and customise a board, we only have to create a new board, and call the method addPiece. (See example below).

Creating a new board with a few pieces.
view plaincopy to clipboardprint?

1. board = new ChessBoard().
2. addPiece(Piece.BLACK_PAWN, Location.C5).
3. addPiece(Piece.WHITE_PAWN, Location.D5).
4. addPiece(Piece.WHITE_KING, Location.D4);

board = new ChessBoard().
addPiece(Piece.BLACK_PAWN, Location.C5).
addPiece(Piece.WHITE_PAWN, Location.D5).
addPiece(Piece.WHITE_KING, Location.D4);

The Chess Analiser

This class contains the main entry points for all the main logic, in particular, we are interested in the method findReachableLocations and in the default implementation ChessAnaliserImpl.
view plaincopy to clipboardprint?

1. public interface ChessAnaliser {
2. [...]
3. public List findReachableLocations(PieceOnLocation pieceOnLocation, ChessBoard board, Movement previousMovement);
4. }

public interface ChessAnaliser {
[...]
public List findReachableLocations(PieceOnLocation pieceOnLocation, ChessBoard board, Movement previousMovement);
}

view plaincopy to clipboardprint?

1. public class ChessAnaliserImpl implements ChessAnaliser {
2. @Override
3. public List findReachableLocations(PieceOnLocation pieceOnLocation, ChessBoard board, Movement previousMovement) {
4. Piece piece = pieceOnLocation.getPiece();
5. List toReturn = new ArrayList();
6.
7. for (MovementLine movementLine : chessReader.findMovementLines(pieceOnLocation)) {
8. if (movementLine.isApplicable (board, previousMovement)){
9. List potentiallyReachablePositions = movementLine.filterPotentiallyReachablePositions (pieceOnLocation.getColor(), board);
10. for (Location location : potentiallyReachablePositions) {
11. ChessBoard nextPossibleBoard = nextBoard(board, movementLine.getType(), pieceOnLocation, location, previousMovement);
12. if (!isInCheck(nextPossibleBoard, piece.getColor())){
13. toReturn.add(location);
14. }
15. }
16. }
17. }
18.
19. return toReturn;
20. }
21. }

public class ChessAnaliserImpl implements ChessAnaliser {
@Override
public List findReachableLocations(PieceOnLocation pieceOnLocation, ChessBoard board, Movement previousMovement) {
Piece piece = pieceOnLocation.getPiece();
List toReturn = new ArrayList();

for (MovementLine movementLine : chessReader.findMovementLines(pieceOnLocation)) {
if (movementLine.isApplicable (board, previousMovement)){
List potentiallyReachablePositions = movementLine.filterPotentiallyReachablePositions (pieceOnLocation.getColor(), board);
for (Location location : potentiallyReachablePositions) {
ChessBoard nextPossibleBoard = nextBoard(board, movementLine.getType(), pieceOnLocation, location, previousMovement);
if (!isInCheck(nextPossibleBoard, piece.getColor())){
toReturn.add(location);
}
}
}
}

return toReturn;
}
}

Take into consideration:

1. [Line 07]. The base of any movement is the movement line. The MovementLine class defines the type of movement and the different locations for that movement type that a piece can potentially move to given a starting position. Movement lines don’t take into consideration other pieces in the board.
2. [Line 08]. The method isApplicable, in the movement line, checks if that movement line is applicable for a given board taking into consideration the previous movement. The clearest example here would be the movement line for a pawn to do an “en passant” movement, this method will check if all the conditions for an en passant movement are true.
3. [Line 09] The method filterPotentiallyReachablePositions checks from all the possible locations of a movement line, how many are reachable, so if there is a piece that is actually standing in the middle of the board for that movement line, it will filter out all the positions after that piece . (See examples below).
4. [Lines 11 and 12] For all the potential locations that given piece can move to, it finds what the next board would be, and then it checks wether if the movement would cause the pieces for that color to be in check. This would be an illegal movement, so the location would be discarded.

Tying everything together

The following are real examples to find what the possible locations for the given pieces are:
view plaincopy to clipboardprint?

1. public class ChessAnaliserImplIntegrationTest {
2. private ChessAnaliserImpl testObj;
3. private ChessReader chessReader = new ChessReaderImpl();
4. private ChessBoard board;
5.
6. @BeforeMethod (groups="integration")
7. public void setup (){
8. testObj = new ChessAnaliserImpl (chessReader);
9. }
10.
11. @Test (groups="integration")
12. public void testFindReachableLocations_forRook_withSameColorObstacles (){
13. //GIVEN
14. board = new ChessBoard().
15. addPiece(Piece.WHITE_ROOK, Location.A1).
16. addPiece(Piece.WHITE_QUEEN, Location.A5).
17. addPiece(Piece.WHITE_KNIGHT, Location.C1).
18. addPiece(Piece.WHITE_KING, Location.C8);
19.
20. //WHEN
21. List actualReachablePositions = testObj.findReachableLocations(board.getPieceOnLocation (Location.A1), board, null);
22.
23. //THEN
24. Assert.assertEquals(actualReachablePositions.size(), 4);
25. Assert.assertEquals(actualReachablePositions.get(0), Location.B1);
26. Assert.assertEquals(actualReachablePositions.get(1), Location.A2);
27. Assert.assertEquals(actualReachablePositions.get(2), Location.A3);
28. Assert.assertEquals(actualReachablePositions.get(3), Location.A4);
29. }
30.
31. @Test (groups="integration")
32. public void testFindReachableLocations_forRook_withDifferentColorObstacles (){
33. //GIVEN
34. board = new ChessBoard().
35. addPiece(Piece.WHITE_ROOK, Location.A1).
36. addPiece(Piece.BLACK_QUEEN, Location.A5).
37. addPiece(Piece.WHITE_KNIGHT, Location.C1).
38. addPiece(Piece.WHITE_KING, Location.C8);
39.
40. //WHEN
41. List actualReachablePositions = testObj.findReachableLocations(board.getPieceOnLocation (Location.A1), board, null);
42.
43. //THEN
44. Assert.assertEquals(actualReachablePositions.size(), 5);
45. Assert.assertEquals(actualReachablePositions.get(0), Location.B1);
46. Assert.assertEquals(actualReachablePositions.get(1), Location.A2);
47. Assert.assertEquals(actualReachablePositions.get(2), Location.A3);
48. Assert.assertEquals(actualReachablePositions.get(3), Location.A4);
49. Assert.assertEquals(actualReachablePositions.get(4), Location.A5);
50. }
51.
52. @Test (groups="integration")
53. public void testFindReachableLocations_forRook_whenCantMove (){
54. //GIVEN
55. board = new ChessBoard().
56. addPiece(Piece.WHITE_KING, Location.D1).
57. addPiece(Piece.WHITE_ROOK, Location.C2).
58. addPiece(Piece.BLACK_QUEEN, Location.A4).
59. addPiece(Piece.WHITE_KNIGHT, Location.C1);
60.
61. //WHEN
62. List actualReachablePositions = testObj.findReachableLocations(board.getPieceOnLocation (Location.C2), board, null);
63.
64. //THEN
65. Assert.assertEquals(actualReachablePositions.size(), 0);
66. }
67.
68. @Test (groups="integration")
69. public void testFindReachableLocations_forPawn_whenInFirstRow (){
70. //GIVEN
71. board = new ChessBoard().
72. addPiece(Piece.BLACK_PAWN, Location.A7).
73. addPiece(Piece.BLACK_KING, Location.D4);
74.
75. //WHEN
76. List actualReachablePositions = testObj.findReachableLocations(board.getPieceOnLocation (Location.A7), board, null);
77.
78. //THEN
79. Assert.assertEquals(actualReachablePositions.size(), 2);
80. Assert.assertEquals(actualReachablePositions.get(0), Location.A6);
81. Assert.assertEquals(actualReachablePositions.get(1), Location.A5);
82. }
83.
84. @Test (groups="integration")
85. public void testFindReachableLocations_forPawn_enPassant (){
86. //GIVEN
87. board = new ChessBoard().
88. addPiece(Piece.BLACK_PAWN, Location.C5).
89. addPiece(Piece.WHITE_PAWN, Location.D5).
90. addPiece(Piece.WHITE_KING, Location.D4);
91.
92. Movement enPassantEnabler = new Movement(Piece.BLACK_PAWN, Location.C7, Location.C5);
93. //WHEN
94. List actualReachablePositions = testObj.findReachableLocations(board.getPieceOnLocation (Location.D5), board, enPassantEnabler);
95.
96. //THEN
97. Assert.assertEquals(actualReachablePositions.size(), 1);
98. Assert.assertEquals(actualReachablePositions.get(0), Location.C6);
99. }
100. }

Source Code:http://subversion.assembla.com/svn/making-good-software/tags/mgsChess/blog20110515/src/com/mgs/chess/core/

Sunday, June 12, 2011

Cryptographic Algorithm MD5 Exposed

Java simple class to compute MD5 hash

abstract
MD5 is a cryptographic message digest algorithm. MD5 hash considered to be one of the most widely-used used secure hashing functions, producing a 128-bit digest (32 hex numbers) from any data. While Java has built-in cryptographic checksum classes, it's quite uneasy to use them for a simple task -- calculate MD5 hash from string and return 32-byte hexadecimal representation.

compatible
Sun Java 1.4 or higher

We use MessageDigest class, first setting it to use MD5 algorithm, than feeding it with source data and getting byte array with hash value. To convert this array to hex-string we use our own method convertToHex.
Here is AeSimpleMD5 class, which single public static method MD5 returns hexadecimal string representation for MD5 hash of input argument:
source code: Java

import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

public class AeSimpleMD5 {

private static String convertToHex(byte[] data) {
StringBuffer buf = new StringBuffer();
for (int i = 0; i < data.length; i++) { int halfbyte = (data[i] >>> 4) & 0x0F;
int two_halfs = 0;
do {
if ((0 <= halfbyte) && (halfbyte <= 9))
buf.append((char) ('0' + halfbyte));
else
buf.append((char) ('a' + (halfbyte - 10)));
halfbyte = data[i] & 0x0F;
} while(two_halfs++ < 1);
}
return buf.toString();
}

public static String MD5(String text)
throws NoSuchAlgorithmException, UnsupportedEncodingException {
MessageDigest md;
md = MessageDigest.getInstance("MD5");
byte[] md5hash = new byte[32];
md.update(text.getBytes("iso-8859-1"), 0, text.length());
md5hash = md.digest();
return convertToHex(md5hash);
}
}

You may use this class directly calling this method
source code: Java

String MD5_ad1 = AeSimpleMD5.MD5("Java solutions class libraries");
String MD5_ad2 = AeSimpleMD5.MD5("Java Toolkits");

One more example, allows user to input string from console and prints MD5 hash
source code: Java

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
import java.security.NoSuchAlgorithmException;

public class Ex01 {

public static void main(String[] args) throws IOException {
BufferedReader userInput = new BufferedReader (new InputStreamReader(System.in));

System.out.println("Enter string:");
String rawString = userInput.readLine();

try {
System.out.println("MD5 hash of string: " + AeSimpleMD5.MD5(rawString));
} catch (NoSuchAlgorithmException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

warning
This class is suitable to calculate hash for small chunks of in-memory string data. To process large external files (like ISO images) you should use MessageDigest class directly
There may be difficulties while processing string data in encodings other than iso-8859-1

Sunday, June 5, 2011

Prime Checking Exposed-MiIler Rabin Algorithm

Miller Rabin Primality Test

Miller Rabin Primality Test is a probabilistic test to check whether a number is a prime or not. It relies on an equality or set of equalities that hold true for prime values, then checks whether or not they hold for a number that we want to test for primality.

Theory

1> Fermat’s little theorem states that if p is a prime and 1 ≤ a < p then
2> If p is a prime and or then or
3> If n is an odd prime then n-1 is an even number and can be written as . By Fermat’s Little Theorem either or for some 1 ≤ r ≤ s-1.
4> The Miller–Rabin primality test is based on the contrapositive of the above claim. That is, if we can find an a such that and for all 1 ≤ r ≤ s-1 then a is witness of compositeness of n and we can say n is not a prime. Otherwise, n may be a prime.
5> We test our number N for some random a and either declare that N is definitely a composite or probably a prime. The probably that a composite number is returned as prime after k itereations is

Algorithm

Input :A number N to be tested and a variable iteration-the number
of 'a' for which algorithm will test N.
Output :0 if N is definitely a composite and 1 if N is probably a prime.

Write N as
For each iteration
Pick a random a in [1,N-1]
x = mod n
if x =1 or x = n-1
Next iteration
for r = 1 to s-1
x = mod n
if x = 1
return false
if x = N-1
Next iteration

return true

public static class RabinMiller
{
public static bool isPrime(int p)
{
if(p<2)
{
return false;
}
if(p!=2 && p%2==0)
{
return false;
}
int s=p-1;
while(s%2==0)
{
s>>=1;
}
for (int i = 1; i < 11;i++)
{
Random r = new Random();
double a = r.Next((int)p - 1) + 1;
int temp = s;
int mod = (int)Math.Pow(a, (double)temp) % p;
while(temp!=p-1 && mod!=1 && mod!=p-1)
{
mod=(mod*mod)%p;
temp=temp*2;
}
if(mod!=p-1 && temp%2==0)
{
return false;
}
}
return true;
}
}

Wednesday, May 25, 2011

Advance Data Structure B-Tree, B+ Tree, RB Tree, AVL Tree & It Application

Introduction

B-Tree is a balanced m-way tree. This discussion from Wiki is a good material to introduce one to the characteristics and node layout of a B-Tree algorithm: http://en.wikipedia.org/wiki/B-tree.

This article discusses an in-memory B-Tree implementation. Although B-Tree is typically used in file/database system indexing (see my B-Tree disk based version for details @: http://www.4atech.net), there is a significant value in implementing a B-Tree based collection or dictionary in the .NET framework or in any language/platform for that matter.

Advantages of B-Tree In-memory

Typical in-memory sorted dictionary data structures today are based on the Binary Tree algorithm, which is not to be confused with B-Tree. Each node of a Binary Tree can contain a single item, whereas a B-Tree can contain a user defined number of items per node, and its nodes are kept balanced. This is a very important differentiation. Being that each node has a single item, storing a large number of items in a Binary Tree will generate a tall and narrow node graph with numerous nodes.

In a corresponding B-Tree implementation, the graph will tend to be shorter and wider with a lot fewer nodes. This is because a node in a B-Tree is typically configured to have numerous items; e.g., a B-Tree dictionary with 12 items will require a single node to contain the items, and a Binary Tree will require 12 nodes which can be scattered around in the heap (memory). Increasing our item sampling to thousands, hundreds of thousands, or millions, we're talking about a significant differentiation and significant optimization that a corresponding B-Tree based sorted dictionary can provide. A million single item node of a Binary Tree vs. around eighty three thousand of a B-Tree, for a 12 item node setup, and even far less if the user specifies more items per node than mentioned.

With this characterization, it is easy to imagine that a Binary Tree will tend to use more memory, and will tend to generate more memory fragmentation than a respective dictionary utilizing the B-Tree algorithm. And in production environments, we can't afford to have a system that is prone to memory fragmentation as in time, the application will degrade in performance and can cause out of memory conditions due to lack of contiguous allocable space.

More &v Detailed Information

Source
B-Tree http://www.codeproject.com/KB/collections/BTreeSortedDictionary.aspx
B+ Tree http://www.codeproject.com/KB/files/NTFSUndelete.aspx
RB Tree http://www.codeproject.com/KB/recipes/redblackcs.aspx
AVL Tree
1 http://www.codeproject.com/KB/architecture/avl_tree.aspx
2 http://www.codeproject.com/KB/architecture/avl_cpp.aspx
3 http://www.codeproject.com/KB/cpp/avltree.aspx

Saturday, May 21, 2011

CRC Implementation Code in C

CRCs are among the best checksums available to detect and/or correct errors in communications transmissions. Unfortunately, the modulo-2 arithmetic used to compute CRCs doesn't map easily into software. This article shows how to implement an efficient CRC in C.

I'm going to complete my discussion of checksums by showing you how to implement CRCs in software. I'll start with a naïve implementation and gradually improve the efficiency of the code as I go along. However, I'm going to keep the discussion at the level of the C language, so further steps could be taken to improve the efficiency of the final code simply by moving into the assembly language of your particular processor.

For most software engineers, the overwhelmingly confusing thing about CRCs is their implementation. Knowing that all CRC algorithms are simply long division algorithms in disguise doesn't help. Modulo-2 binary division doesn't map particularly well to the instruction sets of off-the-shelf processors. For one thing, generally no registers are available to hold the very long bit sequence that is the numerator. For another, modulo-2 binary division is not the same as ordinary division. So even if your processor has a division instruction, you won't be able to use it.
Modulo-2 binary division

Before writing even one line of code, let's first examine the mechanics of modulo-2 binary division. We'll use the example in Figure 1 to guide us. The number to be divided is the message augmented with zeros at the end. The number of zero bits added to the message is the same as the width of the checksum (what I call c); in this case four bits were added. The divisor is a c+1-bit number known as the generator polynomial.

Modulo-2 Binary Division Example

Figure 1. An example of modulo-2 binary division

The modulo-2 division process is defined as follows:

* Call the uppermost c+1 bits of the message the remainder
* Beginning with the most significant bit in the original message and for each bit position that follows, look at the c+1 bit remainder:
o If the most significant bit of the remainder is a one, the divisor is said to divide into it. If that happens (just as in any other long division) it is necessary to indicate a successful division in the appropriate bit position in the quotient and to compute the new remainder. In the case of modulo-2 binary division, we simply:
+ Set the appropriate bit in the quotient to a one, and
+ XOR the remainder with the divisor and store the result back into the remainder
o Otherwise (if the first bit is not a one):
+ Set the appropriate bit in the quotient to a zero, and
+ XOR the remainder with zero (no effect)
o Left-shift the remainder, shifting in the next bit of the message. The bit that's shifted out will always be a zero, so no information is lost.

The final value of the remainder is the CRC of the given message.

What's most important to notice at this point is that we never use any of the information in the quotient, either during or after computing the CRC. So we won't actually need to track the quotient in our software implementation. Also note here that the result of each XOR with the generator polynomial is a remainder that has zero in its most significant bit. So we never lose any information when the next message bit is shifted into the remainder.
Bit by bit

Listing 1 contains a naive software implementation of the CRC computation just described. It simply attempts to implement that algorithm as it was described above for this one particular generator polynomial. Even though the unnecessary steps have been eliminated, it's extremely inefficient. Multiple C statements (at least the decrement and compare, binary AND, test for zero, and left shift operations) must be executed for each bit in the message. Given that this particular message is only eight bits long, that might not seem too costly. But what if the message contains several hundred bytes, as is typically the case in a real-world application? You don't want to execute dozens of processor opcodes for each byte of input data.

#define POLYNOMIAL 0xD8 /* 11011 followed by 0's */

uint8_t
crcNaive(uint8_t const message)
{
uint8_t remainder;

/*
* Initially, the dividend is the remainder.
*/
remainder = message;

/*
* For each bit position in the message....
*/
for (uint8_t bit = 8; bit > 0; --bit)
{
/*
* If the uppermost bit is a 1...
*/
if (remainder & 0x80)
{
/*
* XOR the previous remainder with the divisor.
*/
remainder ^= POLYNOMIAL;
}

/*
* Shift the next bit of the message into the remainder.
*/
remainder = (remainder << 1);
}

/*
* Return only the relevant bits of the remainder as CRC.
*/
return (remainder >> 4);

} /* crcNaive() */

Listing 1. A naive CRC implementation in C
Code clean up

Before we start making this more efficient, the first thing to do is to clean this naive routine up a bit. In particular, let's start making some assumptions about the applications in which it will most likely be used. First, let's assume that our CRCs are always going to be 8-, 16-, or 32-bit numbers. In other words, that the remainder can be manipulated easily in software. That means that the generator polynomials will be 9, 17, or 33 bits wide, respectively. At first it seems we may be stuck with unnatural sizes and will need special register combinations, but remember these two facts:

* The most significant bit of any generator polynomial is always a one
* The uppermost bit of the XOR result is always zero and promptly shifted out of the remainder

Since we already have the information in the uppermost bit and we don't need it for the XOR, the polynomial can also be stored in an 8-, 16-, or 32-bit register. We can simply discard the most significant bit. The register size that we use will always be equal to the width of the CRC we're calculating.

As long as we're cleaning up the code, we should also recognize that most CRCs are computed over fairly long messages. The entire message can usually be treated as an array of unsigned data bytes. The CRC algorithm should then be iterated over all of the data bytes, as well as the bits within those bytes.

The result of making these two changes is the code shown in Listing 2. This implementation of the CRC calculation is still just as inefficient as the previous one. However, it is far more portable and can be used to compute a number of different CRCs of various widths.

/*
* The width of the CRC calculation and result.
* Modify the typedef for a 16 or 32-bit CRC standard.
*/
typedef uint8_t crc;

#define WIDTH (8 * sizeof(crc))
#define TOPBIT (1 << (WIDTH - 1))

crc
crcSlow(uint8_t const message[], int nBytes)
{
crc remainder = 0;

/*
* Perform modulo-2 division, a byte at a time.
*/
for (int byte = 0; byte < nBytes; ++byte)
{
/*
* Bring the next byte into the remainder.
*/
remainder ^= (message[byte] << (WIDTH - 8));

/*
* Perform modulo-2 division, a bit at a time.
*/
for (uint8_t bit = 8; bit > 0; --bit)
{
/*
* Try to divide the current data bit.
*/
if (remainder & TOPBIT)
{
remainder = (remainder << 1) ^ POLYNOMIAL;
}
else
{
remainder = (remainder << 1);
}
}
}

/*
* The final remainder is the CRC result.
*/
return (remainder);

} /* crcSlow() */

Listing 2. A more portable but still naive CRC implementation
Byte by byte

The most common way to improve the efficiency of the CRC calculation is to throw memory at the problem. For a given input remainder and generator polynomial, the output remainder will always be the same. If you don't believe me, just reread that sentence as "for a given dividend and divisor, the remainder will always be the same." It's true. So it's possible to precompute the output remainder for each of the possible byte-wide input remainders and store the results in a lookup table. That lookup table can then be used to speed up the CRC calculations for a given message. The speedup is realized because the message can now be processed byte by byte, rather than bit by bit.

The code to precompute the output remainders for each possible input byte is shown in Listing 3. The computed remainder for each possible byte-wide dividend is stored in the array crcTable[]. In practice, the crcInit() function could either be called during the target's initialization sequence (thus placing crcTable[] in RAM) or it could be run ahead of time on your development workstation with the results stored in the target device's ROM.

crc crcTable[256];

void
crcInit(void)
{
crc remainder;

/*
* Compute the remainder of each possible dividend.
*/
for (int dividend = 0; dividend < 256; ++dividend)
{
/*
* Start with the dividend followed by zeros.
*/
remainder = dividend << (WIDTH - 8);

/*
* Perform modulo-2 division, a bit at a time.
*/
for (uint8_t bit = 8; bit > 0; --bit)
{
/*
* Try to divide the current data bit.
*/
if (remainder & TOPBIT)
{
remainder = (remainder << 1) ^ POLYNOMIAL;
}
else
{
remainder = (remainder << 1);
}
}

/*
* Store the result into the table.
*/
crcTable[dividend] = remainder;
}

} /* crcInit() */

Listing 3. Computing the CRC lookup table

Of course, whether it is stored in RAM or ROM, a lookup table by itself is not that useful. You'll also need a function to compute the CRC of a given message that is somehow able to make use of the values stored in that table. Without going into all of the mathematical details of why this works, suffice it to say that the previously complicated modulo-2 division can now be implemented as a series of lookups and XORs. (In modulo-2 arithmetic, XOR is both addition and subtraction.)

A function that uses the lookup table contents to compute a CRC more efficiently is shown in Listing 4. The amount of processing to be done for each byte is substantially reduced.

crc
crcFast(uint8_t const message[], int nBytes)
{
uint8_t data;
crc remainder = 0;

/*
* Divide the message by the polynomial, a byte at a time.
*/
for (int byte = 0; byte < nBytes; ++byte)
{
data = message[byte] ^ (remainder >> (WIDTH - 8));
remainder = crcTable[data] ^ (remainder << 8);
}

/*
* The final remainder is the CRC.
*/
return (remainder);

} /* crcFast() */

Listing 4. A more efficient, table-driven, CRC implementation

As you can see from the code in Listing 4, a number of fundamental operations (left and right shifts, XORs, lookups, and so on) still must be performed for each byte even with this lookup table approach. So to see exactly what has been saved (if anything) I compiled both crcSlow() and crcFast() with IAR's C compiler for the PIC family of eight-bit RISC processors. 1 I figured that compiling for such a low-end processor would give us a good worst-case comparison for the numbers of instructions to do these different types of CRC computations. The results of this experiment were as follows:

* crcSlow(): 185 instructions per byte of message data
* crcFast(): 36 instructions per byte of message data

So, at least on one processor family, switching to the lookup table approach results in a more than five-fold performance improvement. That's a pretty substantial gain considering that both implementations were written in C. A bit more could probably be done to improve the execution speed of this algorithm if an engineer with a good understanding of the target processor were assigned to hand-code or tune the assembly code. My somewhat-educated guess is that another two-fold performance improvement might be possible. Actually achieving that is, as they say in textbooks, left as an exercise for the curious reader.
CRC standards and parameters

Now that we've got our basic CRC implementation nailed down, I want to talk about the various types of CRCs that you can compute with it. As I mentioned last month, several mathematically well understood and internationally standardized CRC generator polynomials exist and you should probably choose one of those, rather than risk inventing something weaker.

In addition to the generator polynomial, each of the accepted CRC standards also includes certain other parameters that describe how it should be computed. Table 1 contains the parameters for three of the most popular CRC standards. Two of these parameters are the "initial remainder" and the "final XOR value". The purpose of these two c-bit constants is similar to the final bit inversion step added to the sum-of-bytes checksum algorithm. Each of these parameters helps eliminate one very special, though perhaps not uncommon, class of ordinarily undetectable difference. In effect, they bulletproof an already strong checksum algorithm.

CRC-CCITT

CRC-16

CRC-32
Width

16 bits

16 bits

32 bits
(Truncated) Polynomial

0x1021

0x8005

0x04C11DB7
Initial Remainder

0xFFFF

0x0000

0xFFFFFFFF
Final XOR Value

0x0000

0x0000

0xFFFFFFFF
Reflect Data?

No

Yes

Yes
Reflect Remainder?

No

Yes

Yes
Check Value

0x29B1

0xBB3D

0xCBF43926

Table 1. Computational parameters for popular CRC standards

To see what I mean, consider a message that begins with some number of zero bits. The remainder will never contain anything other than zero until the first one in the message is shifted into it. That's a dangerous situation, since packets beginning with one or more zeros may be completely legitimate and a dropped or added zero would not be noticed by the CRC. (In some applications, even a packet of all zeros may be legitimate!) The simple way to eliminate this weakness is to start with a nonzero remainder. The parameter called initial remainder tells you what value to use for a particular CRC standard. And only one small change is required to the crcSlow() and crcFast() functions:

crc remainder = INITIAL_REMAINDER;

The final XOR value exists for a similar reason. To implement this capability, simply change the value that's returned by crcSlow() and crcFast() as follows:

return (remainder ^ FINAL_XOR_VALUE);

If the final XOR value consists of all ones (as it does in the CRC-32 standard), this extra step will have the same effect as complementing the final remainder. However, implementing it this way allows any possible value to be used in your specific application.

In addition to these two simple parameters, two others exist that impact the actual computation. These are the binary values "reflect data" and "reflect remainder". The basic idea is to reverse the bit ordering of each byte within the message and/or the final remainder. The reason this is sometimes done is that a good number of the hardware CRC implementations operate on the "reflected" bit ordering of bytes that is common with some UARTs. Two slight modifications of the code are required to prepare for these capabilities.

What I've generally done is to implement one function and two macros. This code is shown in Listing 5. The function is responsible for reflecting a given bit pattern. The macros simply call that function in a certain way.

#define REFLECT_DATA(X) ((uint8_t) reflect((X), 8))
#define REFLECT_REMAINDER(X) ((crc) reflect((X), WIDTH))

uint32_t
reflect(uint32_t data, uint8_t nBits)
{
uint32_t reflection = 0;

/*
* Reflect the data about the center bit.
*/
for (uint8_t bit = 0; bit < nBits; ++bit)
{
/*
* If the LSB bit is set, set the reflection of it.
*/
if (data & 0x01)
{
reflection |= (1 << ((nBits - 1) - bit));
}

data = (data >> 1);
}

return (reflection);

} /* reflect() */

Listing 5. Reflection macros and function

By inserting the macro calls at the two points that reflection may need to be done, it is easier to turn reflection on and off. To turn either kind of reflection off, simply redefine the appropriate macro as (X). That way, the unreflected data byte or remainder will be used in the computation, with no overhead cost. Also note that for efficiency reasons, it may be desirable to compute the reflection of all of the 256 possible data bytes in advance and store them in a table, then redefine the REFLECT_DATA() macro to use that lookup table.

Tested, full-featured implementations of both crcSlow() and crcFast() are available for download. These implementations include the reflection capabilities just described and can be used to implement any parameterized CRC formula. Simply change the constants and macros as necessary.

The final parameter that I've included in Table 1 is a "check value" for each CRC standard. This is the CRC result that's expected for the simple ASCII test message "123456789." To test your implementation of a particular standard, simply invoke your CRC computation on that message and check the result:

crcInit();

checksum = crcFast("123456789", 9);

If checksum has the correct value after this call, then you know your implementation is correct. This is a handy way to ensure compatibility between two communicating devices with different CRC implementations or implementors.

Additive CheckSum Algorithm

Whenever you connect two or more computers together with the intent of exchanging information, you assume that the exchange will take place without errors. But what if some of the data is lost or corrupted in transit? Communication protocols usually attempt to detect such errors automatically. To do that they use checksums.

The most important part of listening to someone speak is ensuring that you've heard them correctly. Your brain performs the tasks of error detection and correction for you, automatically. It does this by examining extra bits of information from the speaker and the speech; if a phrase or sentence makes sense as a whole and it makes sense coming from the mouth of the particular speaker, then the individual words were probably heard correctly. The same principle applies when you are reading. But what happens when computers are communicating with one another? How does the receiving computer know if an error has occurred in transmission?

Establishing correctness is more difficult for computers than humans. At the lowest level, communication between computers consists of nothing but a stream of binary digits. Meaning is only assigned to that particular sequence of bits at higher levels. We call that meaningful sequence of bits the message; it is analogous to a spoken or written phrase. If one or more bits within the message are inverted (a logic one becomes a logic zero, or vice versa) as it travels between computers, the receiver has no way to detect the error. No environmental or syntactical context is available to the receiver, since it cannot understand the message in its transmitted form.
Achieving parity

If we want communicating computers to detect and correct transmission errors automatically, we must provide a replacement for context. This usually takes the form of an error correction code or error detection code. A simple type of error detection code that you are probably already familiar with is called a parity bit. A parity bit is a single, extra binary digit that is appended to the message by the sender and transmitted along with it. Depending on the type of parity used, the parity bit ensures that the total number of logic ones sent is even (even parity) or odd (odd parity). For an example of even parity, consider the sequence:

10101110 1

in which the eight-bit message contains five ones and three zeros. The parity bit is one in this case to force the total number of ones in the transmitted data stream to be even.

When a parity bit is appended to a message, one additional bit of data must be sent between the computers. So there must be some benefit associated with the lost bandwidth. The most obvious benefit is that if any single bit in the message is inverted during transmission, the number of ones will either increase or decrease by one, thus making the received number of ones odd and the parity incorrect. The receiver can now detect such an error automatically and request a retransmission of the entire stream of bits. Interestingly, the receiver can also now detect any odd number of bit inversions. (Note that all of this still applies even when the parity bit is one of the inverted bits.)

However, as you may have already noticed, if two bits are inverted (two ones become zeros, for example), the parity of the stream of bits will not change; a receiver will not be able to detect such errors. In fact, the same is true for any even number of bit inversions. A parity bit is a weak form of error detection code. It has a small cost (one bit per message), but it is unable to detect many types of possible errors. (By the way, odd parity has the same costs, benefits, and weaknesses as even parity.)

Perhaps this is not acceptable for your application. An alternative that might occur to you is to send the entire message twice. If you're already spending bandwidth sending an error detection code, why not spend half of the bandwidth? The problem is that you could actually be spending much more than half of the available bandwidth. If a computer receives two copies of a message and the bits that comprise them aren't exactly the same, it cannot tell which one of the two is correct. In fact, it may be that both copies contain errors. So the receiver must request a retransmission whenever there is a discrepancy between the message and the copy sent as an error detection code.

With that in mind, let's compare the bandwidth costs of using a parity bit vs. resending the entire message. We'll assume that all messages are 1,000-bits long and that the communications channel has a bit error rate (average number of inverted bits) of one per 10,000 bits sent. Using a parity bit, we'd spend 0.1% of our bandwidth sending the error detection code (one bit of error protection for 1,000 bits of message) and have to retransmit one out of every 10 messages due to errors. If we send two complete copies of each message instead, the smallest unit of transmission is 2,000 bits (50% of our bandwidth is now spent sending the error detection code). In addition, we'll have to retransmit one out of every five messages. Therefore, the achievable bandwidths are approximately 90% and 40%, respectively. As the width of the code increases, the message plus code lengthens and becomes more vulnerable to bit errors and, as a result, expensive retransmissions.

From this type of analysis it should be clear that keeping the size of an error detection code as small as possible is a good thing, even if it does mean that some types of errors will be undetectable. (Note that even sending two copies of the message is not a perfect solution. If the same bit errors should happen to occur in both copies, the errors will not be detected by the receiver.) In addition to reducing the bandwidth cost of transmitting the error detection code itself, this also increases the message throughput. But we still don't want to go so far as using a one-bit error detection scheme like parity. That would let too many possible errors escape detection.

In practice, of course, we can achieve far better error detection capabilities than just odd numbers of inverted bits. But in order to do so we have to use error detection codes that are more complex than simple parity, and also contain more bits. I'll spend the remainder of this column and most of the next two discussing the strengths and weaknesses of various types of checksums, showing you how to compute them, and explaining how each can be best employed for purposes of error detection.
Checksums

As the name implies, a checksum usually involves a summation of one sort or another. For example, one of the most common checksum algorithms involves treating the message like a sequence of bytes and summing them. Listing 1 shows how this simple algorithm might be implemented in C.

uint8_t
Sum(uint8_t const message[], int nBytes)
{
uint8_t sum = 0;

while (nBytes-- > 0)
{
sum += *(message++);
}

return (sum);

} /* Sum() */

Listing 1. A sum-of-bytes checksum

The sum-of-bytes algorithm is straightforward to compute. However, understanding its strengths and weaknesses as a checksum is more difficult. What types of errors does it detect? What errors is it unable to detect? These are important factors to consider whenever you are selecting a checksum algorithm for a particular application. You want the algorithm you choose to be well matched to the types of errors you expect to occur in your transmission environment. For example, a parity bit would be a poor choice for a checksum if bit errors will frequently occur in pairs.

A noteworthy weakness of the sum-of-bytes algorithm is that no error will be detected if the entire message and data are somehow received as a string of all zeros. (A message of all zeros is a possibility, and the sum of a large block of zeros is still zero.) The simplest way to overcome this weakness is to add a final step to the checksum algorithm: invert the bits of the final sum. The new proper checksum for a message of all zeros would be FFh. That way, if the message and checksum are both all zeros, the receiver will know that something's gone terribly wrong. A modified version of the checksum implementation is shown in Listing 2.

uint8_t
SumAndInvert(uint8_t const message[], int nBytes)
{
uint8_t sum = 0;

while (nBytes-- > 0)
{
sum += *(message++);
}

return (~sum);

} /* SumAndInvert() */

Listing 2. A slightly more robust sum-of-bytes checksum algorithm

This final inversion does not affect any of the other error detection capabilities of this checksum, so let's go back to discussing the basic sum-of-bytes algorithm in Listing 1. First, it should be obvious that any single bit inversion in the message or checksum will be detectable. Such an error will always affect at least one bit within the checksum. (It could affect more, of course, but not less.) Observe that the sum-of-bytes is performed by essentially lining up all of the bytes that comprise the message and performing addition on them, like this:

10110101
11000000
00001101
+ 11100011
----------
01100101

Because of this mathematical arrangement, simultaneous single-bit inversions could occur in each of any number of the "columns." At least one bit of the checksum will always be affected. No matter how the inversions occur, at least the lowest-order column with an error will alter the checksum. This is an important point, because it helps us to understand the broader class of errors that can be detected by this type of checksum. I'll say it again: no matter how the inversions occur, at least the lowest order column with an error will alter the checksum, which means that two or more bit inversions in higher-order columns may cancel each other out. As long as the lowest-order column with an error has only one, it doesn't matter what other errors may be hidden within the message.

Now let's step back for a minute and talk about errors more formally. What exactly does a transmission error look like? Well, the first and most obvious type of error is a single bit that is inverted. That happens sometimes and is easy to detect (even a parity bit will do the job). Other times, bit inversions are part of an error burst. Not all of the bits within an error burst must be inverted. The defining characteristic is simply an inverted bit on each end. So an error burst may be 200 bits long, but contain just two inverted bits--one at each end.

A sum-of-bytes checksum will detect the vast majority of error bursts, no matter what their length. However, describing exactly which ones is generally difficult. Only one class of error bursts is always detectable: those of length eight bits or less. As long as the two inverted bits that bound the error burst have no more than six bits between them, the error will always be detected by this algorithm. That's because our previous requirement for error detection--that the lowest-order column with an error have only one error--is always met when the length of the error burst is no longer than the width of the checksum. We can know with certainty that such errors will always be detected.

Of course, many longer error bursts will also be detected. The probability of detecting a random error burst longer than eight bits is 99.6%. Errors will only be overlooked if the modified message has exactly the same sum as the original one, for which there is a 2-8 chance. That's much better error detection than a simple parity bit, and for not too much more cost.

The sum-of-bytes algorithm becomes even more powerful when the width of the checksum is increased. In other words, increasing the number of bits in the checksum causes it to detect even more types of errors. A 16-bit sum-of-words checksum will detect all single bit errors and all error bursts of length 16 bits or fewer. It will also detect 99.998% of longer error bursts. A 32-bit sum will detect even more errors. In practice, this increase in performance must be weighed against the increased cost of sending the extra checksum bits as part of every exchange of data.
Internet checksum

Many protocol stacks include some sort of a checksum within each protocol layer. The TCP/IP suite of protocols is no exception in this regard. In addition to a checksum at the lowest layer (within Ethernet packets, for example), checksums also exist within each IP, UDP, and TCP header. Figure 1 shows what some of these headers look like in the case of some data sent via UDP/IP. Here the fields of the IP header are summed to generate the 16-bit IP checksum and the data, fields of the UDP header, and certain fields of the IP header are summed to generate the 16-bit UDP checksum.

Figure 1. UDP and IP headers with checksum fields highlighted

A function that calculates the IP header checksum is shown in Listing 3. This function can be used before sending an IP packet or immediately after receiving one. If the packet is about to be sent, the checksum field should be set to zero before calculating the checksum and filled with the returned value afterward. If the packet has just been received, the checksum routine is expected to return a value of 0xFFFF to indicate that the IP header was received correctly. This result is a property of the type of addition used to compute the IP checksum.

uint16_t
NetIpChecksum(uint16_t const ipHeader[], int nWords)
{
uint16_t sum = 0;

/*
* IP headers always contain an even number of bytes.
*/
while (nWords-- > 0)
{
sum += *(ipHeader++);
}

/*
* Use carries to compute 1's complement sum.
*/
sum = (sum >> 16) + (sum & 0xFFFF);
sum += sum >> 16;

/*
* Return the inverted 16-bit result.
*/
return ((unsigned short) ~sum);

} /* NetIpChecksum() */

Listing 3. IP header checksum calculation

The IP header to be checksummed should be passed to NetIpChecksum() as an array of 16-bit words. Since the length of an IP header is always a multiple of four bytes, it is sufficient to provide the header length as a number of 16-bit words. The checksum is then computed by the function and returned to the caller for insertion into the header for validation of the header contents.

When you first look at this function, you may be overcome with a feeling of deja vu. The IP checksum algorithm begins in much the same way as the sum-of-bytes algorithm I discussed earlier. However, this algorithm is actually different. First, of course, we're computing a 16-bit checksum instead of an eight-bit one, so we're summing words rather than bytes. That difference is obvious. Less obvious is that we're actually computing a ones complement sum.

Recall that most computers store integers in a twos complement representation. Simply adding two integers, as we did in the previous algorithm, will therefore result in a twos complement sum. In order to compute the ones complement sum, we need to perform our addition with "end around carry." This means that carries out of the most significant bit (MSB) are not discarded, as they were previously. Instead, carries are added back into the checksum at the least significant bit (LSB). This could be done after each addition, but testing for a carry after each addition is expensive in C. A faster way is to let the carries accumulate in the upper half of a 32-bit accumulator. Once the sum-of-words is complete, the upper half of the 32-bit accumulator is turned into a 16-bit value and added to the 16-bit twos complement sum (the lower half). One final carry is possible at that point, and must be included in the final sum. The IP checksum is the inverted ones complement sum of all of the words in the IP header.

For checksum purposes, ones complement arithmetic has an important advantage over twos complement arithmetic. Recall that the biggest weakness of a parity bit is that it can't detect a pair of bit errors or any even number of errors, for that matter. A twos complement sum suffers from a similar weakness. Only one of the bits in the sum (the MSB) is affected by changes in the most significant of the 16 columns. If an even number of bit errors occurs within that column, the checksum will appear to be correct and the error will not be detected. A ones complement sum does not suffer this weakness.

Because carries out of the MSB are added back into the LSB during a ones complement sum, errors in any one column of the data will be reflected in more than one bit of the checksum. So a ones complement sum is a stronger checksum (for example, will detect more errors) than a twos complement sum, and only slightly more expensive. Hence, it is chosen for use in a lot of different situations.

The checksums within the UDP and TCP headers are computed in exactly the same way as the IP checksum. In fact, the only major difference is the set of words over which the sum is calculated. (A minor difference is that UDP checksums are optional.) In both cases, these layer 4 protocols include the message, their own header fields, and a few important fields of the IP header in the checksum calculation. The inclusion of some IP header fields in the UDP and TCP checksums is one of the biggest reasons that these layers of the protocol stack cannot be completely separated from the IP layer in a software implementation. The reason that IP checksums include only the IP header is that many intermediate computers must validate, read, and modify the contents of the IP header as a packet travels from its source to the destination. By reducing the number of words involved in the calculation, the speed with which all IP packets move across the Internet is improved. Including a larger set of words in the UDP and TCP checksums does not slow down communications; rather, it increases the likelihood that the message received is the same as the message sent.

Source http://www.netrino.com/Embedded-Systems/How-To/Additive-Checksums