I, Me, MySelf and CS Problems

Wednesday, May 25, 2011

Advance Data Structure B-Tree, B+ Tree, RB Tree, AVL Tree & It Application

Introduction

B-Tree is a balanced m-way tree. This discussion from Wiki is a good material to introduce one to the characteristics and node layout of a B-Tree algorithm: http://en.wikipedia.org/wiki/B-tree.

This article discusses an in-memory B-Tree implementation. Although B-Tree is typically used in file/database system indexing (see my B-Tree disk based version for details @: http://www.4atech.net), there is a significant value in implementing a B-Tree based collection or dictionary in the .NET framework or in any language/platform for that matter.

Advantages of B-Tree In-memory

Typical in-memory sorted dictionary data structures today are based on the Binary Tree algorithm, which is not to be confused with B-Tree. Each node of a Binary Tree can contain a single item, whereas a B-Tree can contain a user defined number of items per node, and its nodes are kept balanced. This is a very important differentiation. Being that each node has a single item, storing a large number of items in a Binary Tree will generate a tall and narrow node graph with numerous nodes.

In a corresponding B-Tree implementation, the graph will tend to be shorter and wider with a lot fewer nodes. This is because a node in a B-Tree is typically configured to have numerous items; e.g., a B-Tree dictionary with 12 items will require a single node to contain the items, and a Binary Tree will require 12 nodes which can be scattered around in the heap (memory). Increasing our item sampling to thousands, hundreds of thousands, or millions, we're talking about a significant differentiation and significant optimization that a corresponding B-Tree based sorted dictionary can provide. A million single item node of a Binary Tree vs. around eighty three thousand of a B-Tree, for a 12 item node setup, and even far less if the user specifies more items per node than mentioned.

With this characterization, it is easy to imagine that a Binary Tree will tend to use more memory, and will tend to generate more memory fragmentation than a respective dictionary utilizing the B-Tree algorithm. And in production environments, we can't afford to have a system that is prone to memory fragmentation as in time, the application will degrade in performance and can cause out of memory conditions due to lack of contiguous allocable space.

More &v Detailed Information

Source
B-Tree http://www.codeproject.com/KB/collections/BTreeSortedDictionary.aspx
B+ Tree http://www.codeproject.com/KB/files/NTFSUndelete.aspx
RB Tree http://www.codeproject.com/KB/recipes/redblackcs.aspx
AVL Tree
1 http://www.codeproject.com/KB/architecture/avl_tree.aspx
2 http://www.codeproject.com/KB/architecture/avl_cpp.aspx
3 http://www.codeproject.com/KB/cpp/avltree.aspx

Saturday, May 21, 2011

CRC Implementation Code in C

CRCs are among the best checksums available to detect and/or correct errors in communications transmissions. Unfortunately, the modulo-2 arithmetic used to compute CRCs doesn't map easily into software. This article shows how to implement an efficient CRC in C.

I'm going to complete my discussion of checksums by showing you how to implement CRCs in software. I'll start with a naïve implementation and gradually improve the efficiency of the code as I go along. However, I'm going to keep the discussion at the level of the C language, so further steps could be taken to improve the efficiency of the final code simply by moving into the assembly language of your particular processor.

For most software engineers, the overwhelmingly confusing thing about CRCs is their implementation. Knowing that all CRC algorithms are simply long division algorithms in disguise doesn't help. Modulo-2 binary division doesn't map particularly well to the instruction sets of off-the-shelf processors. For one thing, generally no registers are available to hold the very long bit sequence that is the numerator. For another, modulo-2 binary division is not the same as ordinary division. So even if your processor has a division instruction, you won't be able to use it.
Modulo-2 binary division

Before writing even one line of code, let's first examine the mechanics of modulo-2 binary division. We'll use the example in Figure 1 to guide us. The number to be divided is the message augmented with zeros at the end. The number of zero bits added to the message is the same as the width of the checksum (what I call c); in this case four bits were added. The divisor is a c+1-bit number known as the generator polynomial.

Modulo-2 Binary Division Example

Figure 1. An example of modulo-2 binary division

The modulo-2 division process is defined as follows:

* Call the uppermost c+1 bits of the message the remainder
* Beginning with the most significant bit in the original message and for each bit position that follows, look at the c+1 bit remainder:
o If the most significant bit of the remainder is a one, the divisor is said to divide into it. If that happens (just as in any other long division) it is necessary to indicate a successful division in the appropriate bit position in the quotient and to compute the new remainder. In the case of modulo-2 binary division, we simply:
+ Set the appropriate bit in the quotient to a one, and
+ XOR the remainder with the divisor and store the result back into the remainder
o Otherwise (if the first bit is not a one):
+ Set the appropriate bit in the quotient to a zero, and
+ XOR the remainder with zero (no effect)
o Left-shift the remainder, shifting in the next bit of the message. The bit that's shifted out will always be a zero, so no information is lost.

The final value of the remainder is the CRC of the given message.

What's most important to notice at this point is that we never use any of the information in the quotient, either during or after computing the CRC. So we won't actually need to track the quotient in our software implementation. Also note here that the result of each XOR with the generator polynomial is a remainder that has zero in its most significant bit. So we never lose any information when the next message bit is shifted into the remainder.
Bit by bit

Listing 1 contains a naive software implementation of the CRC computation just described. It simply attempts to implement that algorithm as it was described above for this one particular generator polynomial. Even though the unnecessary steps have been eliminated, it's extremely inefficient. Multiple C statements (at least the decrement and compare, binary AND, test for zero, and left shift operations) must be executed for each bit in the message. Given that this particular message is only eight bits long, that might not seem too costly. But what if the message contains several hundred bytes, as is typically the case in a real-world application? You don't want to execute dozens of processor opcodes for each byte of input data.

#define POLYNOMIAL 0xD8 /* 11011 followed by 0's */

uint8_t
crcNaive(uint8_t const message)
{
uint8_t remainder;

/*
* Initially, the dividend is the remainder.
*/
remainder = message;

/*
* For each bit position in the message....
*/
for (uint8_t bit = 8; bit > 0; --bit)
{
/*
* If the uppermost bit is a 1...
*/
if (remainder & 0x80)
{
/*
* XOR the previous remainder with the divisor.
*/
remainder ^= POLYNOMIAL;
}

/*
* Shift the next bit of the message into the remainder.
*/
remainder = (remainder << 1);
}

/*
* Return only the relevant bits of the remainder as CRC.
*/
return (remainder >> 4);

} /* crcNaive() */

Listing 1. A naive CRC implementation in C
Code clean up

Before we start making this more efficient, the first thing to do is to clean this naive routine up a bit. In particular, let's start making some assumptions about the applications in which it will most likely be used. First, let's assume that our CRCs are always going to be 8-, 16-, or 32-bit numbers. In other words, that the remainder can be manipulated easily in software. That means that the generator polynomials will be 9, 17, or 33 bits wide, respectively. At first it seems we may be stuck with unnatural sizes and will need special register combinations, but remember these two facts:

* The most significant bit of any generator polynomial is always a one
* The uppermost bit of the XOR result is always zero and promptly shifted out of the remainder

Since we already have the information in the uppermost bit and we don't need it for the XOR, the polynomial can also be stored in an 8-, 16-, or 32-bit register. We can simply discard the most significant bit. The register size that we use will always be equal to the width of the CRC we're calculating.

As long as we're cleaning up the code, we should also recognize that most CRCs are computed over fairly long messages. The entire message can usually be treated as an array of unsigned data bytes. The CRC algorithm should then be iterated over all of the data bytes, as well as the bits within those bytes.

The result of making these two changes is the code shown in Listing 2. This implementation of the CRC calculation is still just as inefficient as the previous one. However, it is far more portable and can be used to compute a number of different CRCs of various widths.

/*
* The width of the CRC calculation and result.
* Modify the typedef for a 16 or 32-bit CRC standard.
*/
typedef uint8_t crc;

#define WIDTH (8 * sizeof(crc))
#define TOPBIT (1 << (WIDTH - 1))

crc
crcSlow(uint8_t const message[], int nBytes)
{
crc remainder = 0;

/*
* Perform modulo-2 division, a byte at a time.
*/
for (int byte = 0; byte < nBytes; ++byte)
{
/*
* Bring the next byte into the remainder.
*/
remainder ^= (message[byte] << (WIDTH - 8));

/*
* Perform modulo-2 division, a bit at a time.
*/
for (uint8_t bit = 8; bit > 0; --bit)
{
/*
* Try to divide the current data bit.
*/
if (remainder & TOPBIT)
{
remainder = (remainder << 1) ^ POLYNOMIAL;
}
else
{
remainder = (remainder << 1);
}
}
}

/*
* The final remainder is the CRC result.
*/
return (remainder);

} /* crcSlow() */

Listing 2. A more portable but still naive CRC implementation
Byte by byte

The most common way to improve the efficiency of the CRC calculation is to throw memory at the problem. For a given input remainder and generator polynomial, the output remainder will always be the same. If you don't believe me, just reread that sentence as "for a given dividend and divisor, the remainder will always be the same." It's true. So it's possible to precompute the output remainder for each of the possible byte-wide input remainders and store the results in a lookup table. That lookup table can then be used to speed up the CRC calculations for a given message. The speedup is realized because the message can now be processed byte by byte, rather than bit by bit.

The code to precompute the output remainders for each possible input byte is shown in Listing 3. The computed remainder for each possible byte-wide dividend is stored in the array crcTable[]. In practice, the crcInit() function could either be called during the target's initialization sequence (thus placing crcTable[] in RAM) or it could be run ahead of time on your development workstation with the results stored in the target device's ROM.

crc crcTable[256];

void
crcInit(void)
{
crc remainder;

/*
* Compute the remainder of each possible dividend.
*/
for (int dividend = 0; dividend < 256; ++dividend)
{
/*
* Start with the dividend followed by zeros.
*/
remainder = dividend << (WIDTH - 8);

/*
* Perform modulo-2 division, a bit at a time.
*/
for (uint8_t bit = 8; bit > 0; --bit)
{
/*
* Try to divide the current data bit.
*/
if (remainder & TOPBIT)
{
remainder = (remainder << 1) ^ POLYNOMIAL;
}
else
{
remainder = (remainder << 1);
}
}

/*
* Store the result into the table.
*/
crcTable[dividend] = remainder;
}

} /* crcInit() */

Listing 3. Computing the CRC lookup table

Of course, whether it is stored in RAM or ROM, a lookup table by itself is not that useful. You'll also need a function to compute the CRC of a given message that is somehow able to make use of the values stored in that table. Without going into all of the mathematical details of why this works, suffice it to say that the previously complicated modulo-2 division can now be implemented as a series of lookups and XORs. (In modulo-2 arithmetic, XOR is both addition and subtraction.)

A function that uses the lookup table contents to compute a CRC more efficiently is shown in Listing 4. The amount of processing to be done for each byte is substantially reduced.

crc
crcFast(uint8_t const message[], int nBytes)
{
uint8_t data;
crc remainder = 0;

/*
* Divide the message by the polynomial, a byte at a time.
*/
for (int byte = 0; byte < nBytes; ++byte)
{
data = message[byte] ^ (remainder >> (WIDTH - 8));
remainder = crcTable[data] ^ (remainder << 8);
}

/*
* The final remainder is the CRC.
*/
return (remainder);

} /* crcFast() */

Listing 4. A more efficient, table-driven, CRC implementation

As you can see from the code in Listing 4, a number of fundamental operations (left and right shifts, XORs, lookups, and so on) still must be performed for each byte even with this lookup table approach. So to see exactly what has been saved (if anything) I compiled both crcSlow() and crcFast() with IAR's C compiler for the PIC family of eight-bit RISC processors. 1 I figured that compiling for such a low-end processor would give us a good worst-case comparison for the numbers of instructions to do these different types of CRC computations. The results of this experiment were as follows:

* crcSlow(): 185 instructions per byte of message data
* crcFast(): 36 instructions per byte of message data

So, at least on one processor family, switching to the lookup table approach results in a more than five-fold performance improvement. That's a pretty substantial gain considering that both implementations were written in C. A bit more could probably be done to improve the execution speed of this algorithm if an engineer with a good understanding of the target processor were assigned to hand-code or tune the assembly code. My somewhat-educated guess is that another two-fold performance improvement might be possible. Actually achieving that is, as they say in textbooks, left as an exercise for the curious reader.
CRC standards and parameters

Now that we've got our basic CRC implementation nailed down, I want to talk about the various types of CRCs that you can compute with it. As I mentioned last month, several mathematically well understood and internationally standardized CRC generator polynomials exist and you should probably choose one of those, rather than risk inventing something weaker.

In addition to the generator polynomial, each of the accepted CRC standards also includes certain other parameters that describe how it should be computed. Table 1 contains the parameters for three of the most popular CRC standards. Two of these parameters are the "initial remainder" and the "final XOR value". The purpose of these two c-bit constants is similar to the final bit inversion step added to the sum-of-bytes checksum algorithm. Each of these parameters helps eliminate one very special, though perhaps not uncommon, class of ordinarily undetectable difference. In effect, they bulletproof an already strong checksum algorithm.

CRC-CCITT

CRC-16

CRC-32
Width

16 bits

16 bits

32 bits
(Truncated) Polynomial

0x1021

0x8005

0x04C11DB7
Initial Remainder

0xFFFF

0x0000

0xFFFFFFFF
Final XOR Value

0x0000

0x0000

0xFFFFFFFF
Reflect Data?

No

Yes

Yes
Reflect Remainder?

No

Yes

Yes
Check Value

0x29B1

0xBB3D

0xCBF43926

Table 1. Computational parameters for popular CRC standards

To see what I mean, consider a message that begins with some number of zero bits. The remainder will never contain anything other than zero until the first one in the message is shifted into it. That's a dangerous situation, since packets beginning with one or more zeros may be completely legitimate and a dropped or added zero would not be noticed by the CRC. (In some applications, even a packet of all zeros may be legitimate!) The simple way to eliminate this weakness is to start with a nonzero remainder. The parameter called initial remainder tells you what value to use for a particular CRC standard. And only one small change is required to the crcSlow() and crcFast() functions:

crc remainder = INITIAL_REMAINDER;

The final XOR value exists for a similar reason. To implement this capability, simply change the value that's returned by crcSlow() and crcFast() as follows:

return (remainder ^ FINAL_XOR_VALUE);

If the final XOR value consists of all ones (as it does in the CRC-32 standard), this extra step will have the same effect as complementing the final remainder. However, implementing it this way allows any possible value to be used in your specific application.

In addition to these two simple parameters, two others exist that impact the actual computation. These are the binary values "reflect data" and "reflect remainder". The basic idea is to reverse the bit ordering of each byte within the message and/or the final remainder. The reason this is sometimes done is that a good number of the hardware CRC implementations operate on the "reflected" bit ordering of bytes that is common with some UARTs. Two slight modifications of the code are required to prepare for these capabilities.

What I've generally done is to implement one function and two macros. This code is shown in Listing 5. The function is responsible for reflecting a given bit pattern. The macros simply call that function in a certain way.

#define REFLECT_DATA(X) ((uint8_t) reflect((X), 8))
#define REFLECT_REMAINDER(X) ((crc) reflect((X), WIDTH))

uint32_t
reflect(uint32_t data, uint8_t nBits)
{
uint32_t reflection = 0;

/*
* Reflect the data about the center bit.
*/
for (uint8_t bit = 0; bit < nBits; ++bit)
{
/*
* If the LSB bit is set, set the reflection of it.
*/
if (data & 0x01)
{
reflection |= (1 << ((nBits - 1) - bit));
}

data = (data >> 1);
}

return (reflection);

} /* reflect() */

Listing 5. Reflection macros and function

By inserting the macro calls at the two points that reflection may need to be done, it is easier to turn reflection on and off. To turn either kind of reflection off, simply redefine the appropriate macro as (X). That way, the unreflected data byte or remainder will be used in the computation, with no overhead cost. Also note that for efficiency reasons, it may be desirable to compute the reflection of all of the 256 possible data bytes in advance and store them in a table, then redefine the REFLECT_DATA() macro to use that lookup table.

Tested, full-featured implementations of both crcSlow() and crcFast() are available for download. These implementations include the reflection capabilities just described and can be used to implement any parameterized CRC formula. Simply change the constants and macros as necessary.

The final parameter that I've included in Table 1 is a "check value" for each CRC standard. This is the CRC result that's expected for the simple ASCII test message "123456789." To test your implementation of a particular standard, simply invoke your CRC computation on that message and check the result:

crcInit();

checksum = crcFast("123456789", 9);

If checksum has the correct value after this call, then you know your implementation is correct. This is a handy way to ensure compatibility between two communicating devices with different CRC implementations or implementors.

Additive CheckSum Algorithm

Whenever you connect two or more computers together with the intent of exchanging information, you assume that the exchange will take place without errors. But what if some of the data is lost or corrupted in transit? Communication protocols usually attempt to detect such errors automatically. To do that they use checksums.

The most important part of listening to someone speak is ensuring that you've heard them correctly. Your brain performs the tasks of error detection and correction for you, automatically. It does this by examining extra bits of information from the speaker and the speech; if a phrase or sentence makes sense as a whole and it makes sense coming from the mouth of the particular speaker, then the individual words were probably heard correctly. The same principle applies when you are reading. But what happens when computers are communicating with one another? How does the receiving computer know if an error has occurred in transmission?

Establishing correctness is more difficult for computers than humans. At the lowest level, communication between computers consists of nothing but a stream of binary digits. Meaning is only assigned to that particular sequence of bits at higher levels. We call that meaningful sequence of bits the message; it is analogous to a spoken or written phrase. If one or more bits within the message are inverted (a logic one becomes a logic zero, or vice versa) as it travels between computers, the receiver has no way to detect the error. No environmental or syntactical context is available to the receiver, since it cannot understand the message in its transmitted form.
Achieving parity

If we want communicating computers to detect and correct transmission errors automatically, we must provide a replacement for context. This usually takes the form of an error correction code or error detection code. A simple type of error detection code that you are probably already familiar with is called a parity bit. A parity bit is a single, extra binary digit that is appended to the message by the sender and transmitted along with it. Depending on the type of parity used, the parity bit ensures that the total number of logic ones sent is even (even parity) or odd (odd parity). For an example of even parity, consider the sequence:

10101110 1

in which the eight-bit message contains five ones and three zeros. The parity bit is one in this case to force the total number of ones in the transmitted data stream to be even.

When a parity bit is appended to a message, one additional bit of data must be sent between the computers. So there must be some benefit associated with the lost bandwidth. The most obvious benefit is that if any single bit in the message is inverted during transmission, the number of ones will either increase or decrease by one, thus making the received number of ones odd and the parity incorrect. The receiver can now detect such an error automatically and request a retransmission of the entire stream of bits. Interestingly, the receiver can also now detect any odd number of bit inversions. (Note that all of this still applies even when the parity bit is one of the inverted bits.)

However, as you may have already noticed, if two bits are inverted (two ones become zeros, for example), the parity of the stream of bits will not change; a receiver will not be able to detect such errors. In fact, the same is true for any even number of bit inversions. A parity bit is a weak form of error detection code. It has a small cost (one bit per message), but it is unable to detect many types of possible errors. (By the way, odd parity has the same costs, benefits, and weaknesses as even parity.)

Perhaps this is not acceptable for your application. An alternative that might occur to you is to send the entire message twice. If you're already spending bandwidth sending an error detection code, why not spend half of the bandwidth? The problem is that you could actually be spending much more than half of the available bandwidth. If a computer receives two copies of a message and the bits that comprise them aren't exactly the same, it cannot tell which one of the two is correct. In fact, it may be that both copies contain errors. So the receiver must request a retransmission whenever there is a discrepancy between the message and the copy sent as an error detection code.

With that in mind, let's compare the bandwidth costs of using a parity bit vs. resending the entire message. We'll assume that all messages are 1,000-bits long and that the communications channel has a bit error rate (average number of inverted bits) of one per 10,000 bits sent. Using a parity bit, we'd spend 0.1% of our bandwidth sending the error detection code (one bit of error protection for 1,000 bits of message) and have to retransmit one out of every 10 messages due to errors. If we send two complete copies of each message instead, the smallest unit of transmission is 2,000 bits (50% of our bandwidth is now spent sending the error detection code). In addition, we'll have to retransmit one out of every five messages. Therefore, the achievable bandwidths are approximately 90% and 40%, respectively. As the width of the code increases, the message plus code lengthens and becomes more vulnerable to bit errors and, as a result, expensive retransmissions.

From this type of analysis it should be clear that keeping the size of an error detection code as small as possible is a good thing, even if it does mean that some types of errors will be undetectable. (Note that even sending two copies of the message is not a perfect solution. If the same bit errors should happen to occur in both copies, the errors will not be detected by the receiver.) In addition to reducing the bandwidth cost of transmitting the error detection code itself, this also increases the message throughput. But we still don't want to go so far as using a one-bit error detection scheme like parity. That would let too many possible errors escape detection.

In practice, of course, we can achieve far better error detection capabilities than just odd numbers of inverted bits. But in order to do so we have to use error detection codes that are more complex than simple parity, and also contain more bits. I'll spend the remainder of this column and most of the next two discussing the strengths and weaknesses of various types of checksums, showing you how to compute them, and explaining how each can be best employed for purposes of error detection.
Checksums

As the name implies, a checksum usually involves a summation of one sort or another. For example, one of the most common checksum algorithms involves treating the message like a sequence of bytes and summing them. Listing 1 shows how this simple algorithm might be implemented in C.

uint8_t
Sum(uint8_t const message[], int nBytes)
{
uint8_t sum = 0;

while (nBytes-- > 0)
{
sum += *(message++);
}

return (sum);

} /* Sum() */

Listing 1. A sum-of-bytes checksum

The sum-of-bytes algorithm is straightforward to compute. However, understanding its strengths and weaknesses as a checksum is more difficult. What types of errors does it detect? What errors is it unable to detect? These are important factors to consider whenever you are selecting a checksum algorithm for a particular application. You want the algorithm you choose to be well matched to the types of errors you expect to occur in your transmission environment. For example, a parity bit would be a poor choice for a checksum if bit errors will frequently occur in pairs.

A noteworthy weakness of the sum-of-bytes algorithm is that no error will be detected if the entire message and data are somehow received as a string of all zeros. (A message of all zeros is a possibility, and the sum of a large block of zeros is still zero.) The simplest way to overcome this weakness is to add a final step to the checksum algorithm: invert the bits of the final sum. The new proper checksum for a message of all zeros would be FFh. That way, if the message and checksum are both all zeros, the receiver will know that something's gone terribly wrong. A modified version of the checksum implementation is shown in Listing 2.

uint8_t
SumAndInvert(uint8_t const message[], int nBytes)
{
uint8_t sum = 0;

while (nBytes-- > 0)
{
sum += *(message++);
}

return (~sum);

} /* SumAndInvert() */

Listing 2. A slightly more robust sum-of-bytes checksum algorithm

This final inversion does not affect any of the other error detection capabilities of this checksum, so let's go back to discussing the basic sum-of-bytes algorithm in Listing 1. First, it should be obvious that any single bit inversion in the message or checksum will be detectable. Such an error will always affect at least one bit within the checksum. (It could affect more, of course, but not less.) Observe that the sum-of-bytes is performed by essentially lining up all of the bytes that comprise the message and performing addition on them, like this:

10110101
11000000
00001101
+ 11100011
----------
01100101

Because of this mathematical arrangement, simultaneous single-bit inversions could occur in each of any number of the "columns." At least one bit of the checksum will always be affected. No matter how the inversions occur, at least the lowest-order column with an error will alter the checksum. This is an important point, because it helps us to understand the broader class of errors that can be detected by this type of checksum. I'll say it again: no matter how the inversions occur, at least the lowest order column with an error will alter the checksum, which means that two or more bit inversions in higher-order columns may cancel each other out. As long as the lowest-order column with an error has only one, it doesn't matter what other errors may be hidden within the message.

Now let's step back for a minute and talk about errors more formally. What exactly does a transmission error look like? Well, the first and most obvious type of error is a single bit that is inverted. That happens sometimes and is easy to detect (even a parity bit will do the job). Other times, bit inversions are part of an error burst. Not all of the bits within an error burst must be inverted. The defining characteristic is simply an inverted bit on each end. So an error burst may be 200 bits long, but contain just two inverted bits--one at each end.

A sum-of-bytes checksum will detect the vast majority of error bursts, no matter what their length. However, describing exactly which ones is generally difficult. Only one class of error bursts is always detectable: those of length eight bits or less. As long as the two inverted bits that bound the error burst have no more than six bits between them, the error will always be detected by this algorithm. That's because our previous requirement for error detection--that the lowest-order column with an error have only one error--is always met when the length of the error burst is no longer than the width of the checksum. We can know with certainty that such errors will always be detected.

Of course, many longer error bursts will also be detected. The probability of detecting a random error burst longer than eight bits is 99.6%. Errors will only be overlooked if the modified message has exactly the same sum as the original one, for which there is a 2-8 chance. That's much better error detection than a simple parity bit, and for not too much more cost.

The sum-of-bytes algorithm becomes even more powerful when the width of the checksum is increased. In other words, increasing the number of bits in the checksum causes it to detect even more types of errors. A 16-bit sum-of-words checksum will detect all single bit errors and all error bursts of length 16 bits or fewer. It will also detect 99.998% of longer error bursts. A 32-bit sum will detect even more errors. In practice, this increase in performance must be weighed against the increased cost of sending the extra checksum bits as part of every exchange of data.
Internet checksum

Many protocol stacks include some sort of a checksum within each protocol layer. The TCP/IP suite of protocols is no exception in this regard. In addition to a checksum at the lowest layer (within Ethernet packets, for example), checksums also exist within each IP, UDP, and TCP header. Figure 1 shows what some of these headers look like in the case of some data sent via UDP/IP. Here the fields of the IP header are summed to generate the 16-bit IP checksum and the data, fields of the UDP header, and certain fields of the IP header are summed to generate the 16-bit UDP checksum.

Figure 1. UDP and IP headers with checksum fields highlighted

A function that calculates the IP header checksum is shown in Listing 3. This function can be used before sending an IP packet or immediately after receiving one. If the packet is about to be sent, the checksum field should be set to zero before calculating the checksum and filled with the returned value afterward. If the packet has just been received, the checksum routine is expected to return a value of 0xFFFF to indicate that the IP header was received correctly. This result is a property of the type of addition used to compute the IP checksum.

uint16_t
NetIpChecksum(uint16_t const ipHeader[], int nWords)
{
uint16_t sum = 0;

/*
* IP headers always contain an even number of bytes.
*/
while (nWords-- > 0)
{
sum += *(ipHeader++);
}

/*
* Use carries to compute 1's complement sum.
*/
sum = (sum >> 16) + (sum & 0xFFFF);
sum += sum >> 16;

/*
* Return the inverted 16-bit result.
*/
return ((unsigned short) ~sum);

} /* NetIpChecksum() */

Listing 3. IP header checksum calculation

The IP header to be checksummed should be passed to NetIpChecksum() as an array of 16-bit words. Since the length of an IP header is always a multiple of four bytes, it is sufficient to provide the header length as a number of 16-bit words. The checksum is then computed by the function and returned to the caller for insertion into the header for validation of the header contents.

When you first look at this function, you may be overcome with a feeling of deja vu. The IP checksum algorithm begins in much the same way as the sum-of-bytes algorithm I discussed earlier. However, this algorithm is actually different. First, of course, we're computing a 16-bit checksum instead of an eight-bit one, so we're summing words rather than bytes. That difference is obvious. Less obvious is that we're actually computing a ones complement sum.

Recall that most computers store integers in a twos complement representation. Simply adding two integers, as we did in the previous algorithm, will therefore result in a twos complement sum. In order to compute the ones complement sum, we need to perform our addition with "end around carry." This means that carries out of the most significant bit (MSB) are not discarded, as they were previously. Instead, carries are added back into the checksum at the least significant bit (LSB). This could be done after each addition, but testing for a carry after each addition is expensive in C. A faster way is to let the carries accumulate in the upper half of a 32-bit accumulator. Once the sum-of-words is complete, the upper half of the 32-bit accumulator is turned into a 16-bit value and added to the 16-bit twos complement sum (the lower half). One final carry is possible at that point, and must be included in the final sum. The IP checksum is the inverted ones complement sum of all of the words in the IP header.

For checksum purposes, ones complement arithmetic has an important advantage over twos complement arithmetic. Recall that the biggest weakness of a parity bit is that it can't detect a pair of bit errors or any even number of errors, for that matter. A twos complement sum suffers from a similar weakness. Only one of the bits in the sum (the MSB) is affected by changes in the most significant of the 16 columns. If an even number of bit errors occurs within that column, the checksum will appear to be correct and the error will not be detected. A ones complement sum does not suffer this weakness.

Because carries out of the MSB are added back into the LSB during a ones complement sum, errors in any one column of the data will be reflected in more than one bit of the checksum. So a ones complement sum is a stronger checksum (for example, will detect more errors) than a twos complement sum, and only slightly more expensive. Hence, it is chosen for use in a lot of different situations.

The checksums within the UDP and TCP headers are computed in exactly the same way as the IP checksum. In fact, the only major difference is the set of words over which the sum is calculated. (A minor difference is that UDP checksums are optional.) In both cases, these layer 4 protocols include the message, their own header fields, and a few important fields of the IP header in the checksum calculation. The inclusion of some IP header fields in the UDP and TCP checksums is one of the biggest reasons that these layers of the protocol stack cannot be completely separated from the IP layer in a software implementation. The reason that IP checksums include only the IP header is that many intermediate computers must validate, read, and modify the contents of the IP header as a packet travels from its source to the destination. By reducing the number of words involved in the calculation, the speed with which all IP packets move across the Internet is improved. Including a larger set of words in the UDP and TCP checksums does not slow down communications; rather, it increases the likelihood that the message received is the same as the message sent.

Source http://www.netrino.com/Embedded-Systems/How-To/Additive-Checksums

Friday, April 29, 2011

Facbook Algorithm

The Social Network (2010) - FaceMash Algorithm

I saw Social Network three times in 1 week. Not for entertainment. Not because I had nothing better to do. But because I wanted to understand the math and computer science fundae used in the movie. I would like to discuss one particularly interesting scene from the movie.

You may remember Mark inviting his friend Eduardo to give him his chess algorithm at the beginning of the movie (Mark was drinking, blogging and hacking simultaneously and creating Facemash.com). You may also remember the scribbles on the window:

and

What is this? This is actually the math behind Elo Rating System. Elo rating system is a method for calculating the relative skill levels of players in two-player games such as chess. It is named after its creator Arpad Elo, a Hungarian-born American physics professor.

As explained in the movie, Facemash was quite simple. Not unlike hotornot.com, students went to a page and 2 random images of girls were picked and presented to them. The students then had to click on the hottest girl presented to them and then another set of two girls would be presented asking the students to repeat the same actions they had done. The difference with hotornot was that the girls presented were all Harvard students. In other words, the students were rating girls of Harvard based on their looks (You can imagine why Mark got into trouble).

The algorithm used - The Elo Rating system - insured a fair rating and ranking of the girls. A hot girl A winning over hot girl B girl would gain more points than winning (or being picked) against ugly girl C. Same goes for the following: ugly girl C wins over ugly girl D. ugly girl C gains some points in her general ranking. If ugly girl C wins over hot girl A then ugly girl C gains more points because the ranking of hot girl A is much higher than ugly girl D. The previous scenario is roughly what the algorithm implemented by Mark was doing. It was somewhat insuring a level of fairness despite the misogynistic nature of the product.

In today’s society, the Elo Rating system is used by many rating and ranking system to predict the outcome of matches but also insure a level of fairness between teams of different levels playing against each others. FIFA uses it to rank football teams.

You can read more about the system on wikipedia page(http://en.wikipedia.org/wiki/Elo_rating_system).

Wednesday, April 13, 2011

Problem Set-3 Dynamic Programming

Dynamic Programming Practice Problems
This site contains a collection of practice dynamic programming problems and their solutions. The problems listed below are also available in a pdf handout. To view the solution to one of the problems below, click on its title. To view the solutions, you'll need a machine which can view Macromedia Flash animations and which has audio output. If you want, you can also view a quick review from recitation on how to solve the integer knapsack problem (with multiple copies of items allowed) using dynamic programming.
Problems:

1. Maximum Value Contiguous Subsequence. Given a sequence of n real numbers A(1) ... A(n), determine a contiguous subsequence A(i) ... A(j) for which the sum of elements in the subsequence is maximized.

2. Making Change. You are given n types of coin denominations of values v(1) < v(2) < ... < v(n) (all integers). Assume v(1) = 1, so you can always make change for any amount of money C. Give an algorithm which makes change for an amount of money C with as few coins as possible. [on problem set 4]

3. Longest Increasing Subsequence. Given a sequence of n real numbers A(1) ... A(n), determine a subsequence (not necessarily contiguous) of maximum length in which the values in the subsequence form a strictly increasing sequence. [on problem set 4]

4. Box Stacking. You are given a set of n types of rectangular 3-D boxes, where the i^th box has height h(i), width w(i) and depth d(i) (all real numbers). You want to create a stack of boxes which is as tall as possible, but you can only stack a box on top of another box if the dimensions of the 2-D base of the lower box are each strictly larger than those of the 2-D base of the higher box. Of course, you can rotate a box so that any side functions as its base. It is also allowable to use multiple instances of the same type of box.

5. Building Bridges. Consider a 2-D map with a horizontal river passing through its center. There are n cities on the southern bank with x-coordinates a(1) ... a(n) and n cities on the northern bank with x-coordinates b(1) ... b(n). You want to connect as many north-south pairs of cities as possible with bridges such that no two bridges cross. When connecting cities, you can only connect city i on the northern bank to city i on the southern bank. (Note: this problem was incorrectly stated on the paper copies of the handout given in recitation.)

6. Integer Knapsack Problem (Duplicate Items Forbidden). This is the same problem as the example above, except here it is forbidden to use more than one instance of each type of item.

7. Balanced Partition. You have a set of n integers each in the range 0 ... K. Partition these integers into two subsets such that you minimize |S1 - S2|, where S1 and S2 denote the sums of the elements in each of the two subsets.

8. Edit Distance. Given two text strings A of length n and B of length m, you want to transform A into B with a minimum number of operations of the following types: delete a character from A, insert a character into A, or change some character in A into a new character. The minimal number of such operations required to transform A into B is called the edit distance between A and B.

9. Counting Boolean Parenthesizations. You are given a boolean expression consisting of a string of the symbols 'true', 'false', 'and', 'or', and 'xor'. Count the number of ways to parenthesize the expression such that it will evaluate to true. For example, there is only 1 way to parenthesize 'true and false xor true' such that it evaluates to true.

10. Optimal Strategy for a Game. Consider a row of n coins of values v(1) ... v(n), where n is even. We play a game against an opponent by alternating turns. In each turn, a player selects either the first or last coin from the row, removes it from the row permanently, and receives the value of the coin. Determine the maximum possible amount of money we can definitely win if we move first.

11. Two-Person Traversal of a Sequence of Cities. You are given an ordered sequence of n cities, and the distances between every pair of cities. You must partition the cities into two subsequences (not necessarily contiguous) such that person A visits all cities in the first subsequence (in order), person B visits all cities in the second subsequence (in order), and such that the sum of the total distances travelled by A and B is minimized. Assume that person A and person B start initially at the first city in their respective subsequences.

12. Bin Packing (Simplified Version). You have n1 items of size s1, n2 items of size s2, and n3 items of size s3. You'd like to pack all of these items into bins each of capacity C, such that the total number of bins used is minimized.

Source http://people.csail.mit.edu/bdean/6.046/dp/
http://net.pku.edu.cn/~course/cs101/2007/resource/Intro2Algorithm/book6/chap16.htm
http://www.cs.berkeley.edu/~vazirani/algorithms/chap6.pdf
http://www.ms.unimelb.edu.au/~moshe/dp/
http://web.iiit.ac.in/~avidullu/pdfs/dynprg/
http://coders-stop.blogspot.com/search/label/Dynamic%20Programming

Sunday, April 10, 2011

Study of Advance data Structure & Algorithm With Code

http://www.stanford.edu/~blp/avl/libavl.html/index.html#toc_AVL-Trees

Problems Set 1- Data Structure

Data Structures

K Nearest Neighbor Algorithm Implementation and Overview

http://www.codeproject.com/KB/recipes/K_nearest_neighbor.aspx

Stacks, Queues, and Lists

3-1. A common problem for compilers and text editors is determining whether the parentheses in a string are balanced and properly nested. For example, the string ((())())() contains properly nested pairs of parentheses, which the strings )()( and ()) do not. Give an algorithm that returns true if a string contains properly nested and balanced parentheses, and false if otherwise. For full credit, identify the position of the first offending parenthesis if the string is not properly nested and balanced.

(Solution 3.1)

3-2. Write a program to reverse the direction of a given singly-linked list. In other words, after the reversal all pointers should now point backwards. Your algorithm should take linear time.

(Solution 3.2)

3-3. We have seen how dynamic arrays enable arrays to grow while still achieving constant-time amortized performance. This problem concerns extending dynamic arrays to let them both grow and shrink on demand.

1. Consider an underflow strategy that cuts the array size in half whenever the array falls below half full. Give an example sequence of insertions and deletions where this strategy gives a bad amortized cost.
2. Then, give a better underflow strategy than that suggested above, one that achieves constant amortized cost per deletion.

(Solution 3.3)

Trees and Other Dictionary Structures

3-4. Design a dictionary data structure in which search, insertion, and deletion can all be processed in O(1) time in the worst case. You may assume the set elements are integers drawn from a finite set 1,2,..,n, and initialization can take O(n) time.

(Solution 3.4)

3-5. Find the overhead fraction (the ratio of data space over total space) for each of the following binary tree implementations on n nodes:

1. All nodes store data, two child pointers, and a parent pointer. The data field requires four bytes and each pointer requires four bytes.
2. Only leaf nodes store data; internal nodes store two child pointers. The data field requires four bytes and each pointer requires two bytes.

(Solution 3.5)

3-6. Describe how to modify any balanced tree data structure such that search, insert, delete, minimum, and maximum still take O(logn) time each, but successor and predecessor now take O(1) time each. Which operations have to be modified to support this?

3-7. Suppose you have access to a balanced dictionary data structure, which supports each of the operations search, insert, delete, minimum, maximum, successor, and predecessor in O(logn) time. Explain how to modify the insert and delete operations so they still take O(logn) but now minimum and maximum take O(1) time. (Hint: think in terms of using the abstract dictionary operations, instead of mucking about with pointers and the like.)

(Solution 3.7)

3-8. Design a data structure to support the following operations:

1. insert(x,T) -- Insert item x into the set T.
2. delete(k,T) -- Delete the kth smallest element from T.
3. member(x,T) -- Return true iff x \in T.

All operations must take O(logn) time on an n-element set.

3-9. A concatenate operation takes two sets S1 and S2, where every key in S1 is smaller than any key in S2, and merges them together. Give an algorithm to concatenate two binary search trees into one binary search tree. The worst-case running time should be O(h), where h is the maximal height of the two trees.

(Solution 3.9)

Applications of Tree Structures

3-10. In the bin-packing problem, we are given n metal objects, each weighing between zero and one kilogram. Our goal is to find the smallest number of bins that will hold the n objects, with each bin holding one kilogram at most.

1. The best-fit heuristic for bin packing is as follows. Consider the objects in the order in which they are given. For each object, place it into the partially filled bin with the smallest amount of extra room after the object is inserted.. If no such bin exists, start a new bin. Design an algorithm that implements the best-fit heuristic (taking as input the n weights w1,w2,...,wn and outputting the number of bins used) in O(nlogn) time.
2. Repeat the above using the worst-fit heuristic, where we put the next object in the partially filled bin with the largest amount of extra room after the object is inserted.

3-11. Suppose that we are given a sequence of n values x1,x2,...,xn and seek to quickly answer repeated queries of the form: given i and j, find the smallest value in x_i,\ldots,x_j.

1. Design a data structure that uses O(n2) space and answers queries in O(1) time.
2. Design a data structure that uses O(n) space and answers queries in O(logn) time. For partial credit, your data structure can use O(nlogn) space and have O(logn) query time.

(Solution 3.11)

3-12. Suppose you are given an input set S of n numbers, and a black box that if given any sequence of real numbers and an integer k instantly and correctly answers whether there is a subset of input sequence whose sum is exactly k. Show how to use the black box O(n) times to find a subset of S that adds up to k.

3-13. Let A[1..n] be an array of real numbers. Design an algorithm to perform any sequence of the following operations:

1. Add(i,y) -- Add the value y to the ith number.
2. Partial-sum(i) -- Return the sum of the first i numbers, i.e.

\sum_{j=1}^i A[j]. There are no insertions or deletions; the only change is to the values of the numbers. Each operation should take O(logn) steps. You may use one additional array of size n as a work space.

(Solution 3.13)

3-14. Extend the data structure of the previous problem to support insertions and deletions. Each element now has both a key and a value. An element is accessed by its key. The addition operation is applied to the values, but the elements are accessed by its key. The Partial-sum operation is different.

1. Add(k,y) -- Add the value y to the item with key k.
2. Insert(k,y) -- Insert a new item with key k and value y.
3. Delete(k) -- Delete the item with key k.
4. Partial-sum(k) --

Return the sum of all the elements currently in the set whose key is less than y, i.e. \sum_{x_j

3-15. Design a data structure that allows one to search, insert, and delete an integer X in O(1) time (i.e., constant time, independent of the total number of integers stored). Assume that 1 \leq X \leq n and that there are m + n units of space available, where m is the maximum number of integers that can be in the table at any one time. (Hint: use two arrays A[1..n] and B[1..m].) You are not allowed to initialize either A or B, as that would take O(m) or O(n) operations. This means the arrays are full of random garbage to begin with, so you must be very careful.

(Solution 3.15)

Implementation Projects

3-16. Implement versions of several different dictionary data structures, such as linked lists, binary trees, balanced binary search trees, and hash tables. Conduct experiments to assess the relative performance of these data structures in a simple application that reads a large text file and reports exactly one instance of each word that appears within it. This application can be efficiently implemented by maintaining a dictionary of all distinct words that have appeared thus far in the text and inserting/reporting each word that is not found. Write a brief report with your conclusions.

3-17. A Caesar shift (see cryptography) is a very simple class of ciphers for secret messages. Unfortunately, they can be broken using statistical properties of English. Develop a program capable of decrypting Caesar shifts of sufficiently long texts.

(Solution 3.17)

Interview Problems

3-18. What method would you use to look up a word in a dictionary?

3-19. Imagine you have a closet full of shirts. What can you do to organize your shirts for easy retrieval?

(Solution 3.19)

3-20. Write a function to find the middle node of a singly-linked list.

(Solution 3.20)

3-21. Write a function to compare whether two binary trees are identical. Identical trees have the same key value at each position and the same structure.

(Solution 3.21)

3-22. Write a program to convert a binary search tree into a linked list.

(Solution 3.22)

3-23. Implement an algorithm to reverse a linked list. Now do it without recursion.

(Solution 3.23)

3-24. What is the best data structure for maintaining URLs that have been visited by a Web crawler? Give an algorithm to test whether a given URL has already been visited, optimizing both space and time.

(Solution 3.24)

3-25. You are given a search string and a magazine. You seek to generate all the characters in search string by cutting them out from the magazine. Give an algorithm to efficiently determine whether the magazine contains all the letters in the search string.

(Solution 3.25)

3-26. Reverse the words in a sentence---i.e., My name is Chris becomes Chris is name My. Optimize for time and space.

3-27. Determine whether a linked list contains a loop as quickly as possible without using any extra storage. Also, identify the location of the loop.

(Solution 3.27)

3-28. You have an unordered array X of n integers. Find the array M containing n elements where Mi is the product of all integers in X except for Xi. You may not use division. You can use extra memory. (Hint: There are solutions faster than O(n2).)

3-29. Give an algorithm for finding an ordered word pair (e.g., New York) occurring with the greatest frequency in a given webpage. Which data structures would you use? Optimize both time and space.

(Solution 3.29)

Source For Solution

http://www2.algorithm.cs.sunysb.edu/mediawiki/index.php/Data-structures-TADM2E-2