Reprinted from http://blog.csdn.net/liwenjia1981/archive/2010/07/13/5731040.aspx

Programming of the United States 3.3

After reading the title, no clue

Thinking in Solving the book very well, the distance between the first two strings must be a known number, must be less than the sum of the two strings.

By deletion of the two strings have become the empty string.

Recursive method as shown in the book, the code knocked out, there is little difference

view plaincopy to clipboardprint?

#include <stdlib.h> #include <stdio.h> #include <string.h> void calDistance(char *a, int aBegin, int aEnd, char * b, int bBegin, int bEnd, int& dis); int min(int a, int b, int c); int main() { char a[50]; char b[50]; int dis = 0; printf("First String :\n"); scanf("%s", a); printf("Second String : \n"); scanf("%s", b); int aLen = strlen(a); int bLen = strlen(b); calDistance(a,0,aLen-1, b, 0, bLen-1, dis); printf("The two string distance is :%d\n", dis); return 0; } int min(int a, int b, int c) { if(a < b && a < c) return a; if(b <a && b < c) return b; return c; } void calDistance(char *a, int aBegin, int aEnd, char * b, int bBegin, int bEnd, int& dis) { if(aBegin > aEnd) { if(bBegin > bEnd) return ; else { dis = dis + bEnd - bBegin + 1; return ; } } else if(bBegin > bEnd) { dis = dis + aEnd - aBegin + 1; return ; } if(a[aBegin] == b[bBegin]) calDistance(a, aBegin + 1, aEnd, b, bBegin + 1, bEnd, dis); else { int num1, num2, num3; num1 = num2 = num3 = dis + 1; calDistance(a, aBegin + 1, aEnd, b, bBegin, bEnd, num1); // To give b[bBegin] Before a[aBegin] calDistance(a, aBegin, aEnd, b, bBegin + 1, bEnd, num2); // To give a[aBegin] Before b[bBegin] calDistance(a, aBegin + 1, aEnd, b, bBegin + 1, bEnd, num3); // Will a[aBegin] Replacement for b[bBegin], Or will b[bBegin] Replacement for a[aBegin] dis = min(num1, num2, num3); } }

Because of the existence of repeated sub-code problem, optimization, calculated using the sub-structure of a temporary problem, once again calling for direct search results.

Related to repeated sub-problems associated with dynamic programming ... the concrete realization of the memorandum, ah need to reference the ...

Reference article:

http://tech.ddvip.com/2009-05/1243159182120785_3.html

DP Solution of the first code switch

Always take full advantage of the dynamic programming algorithm overlapping sub-problems, each sub-problem only through a solution, the solution is stored in a necessary, you can view the table, and each look-up table of time constants.

Dynamic programming algorithm with bottom-up.

view plaincopy to clipboardprint?

/* * A loop method using dynamic programming. * Calculate from bottom to top. */ int calculateStringDistance(string strA, string strB) { int lenA = (int)strA.length(); int lenB = (int)strB.length(); int c[lenA+1][lenB+1]; // Record the distance of all begin points of each string // i: begin point of strA // j: begin point of strB for(int i = 0; i < lenA; i++) c[i][lenB] = lenA - i; for(int j = 0; j < lenB; j++) c[lenA][j] = lenB - j; c[lenA][lenB] = 0; for(int i = lenA-1; i >= 0; i--) for(int j = lenB-1; j >= 0; j--) { if(strB[j] == strA[i]) c[i][j] = c[i+1][j+1]; else c[i][j] = minValue(c[i][j+1], c[i+1][j], c[i+1][j+1]) + 1; } return c[0][0]; }

Sub memorandum practice:

Memorandum is a variant of dynamic programming, both with the usual dynamic programming efficiency, but also uses a top-down strategy.

Added a note of the recursive algorithm for each sub-problem solution in the form of a table entry in the record. Beginning of each table entry initially contains a special value to indicate that the table entry to be filled. When the recursive algorithm is first encountered in the implementation of a sub-problem, the solution to calculate and fill in the table. Each subsequent encounter of the sub-problem, as long as the view and return to the previous value can be filled.

The following is a recursive algorithm to do the original version of the memorandum, and by a Boolean variable to control whether memoize memorandum, and a Boolean variable to control whether to print debug call process. Interested in reading are available through the control of the two Boolean variables to compare versions of the memorandum and non-complexity version of the memorandum.

view plaincopy to clipboardprint?

#include <iostream> #define M 100 using namespace std; const bool debug = false; // Whether to print debug info const bool memoize = true; // Whether to use memoization unsigned int cnt = 0; // Line number for the debug info int memoizedDistance[M][M]; // Matrix for memoiztion int minValue(int a, int b, int c) { if(a < b && a < c) return a; else if(b < a && b < c) return b; else return c; } /* * A recursive method which can be decorated by memoization. * Calculate from top to bottom. */ int calculateStringDistance(string strA, int pABegin, int pAEnd, string strB, int pBBegin, int pBEnd) { if(memoize && memoizedDistance[pABegin][pBBegin] >= 0) return memoizedDistance[pABegin][pBBegin]; if(pABegin > pAEnd) { if(pBBegin > pBEnd) { if(memoize) memoizedDistance[pABegin][pBBegin] = 0; if(debug) cout << cnt++ << ": m(" << pABegin << "," << pBBegin << ")=0" << endl; return 0; } else { int temp = pBEnd - pBBegin + 1; if(memoize) memoizedDistance[pABegin][pBBegin] = temp; if(debug) cout << cnt++ << ": m(" << pABegin << "," << pBBegin << ")=" << temp << endl; return temp; } } if(pBBegin > pBEnd) { if(pABegin > pAEnd) { if(memoize) memoizedDistance[pABegin][pBBegin] = 0; if(debug) cout << cnt++ << ": m(" << pABegin << "," << pBBegin << ")=0" << endl; return 0; } else { int temp = pAEnd - pABegin + 1; if(memoize) memoizedDistance[pABegin][pBBegin] = temp; if(debug) cout << cnt++ << ": m(" << pABegin << "," << pBBegin << ")=" << temp << endl; return temp; } } if(strA[pABegin] == strB[pBBegin]) { int temp = calculateStringDistance(strA, pABegin+1, pAEnd, strB, pBBegin+1, pBEnd); if(memoize) memoizedDistance[pABegin][pBBegin] = temp; if(debug) cout << cnt++ << ": m(" << pABegin << "," << pBBegin << ")=" << temp << endl; return temp; } else { int t1 = calculateStringDistance(strA, pABegin, pAEnd, strB, pBBegin+1, pBEnd); int t2 = calculateStringDistance(strA, pABegin+1, pAEnd, strB, pBBegin, pBEnd); int t3 = calculateStringDistance(strA, pABegin+1, pAEnd, strB, pBBegin+1, pBEnd); int temp = minValue(t1, t2, t3) + 1; if(memoize) memoizedDistance[pABegin][pBBegin] = temp; if(debug) cout << cnt++ << ": m(" << pABegin << "," << pBBegin << ")=" << temp << endl; return temp; } } int main() { if(memoize) { // initialize the matrix : memoizedDistance[][] for(int i = 0; i < M; i++) for(int j = 0; j < M; j++) memoizedDistance[i][j] = -1; // -1 means unfilled cell yet } string strA = "abcdfef"; string strB = "a"; cout << endl << "Similarity = " << 1.0 / (1 + calculateStringDistance(strA, 0, (int)strA.length()-1, strB, 0, (int)strB.length()-1)) << endl; return 0; }

Summary:

Can be calculated, if the dynamic programming or do not have the memorandum, the worst case complexity is about: lenA! * LenB! . Using dynamic programming complexity is O ((lenA +1) * (lenB +1)). Recursive way and do memorandum for the worst case complexity O ((lenA +1) * (lenB +1)).

In practice, if all sub-problems to be calculated at least once, then a bottom-up dynamic programming algorithm is usually better than a top-down method is better to do a memorandum of constant factor, because the former without the cost of recursive and maintenance costs are also smaller tables.

In addition, in some problems, you can also use the dynamic programming algorithm in the table access mode to further reduce the time or space requirements. Or, if the sub problem space in some sub-There is no need to solve the problem, do memorandum approaches have only Xie affirmative requirement that sub-problem solutions advantage for this issue is.

http://blog.csdn.net/jeiwt/archive/2009/12/10/4981876.aspx

-------------------------------------------------- -------------------------------------------------- ----

Two strings have a lot of similarity calculation, are compiled from the algorithm based on the similarity between their own under the summary.

Briefly under the Levenshtein Distance (LD): LD may be to measure the similarity of two strings. The distance between them is that a string into a string in the process of add, delete, modify the value.

For example:

If str1 = "test", str2 = "test", then LD (str1, str2) = 0. Without conversion.

If str1 = "test", str2 = "tent", then LD (str1, str2) = 1. str1 the "s" conversion "n", conversion of a character, so is 1.

If the distance between them greater, indicating they are more different.

Levenshtein distance is the first Russian scientist Vladimir Levenshtein in 1965, invented, with his name. Not spelling, you can call it edit distance (edit distance).

Levenshtein distance can be used to:

Spell checking (spell check)

Speech recognition (statement recognition)

DNA analysis (DNA analysis)

Plagiarism detection (plagiarism detection)

LD with m * n matrix stored distance value. Algorithm about the process:

The length of str1 or str2 is 0 return another string length.

Initialize (n +1) * (m +1) matrix d, and let the first row and column values from 0 to growth.

Scan the two strings (n * m level), if: str1 [i] == str2 [j], with the temp record it as 0. Otherwise recorded as a temp. Then in the matrix d [i] [j] confers d [i-1] [j] +1, d [i] [j-1] +1, d [i-1] [j-1] + temp c are the minimum.

After scanning, returns the last value matrix is d [n] [m]

The final return of their distance. According to the similarity of how it obtained from? Because their maximum distance is the maximum length of two strings. The string is not very sensitive. Similarity of formula I are set to 1 - their distance / maximum string length.

Source:

package com.chenlb.algorithm; /** * Edit distance of two string similarity * * @author chenlb 2008-6-24 In the afternoon 06:41:55 */ public class Similarity { private int min( int one, int two, int three) { int min = one; if (two < min) { min = two; } if (three < min) { min = three; } return min; } public int ld(String str1, String str2) { int d[][]; // Matrix int n = str1.length(); int m = str2.length(); int i; // Traverse str1 int j; // Traverse the str2 char ch1; // str1 By char ch2; // str2 By int temp; // Records of the same character, in a matrix location value increments , It is not 0 1 if (n == 0 ) { return m; } if (m == 0 ) { return n; } d = new int [n + 1 ][m + 1 ]; for (i = 0 ; i <= n; i ++ ) { // Initialize the first column d[i][ 0 ] = i; } for (j = 0 ; j <= m; j ++ ) { // Initialize the first row d[ 0 ][j] = j; } for (i = 1 ; i <= n; i ++ ) { // Traverse str1 ch1 = str1.charAt(i - 1 ); // Go to match str2 for (j = 1 ; j <= m; j ++ ) { ch2 = str2.charAt(j - 1 ); if (ch1 == ch2) { temp = 0 ; } else { temp = 1 ; } // Top left + 1, +1, Corner + take minimum temp d[i][j] = min(d[i - 1 ][j] + 1 , d[i][j - 1 ] + 1 , d[i - 1 ][j - 1 ] + temp); } } return d[n][m]; } public double sim(String str1, String str2) { int ld = ld(str1, str2); return 1 - ( double ) ld / Math.max(str1.length(), str2.length()); } public static void main(String[] args) { Similarity s = new Similarity(); String str1 = " chenlb.blogjava.net " ; String str2 = " chenlb.javaeye.com " ; System.out.println( " ld= " + s.ld(str1, str2)); System.out.println( " sim= " + s.sim(str1, str2)); } }

I do not know the formula sim method is reasonable and think poorly thought, ^ _ ^