2012

Abreu, J.

Detección de regularidades en contornos 2D, cálculo aproximado de medianas y su aplicación en tareas de clasificación PhD Thesis

2012.

Abstract | BibTeX | Tags: ISIC 2010, TIASA

@phdthesis{k293,

title = {Detección de regularidades en contornos 2D, cálculo aproximado de medianas y su aplicación en tareas de clasificación},

author = {J. Abreu},

editor = {J. R. Rico},

year = {2012},

date = {2012-01-01},

urldate = {2012-01-01},

organization = {Universidad de Alicante},

abstract = {In this work, we address two main problems: identifying the regularities and the construction of average contours from contours encoded by Freeman chain codes. Solutions for those problems are proposed which relies in the information gathered from the Levenshtein edit distance computation. We describe a new method for quantifying the regularity of contours and comparing them, when encoded by Freeman chain codes, in terms of a similarity criterion. The criterion used allows subsequences to be found from the minimal cost edit sequence that speciﬁ and es an alignment of contour segments which are similar. Two external parameters adjust the similarity criterion. The information about each similar part is encoded by strings that represent an average contour region. An explanation of how to construct a prototype based on the identiﬁ and ed regularities is also reviewed. The reliability of the prototypes is eva- luated by replacing contour groups, samples, by new prototypes used as the training set in a classiﬁ and cation task. This way, the size of the data set can be reduced without sensibly affecting its representational power for classiﬁ and cation purposes. Experimental results show that this scheme achieves a reduction in the size of the training data set of about 80% while the classiﬁ and cation error only increases by 0.45% in one of the three data sets studied. Also this thesis presents a new fast algorithm for computing an approximation to the mean between two strings of characters representing a 2D shape and its application to a new Wilson-based editing proce dure. The approximate mean is built by including some symbols from the two original strings. Besides, a greedy approach to this algorithm is studied which allows to reduce the time required to computed an approximate mean. The new dataset editing scheme relaxes the criterion for deleting instances proposed by the Wilson editing procedure. In practice, not all instances misclassiﬁ and ed by their near neighbors are pruned. Instead, an artiﬁ and cial instance is added to the dataset in the hope of successfully classifying the instance in the future. The new artiﬁ and cial instance is the approximated mean of the misclassiﬁ and ed sample and its same-class nearest neighbor. Experiments carried over three widely known databases of contours show the proposed algorithms performs very well in computing the mean of two strings, outperforming methods proposed by other authors. Particularly the low computational time required by the heuristic approach make it very suitable when dealing with long length strings. Results also shows the propo- sed preprocessing scheme can reduce the classiﬁ and cation error in about 83% of trials. There is empirical evidence that using the greedy approximation to compute the approximated mean does not affect the editing procedure performance. Finally, a new algorithm with which to compute an approximation to the mean of a set of strings is presented. The approximated mean is computed through the successive improvements of a partial solution. In each iteration, the edit distance from the partial solution to all the strings in the set are computed, thus accounting for the frequency of each of the edit operations in every position of the approximated mean. A goodness index for edit operations is later computed by multiplying their frequency by the cost. Each operation is tested, starting from that with the highest index, in order to verify whether applying it to the partial solution leads to an improvement. If successful, a new iteration begins from the new approximated mean. The algorithm ﬁ and nishes after all the operations have been examined without a better solution being found. Comparative experiments involving Freeman chain codes encoding 2D shapes show that the quality of the approximated mean string is similar to other approaches but achieves a much faster convergence.},

keywords = {ISIC 2010, TIASA},

pubstate = {published},

tppubtype = {phdthesis}

}

In this work, we address two main problems: identifying the regularities and the construction of average contours from contours encoded by Freeman chain codes. Solutions for those problems are proposed which relies in the information gathered from the Levenshtein edit distance computation. We describe a new method for quantifying the regularity of contours and comparing them, when encoded by Freeman chain codes, in terms of a similarity criterion. The criterion used allows subsequences to be found from the minimal cost edit sequence that speci&#64257 and es an alignment of contour segments which are similar. Two external parameters adjust the similarity criterion. The information about each similar part is encoded by strings that represent an average contour region. An explanation of how to construct a prototype based on the identi&#64257 and ed regularities is also reviewed. The reliability of the prototypes is eva- luated by replacing contour groups, samples, by new prototypes used as the training set in a classi&#64257 and cation task. This way, the size of the data set can be reduced without sensibly affecting its representational power for classi&#64257 and cation purposes. Experimental results show that this scheme achieves a reduction in the size of the training data set of about 80% while the classi&#64257 and cation error only increases by 0.45% in one of the three data sets studied. Also this thesis presents a new fast algorithm for computing an approximation to the mean between two strings of characters representing a 2D shape and its application to a new Wilson-based editing proce dure. The approximate mean is built by including some symbols from the two original strings. Besides, a greedy approach to this algorithm is studied which allows to reduce the time required to computed an approximate mean. The new dataset editing scheme relaxes the criterion for deleting instances proposed by the Wilson editing procedure. In practice, not all instances misclassi&#64257 and ed by their near neighbors are pruned. Instead, an arti&#64257 and cial instance is added to the dataset in the hope of successfully classifying the instance in the future. The new arti&#64257 and cial instance is the approximated mean of the misclassi&#64257 and ed sample and its same-class nearest neighbor. Experiments carried over three widely known databases of contours show the proposed algorithms performs very well in computing the mean of two strings, outperforming methods proposed by other authors. Particularly the low computational time required by the heuristic approach make it very suitable when dealing with long length strings. Results also shows the propo- sed preprocessing scheme can reduce the classi&#64257 and cation error in about 83% of trials. There is empirical evidence that using the greedy approximation to compute the approximated mean does not affect the editing procedure performance. Finally, a new algorithm with which to compute an approximation to the mean of a set of strings is presented. The approximated mean is computed through the successive improvements of a partial solution. In each iteration, the edit distance from the partial solution to all the strings in the set are computed, thus accounting for the frequency of each of the edit operations in every position of the approximated mean. A goodness index for edit operations is later computed by multiplying their frequency by the cost. Each operation is tested, starting from that with the highest index, in order to verify whether applying it to the partial solution leads to an improvement. If successful, a new iteration begins from the new approximated mean. The algorithm &#64257 and nishes after all the operations have been examined without a better solution being found. Comparative experiments involving Freeman chain codes encoding 2D shapes show that the quality of the approximated mean string is similar to other approaches but achieves a much faster convergence.

PUBLICATIONS

2012