Where academic tradition
meets the exciting future

Three Factors Affecting the Predictive Performance of ANNs: Pre-Processing Method, Data Distribution and Training Mechanism

Adrian Costea, Iulian Nastac, Three Factors Affecting the Predictive Performance of ANNs: Pre-Processing Method, Data Distribution and Training Mechanism. TUCS Technical Reports 679, Turku Centre for Computer Science, 2005.

Abstract:

In this paper we analyze the implications of three different factors (preprocessing method, data distribution and training mechanism) on the classification performance of artificial neural networks (ANN). We use three preprocessing approaches: no preprocessing, normalization and division by the maximum absolute values. We study the implications of input data distributions by using five datasets with different distributions: the real data, uniform, normal, logistic and Laplace distributions. We test two training mechanisms: one belonging to the gradient-descent techniques, improved by a retraining procedure (RT), and the other is a genetic algorithm (GA), which is based on the principles of natural evolution. The results show statistically significant influences of all individual and combined factors on both training and testing performances. A major difference with other related studies is the fact that for both training mechanisms we train the network using as starting solution the one obtained when constructing the network architecture. In other words we use a hybrid approach by refining a previously obtained solution. We found that when the starting solution has relatively low accuracy rates (80-90%) GA clearly outperformed the retraining procedure, while the difference was smaller to zero when the starting solution had relatively high accuracy rates (95-98%). As reported in other studies we found little to no evidence of crossover operator influence on the GA performance.

Files:

Full publication in PDF-format

BibTeX entry:

@TECHREPORT{tCoNa05a,
  title = {Three Factors Affecting the Predictive Performance of ANNs: Pre-Processing Method, Data Distribution and Training Mechanism},
  author = {Costea, Adrian and Nastac, Iulian},
  number = {679},
  series = {TUCS Technical Reports},
  publisher = {Turku Centre for Computer Science},
  year = {2005},
  keywords = {Artificial Neural Networks, Genetic Algorithms, preprocessing method, data distribution, training mechanism},
  ISBN = {952-12-1530-5},
}

Belongs to TUCS Research Unit(s): Data Mining and Knowledge Management Laboratory

Edit publication