Feature Reduction for Network Intrusion Detection using Principal Component Analysis in Data Mining

Authors

  • Geraldin Dela Cruz Tarlac Agricultural University

Keywords:

Data Mining, Decision Trees, PCA, intrusion detection, WEKA

Abstract

Data Mining has emerged as one of the domains in the field of research. It is an analytic process designed to explore and search for
consistent patterns and systematic relationships between variables in a dataset. In Data Mining, patterns in huge data are analyzed
to extract useful information or knowledge. Discovering hidden information from historical data is among its important tasks while its
ultimate goal is prediction. Before the data mining process, data cleaning and preprocessing are done to reduce noise and redundancy
in the data. In this paper, the Principal Component Analysis (PCA) was utilized in reducing the dimensions of the KDDCup99 dataset.
The goal was to reduce data dimensionality, reduce noise, and remove redundancy to find the useful feature subset that had a high
influence in predicting network intrusion and reduce computational time. The study used the WEKA software in the experiment,
specifically the J48, RandomTree and RandomForest decision tree algorithms, which were capable in detecting intrusions. The
algorithms were first trained using 10-fold cross validation and the generated model was applied and tested. Then the results were
compared over the original and reduced dataset. The results of the experiment showed improvements in detecting network intrusions
in contrast to the reduced dataset over the original. This finding can be attributed to PCA as the pre- processing mechanism. It is
recommended that similar studies be conducted using other classification algorithms and integrating clustering technique to perform
anomaly detection and reduce the detection error rate. Future work is implementing the generated model in real time environment.

Downloads

Published

2018-01-01