英语论文网

owing:

Enhance the accuracy and effectiveness of decision tree classification . For example , if the property has a whole range of discrete or continuous values, based on all the properties are classified or discretized ID3 algorithm needs to be improved to make it adapt to the new property features ; when the attribute is multi-valued attribute , ID3 algorithm used information gain as a measure to select the test attribute will affect the classification accuracy and validity , then you can use information gain ratio , Gini index , G- statistics and other measures to ensure selection algorithm is suitable for multi-valued attribute characteristics.

Algorithm space complexity requirements. Common ID3 and C4.5 algorithms are limited sample of main memory resident , because training sample in between main memory and cache be swapped out , the decision tree algorithm may become inefficient.

Algorithm time complexity requirements. The face of the number of samples is a lot of , as well as the number of attributes for each sample the number of values for each attribute are many and complex, and decision tree algorithms often require multiple scans the entire database , thus improving algorithm scalability , reducing algorithm time is crucial . [ 9 ]

Often people will use relevant indicators of a decision tree algorithm to evaluate , to reflect the relative merits of different algorithms . Decision Tree Algorithm evaluation mainly in the following aspects: tree size , the algorithm accuracy, complexity of the algorithm , the algorithm robustness , scalability algorithm , the algorithm can interpretability .

Tree size. Generating the decision tree , the guaranteed performance requirements of the case , always try to reduce the size of the tree , the size tree has two metrics : internal nodes of nodes and leaves . Generate more simple decision tree , classification and prediction ability is stronger . [ 3 ]

Algorithm accuracy. The so-called decision tree algorithm accuracy rate refers to the classification accuracy. It is to test the concentration of misclassified records measured as a percentage of total records . Improve the accuracy of decision tree pruning techniques , mainly through increased. [ 5 ]

Complexity of the algorithm . Computational complexity depends on the specific implementation details and hardware environment, in data mining , the operation object is a huge database , so space and time complexity of the problem will be a very important aspect .

Algorithm robustness . Decision tree algorithm robustness mainly involves the given data contains noise data , there is a vacancy or the data values, decision tree algorithm can effectively deal with these data , in order to ensure the correctness of classification prediction .

Algorithm scalability. Decision tree algorithm mainly involves massive data scalability , decision tree from a large dataset can quickly and accurately find hidden in one of the main classification rules .

Algorithm interpretability . Decision tree is an expression of knowledge , extracted from the generated decision tree classification rules should be clear and easy to understand , the only way it makes sense decision tree algorithm .