摘要:决策树的构造使用属性选择度量来选择将元组最好地划分成不同类的属性,通过属性选择度量来确定各个属性间的拓扑结构。决策树构造的关键步骤就是分裂属性,即在某个节点处按照某一特征属性的不同划分构造不同的分支,其目标是让各个分裂子集尽可能地“纯”。
owing:
Enhance the accuracy and effectiveness of decision tree classification . For example , if the property has a whole range of discrete or continuous values, based on all the properties are classified or discretized ID3 algorithm needs to be improved to make it adapt to the new property features ; when the attribute is multi-valued attribute , ID3 algorithm used information gain as a measure to select the test attribute will affect the classification accuracy and validity , then you can use information gain ratio , Gini index , G- statistics and other measures to ensure selection algorithm is suitable for multi-valued attribute characteristics.
Algorithm space complexity requirements. Common ID3 and C4.5 algorithms are limited sample of main memory resident , because training sample in between main memory and cache be swapped out , the decision tree algorithm may become inefficient.
Algorithm time complexity requirements. The face of the number of samples is a lot of , as well as the number of attributes for each sample the number of values for each attribute are many and complex, and decision tree algorithms often require multiple scans the entire database , thus improving algorithm scalability , reducing algorithm time is crucial . [ 9 ]
Often people will use relevant indicators of a decision tree algorithm to evaluate , to reflect the relative merits of different algorithms . Decision Tree Algorithm evaluation mainly in the following aspects: tree size , the algorithm accuracy, complexity of the algorithm , the algorithm robustness , scalability algorithm , the algorithm can interpretability .
Tree size. Generating the decision tree , the guaranteed performance requirements of the case , always try to reduce the size of the tree , the size tree has two metrics : internal nodes of nodes and leaves . Generate more simple decision tree , classification and prediction ability is stronger . [ 3 ]
Algorithm accuracy. The so-called decision tree algorithm accuracy rate refers to the classification accuracy. It is to test the concentration of misclassified records measured as a percentage of total records . Improve the accuracy of decision tree pruning techniques , mainly through increased. [ 5 ]
Complexity of the algorithm . Computational complexity depends on the specific implementation details and hardware environment, in data mining , the operation object is a huge database , so space and time complexity of the problem will be a very important aspect .
Algorithm robustness . Decision tree algorithm robustness mainly involves the given data contains noise data , there is a vacancy or the data values, decision tree algorithm can effectively deal with these data , in order to ensure the correctness of classification prediction .
Algorithm scalability. Decision tree algorithm mainly involves massive data scalability , decision tree from a large dataset can quickly and accurately find hidden in one of the main classification rules .
Algorithm interpretability . Decision tree is an expression of knowledge , extracted from the generated decision tree classification rules should be clear and easy to understand , the only way it makes sense decision tree algorithm .
本论文由英语论文网提供整理,提供论文代写,英语论文代写,代写论文,代写英语论文,代写留学生论文,代写英文论文,留学生论文代写相关核心关键词搜索。