Machine Learning(40717) Fall 2008 From: 1387/8/21 Due: 1387/9/4 Assignment 2 Computer Engineering Machine Learning Tools(1) “Weka” Sharif University of Technology What is Weka: (Waikato Environment for Knowledge Analysis) Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato. WEKA is free software available under the GNU General Public License. Weka is a collection of machine learning algorithms that especially used in data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. In this Assignment you should work with Weka and use it to analysis different machine learning algorithms. Downloading and installing Weka: There are different options for downloading and installing it on your system. See: http://www.cs.waikato.ac.nz/~ml/weka/index.html Learning Weka: You must use Weka Knowledge Explorer. For more info about it, you can see: http://www.cs.waikato.ac.nz/~ml/weka/gui_explorer.html Downloading Data Sets: You must download two dataset including spambase.arff and iris.arff from: http://www.hakank.org/weka/iris.arff http://www.hakank.org/weka/spambase.arff What you need to do: 1. For each dataset D, perform the following experiments using 1X-fold crossvalidation: (X is the last digit of your student ID. If (X=2) then {12-fold crossvalidation}) a. Create a classifier based on D using J48 (C4.5): • Run error-based pruning (the default), while testing the effect of the Confidence factor parameter (if (x is odd) then {0.25, 0.5, and 0.75} else {0.3, 0.5, and 0.7}). • Run "Reduced error pruning" (which uses validation set), testing the effect of the validation set portion (if (x is odd) then {3, 5, and 7} else {4, 6, and 8}). b. Create a classifier based on D using KNN (IBK), while examining the affect of: • Different values of K (if (x is odd) then {3, 5, and 7} else {4, 9, and 14}). • Weights by distance (1/distance and 1-distance). c. Create a classifier based on D using NaiveBayes. d. Create a classifier based on D using ID3. (First you must discretize the non-nominal attributes by using discretize filter) 2. Summarize your experimental results in tables and graphs. 3. Compare the performance of ID3, KNN, j48, and NaiveBayes based on the accuracy they yielded in the above experiments. Draw conclusions about their relative performance with respect to the datasets of different nature and with different parameter settings. Note: Don’t forget to perform all experiments under 1X-fold cross-validation! (X is the last digit of your student ID. If (X=2) then {12-fold crossvalidation}) Feel free to contact Mr.Ghasempour for your questions by ghasempour@ce.sharif.edu Delivery format: 1.You should briefly explain what your work with Weka for doing this assignment, and especially and completely explain your result as graph, table, … and analyze them. 2.You should upload your result + Document + other needed files as single zipped file only in Sharif Courseware (http://cw.sharif.edu). Your file name should be HW_X_ID_fullName.rar that X indicate homework number and ID indicate you student number for example Æ “HW_2_87111111_RasoulMohammadiNasiri.rar”. 3. Check your file before uploading, no corruption would have expected.
© Copyright 2025 Paperzz