machine-learning – An introduction to Classificiation: Generating several models using Weka – Train the first classifier: Setting a baseline with ZeroR

ZeroR is a simple classifier. It doesn’t operate per instance instead it operates on general distribution of the classes. It selects the class with the largest a priori probability. It is not a good classifier in the sense that it doesn’t use any information in the candidate, but it is often used as a baseline. Note: Other baselines can be used aswel, such as: Industry standard classifiers or handcrafted rules

 // First we tell our data that it's class is hidden in the last attribute
 data.setClassIndex(data.numAttributes() -1);
 // Then we split the data in to two sets
 // randomize first because we don't want unequal distributions
 data.randomize(new java.util.Random(0));
 Instances testset = new Instances(data, 0, 50);
 Instances trainset = new Instances(data, 50, 99);
 
 // Now we build a classifier
 // Train it with the trainset
 ZeroR classifier1 = new ZeroR();
 classifier1.buildClassifier(trainset);
 // Next we test it against the testset
 Evaluation Test = new Evaluation(trainset);
 Test.evaluateModel(classifier1, testset);
 System.out.println(Test.toSummaryString());

The largest class in the set gives you a 34% correct rate. (50 out of 149)

The actual distribution of the classes in the set

Note: The ZeroR performs around 30%. This is because we splitted randomly into a train and test set. The largest set in the train set, will thusly be the smallest in the test set. Crafting a good test/train set can be worth your while

if you want to reproduce, please indicate the source:
machine-learning – An introduction to Classificiation: Generating several models using Weka – Train the first classifier: Setting a baseline with ZeroR - CodeDay