python - AUC calculation in decision tree in scikit-learn -

- May 15, 2011

using scikit-learn python 2.7 on windows, wrong code calculate auc? thanks.

from sklearn.datasets import load_iris sklearn.cross_validation import cross_val_score sklearn.tree import decisiontreeclassifier clf = decisiontreeclassifier(random_state=0) iris = load_iris() #print cross_val_score(clf, iris.data, iris.target, cv=10, scoring="precision") #print cross_val_score(clf, iris.data, iris.target, cv=10, scoring="recall") print cross_val_score(clf, iris.data, iris.target, cv=10, scoring="roc_auc")  traceback (most recent call last):   file "c:/users/foo/pycharmprojects/codeexercise/decisiontree.py", line 8, in <module>     print cross_val_score(clf, iris.data, iris.target, cv=10, scoring="roc_auc")   file "c:\python27\lib\site-packages\sklearn\cross_validation.py", line 1433, in cross_val_score     train, test in cv)   file "c:\python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 800, in __call__     while self.dispatch_one_batch(iterator):   file "c:\python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 658, in dispatch_one_batch     self._dispatch(tasks)   file "c:\python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 566, in _dispatch     job = immediatecomputebatch(batch)   file "c:\python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 180, in __init__     self.results = batch()   file "c:\python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 72, in __call__     return [func(*args, **kwargs) func, args, kwargs in self.items]   file "c:\python27\lib\site-packages\sklearn\cross_validation.py", line 1550, in _fit_and_score     test_score = _score(estimator, x_test, y_test, scorer)   file "c:\python27\lib\site-packages\sklearn\cross_validation.py", line 1606, in _score     score = scorer(estimator, x_test, y_test)   file "c:\python27\lib\site-packages\sklearn\metrics\scorer.py", line 159, in __call__     raise valueerror("{0} format not supported".format(y_type)) valueerror: multiclass format not supported

edit 1, looks scikit learn decide threshold without machine learning models, wondering why,

import numpy np sklearn.metrics import roc_curve y = np.array([1, 1, 2, 2]) scores = np.array([0.1, 0.4, 0.35, 0.8]) fpr, tpr, thresholds = roc_curve(y, scores, pos_label=2) print fpr print tpr print thresholds

the roc_auc in sklearn works binary class:

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html

one way move around issue binarize label , extend classification one-vs-all scheme. in sklearn can use sklearn.preprocessing.labelbinarizer. documentation here:

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.labelbinarizer.html

Search This Blog

celery

python - AUC calculation in decision tree in scikit-learn -

Comments

Post a Comment

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -