python 2.7 - TypeError in Countvectorizer scikit-learn: Expected string or buffer -

- August 15, 2015

i trying solve classification problem. when feed text countvectorizer gives error:

expected string or buffer.

is wrong dataset contains message mixture of number , word special character in message.

sample how message following:

0         have not received gifts ordered ok 1                 hth wells idyll mcgill kooky bbc.co 2                                   test test test 1 test 3                                                    test 4                         hello reward points 5       hi, can koovs coupons or vouchers here...

here code used classification:

import pandas pd sklearn.feature_extraction.text import countvectorizer df = pd.read_excel('training_data.xlsx') x_train = df.message print x_train.shape map_class_label = {'checkin':0, 'greeting':1,'more reward options':2,'noclass':3, 'other':4,'points':5,                            'referral points':6,'snapbill':7, 'thanks':8,'voucher not working':9,'voucher':10} df['label_num'] = df['final category'].map(map_class_label) y_train = df.label_num vectorizer = countvectorizer(lowercase=false,decode_error='ignore') x_train_dtm = vectorizer.fit_transform(x_train)

you need convert column message string astype, because in data numeric values:

df = pd.read_excel('training_data.xlsx') df['message'] = df['message'].values.astype('unicode') ... ...

Search This Blog

celery

python 2.7 - TypeError in Countvectorizer scikit-learn: Expected string or buffer -

Comments

Post a Comment

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -