python 2.7 - mapper_parsing_exception error when indexing pdf in elasticsearch -


i trying index pdf using elasticsearch 2.3.4 , python. want extract text , metadata pdf index. using mapper_attachment plugin.

when trying index, getting 'mapper_parsing_exception'error. following code,

#configuration  dir = 'd:/qa_testing/testing/data' es_host = {"host" : "localhost", "port" : 9200} index_name = 'testing' type_name = 'documents' url = "d:/xyz.pdf"  es = elasticsearch(hosts = [es_host])  mapping = {   "mappings": {     "documents": {       "properties": {         "cv": { "type": "attachment" } }}}}  file64 = open(url, "rb").read().encode("base64") data_dict = {'cv': file64} data_dict = json.dumps(data_dict)  res = es.indices.create(index = index_name, body = mapping)  es.index(index = index_name, body = data_dict ,doc_type = "attachment", id=1) 

error:

traceback (most recent call last):   file "c:/users/537095/desktop/qa/indexingworkspace/mainworkspace/index3.py", line 51, in <module>     es.index(index = index_name, body = data_dict ,doc_type = "attachment", id=1)   file "c:\python27\lib\site-packages\elasticsearch\client\utils.py", line 69, in _wrapped     return func(*args, params=params, **kwargs)   file "c:\python27\lib\site-packages\elasticsearch\client\__init__.py", line 261, in index     _make_path(index, doc_type, id), params=params, body=body)   file "c:\python27\lib\site-packages\elasticsearch\transport.py", line 329, in perform_request     status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)   file "c:\python27\lib\site-packages\elasticsearch\connection\http_urllib3.py", line 106, in perform_request     self._raise_error(response.status, raw_data)   file "c:\python27\lib\site-packages\elasticsearch\connection\base.py", line 105, in _raise_error     raise http_exceptions.get(status_code, transporterror)(status_code, error_message, additional_info) requesterror: transporterror(400, u'mapper_parsing_exception', u'failed parse') 

am doing wrong?

you need change doc_type, should documents , not attachment

es.index(index = index_name, body = data_dict ,doc_type = "documents", id=1) 

Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -