python 2.7 - mapper_parsing_exception error when indexing pdf in elasticsearch -
i trying index pdf using elasticsearch 2.3.4 , python. want extract text , metadata pdf index. using mapper_attachment plugin.
when trying index, getting 'mapper_parsing_exception'error. following code,
#configuration dir = 'd:/qa_testing/testing/data' es_host = {"host" : "localhost", "port" : 9200} index_name = 'testing' type_name = 'documents' url = "d:/xyz.pdf" es = elasticsearch(hosts = [es_host]) mapping = { "mappings": { "documents": { "properties": { "cv": { "type": "attachment" } }}}} file64 = open(url, "rb").read().encode("base64") data_dict = {'cv': file64} data_dict = json.dumps(data_dict) res = es.indices.create(index = index_name, body = mapping) es.index(index = index_name, body = data_dict ,doc_type = "attachment", id=1)
error:
traceback (most recent call last): file "c:/users/537095/desktop/qa/indexingworkspace/mainworkspace/index3.py", line 51, in <module> es.index(index = index_name, body = data_dict ,doc_type = "attachment", id=1) file "c:\python27\lib\site-packages\elasticsearch\client\utils.py", line 69, in _wrapped return func(*args, params=params, **kwargs) file "c:\python27\lib\site-packages\elasticsearch\client\__init__.py", line 261, in index _make_path(index, doc_type, id), params=params, body=body) file "c:\python27\lib\site-packages\elasticsearch\transport.py", line 329, in perform_request status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) file "c:\python27\lib\site-packages\elasticsearch\connection\http_urllib3.py", line 106, in perform_request self._raise_error(response.status, raw_data) file "c:\python27\lib\site-packages\elasticsearch\connection\base.py", line 105, in _raise_error raise http_exceptions.get(status_code, transporterror)(status_code, error_message, additional_info) requesterror: transporterror(400, u'mapper_parsing_exception', u'failed parse')
am doing wrong?
you need change doc_type
, should documents
, not attachment
es.index(index = index_name, body = data_dict ,doc_type = "documents", id=1)
Comments
Post a Comment