Elasticsearch - Find document by term which is only part of given query-string
I have a problem with searching in elasticsearch and hope that you can help.
I want to find a document which is keyword tokenized, only lowercased by the analyzer in the index. When the generated term is part of the searched query, I want Elasticsearch to find it.
Example search:
"query": {
"match": {
"categoryNames": "CD&DVD Aufbewahrung schwarz"
}
}
Document I want to find:
"_source": {
"categoryId": 11972638,
"categoryNames": [
"DVD-Koffer",
"CD-Koffer",
"CD-Aufbewahrung",
"DVD-Aufbwahrung",
"DVD-Ordner",
"EDV-DVD-Aufbewahrung",
"EDV-CD-Aufbewahrung",
"CD&DVD Aufbewahrung",
"Multimediabox"
],
"lvl3Id": 11972638
}
Index Analyzer:
"analysis" : {
"analyzer" : {
"default" : {
"type": "custom",
"tokenizer": "keyword",
"filter" : ["lowercase"]
}
}
}
Termvectors of the document, which I want to find:
"cd&dvd aufbewahrung": {
"term_freq": 1,
"tokens": [
...
]
},
"cd-aufbewahrung": {
"term_freq": 1,
"tokens": [
...
]
},
"cd-koffer": {
"term_freq": 1,
"tokens": [
...
]
},
....
I have no result. When I am only searching for "CD&DVD aufbewahrung", I find the document.
I think that elasticsearch is trying to find a term "CD&DVD Aufbewahrung schwarz" which not exists, instead of matching "CD&DVD Aufbewahrung" and ignore "schwarz".
The search cannot use the standard analyzer, because it is important that only "CD&DVD Aufbewahrung" find "CD&DVD Aufbewahrung" and not for example a term which only contains "Aufbewahrung" or "Aufbewahrung CD&DVD", which will be found when the term is analyzed by e.g. whitespaces.
A few example searches with my expectations for the document above:
CD&DVD Aufbewahrung -> Found
CD&DVD aufbewahrung -> Found
schwarz CD&DVD Aufbewahrung -> Found
CD&DVD Aufbewahrung gelb -> Found
schwarz CD&DVD Aufbewahrung gelb -> Found
CD&DVD schwarz Aufbewahrung -> not Found
Aufbewahrung CD&DVD -> not Found
schwarz CD & DVD Aufbewahrung -> not Found
schwarzCD&DVD Aufbewahrung -> Not Found
Has anyone an idea how to fix this?
elasticsearch match-query
add a comment |
I have a problem with searching in elasticsearch and hope that you can help.
I want to find a document which is keyword tokenized, only lowercased by the analyzer in the index. When the generated term is part of the searched query, I want Elasticsearch to find it.
Example search:
"query": {
"match": {
"categoryNames": "CD&DVD Aufbewahrung schwarz"
}
}
Document I want to find:
"_source": {
"categoryId": 11972638,
"categoryNames": [
"DVD-Koffer",
"CD-Koffer",
"CD-Aufbewahrung",
"DVD-Aufbwahrung",
"DVD-Ordner",
"EDV-DVD-Aufbewahrung",
"EDV-CD-Aufbewahrung",
"CD&DVD Aufbewahrung",
"Multimediabox"
],
"lvl3Id": 11972638
}
Index Analyzer:
"analysis" : {
"analyzer" : {
"default" : {
"type": "custom",
"tokenizer": "keyword",
"filter" : ["lowercase"]
}
}
}
Termvectors of the document, which I want to find:
"cd&dvd aufbewahrung": {
"term_freq": 1,
"tokens": [
...
]
},
"cd-aufbewahrung": {
"term_freq": 1,
"tokens": [
...
]
},
"cd-koffer": {
"term_freq": 1,
"tokens": [
...
]
},
....
I have no result. When I am only searching for "CD&DVD aufbewahrung", I find the document.
I think that elasticsearch is trying to find a term "CD&DVD Aufbewahrung schwarz" which not exists, instead of matching "CD&DVD Aufbewahrung" and ignore "schwarz".
The search cannot use the standard analyzer, because it is important that only "CD&DVD Aufbewahrung" find "CD&DVD Aufbewahrung" and not for example a term which only contains "Aufbewahrung" or "Aufbewahrung CD&DVD", which will be found when the term is analyzed by e.g. whitespaces.
A few example searches with my expectations for the document above:
CD&DVD Aufbewahrung -> Found
CD&DVD aufbewahrung -> Found
schwarz CD&DVD Aufbewahrung -> Found
CD&DVD Aufbewahrung gelb -> Found
schwarz CD&DVD Aufbewahrung gelb -> Found
CD&DVD schwarz Aufbewahrung -> not Found
Aufbewahrung CD&DVD -> not Found
schwarz CD & DVD Aufbewahrung -> not Found
schwarzCD&DVD Aufbewahrung -> Not Found
Has anyone an idea how to fix this?
elasticsearch match-query
add a comment |
I have a problem with searching in elasticsearch and hope that you can help.
I want to find a document which is keyword tokenized, only lowercased by the analyzer in the index. When the generated term is part of the searched query, I want Elasticsearch to find it.
Example search:
"query": {
"match": {
"categoryNames": "CD&DVD Aufbewahrung schwarz"
}
}
Document I want to find:
"_source": {
"categoryId": 11972638,
"categoryNames": [
"DVD-Koffer",
"CD-Koffer",
"CD-Aufbewahrung",
"DVD-Aufbwahrung",
"DVD-Ordner",
"EDV-DVD-Aufbewahrung",
"EDV-CD-Aufbewahrung",
"CD&DVD Aufbewahrung",
"Multimediabox"
],
"lvl3Id": 11972638
}
Index Analyzer:
"analysis" : {
"analyzer" : {
"default" : {
"type": "custom",
"tokenizer": "keyword",
"filter" : ["lowercase"]
}
}
}
Termvectors of the document, which I want to find:
"cd&dvd aufbewahrung": {
"term_freq": 1,
"tokens": [
...
]
},
"cd-aufbewahrung": {
"term_freq": 1,
"tokens": [
...
]
},
"cd-koffer": {
"term_freq": 1,
"tokens": [
...
]
},
....
I have no result. When I am only searching for "CD&DVD aufbewahrung", I find the document.
I think that elasticsearch is trying to find a term "CD&DVD Aufbewahrung schwarz" which not exists, instead of matching "CD&DVD Aufbewahrung" and ignore "schwarz".
The search cannot use the standard analyzer, because it is important that only "CD&DVD Aufbewahrung" find "CD&DVD Aufbewahrung" and not for example a term which only contains "Aufbewahrung" or "Aufbewahrung CD&DVD", which will be found when the term is analyzed by e.g. whitespaces.
A few example searches with my expectations for the document above:
CD&DVD Aufbewahrung -> Found
CD&DVD aufbewahrung -> Found
schwarz CD&DVD Aufbewahrung -> Found
CD&DVD Aufbewahrung gelb -> Found
schwarz CD&DVD Aufbewahrung gelb -> Found
CD&DVD schwarz Aufbewahrung -> not Found
Aufbewahrung CD&DVD -> not Found
schwarz CD & DVD Aufbewahrung -> not Found
schwarzCD&DVD Aufbewahrung -> Not Found
Has anyone an idea how to fix this?
elasticsearch match-query
I have a problem with searching in elasticsearch and hope that you can help.
I want to find a document which is keyword tokenized, only lowercased by the analyzer in the index. When the generated term is part of the searched query, I want Elasticsearch to find it.
Example search:
"query": {
"match": {
"categoryNames": "CD&DVD Aufbewahrung schwarz"
}
}
Document I want to find:
"_source": {
"categoryId": 11972638,
"categoryNames": [
"DVD-Koffer",
"CD-Koffer",
"CD-Aufbewahrung",
"DVD-Aufbwahrung",
"DVD-Ordner",
"EDV-DVD-Aufbewahrung",
"EDV-CD-Aufbewahrung",
"CD&DVD Aufbewahrung",
"Multimediabox"
],
"lvl3Id": 11972638
}
Index Analyzer:
"analysis" : {
"analyzer" : {
"default" : {
"type": "custom",
"tokenizer": "keyword",
"filter" : ["lowercase"]
}
}
}
Termvectors of the document, which I want to find:
"cd&dvd aufbewahrung": {
"term_freq": 1,
"tokens": [
...
]
},
"cd-aufbewahrung": {
"term_freq": 1,
"tokens": [
...
]
},
"cd-koffer": {
"term_freq": 1,
"tokens": [
...
]
},
....
I have no result. When I am only searching for "CD&DVD aufbewahrung", I find the document.
I think that elasticsearch is trying to find a term "CD&DVD Aufbewahrung schwarz" which not exists, instead of matching "CD&DVD Aufbewahrung" and ignore "schwarz".
The search cannot use the standard analyzer, because it is important that only "CD&DVD Aufbewahrung" find "CD&DVD Aufbewahrung" and not for example a term which only contains "Aufbewahrung" or "Aufbewahrung CD&DVD", which will be found when the term is analyzed by e.g. whitespaces.
A few example searches with my expectations for the document above:
CD&DVD Aufbewahrung -> Found
CD&DVD aufbewahrung -> Found
schwarz CD&DVD Aufbewahrung -> Found
CD&DVD Aufbewahrung gelb -> Found
schwarz CD&DVD Aufbewahrung gelb -> Found
CD&DVD schwarz Aufbewahrung -> not Found
Aufbewahrung CD&DVD -> not Found
schwarz CD & DVD Aufbewahrung -> not Found
schwarzCD&DVD Aufbewahrung -> Not Found
Has anyone an idea how to fix this?
elasticsearch match-query
elasticsearch match-query
edited Nov 21 '18 at 13:32
asked Nov 21 '18 at 12:49
Marcel Balzer
2,2232926
2,2232926
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Maybe custom analyzer with Shingle Token Filter will be helpful here. Please see code below:
Mapping
PUT /so53412408
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_keyword": {
"tokenizer": "keyword",
"filter": [
"lowercase"
]
},
"lowercase_shingle": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"my_shingle"
]
}
},
"filter": {
"my_shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 4
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"categoryNames": {
"type": "text",
"analyzer": "lowercase_keyword",
"search_analyzer": "lowercase_shingle"
}
}
}
}
}
Sample data
POST /so53412408/_doc
{
"categoryNames": [
"DVD-Koffer",
"CD-Koffer",
"CD-Aufbewahrung",
"DVD-Aufbwahrung",
"DVD-Ordner",
"EDV-DVD-Aufbewahrung",
"EDV-CD-Aufbewahrung",
"CD&DVD Aufbewahrung",
"Multimediabox"
]
}
Search query
GET /so53412408/_search
{
"query": {
"match": {
"categoryNames": "schwarzCD&DVD Aufbewahrung"
}
}
}
Results
CD&DVD Aufbewahrung -> Found
CD&DVD aufbewahrung -> Found
schwarz CD&DVD Aufbewahrung -> Found
CD&DVD Aufbewahrung gelb -> Found
schwarz CD&DVD Aufbewahrung gelb -> Found
CD&DVD schwarz Aufbewahrung -> Not Found
Aufbewahrung CD&DVD -> Not Found
schwarz CD & DVD Aufbewahrung -> Not Found
schwarzCD&DVD Aufbewahrung -> Not Found
1
Thank a lot, works perfect.
– Marcel Balzer
Nov 22 '18 at 11:23
Your welcome. Great to hear that :)
– Piotr Pradzynski
Nov 22 '18 at 11:52
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53412408%2felasticsearch-find-document-by-term-which-is-only-part-of-given-query-string%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Maybe custom analyzer with Shingle Token Filter will be helpful here. Please see code below:
Mapping
PUT /so53412408
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_keyword": {
"tokenizer": "keyword",
"filter": [
"lowercase"
]
},
"lowercase_shingle": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"my_shingle"
]
}
},
"filter": {
"my_shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 4
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"categoryNames": {
"type": "text",
"analyzer": "lowercase_keyword",
"search_analyzer": "lowercase_shingle"
}
}
}
}
}
Sample data
POST /so53412408/_doc
{
"categoryNames": [
"DVD-Koffer",
"CD-Koffer",
"CD-Aufbewahrung",
"DVD-Aufbwahrung",
"DVD-Ordner",
"EDV-DVD-Aufbewahrung",
"EDV-CD-Aufbewahrung",
"CD&DVD Aufbewahrung",
"Multimediabox"
]
}
Search query
GET /so53412408/_search
{
"query": {
"match": {
"categoryNames": "schwarzCD&DVD Aufbewahrung"
}
}
}
Results
CD&DVD Aufbewahrung -> Found
CD&DVD aufbewahrung -> Found
schwarz CD&DVD Aufbewahrung -> Found
CD&DVD Aufbewahrung gelb -> Found
schwarz CD&DVD Aufbewahrung gelb -> Found
CD&DVD schwarz Aufbewahrung -> Not Found
Aufbewahrung CD&DVD -> Not Found
schwarz CD & DVD Aufbewahrung -> Not Found
schwarzCD&DVD Aufbewahrung -> Not Found
1
Thank a lot, works perfect.
– Marcel Balzer
Nov 22 '18 at 11:23
Your welcome. Great to hear that :)
– Piotr Pradzynski
Nov 22 '18 at 11:52
add a comment |
Maybe custom analyzer with Shingle Token Filter will be helpful here. Please see code below:
Mapping
PUT /so53412408
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_keyword": {
"tokenizer": "keyword",
"filter": [
"lowercase"
]
},
"lowercase_shingle": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"my_shingle"
]
}
},
"filter": {
"my_shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 4
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"categoryNames": {
"type": "text",
"analyzer": "lowercase_keyword",
"search_analyzer": "lowercase_shingle"
}
}
}
}
}
Sample data
POST /so53412408/_doc
{
"categoryNames": [
"DVD-Koffer",
"CD-Koffer",
"CD-Aufbewahrung",
"DVD-Aufbwahrung",
"DVD-Ordner",
"EDV-DVD-Aufbewahrung",
"EDV-CD-Aufbewahrung",
"CD&DVD Aufbewahrung",
"Multimediabox"
]
}
Search query
GET /so53412408/_search
{
"query": {
"match": {
"categoryNames": "schwarzCD&DVD Aufbewahrung"
}
}
}
Results
CD&DVD Aufbewahrung -> Found
CD&DVD aufbewahrung -> Found
schwarz CD&DVD Aufbewahrung -> Found
CD&DVD Aufbewahrung gelb -> Found
schwarz CD&DVD Aufbewahrung gelb -> Found
CD&DVD schwarz Aufbewahrung -> Not Found
Aufbewahrung CD&DVD -> Not Found
schwarz CD & DVD Aufbewahrung -> Not Found
schwarzCD&DVD Aufbewahrung -> Not Found
1
Thank a lot, works perfect.
– Marcel Balzer
Nov 22 '18 at 11:23
Your welcome. Great to hear that :)
– Piotr Pradzynski
Nov 22 '18 at 11:52
add a comment |
Maybe custom analyzer with Shingle Token Filter will be helpful here. Please see code below:
Mapping
PUT /so53412408
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_keyword": {
"tokenizer": "keyword",
"filter": [
"lowercase"
]
},
"lowercase_shingle": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"my_shingle"
]
}
},
"filter": {
"my_shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 4
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"categoryNames": {
"type": "text",
"analyzer": "lowercase_keyword",
"search_analyzer": "lowercase_shingle"
}
}
}
}
}
Sample data
POST /so53412408/_doc
{
"categoryNames": [
"DVD-Koffer",
"CD-Koffer",
"CD-Aufbewahrung",
"DVD-Aufbwahrung",
"DVD-Ordner",
"EDV-DVD-Aufbewahrung",
"EDV-CD-Aufbewahrung",
"CD&DVD Aufbewahrung",
"Multimediabox"
]
}
Search query
GET /so53412408/_search
{
"query": {
"match": {
"categoryNames": "schwarzCD&DVD Aufbewahrung"
}
}
}
Results
CD&DVD Aufbewahrung -> Found
CD&DVD aufbewahrung -> Found
schwarz CD&DVD Aufbewahrung -> Found
CD&DVD Aufbewahrung gelb -> Found
schwarz CD&DVD Aufbewahrung gelb -> Found
CD&DVD schwarz Aufbewahrung -> Not Found
Aufbewahrung CD&DVD -> Not Found
schwarz CD & DVD Aufbewahrung -> Not Found
schwarzCD&DVD Aufbewahrung -> Not Found
Maybe custom analyzer with Shingle Token Filter will be helpful here. Please see code below:
Mapping
PUT /so53412408
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_keyword": {
"tokenizer": "keyword",
"filter": [
"lowercase"
]
},
"lowercase_shingle": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"my_shingle"
]
}
},
"filter": {
"my_shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 4
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"categoryNames": {
"type": "text",
"analyzer": "lowercase_keyword",
"search_analyzer": "lowercase_shingle"
}
}
}
}
}
Sample data
POST /so53412408/_doc
{
"categoryNames": [
"DVD-Koffer",
"CD-Koffer",
"CD-Aufbewahrung",
"DVD-Aufbwahrung",
"DVD-Ordner",
"EDV-DVD-Aufbewahrung",
"EDV-CD-Aufbewahrung",
"CD&DVD Aufbewahrung",
"Multimediabox"
]
}
Search query
GET /so53412408/_search
{
"query": {
"match": {
"categoryNames": "schwarzCD&DVD Aufbewahrung"
}
}
}
Results
CD&DVD Aufbewahrung -> Found
CD&DVD aufbewahrung -> Found
schwarz CD&DVD Aufbewahrung -> Found
CD&DVD Aufbewahrung gelb -> Found
schwarz CD&DVD Aufbewahrung gelb -> Found
CD&DVD schwarz Aufbewahrung -> Not Found
Aufbewahrung CD&DVD -> Not Found
schwarz CD & DVD Aufbewahrung -> Not Found
schwarzCD&DVD Aufbewahrung -> Not Found
answered Nov 21 '18 at 15:35
Piotr Pradzynski
1,57121026
1,57121026
1
Thank a lot, works perfect.
– Marcel Balzer
Nov 22 '18 at 11:23
Your welcome. Great to hear that :)
– Piotr Pradzynski
Nov 22 '18 at 11:52
add a comment |
1
Thank a lot, works perfect.
– Marcel Balzer
Nov 22 '18 at 11:23
Your welcome. Great to hear that :)
– Piotr Pradzynski
Nov 22 '18 at 11:52
1
1
Thank a lot, works perfect.
– Marcel Balzer
Nov 22 '18 at 11:23
Thank a lot, works perfect.
– Marcel Balzer
Nov 22 '18 at 11:23
Your welcome. Great to hear that :)
– Piotr Pradzynski
Nov 22 '18 at 11:52
Your welcome. Great to hear that :)
– Piotr Pradzynski
Nov 22 '18 at 11:52
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53412408%2felasticsearch-find-document-by-term-which-is-only-part-of-given-query-string%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown