Re-Indexing Elasticsearch index with Existing Data
Recently, I was required to modify my existing elasticsearch index’s field mapping. But the problem is, there is no way to make a change in your existing field-mapping without creating a new index. But if I delete the existing index, all my data will be gone!
So, what to do now???
Here comes a great solution! Recently elasticsearch has introduced a new API, RE_INDEX. Let’s see how can we use this to achieve our goal!
Before that, I am gonna quote from the elasticsearch API:
Reindex does not attempt to set up the destination index. It does not copy the settings of the source index. You should set up the destination index prior to running a _reindex
action, including setting up mappings, shard counts, replicas, etc.
So, let’s create a backup index :
PUT project_copy
{
"settings" : {
"index" : {
"number_of_shards" : 5,
"number_of_replicas" : 2
}
}
}
Response:
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "project_copy"
}
Let’s copy all the data from the project index to project_copy.
POST _reindex
{
"source": {
"index": "project",
"size": 10000
},
"dest": {
"index": "project_copy",
"version_type": "external"
}
}
Response:
{
"took" : 749,
"timed_out" : false,
"total" : 8,
"updated" : 0,
"created" : 8,
"deleted" : 0,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
The above request is self-explanatory, rather will tell what size and version_type does: size- this one indicates how much data will be operated per batch operation. version_type: indicates that this will preserve all the versioning from the source index as it is, no change will be made.
Now, check if all the data are well in the project_copy index:
GET /project_copy/_search
Response: due to some obvious reason, I have committed some data, just shown the basic here-total 8 entries here.
Nice! out backup is done! Let’s change our target index field-mapping:
We will set one field as keyword in our project index, before that, we first delete that index:
DELETE /project_copy
Now, create that with changes we want(you add more fields mapping as per your need)
PUT project
{
"settings" : {
"index" : {
"number_of_shards" : 5,
"number_of_replicas" : 2
}
},
"mappings": {
"properties":{
"projectName":{
"type": "keyword"
}
}
}
}
Our task is almost done! Now follow the previous step to reindex the backup data here:
POST _reindex
{
"source": {
"index": "project_copy",
"size": 10000
},
"dest": {
"index": "project",
"version_type": "external"
}
}
Verify the data:
GEt /project/_search
Voila! We have successfully changed our index field-mapping and reindexed the old data!
Comments
Post a Comment