Re-Indexing Elasticsearch index with Existing Data

 


Recently, I was required to modify my existing elasticsearch index’s field mapping. But the problem is, there is no way to make a change in your existing field-mapping without creating a new index. But if I delete the existing index, all my data will be gone!

So, what to do now???

Here comes a great solution! Recently elasticsearch has introduced a new API, RE_INDEX. Let’s see how can we use this to achieve our goal!

Before that, I am gonna quote from the elasticsearch API:

Reindex does not attempt to set up the destination index. It does not copy the settings of the source index. You should set up the destination index prior to running a _reindex action, including setting up mappings, shard counts, replicas, etc.

So, let’s create a backup index :

PUT project_copy
{
"settings" : {
"index" : {
"number_of_shards" : 5,
"number_of_replicas" : 2
}
}
}

Response:

{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "project_copy"
}

Let’s copy all the data from the project index to project_copy.

POST _reindex
{
"source": {
"index": "project",
"size": 10000
},
"dest": {
"index": "project_copy",
"version_type": "external"
}
}

Response:

{
"took" : 749,
"timed_out" : false,
"total" : 8,
"updated" : 0,
"created" : 8,
"deleted" : 0,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}

The above request is self-explanatory, rather will tell what size and version_type does: size- this one indicates how much data will be operated per batch operation. version_type: indicates that this will preserve all the versioning from the source index as it is, no change will be made.

Now, check if all the data are well in the project_copy index:

GET /project_copy/_search

Response: due to some obvious reason, I have committed some data, just shown the basic here-total 8 entries here.

Nice! out backup is done! Let’s change our target index field-mapping:

We will set one field as keyword in our project index, before that, we first delete that index:

DELETE /project_copy

Now, create that with changes we want(you add more fields mapping as per your need)

PUT project
{
"settings" : {
"index" : {
"number_of_shards" : 5,
"number_of_replicas" : 2
}
},
"mappings": {
"properties":{
"projectName":{
"type": "keyword"
}
}
}
}

Our task is almost done! Now follow the previous step to reindex the backup data here:

POST _reindex
{
"source": {
"index": "project_copy",
"size": 10000
},
"dest": {
"index": "project",
"version_type": "external"
}
}

Verify the data:

GEt /project/_search

Voila! We have successfully changed our index field-mapping and reindexed the old data!

Comments

Popular posts from this blog

Java with MINIO file operations: upload, download, delete

Spring Boot Scheduler for Distributed System: Using shedlock

Kafka Stream API: MySQL CDC to apache Kafka with debezium