Posts

Showing posts from 2020

Re-Indexing Elasticsearch index with Existing Data

Image
  Recently, I was required to modify my existing elasticsearch index’s field mapping. But the problem is, there is no way to make a change in your existing field-mapping without creating a new index. But if I delete the existing index, all my data will be gone! So, what to do now??? Here comes a great solution! Recently elasticsearch has introduced a new API, RE_INDEX. Let’s see how can we use this to achieve our goal! Before that, I am gonna quote from the elasticsearch API: Reindex does not attempt to set up the destination index. It does not copy the settings of the source index. You should set up the destination index prior to running a _reindex action, including setting up mappings, shard counts, replicas, etc. So, let’s create a backup index : PUT project_copy { "settings" : { "index" : { "number_of_shards" : 5, "number_of_replicas" : 2 } } } Response: { "acknowledged" : true, ...

Java with MINIO file operations: upload, download, delete

Image
If you have started working with MIN IO called as MINIO  I guess you have heard enough about this. This is nothing but object storage service. Quoting from them MinIO is a High Performance Object Storage released under Apache License v2.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads. Today, we will see how to connect with minio, store documents as minio-object and get them from minio server with spring(java). Here, I should mention that their documentation is very good and you can achieve all these from them. But if you need live example code then you can follow here. First of all, you need minio server running in your machine. If you are struggling with that check  MinIO Quickstart Guide , pretty much straight forward. If it runs successfully, you will see a screen like this: Check, you have got default accessKey and secretKey for ...

Elasticsearch Deep Pagination : Search_After

Those who use elasticsearch for their data storage or data search purposes, sometimes need pagination. Like when we search in google, we see the first page on first result, then below the search result, we can see other search results in different pages which are mentioned below with page number in google's search result page. We can navigate to and from clicking on the pageNumber we are shown. For elasticsearch who has ever implemented pagination in traditional way providing from and size parameters, knows that elasticsearch does not allow more than 10k records for a query.  Those who are interested to have a short brief about different type of ES pagination, can check this  Elasticsearch, how we paginated over 10 000 items .  I am not going to describe full of it rather  a short brief here as the topic suggests. Elastic search provides a mechanism of pagination for more than 10k records and one of them is by search_after method which is much better for some...

Kafka connect to Elastic Search: Sink kafka Stream data to Elastic search

We have learned how to outsource stream data from MySQL change log to Kafka topic via kafka-source connector from  Kafka Stream API: MySQL CDC to apache Kafka with debezium  . Now we will learn how to ingest this data to Elastic search and create index on those documents. So, our target is:  Change in MySQL db table record , outsource the record in Kafka topic related to that table, ingest and create index on Elastic search I am assuming you are already having below mentioned services up and configured from the previous blog mentioned above with their compatible version: zookeeper ---> Running Kafka Broker --> Running Kafka Connect --> Running(We will rerun the same way we did in past but with extra config file, will see below) Requirements: Elastic Search : v_7.0.1 Kibana : v_7_0_1  Kafka-Elastic connector jar:  kafka-connect-elasticsearch-5.4.0.jar [donwload updated confluent platform , v5.4.0 , and go to confluent folder-> s...