Moving to managed: The case for the Amazon Elasticsearch Service

Prior to joining AWS, I led a development team that built mobile advertising solutions with Elasticsearch. Elasticsearch is a popular open-source search and analytics engine for log analytics, real-time application monitoring, clickstream analysis, and (of course) search. The platform I was responsible for was essential to driving my company’s business.

My team ran a self-managed implementation of Elasticsearch on AWS. At the time, a managed Elasticsearch offering wasn’t available. We had to build the scripting and tooling to deploy our Elasticsearch clusters across three geographical regions. This included the following tasks (and more):

  • Configuring the networking, routing, and firewall rules to enable the cluster to communicate.
  • Securing Elasticsearch management APIs from unauthorized access.
  • Creating load balancers for request distribution across data nodes.
  • Creating automatic scaling groups to replace the instances if there were issues.
  • Automating the configuration.
  • Managing the upgrades for security concerns.

If I had one word to describe the experience, it would be “painful.” Deploying and managing your own Elasticsearch clusters at scale takes a lot of time and knowledge to do it properly. Perhaps most importantly, it took my engineers away from doing what they do best—innovating and producing solutions for our customers.

Amazon Elasticsearch Service (Amazon ES) launched on October 1, 2015, almost 2 years after I joined AWS. Almost 5 years later, Amazon ES is in the best position ever to provide you with a compelling feature set that enables your search-related needs. With Amazon ES, you get a fully managed service that makes it easy to deploy, operate, and scale Elasticsearch clusters securely and cost-effectively in the AWS Cloud. Amazon ES offers direct access to the Elasticsearch APIs, which makes your existing code and applications using Elasticsearch work seamlessly with the service.

Amazon ES provisions all the resources for your Elasticsearch cluster and launches it in any Region of your choice within minutes. It automatically detects and replaces failed Elasticsearch nodes, which reduces the overhead associated with self-managed infrastructures. You can scale your cluster horizontally or vertically, up to 3 PB of data, with zero downtime through a single API call or a few clicks on the AWS Management Console. With this flexibility, Amazon ES can support any workload from single-node development clusters to production-scale, multi-node clusters.

Amazon ES also provides a robust set of Kibana plugins, free of any licensing fees. Features like fine-grained access control, alerting, index state management, and SQL support are just a few of the many examples. The Amazon ES feature set originates from the needs of customers such as yourself and through open-source community initiatives such as Open Distro for Elasticsearch (ODFE).

You need to factor several considerations into your decision to move to a managed service. Obviously, you want your teams focused on doing meaningful work that propels the growth of your company. Deciding what processes you offload to a managed service versus what are best self-managed can be a challenge. Based on my experience managing Elasticsearch at my prior employer, and having worked with thousands of customers who have migrated to AWS, I consider the following sections important topics for you to review.

Workloads

Before migrating to a managed service, you might look to what others are doing in their “vertical,” whether it be in finance, telecommunications, legal, ecommerce, manufacturing, or any number of other markets. You can take comfort in knowing that thousands of customers across these verticals successfully deploy their search, log analytics, SIEM, and other workloads on Amazon ES.

Elasticsearch is by default a search engine. Compass uses Amazon ES to scale their search infrastructure and build a complete, scalable, real estate home-search solution. By using industry-leading search and analytical tools, they make every listing in the company’s catalog discoverable to consumers and help real estate professionals find, market, and sell homes faster.

With powerful tools such as aggregations and alerting, Elasticsearch is widely used for log analytics workloads to gain insights into operational activities. As Intuit moves to a cloud hosting architecture, the company is on an “observability” journey to transform the way it monitors its applications’ health. Intuit used Amazon ES to build an observability solution, which provides visibility to its operational health across the platform, from containers to serverless applications.

When it comes to security, Sophos is a worldwide leader in next-generation cybersecurity, and protects its customers from today’s most advanced cyber threats. Sophos developed a large-scale security monitoring and alerting system using Amazon ES and other AWS components because they know Amazon ES is well suited for security use cases at scale.

Whether it be finding a home, detecting security events, or assisting developers with finding issues in applications, Amazon ES supports a wide range of use cases and workloads.

Cost

Any discussion around operational best practices has to factor in cost. With Amazon ES, you can select the optimal instance type and storage option for your workload with a few clicks on the console. If you’re uncertain of your compute and storage requirements, Amazon ES has on-demand pricing with no upfront costs or long-term commitments. When you know your workload requirements, you can lock in significant cost savings with Reserved Instance pricing for Amazon ES.

Compute and infrastructure costs are just one part of the equation. At AWS, we encourage customers to evaluate their Total Cost of Ownership (TCO) when comparing solutions. As an organizational decision-maker, you have to consider all the related cost benefits when choosing to replace your self-managed environment. Some of the factors I encourage my customers to consider are:

  • How much are you paying to manage the operation of your cluster 24/7/365?
  • How much do you spend on building the operational components, such as support processes, as well as automated and manual remediation procedures for clusters in your environment?
  • What are the license costs for advanced features?
  • What costs do you pay for networking between the clusters or the DNS services to expose your offerings?
  • How much do you spend on backup processes and how quickly can you recover from any failures?

The beauty of Amazon ES is that you do not need to focus on these issues. Amazon ES provides operational teams to manage your clusters, automated hourly backups of data for 14 days for your cluster, automated remediation of events with your cluster, and incremental license-free features as one of the basic tenants of the service.

You also need to pay particular close attention to managing the cost of storing your data in Elasticsearch. In the past, to keep storage costs from getting out of control, self-managed Elasticsearch users had to rely on solutions that were complicated to manage across data tiers and in some cases didn’t give you quick access to that data. AWS solved this problem with UltraWarm, a new, low-cost storage tier. UltraWarm lets you store and interactively analyze your data, backed by Amazon Simple Storage Service (Amazon S3) using Elasticsearch and Kibana, while reducing your cost per GB by almost 90% over existing hot storage options.

Security

In my conversations with customers, their primary concern is security. One data breech can cost millions and forever damage a company’s reputation. Providing you with the tools to secure your data is a critical component of our service. For your data in Amazon ES, you can do the following:

Many customers want to have a single sign-on environment when integrating with Kibana. Amazon ES offers Amazon Cognito authentication for Kibana. You can choose to integrate identity providers such as AWS Single Sign-On, PingFederate, Okta, and others. For more information, see Integrating Third-Party SAML Identity Providers with Amazon Cognito User Pools.

Recently, Amazon ES introduced fine-grained access control (FGAC). FGAC provides granular control of your data on Amazon ES. For example, depending on who makes the request, you might want a search to return results from only one index. You might want to hide certain fields in your documents or exclude certain documents altogether. FGAC gives you the power to control who sees what data exists in your Amazon ES domain.

Compliance

Many organizations need to adhere to a number of compliance standards. Those that have experienced auditing and certification activities know that ensuring compliance is an expensive, complex, and long process. However, by using Amazon ES, you benefit from the work AWS has done to ensure compliance with a number of important standards. Amazon ES is PCI DSS, SOC, ISO, and FedRamp compliant to help you meet industry-specific or regulatory requirements. Because Amazon ES is a HIPAA-eligible service, processing, storing and transmitting PHI can help you accelerate development of these sensitive workloads.

Amazon ES is part of the services in scope of the most recent assessment. You can build solutions on top of Amazon ES with the knowledge that independent auditors acknowledge that the service meets the bar for these important industry standards.

Availability and resiliency

When you build an Elasticsearch deployment either on premises or in cloud environments, you need to think about how your implementation can survive failures. You also need to figure out how you can recover from failures when they occur. At AWS, we like to plan for the fact that things do break, such as hardware failures and disk failures, to name a few.

Unlike virtually every other technology infrastructure provider, each AWS Region has multiple Availability Zones. Each Availability Zone consists of one or more data centers, physically separated from one another, with redundant power and networking. For high availability and performance of your applications, you can deploy applications across multiple Availability Zones in the same Region for fault tolerance and low latency. Availability Zones interconnect with fast, private fiber-optic networking, which enables you to design applications that automatically fail over between Availability Zones without interruption. Availability Zones are more highly available, fault tolerant and scalable than traditional single or multiple data center infrastructures.

Amazon ES offers you the option to deploy your instances across one, two, or three AZs. If you’re running development or test workloads, pick the single-AZ option. Those running production-grade workloads should use two or three Availability Zones.

For information, see Increase availability for Amazon Elasticsearch Service by Deploying in three Availability Zones. Additionally, deploying in multiple Availability Zones with dedicated master nodes means that you get the benefit of the Amazon ES SLA.

Operations

A 24/7/365 operational team with experience managing thousands of Elasticsearch clusters around the globe monitors Amazon ES. If you need support, you can get expert guidance and assistance across technologies from AWS Support to achieve your objectives faster at lower costs. I want to underline the importance of having a single source for support for your cloud infrastructure. Amazon ES doesn’t run in isolation, and having support for your entire cloud infrastructure from a single source greatly simplifies the support process. AWS also provides you with the option to use Enterprise level support plans, where you can have a dedicated technical account manager who essentially becomes a member of your team and is committed to your success with AWS.

Using tools on Amazon ES such as alerting, which provides you with a means to take action on events in your data, and index state management, which enables you to automate activities like rolling off logs, gives you additional operation features that you don’t need to build.

When it comes to monitoring your deployments, Amazon ES provides you with a wealth of Amazon CloudWatch metrics with which you can monitor all your Amazon ES deployments within a “single pane of glass.” For more information, see Monitoring Cluster Metrics with Amazon CloudWatch.

Staying current is another important topic. To enable access to newer versions of Elasticsearch and Kibana, Amazon ES offers in-place Elasticsearch upgrades for domains that run versions 5.1 and later. Amazon ES provides access to the most stable and current versions in the open-source community as long as the distribution passes our stringent security evaluations. Our service prides itself on the fact that we offer you a version that has passed our own internal AWS security reviews.

AWS integrations and other benefits

AWS has a broad range of services that seamlessly integrate with Amazon ES. Like many customers, you may want to monitor the health and performance of your native cloud services on AWS. Most AWS services log events into Amazon CloudWatch Logs. You can configure a log group to stream data it receives to your Amazon ES domain through a CloudWatch Logs subscription.

The volume of log data can be highly variable, and you should consider buffering layers when operating at a large scale. Buffering allows you to design stability into your processes. When designing for scale, this is one of easiest ways I know to avoid overwhelming your cluster with spikey ingestion events. Amazon Kinesis Data Firehose has a direct integration with Amazon ES and offers buffering and retries as part of the service. You configure Amazon ES as a destination through a few simple settings, and data can begin streaming to your Amazon ES domain.

Increased speed and agility

When building new products and tuning existing solutions, you need to be able to experiment. As part of that experimentation, failing fast is an accepted process that gives your team the ability to try new approaches to speed up the pace of innovation. Part of that process involves using services that allow you to create environments quickly and, should the experiment fail, start over with a new approach or use different features that ultimately allow you to achieve the desired results.

With Amazon ES, you receive the benefit of being able to provision an entire Elasticsearch cluster, complete with Kibana, on the “order of minutes” in a secure, managed environment. If your testing doesn’t produce the desired results, you can change the dimensions of your cluster horizontally or vertically using different instance offerings within the service via a single API call or a few clicks on the console.

When it comes to deploying your environment, native tools such as Amazon CloudFormation provide you with deployment tooling that gives you the ability to create entire environments through configuration scripting via JSON or YAML. The AWS Command Line Interface (AWS CLI) provides command line tooling that also can spin up domains with a small set of commands. For those who want to ride the latest wave in scripting their environments, the AWS CDK has a module for Amazon ES.

Conclusion

Focusing your teams on doing important, innovative work creating products and services that differentiate your company is critical. Amazon ES is an essential tool to provide operational stability, security, and performance of your search and analytics infrastructure. When you consider the following benefits Amazon ES provides, the decision to migrate is simple:

  • Support for search, log analytics, SIEM, and other workloads
  • Innovative functionality using UltraWarm to help you manage your costs
  • Highly secure environments that address PCI and HIPAA workloads
  • Ability to offload operational processes to an experienced provider that knows how to operate Elasticsearch at scale
  • Plugins at no additional cost that provide fine-grained access, vector-based similarity algorithms, or alerting and monitoring with the ability to automate incident response.

You can get started using Amazon ES with the AWS Free Tier. This tier provides free usage of up to 750 hours per month of a t2.small.elasticsearch instance and 10 GB per month of optional EBS storage (Magnetic or General Purpose).

Over the course of the next few months, I’ll be co-hosting a series of posts that introduce migration patterns to help you move to Amazon ES. Additionally, AWS has a robust partner ecosystem and a professional services team that provide you with skilled and capable people to assist you with your migration.

 


About the Author

Kevin Fallis (@AWSCodeWarrior) is an AWS specialist search solutions architect.His passion at AWS is to help customers leverage the correct mix of AWS services to achieve success for their business goals. His after-work activities include family, DIY projects, carpentry, playing drums, and all things music.