⚠️ Alpha Disclaimer

This open-source project is currently under construction and is subject to frequent changes and updates. For more information about our alpha release, see our Alpha Disclaimer


Overview

The MDAI Cluster provides a self-service control plane for OpenTelemetry, making it simple for users to oversee their observability telemetry and control costs efficiently.

MDAI Overview

📚 What is an MDAI Cluster?

Curious about what our cluster offers?

Let us inform you!

🌐 We're Open-Source

We will always be committed to improving open source, whether it's in-house or back to the community.

Learn more

🏛️ Our Architecture

Curious about what super-powers our infrastructure?

We're happy to share!

🚀 Installation

Ready to install? We have a few options for both learning and production use!

Let's learn!

🚢 Start Sending Data

Ready to send your MDAI Cluster data? We have a few methods.

Start sending now!

📈 Usage Guide

Already installed a cluster, but want to learn more about usage?

We will show you the way!




Want to jump ahead to install? Wait no more!
Installation guide ⏭️

⚠️ DISCLAIMER

This alpha release is an early version of our open source solution and is intended for testing and feedback purposes only. Please note that there are many changes and enhancements planned for future iterations, and the current version represents just the initial phase of development.

Primary Functionality

The primary functionality of this alpha release is an installer for cloud-based telemetry pipelines running specifically in AWS environments. The installer is designed to facilitate the setup and provisioning of telemetry pipelines, allowing users to quickly deploy and configure the necessary components for monitoring and data collection.

Important Notes

  • This release may contain bugs, limitations, and incomplete features. We encourage users to report any issues encountered during testing.
  • The user interface and user experience (UI/UX) may undergo significant improvements in future updates.
  • Features, configurations, and integrations are subject to change based on user feedback and development priorities.
  • Data generated or collected during this alpha phase may not be representative of final production data.

Feedback and Contributions

We value your feedback and contributions to help us improve our open source solution. Please share your thoughts, suggestions, and bug reports with us via support@mydecisive.ai.

Thank you for participating in our alpha testing phase!




⏪ Back to Overview Next Step: Installation ⏩

Open-Source

The MDAI Cluster is a series of homegrown and external Open-Source projects.

Homegrown components - things we built

Externally built Open-Source tech we use

Read more about the tools that super power us!

  1. Kubernetes
  2. Prometheus
  3. OpenTelemetry

Things we're contributing to...

Coming soon! Check back to learn more about where we're spending our efforts to improve Open-Source beyond MDAI




Want to jump ahead to install? Wait no more!
Installation guide ⏭️

Architecture Overview

10,000 ft. view

We want to simplify the lives of DevOps teams by providing an on-prem (in your own cloud) solution that can give full control over telemetry pipelines, while maintaining existing use of Observability Platforms and/or tools.

System & Users

1,000 ft. view

Here's a brief diagram for how we are using our preferred technologies to build our infrastructure to help the DevOps community!

Container Diagram

What super powers the MDAI Cluster?

Kubernetes (K8s)

Kubernetes is a powerful container orchestration platform that automates deployment, scaling, and management of containerized applications. It provides a flexible and scalable infrastructure for running modern microservices architectures, enabling rapid development, deployment, and scaling of applications. To learn more about k8s, visit kubernetes.io.

OpenTelemetry (OTel)

OpenTelemetry is an open-source observability framework that enables standardization and instrumentation of telemetry data across distributed systems. By integrating OTel with your applications, you can automatically capture and export metrics, traces, and logs, providing deep insights into application behavior, performance, and dependencies.

Prometheus

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability in dynamic environments like Kubernetes. It excels at collecting and storing time-series data, enabling real-time monitoring, querying, and alerting based on application metrics, service health, and resource utilization.

Why use these technologies together?

Unified Observability

Integrating Prometheus, OTel, and Kubernetes allows for unified observability across the entire application stack. You can collect metrics, traces, and logs from Kubernetes resources, containerized applications, and infrastructure components, providing holistic insights into system health, performance, and behavior.

Dynamic Environment Support

Kubernetes' dynamic nature requires a monitoring solution that can adapt to changes in application topology and resource allocation. Prometheus and OTel are well-suited for dynamic environments, providing auto-discovery, dynamic instrumentation, and flexible querying capabilities.

Standardization and Interoperability

OpenTelemetry promotes standardization and interoperability by providing consistent instrumentation and telemetry formats across different programming languages, frameworks, and observability tools. By integrating Prometheus and OTel, you can leverage a common telemetry format and instrumentation library, facilitating data sharing and analysis across the ecosystem.

Scalability and Extensibility

Prometheus and OTel are designed for scalability and extensibility, making them suitable for monitoring large-scale, distributed systems running on Kubernetes. You can scale your monitoring infrastructure horizontally to handle increased workload and integrate additional observability tools and services as needed.




Want to jump ahead to install? Wait no more!
Installation guide ⏭️

Introduction to the MDAI Cluster

What the MyDecisive.ai (MDAI) MDAI Cluster is...

  • An Open-Source, Cost-control tool for Telemetry Pipelines

  • A consolidated OpenTelemetry configuration and management tool

  • A self-monitoring container for all infrastructure

  • A pipeline analysis tool

What the MDAI Cluster is not...

  • A proprietary Observability vendor/platform

  • A standalone monitoring platform

  • A replacement for comprehensive monitoring solutions

  • A managed service

  • A multi-cloud supported tookit (AWS Only)

Why use the MDAI Cluster...

Cost savings

Why send you data when you don't have to? Prevents data from being emitted to configured “muted” destinations

Control

Enables users to take control in reducing telemetry flow in their pipelines.

How to use our docs...

Contributing

Want to help build the future of Telemetry Pipelines? Checkout our Contribution Page to find out how to help.

Who we want to help...

  • DevOps Managers
  • DevOps Engineers

Alpha Caveats

Warning: This codebase is experimental and in active development. Proceed with caution and use at your own risk. Please review our expansive list of expectations for the limitations of this release.

License

MyDecisive.ai MDAI Cluster 2024 by DecisiveAI, Inc. is licensed under Apache 2.




Want to jump ahead to install? Wait no more!
Installation guide ⏭️

⚠️ Alpha Disclaimer

This open-source project is currently under construction and is subject to frequent changes and updates. For more information about our alpha release, see our Alpha Disclaimer


Supported Installation Methods

There are currently two supported methods for installing an MDAI Cluster:

  1. Local - on your machine
  2. AWS - in your AWS account

Local Installation

Want to try the magic locally before you install an MDAI Cluster in the cloud?

Check out our Quick Start Guide to learn how to test the MyDecisive MDAI Cluster in a local environment.

AWS Installation

Ready to visualize your observability pipelines and identify your top cost drivers?

Install an MDAI Cluster in AWS.

Additional installation methods will be supported in the future.


💻 Go to Local Install ⏭️

☁️ Go to AWS Install ⏭️

Quick Start - Introduction


We hope that deploying an MDAI Cluster locally will help you to:

Using our local installation method, we hope to...

  1. Help you install our infrastructure in less than 5 minutes.
  2. Validate, locally, that the MDAI Cluster is a great tool for your telemetry management needs.
  3. Fall in love with us and our solution before you install in the cloud.

Getting Started

To begin visualizing and managing your telemetry pipelines using an MDAI Cluster, you will need to complete the following steps:

  1. 🛠️ Configure it
  2. 🚀 Deploy it (locally)
  3. 📈 Send telemetry to it
  4. 🔑 Access it
  5. ✅ Verify it!


Ready... Set...
🏃 Let's get started! 🏃‍♀️

Next Step: Prerequisites ⏩

Prerequisites


System Dependencies

Currently, our documentation is tailored for users operating on Linux or Mac computers. Additionally, it assumes you have the following:

  • You have 🍺 homebrew installed.
  • You’re using bash, but you can modify these configurations and commands as needed for your preferred command line interface.
  • If you're using MacOS, you'll also need XCode Commandline Tools installed.


Note on user requirements

We aim to make our solutions accessible to everyone. However, we recommend having an intermediate understanding of OpenTelemetry and Kubernetes, or a strong determination, before proceeding with this installation.

Here are a few great resources for ramping up on these subjects:



⏪ Back to Quick Start Next Step: Local Installation ⏩

Installation


There are three options for installation:

  • Semi-automatic Install - I want to install all system dependencies myself to be aware of and manage the MDAI requirements alongside my other software requirements.

  • Automated Install - I want the fastest method for installation possible, so I can skip ahead to testing out the MDAI Cluster! I'm not worried about how the system dependencies impact my other development environments.

  • CLI Install - Our brand new recommended method for installing and managing MDAI clusters locally combines the benefits of the first two methods and has fewer dependencies.

Choose your adventure:

Option 1: Semi-Automatic Install ⏩

Option 2: Automated Install ⏭️

Option 3: CLI Install ⏭️

🎵 Here's some mood music 🎵



⏪ Back to Prerequisites Next Step: Semi-automatic Install ⏩

Semi-Automated installation


Check us out!

First things first, you have to check out our MDAI infrastructure installation repo.

git clone https://github.com/DecisiveAI/mdai-inkops.git

Prerequisites

As mentioned, the only manual step for this install method is to enable you to have full visibility and control of your system dependencies.

export GOBIN=${GOBIN:-$(go env GOPATH)/bin}
  • Install node
  • Install docker
  • Install helm
  • Install kind for local cluster management using docker containers

Install Commands

make -f ./make/Makefile-local-recipes create-mdai-semi-auto

Note: Once the MDAI Cluster installed your k8s context will be switched automatically to new cluster.

Troubleshooting

You may run into an error running this command related to the docker daemon. Please see our Troubleshooting Guide to see if this issues is relevant to you and how to resolve it.


⏪ Back to Install Next Step: Configure ⏩

Automated installation


Check us out!

First things first, you have to check out our MDAI infrastructure installation repo.

git clone https://github.com/DecisiveAI/mdai-inkops.git

Prerequisites

Our automated installation process is setting up all the required dependencies like

  • Docker
  • Kind cluster
  • Node
  • Go
  • Helm (we support Helm v3.13.0 and later)

Here are installation steps:

  • Install kind for local cluster management using docker containers
  • Run automated installation script
make -f ./make/Makefile-local-recipes create-mdai

Note: Once the MDAI Cluster installed your k8s context will be switched automatically to new cluster.

Troubleshooting

You may run into an error running this command related to the docker daemon. Please see our Troubleshooting Guide to see if this issues is relevant to you and how to resolve it.


⏪ Back to Semi-Automated Install Next Step: Configure ⏩

Install and manage local MDAI cluster with CLI

Our CLI will help you install and manage MDAI clusters locally.

Prerequisite

Since we are using kind cluster you have to have Docker installed.

Download and install MDAI CLI

Install prebuilt binary via shell script

This command will download and install release v.0.1.0, you can check and download the latest release here.

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/decisiveai/mdai-cli/releases/download/v0.1.0/mdai-installer.sh | sh

Install via Homebrew

To install prebuilt binaries via homebrew

brew install decisiveai/tap/mdai

Install MDAI cluster

To install MDAI cluster into local kind cluster run

mdai install

It will take couple minutes to spin up your cluster and have all k8s pods up and running.

cli-install.png

You can check pod status by running

kubectl get pods --all-namespaces

Which should look like:

img.png

Manage MDAI cluster

Our CLI has a sample cluster defined and built in into installation process.
It starts two replicas of "gateway" OTEL collectors.
Let's check OTEL collectors configuration by running:

mdai get --config otel

For a full reference of CLI commands check here or run:

mdai --help

⏪ Back to Automated installation Next Step: Verify ⏩

⏲️ Time to Configure OpenTelemetry! 📈

Warning: ⚠️ This configuration method is deprecated ⚠️
New documentation how to use mdai operator configuration instead is coming soon.

Let's update your MDAI Cluster's OTel configuration file.

🤔 You've got a couple options...

Option 1 - 👢 Bootstrap our built-in Collector Configuration

We have provided a default configuration for a simplified version of the OpenTelemetry collector at values/otelcol-config.yaml. Using our boilerplate can accelerate your deployment and configuration using OpenTelemetry.

MOARRR CHOICES!

You can...

  1. Use the config as is to accelerate through configuration
  2. Making modifications right now that better fit your pipeline needs

Learn more about the specifications for the OTEL Collector to make the best decisions for your telemetry pipelines configuration needs.

Option 2 - 🧳 Bring Your Own Configuration

Want to use a custom or tried-and-true OTel configuration file that already works for you?

Simply update the configuration file at values/otel-config.yaml.

Apply change

After you've configured you OTel pipelines to your heart's desire, you can apply your changes by running

# Update config file within the cluster directory, it will get applied to your cluster instance automatically via a k8s operator.
kubectl apply -f values/otel-config.yaml

⏪ Back to Install Next Step: Verify ⏩

Verify local MDAI Cluster


Configure MDAI Console (UI) to use localhost

Find your MDAI Console pod

One of the pods installed will contain a web app. Let's list out all cluster pods to find what your pod's name is.

Note: the pod that starts with mdai-console-<hash_id>

kubectl get pods

Enable port forwarding from cluster to localhost

<!-- Example kubectl port-forward mdai-console-abcd123-xyz1 5173:5173 -->
kubectl port-forward <POD_NAME> <PORT>:<PORT>

View Console on your desired port

You should now be able to view the MDAI Console at your newly configured port, in our case, we used :5173

🐙🎉 View yours at http://localhost:<configured_port>

A bright and shiny MDAI Cluster Console


⏪ Back to Configure Next Step: Generate & Collect Telemetry ⏩

Validate data flow


  1. View the local MDAI console
  2. As telemetry flows through the MDAI Cluster, you will see counts increase in the console, color-coded by telemetry type. 🐙🎉

The MDAI Cluster Console showing data flows

Note: Data flowing to debug exporters are not counted towards data flow totals in the right sidebar


⏪ Back to Generate & Collect Telemetry Next Step: AutoScaling ⏩

MDAI Cluster Life Cycle Management

Overview

MDAI Cluster lifecycle management involves post-creation operations to control the behavior and state of the components of each MDAI Cluster or the MDAI Cluster as a whole.

The common use cases for lifecycle management are highlighted below...

  1. Disable MDAI Cluster Ingress - Disable ingress to validate that data flow has stopped.
  2. Enable MDAI Cluster Ingress - Enable ingress to validate that data flow has been re-instantiated.
  3. Uninstall MDAI Cluster - Destroy all artifacts associated with the MDAI Cluster.

🚨 Important considerations for Life Cycle Management 🚨

  • Ensure that you have the necessary permissions and access rights to perform these operations.
  • Always review and test configurations before applying changes to avoid unintended consequences.
  • Use caution when destroying a cluster, as it will permanently delete all associated resources.

⏪ Back to: Validate Next Step: Disable the MDAI Cluster ⏩

Disable MDAI Cluster Ingress

You may need to temporarily halt incoming traffic to perform maintenance or troubleshoot issues without impacting data flow.

How to Disable...

Manual

  • Set replica count to 0 in values/params-values-otel.yaml:
spec:
  # 2 replicas recommended
  replicas: 0
  • Apply config to cluster

kubectl apply -f values/params-values-otel.yaml

Automated

Check back soon, we're working on an automated flow for disabling your MDAI Cluster.



⏪ Back to: Lifecycle Management Overview Next Step: Enable MDAI Cluster ⏩

Enable MDAI Cluster Ingress

After completing any tasks related to disabling your MDAI Cluster instance (i.e., maintenance or MDAI Cluster data flow resolution), you likely want to resume ingress and ensure that data flow resumes smoothly.

How to Enable...

Manual

  • Set replica count to 2 in values/params-values-otel.yaml:
spec:
  # 2 replicas recommended
  replicas: 2
  • Apply config to cluster

kubectl apply -f values/params-values-otel.yaml

Automated

Check back soon, we're working on an automated flow for disabling your MDAI Cluster.



⏪ Back to: Disable MDAI Cluster Next Step: Uninstall MDAI Cluster ⏩

Uninstall the MDAI Cluster


Tired of using the MDAI Cluster locally? 😭 We're sorry to see you go, but we understand.

If you have feedback for us, please let us know why!

  1. Support team email
  2. Feedback form

Automated uninstall

When decommissioning a cluster or performing a clean-up operation, you need to delete all resources and configurations. Follow the steps below to uninstall your local MDAI Cluster.

Ensure you have the correct k8s context selected.

Make sure your k8s context is set to kind-mdai-local cluster:

kubectl config get-contexts

Switch the context if needed:

kubectl cluster-info --context kind-mdai-local

Uninstall Cluster

Run automated de-installation script

 make -f ./make/Makefile-local-recipes delete-mdai

Optional: Uninstall all helm artifacts

If you want to remove all helm artifacts installed (you don't use it your other local setup), run the following

 make -f ./make/Makefile-local-recipes delete-mdai-all

Data Persistence - Prometheus

Currently, there is no data persistence for Prometheus. If you destroy your MDAI Cluster, you'll also be destroying your prometheus instance and all associated data. 🪦

Until MDAI is able to support a more robust persistence layer or snapshot capability, we recommend using a remote storage option for your persistence needs.



⏪ Back to: Enable MDAI Cluster Next Step: Congrats ⏩

🎉 🐙 🎉 You did it! 🎉 🐙 🎉


That's all for now!

We hope you've enjoyed using the MDAI Cluster for visibility into your o11y costs and starting taking telemetry pipeline management back into your hands.

📚 Learn more!

Are you interested in learning about advanced configuration options?

Do you have questions, concerns, feedback?

Please give us your unfiltered thoughts!


⏪ Back to: Installation Next up: MDAI Console Usage ⏭️

🪄 Introduction 🪄


We hope that deploying an MDAI Cluster to AWS will help you to:

  1. ☁️ Manage your MDAI Cluster in the cloud.
  2. 🔐 Enable access to your MDAI Cluster via your AWS account settings, whether it's for multiple users or just yourself.
  3. 🌎 Use DNS to connect your custom domain to any MDAI Cluster entry-points.
  4. 💵 Visualize your pipeline data flows, giving you instant insights into your telemetry cost drivers and the net impact of your defined data processors.

Future versions of the MDAI Cluster will be compatible with other cloud service providers.

Getting Started

To begin visualizing your telemetry pipelines using an MDAI Cluster, you will need to complete the following steps:

  1. 🛠️ Configure it
  2. 🚀 Deploy it
  3. 📈 Send telemetry to it
  4. 🔑 Access it
  5. ✅ Verify it!


Ready... Set...
🏃 Let's get started! 🏃‍♀️

Next Step: Prerequisites ⏩

AWS Install Prerequisites


System Requirements

Ensure your development environment has access to all of the following software and settings.

Note: This page assumes you’re using bash, but you can modify these configurations and commands as needed for your preferred command line interface.

  • Install Go (1.20 or higher) from source or use homebrew brew install go
  • GOBIN environment variable is set; if unset, initialize it appropriately, for example:
export GOBIN=${GOBIN:-$(go env GOPATH)/bin}

Note: We do install Helm automatically, however, it's important to note that we support Helm v3.13.0 and later. If you have helm installed, but at an earlier version, we cannot guarantee our install process will work as expected.

AWS Requirements

User Requirements

We never desire to exclude anyone from using our solutions. Instead we encourage and inspire you to obtain an intermediate knowledge of AWS (or a ton of grit) before continuing with this install. Here's a great AWS Training resource!



⏪ Back to: Getting Started Next Step: Setup AWS CLI ⏩

Programatic Access for AWS CLI


To deploy an MDAI Cluster, we use the AWS CLI for access and control to the various AWS tech we leverage.

Pre-requisites

Before you install a cluster, you will need to Create an AWS IAM Admin Account

Setup the AWS CLI

We've found the Gettings Started with the AWS CLI gude extremely helpful in setting up everything from pre-requisites, to programatic access via your desired terminal tools.

There are two methods we recommend for programatic access to your AWS resources via the AWS CLI, though there are more that you can access the CLI with and can be found here

The AWS SSO option is more secure and generally recommended by AWS and AWS Practitioners.

Configure

AWS SSO Config

Login

# Assumes your machine is already configured access to AWS via `aws configure sso`.
aws sso login --profile <AWS_PROFILE>

Option 2: Short-term credentials

The AWS Short-term credentials option is light-weight and simple to setup with a few commands.

AWS Short-term Credentials

After configuration, you should be able to AWS CLI commands.



🐙 Check us out!


🎵 Here's some mood music 🎵

Clone the MDAI InkOps Toolkit

Visit our GitHub repo to access our MDAI InkOps Toolkit

git clone git@github.com:DecisiveAI/mdai-inkops.git

Navigate to the directory you just installed our toolkit.

# go to directory for the mdai-inkops toolkit
cd mdai-inkops

⏪ Back to: Configure AWS CLI Next Step: Configure AWS Env ⏩

Let's update your AWS Environment Configs


Navigate to the values/aws.env file and start inputting the environment configuration that's relevant to your AWS account and infrastructure configurations.

The AWS Basics

These fields are essential for AWS account access and resource control in AWS. These settings guarantee the MDAI Cluster will be installed in your desired account, region, and created with a user profile who has the correct permissions to create infrastructure. This is especially useful if these values differ from the default configurations in your AWS SSO/auth settings.

KeyExample ValueDescription
AWS_REGIONus-east-1AWS Region to be used for MDAI cluster creation
AWS_ACCOUNT123123456AWS Account to be used for MDAI cluster creation
AWS_PROFILEAdmin-123123456AWS Profile to be used for MDAI cluster creation

MDAI Cluster - Helpful Contexts

KeyExample ValueDescription
MDAI_CLUSTER_NAMEmy-mdai-clusterA friendly name for your cluster

READONLY: EC2 and EKS Configurations

These values should be considered READONLY.

The values can technically be changed, however this is the minimum required configuration for running the MDAI infrastructure with performance and cost in mind. You can adjust the values to fit your needs, note it's at your own risk. We have not tested beyond this configuration, but plan to in the near future.

KeyREADONLY ValueDescription
MDAI_EC2_INSTANCE_CLASSt2EC2 Class used for nodes in the EKS cluster
MDAI_EC2_INSTANCE_SIZEmicroEC2 Size used for nodes in the EKS cluster
MDAI_CLUSTER_CAPACITY12Number of cluster EC2 nodes

Opt-in/out: MDAI Console UI Security

This is an opt-in/out feature that enables security via authentication on the Console UI app. By default your MDAI Console application will be exposed to the open internet for anyone to access.

Post-configuration, you will be presented with an option to configure more advanced authentication options in our AWS Authentication for your MDAI Console guide.

If you choose to remain opt'd-out ignore fields MDAI_UI_HOSTNAME and MDAI_UI_USER_POOL_DOMAIN.

KeyExample ValueDescription
COGNITOfalse (default)Opt-in to include Cognito in your cluster infrastructure
MDAI_UI_HOSTNAMEmdai.yourdomain.devA required field to map your MDAI Console UI URL to your custom domain via a CNAME record. This hostname will also be used during the SSL Cert creation for this flow.
MDAI_UI_USER_POOL_DOMAINyourdomainThe domain name Cognito uses to bind your domain to, e.g., yourdomain.com would use yourdomain here. (https://yourdomain.auth.AWS_REGION.amazoncognito.com). This setting guarantees your domain will have a TLS (SSL) certificate auto-generated and auto-discovered during install.
Due to our MDAI Cluster's Load Balancer capabilities, alongside security measures, we're only able to support full MDAI Cluster compatibility with a limited list of supported AWS Regions.
Should you choose to opt-in to Cognito, please be sure the AWS_REGION you selected is part of the supported region.

Opt-in/out: Node AutoScaling

Default should be true if you're unsure if you want this, please change to false.

Karpenter provides dynamic and efficient scaling of your Kubernetes clusters. It automatically adjusts the number of nodes based on the current workload, ensuring optimal resource utilization and cost efficiency.

KeyExample ValueDescription
KARPENTERtrue (default)An opt-in/out feature that enables autoscaling on the EC2 node sizes you have selected.

If you opt-in to this feature, we handle the configuration under the hood for you. You can read more about this configuration in our Node AutoScaling Guide.


⏪ Back to: Check Us Out! Next Step: OTel config ⏩

⏲️ Time for OpenTelemetry! 📈

Let's update your MDAI Cluster's OTel configuration file.

🤔 You've got a couple options...

Option 1 - 👢 Bootstrap our built-in Collector Configuration

We have provided a default configuration for a simplified version of the OpenTelemetry collector at values/otelcol-config.yaml. Using our boilerplate can accelerate your deployment and configuration using OpenTelemetry.

MOARRR CHOICES!

You can...

  1. Use the config as is and update at your will later
  2. you can start making modifications right now

Learn more about the specifications for the OTEL Collector to make the best decisions for your telemetry pipelines configuration needs.

Option 2 - 🧳 Bring Your Own Configuration

Want to use a custom or tried-and-true OTel configuration file that already works for you?

Simply update the configuration file at templates/mdai-operator.yaml.


🔒 Securing your MDAI Cluster with Certs!


🙆‍♀️ Let's take a quick stretch break..

You might be feeling like this right now "🙄 ... more configs?!?"

😬 We know, we know.. so many options.

Trust us, we're truly here to help ease the pain of configuration and security for your MDAI Cluster. Hang with us, we're so close to done!

🙆🏽‍♂️ Okay.. one more stretch.. that's better.. back to it!

Step 1: Assigning SSL Certificates to your ingress endpoints

Option 1: 🚜 AutoGen a Cert

Let us help! We created a simple command that not only autogenerates a cert, it also uploads your cert to your AWS account based on your aws.env values you updated earlier.

make cert

🎉 CERT GENERATION COMPLETE! 🎉

There are no further steps for this option. You can move on to applying your configurations

Next Step: Applying configuration ⏩

Once you have access to your certificate in AWS ACM, it will be accessible via Amazon Resource Name (ARN). These ARNs need to be provided as a configuration parameter during the configuration phase, so make note of these ARNs. ACM ARN

🚨 ‼️ NOTE: If you BYO cert, please copy the cern ARN after you import it to ACM! ‼️ 🚨

Manual step: Add Cert ARN to your OTel config

🪰 Remember that ARN we just pestered you about above? 🪰

Let's add it to the templates/mdai-operator.yaml file as the values for the following keys

  • service.beta.kubernetes.io/aws-load-balancer-ssl-cert

K8 ssl certificate

  • alb.ingress.kubernetes.io/certificate-arn

K8 ssl certificate


⏪ Back to: OTel Configuration Next Step: Applying configuration ⏩

🔧 Apply configuration to the MDAI Cluster


We're now going to run a configuration script to all those configs you've worked so hard on and apply them to a deployment script.

🐾 Go ahead.. do it.. I triple dog dare ya! 🐾

make config



🧐... Wait, that's it? What the...

🚀 It's time! Let's deploy 🚀


⏪ Back to: Adding Certs Next Step: Deploy your MDAI Cluster! ⏩

Deploying to AWS


🎶 Time for more theme music 🎶

💪 Congratulations! You're about to provision an MDAI Cluster! 💪

IMPORTANT! After deployment, the MDAI Cluster will be active and running in AWS. The current version of the MDAI Cluster was designed to be as cost-effective as possible, however you should still carefully monitor your spend and manage your MDAI Cluster capacity accordingly. If the MDAI Cluster is not actively being used for testing or managing telemetry pipelines, you may wish to shut it down to minimize your spend.

We hope to have benchmarks soon for cost and performance of our MDAI Clusters.

Deploy the MDAI Cluster

make install
Don't walk away just yet!

There's one more manual verification step required before CloudFormation will deploy all necessary infrastructure to get your MDAI Cluster up and running as soon as possible.

The install workflow

  1. The install will check and bootstrap the CDK Toolkit if it's not present for your configured region. bootstrap.png

  2. The CDK will output the detected changes and ask you to accept or reject the changes. Review all output carefully before proceeding! stack-details.png

  3. Follow the progress of the stack deployment process through the terminal interface. stack-details.png or AWS Console -> Cloud Formation CFN Status

  4. The installation process will add a new context to your kubeconfig. You can switch context by running: kubectl config use-context <desired_context> Detailed output will be stored into cdk-output.json.

A few notes while your stack is being provisioned...

  1. 🍿 Grab some popcorn! 🍿 Average install time is ~20-30 minutes.

  2. 👀 Don't want to monitor your terminal? Check out your stack's provisioning status in the AWS Console.

<!-- Change AWS_REGION to the region you deployed to -->
https://AWS_REGION.console.aws.amazon.com/cloudformation/home?region=AWS_REGION#/stacks?filteringText=&filteringStatus=active&viewNested=true
  1. 🏛️ View your stack's architecture in the CloudFormation Designer tool.

To find your stack ARN:

CFN Stack ARN

<!-- Change AWS_REGION to the region you deployed to -->
<!-- Change STACK_ARN to your stack's ARN -- shown above  -->
https://AWS_REGION.console.aws.amazon.com/cloudformation/designer/home?region=AWS_REGION&stackId=<STACK_ARN>#

Want to see how the MDAI Cluster runs locally?

🛠️ Build a local version while you're waiting! QuickStart


⏪ Back to: Configure Your MDAI Cluster Next Step: Verify Installation ⏩

Verify installation


Verify the MDAI Cluster via kubectl

Ensure your cluster is up and running.

Verify namespaces

List out all namespaces

kubectl get ns

Expected output

Verify namespaces

Verify pods

List out all pods in the default namespace.

kubectl get pods

Expected output

Your output for default configuration should be similar to:

Verify pods

Verify the Console is up and running

Step 1: Navigate to your AWS EC2 Load Balancer in the AWS Console

Note: Replace AWS_REGION with the region you deployed to.

https://AWS_REGION.console.aws.amazon.com/ec2/home?region=AWS_REGION#LoadBalancers

Step 2: Find the LB where the name is mdai-console

LB List

Step 3: Copy DNS Name from mdai-console LB

LB DNS Name

OPTIONAL: Use CNAME Record for Console Access

If you'd prefer to use a custom domain to access your MDAI Console (e.g., your-console.yourdomain.com), you can copy the Console Load Balancer Endpoint and create a CNAME record for you host provider (e.g., GoDaddy).

Once updated, you'll need to wait for the DNS mapping to resolve.

You may also need to add an SSL Cert to create a secure connection to your app using this mechanism.

Step 4: Navigate to the URL and view the console

Console Data Flow


⏪ Back to: Deploy Next Step: Validate ⏩

Validate data flow

  1. From the last step in Verify, you should've found your Console UI Load Balance link. Please visit that URL to view your pipeline configuration and dataflow.
  2. As telemetry flows through the MDAI Cluster, you will see counts increase in the console, color-coded by telemetry type. 🐙🎉

The MDAI Cluster Console showing pipeline composition and data flow

Note: Data flowing to debug exporters are not counted towards data flow totals in the right sidebar


⏪ Back to: Verify Next Step: UI Auth Options ⏩

Options for AWS Authentication for your MDAI Console

This documentation outlines how to configure AWS access for the MDAI Console. We will cover three scenarios:

🌐 Public access (no authentication)

[Pre-Installation] - Fastest, least secure

Note: You should've already explored this option pre-install.

You just need to spin this darn cluster up and don't mind your Console being open to any user on the internet.

🔒 Cognito

[Post-Installation] - Secure, slightly complex

AWS Cognito provides a scalable and secure authentication mechanism for your application. You get a login interface and can connect it to your custom domain.

🥷 VPC-based access

[Post-Installation] - Most secure, most complex

This is the most secure option as you have the most control over inbound traffic/access to the cluster.


Choose your adventure...

Please select from the options above you'd like to use for auth for your MDAI Console.

🌐 Public access (no authentication)

🔒 Cognito

🥷 VPC-based access

🌐 Public access (no authentication)


Long hair, don't care? Just want to test us out without auth on your Console? NO PROBLEM!

During setup you have the option to opt-out of the default Cognito-based auth solution by setting your flag to COGNITO=false in the aws.env file. This will prevent any auth-related resources from being installed in your infrastructure in the future.

# An opt-in/out feature that enables auth on the Console UI app.
# Default should be true if you're unsure if you want this, please change to false.
# If you select false, ignore fields MDAI_UI_HOSTNAME and MDAI_UI_USER_POOL_DOMAIN
COGNITO=true

# Amazon Resource Name (ARN) of the certificate to be used for the Console endpoint (e.g., eu-west-2.yourdomain.dev)
MDAI_UI_HOSTNAME=

# The domain name to tie your auth to, e.g., 'example.com' would use 'example' here (e.g., yourdomain)
MDAI_UI_USER_POOL_DOMAIN=

Some bad news...

Currently, we don't have steps to retroactively apply cognito if you change your mind, but with your feedback to support@mydecisive.ai or by opening an issue, we'll prioritize these post-install flows for you!


⏪ Back to: Validate Installation Next Step: Lifecycle Overview ⏩

🔒 Cognito


We knew it wouldn't be acceptable to leave your precious new resources unprotected and open to any user of the internet!

...Enter Cognito...

We've added the ability to connect Amazon Cognito easily to your new MDAI Cluster to limit access for undesired users.

Cognito is not supported in all AWS Regions. If you run into issues post-install, this is likely the reason. A typical error you would see looks like this:
Failed deploy model due to failed to create listener rule: ValidationError: Action type 'authenticate-cognito' must be one of 'redirect,fixed-response,forward,authenticate-oidc' status code: 400, request id: ########-####-####-####-############

The above error can be found in the AWS Console > EKS service.

You can access the ingress output manually:
EKS > Clusters > [YOUR_CLUSTER] > Resources > Service and networking > Ingresses > ui-alb-ingress

You can access the ingress output by hitting this link after updating the values in brackets:
https://[AWS_REGION].console.aws.amazon.com/eks/home?region=[AWS_REGION]#/clusters/[YOUR_CLUSTER]/ingresses/ui-alb-ingress?namespace=default

Our installation includes the following resources upon creating an MDAI Cluster:

  1. User pool mdai-user-pool
  2. App client mdai-app-client

⚙️ Pre-built configuration

The following environment variables need to be defined in values/aws.env during the pre-build configuration step:

MDAI_UI_HOSTNAME This is a hostname for the UI endpoint, which is going to be used as a main UI hostname. This should directly relate to a DNS CNAME record, pointing to DNS hostname, generated by AWS for the Load Blancer mdai-console endpoint.

MDAI_UI_USER_POOL_DOMAIN This variable contains a prefix for User pool domain. Amazon Cognito domain is used with this installation, so this variable represents a prefix, for the domain. This domain name will become a part of the auth redirecting URL.

🧍User(s) creation

The only manual step required is a user(s) creation.

Option 1: Via the AWS Console UI

Please follow the Amazon Cognito User Pool documentation steps to create users(s) in mdai-user-pool.

Option 1: Via the AWS CLI

If you prefer to use the command line, this quick command will find the user group created as a result of the MDAI stack and perform the user create operations.

for up_id in $(aws cognito-idp list-user-pools --max-results 1  --region <your-region> --profile <your-profile> --query 'UserPools[?Name==`mdai-user-pool`].[Id]' --output text); do
    echo "Creaded user ${up_id}"
    aws cognito-idp admin-create-user --user-pool-id $up_id --username user@yourdomain.com --temporary-password P@55w0rd_fun --region <your-region> --profile <your-profile>
done


⏪ Back to: Validate Installation Next Step: Lifecycle Overview ⏩

🥷 VPC Access


For secure access within a VPC, ensure your Kubernetes cluster and app are configured to operate within the VPC.

Configuring VPC for EKS

1. Create EKS Cluster within VPC

During install you should have created an EKS cluster. You should be able to find your cluster name and region in the aws.env file used for cluster installation. You can use the following link to navigate to your cluster Networking settings and view the relevant subnets.

https://<your-region>.console.aws.amazon.com/eks/home?region=<your-region>#/clusters/<your-cluster-name>?selectedTab=cluster-networking-tab

Cluster > Networking tab

Note the subnets and security groups are created for you.

2. Configure Security Groups

Setup the correct security group rules for your desired security.

VPC Peering (if necessary):

If the app needs to communicate with resources in another VPC, you might consider

  1. Setting up VPC peering
  2. Setting up route tables

⏪ Back to: Validate Installation Next Step: Lifecycle Overview ⏩

MDAI Cluster Life Cycle Management

Overview

MDAI Cluster lifecycle management involves post-creation operations to control the behavior and state of the components of each MDAI Cluster or the MDAI Cluster as a whole.

The common use cases for lifecycle management are highlighted below...

  1. Disable MDAI Cluster Ingress - Disable ingress to validate that data flow has stopped.
  2. Enable MDAI Cluster Ingress - Enable ingress to validate that data flow has been re-instantiated.
  3. Destroy MDAI Cluster - Destroy all artifacts associated with the MDAI Cluster.

🚨 Important considerations for Life Cycle Management 🚨

  • Ensure that you have the necessary permissions and access rights to perform these operations.
  • Always review and test configurations before applying changes to avoid unintended consequences.
  • Use caution when destroying a cluster, as it will permanently delete all associated resources.

Disable MDAI Cluster Ingress

You may need to temporarily halt incoming traffic to perform maintenance or troubleshoot issues without impacting data flow.

How to Disable

Manual

  • Set replica count to 0 in values/params-values-otel.yaml:
spec:
  # 2 replicas recommended
  replicas: 0
  • Apply config to cluster

kubectl apply -f values/params-values-otel.yaml


⏪ Back to: UI Auth (Cognito) Next Step: Enable MDAI Cluster ⏩

Enable MDAI Cluster Ingress

After completing any tasks related to disabling your MDAI Cluster instance (i.e., maintenance or MDAI Cluster data flow resolution), you likely want to resume ingress and ensure that data flow resumes smoothly.

How to Enable...

Manual

  • Set replica count to 2 in values/params-values-otel.yaml:
spec:
  # 2 replicas recommended
  replicas: 2
  • Apply config to cluster

kubectl apply -f values/params-values-otel.yaml

Automated

Check back soon, we're working on an automated flow for disabling your MDAI Cluster.



⏪ Back to: Disable MDAI Cluster Next Step: Destroy MDAI Cluster ⏩

Uninstall the MDAI Cluster


Tired of using the MDAI Cluster? 😭 We're sorry to see you go, but we understand.

Feedback - Where are the rough edges?

We'd love to hear your thoughts! Please let us know how we can improve by:

🧨 Time to destroy

Due to AWS CDK limitations several additional steps required to fully remove all MDAI Cluster resources from AWS. Follow the steps below, in order, to destroy your MDAI Cluster and all associated resources.

Remove the helm chart for the console

helm uninstall mdai-console

Scripts for destroying MDAI Cluster stack

Step 1: Allow scripts to be executed

You may need to give access to the destroy scripts if you don't have permissions to execute them

chmod +x ./make/scripts/delete-load-balancers.sh
chmod +x ./make/scripts/delete-stack.sh
chmod +x ./make/scripts/delete-network-artifacts.sh

Step 2: Ensure you have your env variable available

Upon MDAI Cluster creation you should have a .env file full of variables. It will look something like the example below, but with values populated.

MDAI_UI_ACM_ARN=
AWS_REGION=
AWS_ACCOUNT=
AWS_PROFILE=
MDAI_EC2_INSTANCE_CLASS=
MDAI_EC2_INSTANCE_SIZE=
MDAI_CLUSTER_CAPACITY=
MDAI_CLUSTER_NAME=
COGNITO=
MDAI_UI_HOSTNAME=
MDAI_UI_USER_POOL_DOMAIN=
AWS_SSO_ROLE=
Our scripts will use these environment variables to run their scripts so be sure to double check you're running the scripts on the correct environment.

Step 3: Delete all your load balancers

This step is instantaneous and you shouldn't have to wait at all for changes to take effect.

./make/scripts/delete-load-balancers.sh

Step 4: Delete your stack

Grab your popcorn! 🍿 This step takes anywhere from 15-20 minutes as you'll be deleting all the artifacts related to the cluster from installation.

Note: The destroy process will run for a while and may, and likely will, return an error due some resources having dependencies and being deleted out of order. Once the delete fails, please proceed to steps 5 & 6.
# Delete the stack
./make/scripts/delete-stack.sh

You should be able to following the delete progress in one of two ways...

  1. Via the AWS Console UI (replace AWS_REGION with your desired region)
https://AWS_REGION.console.aws.amazon.com/cloudformation/home?region=AWS_REGION#/stacks?filteringText=&filteringStatus=active&viewNested=true
  1. Via the CLI
aws cloudformation describe-stacks \
    --stack-name "$STACK_NAME" \
    --region "$REGION" \
    --profile "$PROFILE" \
    --query "Stacks[0].StackStatus" \
    --output text

🎵 Here's some mood music while you wait. 🎵

Step 5: Delete network artifacts

One of the main reasons your stack fails in the networking layer of your cluster. This script should help delete the artifacts and resources related to networking.

./make/scripts/delete-network-artifacs.sh
Note: You may need to run this a couple times to succeed, however, if you continue to see artifacts not deleting via messaging, you can use step 6 to finalize the destroy process.

Step 6: Force delete the stack

./make/scripts/force-delete-stacks.sh

Step 7: Run the destroy process again via AWS Console UI

If all else fails, you should be able to visit your Stack url in the AWS Console UI and retry delete with the force delete option.


⏪ Back to: Enable MDAI Cluster ⏩ Next Step: Generate Telemetry ⏩

Intro: Generate and Receive Telemetry


What kind of user are you?

TelemetryGen

I don't have any existing agents/collectors that I can use to send telemetry right now.

Start with TelemetryGen now!

Postman & mock MLT payloads

I want to send some payloads using Postman to my MDAI Cluster!

Send sample data with Postman

curl or grpcurl & mock MLT payloads

I want to use my terminal to send some payloads using Postman to my MDAI Cluster!

Send sample data via terminal

Real Collectors, Real pipelines

I have existing sources of telemetry I'd love to point to my new MDAI Cluster!

Setup your collectors with OTel pipelines now!




⏪ Back to Installation Next Step: TelemetryGen ⏩

Using TelemetryGen to send data to MDAI Cluster

Why use TelemetryGen?

Using TelemetryGen is a great option if you're not ready to commit to the costs associated with ingress/egress to your cluster. It's all local to the cluster you have just deployed, so there will not be additional charges, minus the compute required to generate and process the telemetry.

Let's begin!

In our example we'll

  1. Create or use an existing cronjob or job
  2. Apply the cronjob/job to your cluster
  3. Watch telemetry flow through the console
  4. Destroy the cronjob/job
  5. Eat cake!

Step 1 - Create or use cronjobs

You can create a CronJob (or use one from the /jobs directory).

Step 2 - Apply the cronjob to your cluster

kubectl apply -f <cron_filename.yaml>

Step 3 - Validate job is running

Via kubectl

# Get job by name - replace job-name with your cronjob's name
kubectl get cronjob job-name

# Still not enough? Watch your cronjob using the --watch command
kubectl get cronjob --watch

At this point, you should also be able to see data flowing through the MDAI Console to validate data flow.

# Delete job by name - replace job-name with your cronjob's name
kubectl delete cronjob job-name
It is critical that you delete the cronjob once testing is complete, otherwise MDAI Cluster costs will continue to increase as the MDAI Cluster continues to processes all incoming telemetry data.

Step 5 - Eat cake 🍰!

You sent your first telemetry record to MDAI!

🎉 That's it! Your MDAI Cluster is now ready to use! 🎉

Want more ways to send your cluster data?

Still curious about the other ways to generate and send telemetry?



⏪ Back to Intro: Generate and Receive Telemetry

Using curl or grpcurl to send data to MDAI Cluster

⚠️ Note: These docs assume you have already installed an MDAI Cluster! ⚠️

Finding and setting HTTP_HOST and GRPC_HOST

Not sure where to get the variables for GRPC_HOST and HTTP_HOST?

Cloud

GRPC_HOST & GRPC PORT

You can find your GRPC_HOST value by running:

kubectl get ingress -n default

Your output should look something like this...

example ingress output

For gateway_ingress, note the HOSTS value. This will be used for your GRPC_HOST

Your GRPC_PORT will always be :443, as this is the secure listener for your gRPC endpoint.

You can also make note of the mdai-console-ingress, ADDRESS value to validate requests sent from postman later.

HTTP_HOST & HTTP_PORT

You can find your HTTP_HOST value by running:

kubectl get svc gateway-collector -n default

Your output should look something like this...

example ingress output

For the gateway_collector service, note the EXTERNAL-IP value. This will be used for your HTTP_HOST

Your HTTP_PORT will always be :4318, as this is the secure listener for your http endpoint.

Local

For Local testing your GRPC_HOST and HTTP_HOST will always be localhost.

Port-forward your ingress endpoints

The ports, however, will need to be forwarded to their respective testing ports. Luckily there are two pods created during cluster install. You should be able to forward each of them to retrieve requests, one for http and another to gRPC requets.

kubectl get pod

Your output should look like...

example pods output

Make note of your gateway-collector pod names. For simplicity we will use the ports:

  • 4713 for gRPC requests
  • 4813 for http requests
GRPC_PORT=4317
HTTP_PORT=4318

Now let's forward those ports!

# Example port-forward request
kubectl port-forward <POD_NAME> <PORT>:<PORT>

# gRPC Example 

kubectl port-forward gateway-collector-abcd123-xyz1 4318:4318

# HTTP Example

kubectl port-forward gateway-collector-abcd123-1zyx 4317:4317

Set the *_OTLP_ENDPOINT variables in your terminal

HTTP_HOST=<ENTER THIS USING THE OUTCOME FROM ABOVE STEPS>
HTTP_PORT=<ENTER THIS USING THE OUTCOME FROM ABOVE STEPS>
GRPC_HOST=<ENTER THIS USING THE OUTCOME FROM ABOVE STEPS>
GRPC_PORT=<ENTER THIS USING THE OUTCOME FROM ABOVE STEPS>
GRPC_OTLP_ENDPOINT="$GRPC_HOST:$GRPC_PORT"
HTTP_OTLP_ENDPOINT="$HTTP_HOST:$HTTP_PORT"

Start firing off requests

Sending a curl

Send a curl request to the collector (e.g. for Logs):

# execute the request using the data from the example log data file
curl -X POST -H "Content-Type: application/json" -d @./examples/http_service/logs.json -i http://${HTTP_OTLP_ENDPOINT}/v1/logs

Sending a grpcurl

Pre-req Install grpcurl

Install grpcurl using your preferred method in the grpcurl Installation Guide

or

Via homebrew

brew install grpcurl

Send a request

Warning: You must run this command from the directory where you copied our InkOps repo. You may also copy the /examples directory from the InkOps repo, and run this command from that location on your machine. You may need to change the path if you copy the `/example` directory to your machine.
# use data from ./examples/grpc_service/logs.json AND
# run the request using the data (as stdin) copied from above
```shell
cat ./examples/grpc_service/logs.json |
grpcurl -vv -insecure \
    -d @ \
    -proto examples/protos/opentelemetry/proto/collector/logs/v1/logs_service.proto \
    -import-path examples/protos \
    $GRPC_OTLP_ENDPOINT \
    opentelemetry.proto.collector.logs.v1.LogsService/Export

Note: Remember to change the Service path and Service Namespace when sending other signals (traces/metrics).

Feedback

Are you experiencing issues or would like to submit a request to our team?





Coming from Local Install?

⏪ Back to Sending Data to MDAI Want to validate telemetry data flowing through your pipelines?
Next Step: Validate ⏩

Know how to validate data flow already? Next Step: Congrats! ⏩





Coming from AWS Install?

⏪ Back to Verify Installation Want to validate telemetry data flowing through your pipelines?
Next Step: Validate ⏩

Know how to validate data flow already? Next Step: Congrats! ⏩

Using Postman to send data to MDAI Cluster


Sending Telemetry via Postman

Visit our MDAI Postman github repo to learn how to configure Postman using helpful environment variables and our pre-defined postman collection for sending logs, metrics, and traces via HTTP requests.




Coming from Local Install?

⏪ Back to Verify Installation Want to validate telemetry data flowing through your pipelines?
Next Step: Validate ⏩

Know how to validate data flow already? Next Step: Congrats! ⏩





Coming from AWS Install?

⏪ Back to Verify Installation Want to validate telemetry data flowing through your pipelines?
Next Step: Validate ⏩

Know how to validate data flow already? Next Step: Congrats! ⏩

Using a real agent/collector to send data to MDAI Cluster

Visit the OTel Collector Docs site to learn more about configuring your collector based on your specific data sources and destinations.

General setup and connectivity from collector to MDAI

  1. Determine which sources of data (collector/agent) you'd like to point at your MDAI Cluster instance.
  2. Use your CNAME (from your host provider) or DNS (from AWS LB) as mentioned in our Ingress documentation.
  3. Configure your agent/collector to point to the CNAME or DNS as mentioned in our Ingress documentation.
  4. SEE RESULTS! Verify data flow

Where can I find my ingress endpoints?

Finding and setting HTTP_HOST and GRPC_HOST

Not sure where to get the variables for GRPC_HOST and HTTP_HOST?

Cloud

GRPC_HOST & GRPC PORT

You can find your GRPC_HOST value by running:

kubectl get ingress -n default

Your output should look something like this...

example ingress output

For gateway_ingress, note the HOSTS value. This will be used for your host

Your port will always be :443, as this is the secure listener for your gRPC endpoint.

You can also make note of the mdai-console-ingress, ADDRESS value to validate requests sent from postman later.

Note: Your ingress endpoint for gRPC connections will be in the following format... ${host}:${port}

HTTP_HOST & HTTP_PORT

You can find your http ingress endpoint value by running:

kubectl get svc gateway-collector -n default

Your output should look something like this...

example ingress output

For the gateway_collector service, note the EXTERNAL-IP value. This will be used for your host

Your port will always be :4318, as this is the secure listener for your http endpoint.

Note: Your ingress endpoint for http connections will be in the following format... https://${host}:${port}

Examples coming soon

While we know these docs will help guide you to the right configuration, we understand you might need further help. We're actively working on examples connecting commonly used collectors/agents to an MDAI Cluster.

If you have special requests for how-to-guides..


⏪ Back to Intro: Generate and Receive Telemetry Next Step: TelemetryGen ⏩

Advanced Configurations

We have a few advanced configuration options that you can apply to your Cluster.

AutoScaling

Local only

AWS

Modules

Other

Coming soon...

AutoScaling

OpenTelemetry Collector

We have built an HPA config for one of our cluster's core components, the OTel collector app (lib/mdai-operator.yaml) with the following settings:

# 2 replicas recommnded
replicas: 2
# mandatory only when  autoscaler is enabled
# otherwise optional. can be used to limit resources consumtion
resources:
  limits:
    cpu: 500m
    memory: 500Mi
  requests:
    cpu: 250m
    memory: 250Mi
Note: If you decide to change these settings, it is possible that operations and performance of the OTel collector and cluster may become sub-optimal. We've tested with this configuration and cannot guarantee performance outside of these bounds.

Benchmarking - Plans for the future!

We are working hard to understand and create multiple configurations that are resource and cost-effective under predictable and variable workload demands.

We will soon publish a report of the current constraints we are using for testing the above configuration, as well as others.

Other Cluster Components

📺 Coming soon! Stay tuned! 📺

We will shortly begin work on optimizing HPA (or any other scaling method) for both the following components pending community feedback

  1. MDAI Console
  2. Datalyzer module

Have some recommendations for where we should spend our energy?

Are you experiencing any pain using the MDAI Cluster? Do you have specific configurations or workload demands you'd like us to help test?



⏪ Back to: Installation Next up: MDAI Console Usage ⏭️

Node AutoScaling

Karpenter NodePool Settings

The following configuration is used for Karpenter to properly autoscale the EC2 Nodes that your Cluster runs on.

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["t"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["t2"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["t2.micro"]
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h # 30 * 24h = 720h
Note: If you decide to change these settings, it is possible that operations and performance of your Cluster may become sub-optimal. We've tested with this configuration and cannot guarantee performance outside of these bounds.

Benchmarking - Plans for the future!

We are working hard to understand and create multiple configurations that are resource and cost-effective under predictable and variable workload demands.

We will soon publish a report of the current constraints we are using for testing the above configuration, as well as others.

Have some recommendations for where we should spend our energy?

Are you experiencing any pain using the MDAI Cluster? Do you have specific configurations or workload demands you'd like us to help test?



⏪ Back to: Installation Next up: MDAI Console Usage ⏭️

Managing Ingress for the MDAI Cluster's OTel Collector

DNS

To expose the MDAI Cluster's OpenTelemetry Collector to the public, we've simplified the provisioning of two types of Load Balancers.

  1. AWS Application Load Balancer mdai-grpc-endpoint
  2. AWS Network Load Balancer mdai-non-grpc-endpoint

These Load Balancers receive gRPC and non-gRPC (HTTP, TCP, UDP) data respectively.

The Load Balancers can be referred to by their DNS names, which are assigned by AWS. DNS LB Names

For secure connectivity, it’s highly recommended - and for TLS (SSL) endpoints, required - that you create CNAME DNS records in your domain, pointing to the Load Balancer DNS names generated by AWS.

gRPC

In order to provide routing for multiple gRPC receivers of the same type, each receiver should have its own hostname in your public domain. Additional configuration steps are required to provide a mapping between the receiver name and its proposed hostname. A single gRPC Load Balancer will be able to manage ingest from multiple gRPC sources. The current version of the MDAI Cluster only provisions a single gRPC Load Balancer by default.

Example

LB DNS name: mdai-grpc-endpoint.us-east-1.elb.amazonaws.com

OTel Collector Configuration Excerpt: ( This example has 2 gRPC receivers. )

service:
  pipelines:
    traces:
      receivers: [jaeger, otlp]
      processors: [memory_limiter, batch]
      exporters: [debug]

Required CNAME records in domain your-domain.io: | CNAME | AWS DNS NAME | Access URL | | ----------- | --------------- |------------ | | jaeger.grpc.endpoint | mdai-grpc-endpoint.us-east-1.elb.amazonaws.com | jaeger.grpc.endpoint.your-domain.io | | otlp.grpc.endpoint | mdai-grpc-endpoint.us-east-1.elb.amazonaws.com | otlp.grpc.endpoint.your-domain.io |

Manual updates for OpenTelemetry Collector configuration

You would then need to update your OTel configuration to reflect these changes before the MDAI Cluster can receive telemetry via the Access URL.

Navigate to your values/params-values-otel.yaml file:

collectorEndpoints:
  otlp: otlp.grpc.endpoint.collector.your-domain.io
  jaeger: jaeger.grpc.endpoint.collector.your-domain.io

non-gRPC

Example

LB DNS name mdai-non-grpc-endpoint.us-east-1.elb.amazonaws.com

Required CNAME records in domain your-domain.io:

CNAMEAWS DNS NAMEAccess URL
endpointmdai-non-grpc-endpoint.us-east-1.elb.amazonaws.comendpoint.your-domain.io

TLS (SSL)

TLS (SSL) Certificates should be generated and issued for all chosen host names.

Steps for Issuing or Importing a Certificate

  • Choose to Import or Issue a Certificate
  • Once you have access to your certificate in AWS ACM, it will be accessible via Amazon Resource Name (ARN). These ARNs need to be provided as a configuration parameter during the configuration phase, so make note of these ARNs. ACM ARN

gRPC

  • TLS (SSL) is MANDATORY for the gRPC endpoint.

  • From the example above, the gRPC host names that would need certificates are:

    • jaeger.grpc.endpoint.your-domain.io
    • otlp.grpc.endpoint.your-domain.io
  • After identifying the certificate ARN, you are required to update the values/params-values-otel.yaml file:

spec:
  ingress:
    annotations:
      alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:1234567890:certificate/th15-15-@n-@ut0g3n3r@ted-numb3r

non-gRPC

  • SSL is not mandatory, but is highly recommend for non-gRPC ingress.

  • From the example above, the gRPC host name that would need a certificate is:

    • endpoint.your-domain.io
  • After identifying the certificate ARN, you are required to update the values/params-values-otel.yaml file:

metadata:
  name: test-collector
  namespace: default
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:us-east-1:1234567890:certificate/th15-15-@n-@ut0g3n3r@ted-numb3r

Console

DNS

To expose your MDAI Console UI to the public Internet, an AWS Application Load Balancer mdai-console will be provisioned. This Load balancer can be referred to by its DNS name, which is provided by AWS. For secure connectivity, it’s highly recommended - and for TLS (SSL) endpoints, required - that you create CNAME DNS records for your domain that point to the Load Balancer DNS names generated by AWS.

Example

LB DNS name: mdai-console-12345678910.us-east-1.elb.amazonaws.com

Required CNAME records in domain your-domain.io: | CNAME | AWS DNS NAME | Access URL | | ----------- | --------------- |------------ | | mdai-console | mdai-console-12345678910.us-east-1.elb.amazonaws.com | mdai-console.your-domain.io |

TLS (SSL)

TLS (SSL) can be enabled for the MDAI Console UI endpoint. In order to do that, a TLS (SSL) certificate should be issued for the chosen hostname (i.e. mydecisive.domain.dom from the example above), imported into the AWS ACM, and accessed via Amazon Resource Name (ARN). This ARN needs to be provided as a configuration parameter during the configuration phase in the file: values/aws.env

Example: MDAI_UI_ACM_ARN=arn:aws:acm:us-east-1:12345678900:certificate/eaa7da72-1c8d-4277-8131-ab9dfdc35f37



Back to Deploy Your MDAI Cluster

Next Step: Verify Your Installation >>

Datalyzer Module

Primary objective

Track and analyze the total and actual OpenTelemetry Protocol (OTLP) bytes transferred by each source (service/resource) through a collector.

How it works

We built in a special OTLP Service that we export data to that allows us to understand the sender (by service.name) of a pLog, pMetric, and/or pTrace records, as well as measuring the payload size for each record that flow through your pipelines.

This helps us answer your questions:

  1. Who's sending the most data in and out of my cluster
  2. How much are they sending in and planning to send out of my cluster?

See it in action

  1. Visit the Analysis tab in your console.
  2. Send some payloads (real or fake).
  3. See the sender and size of message in and out of your cluster!
  4. Save money, eat cake instead! 🍰

Usage Guide

Using the MDAI Console

OTel Management

Use Cases

Telemetry Muting

The MyDecisive MDAI Cluster Console

Introduction

The MyDecisive MDAI Cluster console provides an at-a-glance display of all the telemetry data that has flowed through your MDAI Cluster in the last hour. It is a web application accessible via HTTPS within the MDAI Cluster.

Data Flow Overview

The console offers three primary areas to help you understand how your data flows: Sankey Diagram: A. The main section displays a Sankey diagram showing the data flow. B. Filters: Filters at the top allow you to customize what you see. C.Sidebar: On the right, a sidebar summarizes your data totals and data reduction.

Sankey Diagram

The MDAI Cluster Console showing data flows

The initial view (A) of the diagram shows how your different data types (Metrics, Logs, and Traces) flow from receivers to exporters, allowing you to see the overall data volumes flowing through the system.

Hovering over any component highlights the specific data path(s), helping you zero in on a specific issue.

The MDAI Cluster Console showing data flows

Pipeline View

The MDAI Cluster Console showing pipeline configuration

Clicking on any component in the Sankey diagram will filter and switch the view to a pipeline view. This shows all data types interacting with that component and highlights its configuration in the textual configuration view, allowing you to see the composition of your pipeline.


The MDAI Cluster Console with pipelines mode switch highlighted

As you view the console to understand your data flows, you can switch between Pipeline and Data flow view at any time to see the appropriate elevation.

Filters

The MDAI Cluster Console showing content filter options

The filters (B) at the top-right of the console allow you to constrain the data flow and pipeline views to specific components or data types. These filters can be combined, and options are automatically refined based on your selections.

Top Talkers and Top Listeners

The Top Talkers filter shows the top three receivers by data volume, and the Top Listeners filter shows the top three exporters by data volume.

More Filters

The More Filters menu allows you to filter your view by receiver, processor, data type, or exporter.

Summary of Data Totals and Data Reduction

The console sidebar (C) displays:

  • A count of receivers, pipelines, and exporters running on your collectors
  • The total volume of data flowing through them in the last hour

The sidebar updates with other changes on the page, allowing you to optimize your pipelines for maximum cost efficiency.

Note: Traffic flowing to debug exporters is not counted in data flow totals.

Telemetry Pipeline Manual Muting

⚠️ This page assumes that you've already done the installation process either locally or cloud-based. ⚠️

TL;DR

Tell me what I'm getting with Manual Mute

An MDAI Cluster provides the ability to mute by Metrics, Logs, and/or Trace data within OpenTelemetry telemetry pipelines. You can do this with our CLI, or from your MDAI Cluster's Console.




Telemetry Muting

One of our primary goals is to give back full-control of telemetry pipelines. Moving and storing data is expensive.. it doesn't have to be. Imagine a world where you only send data to observability vendors when you need to, i.e., coming up on an event (incident, spike in activity, etc).

📣 Introducing Manual Muting! 📣

What is Telemetry Muting?

Simple! Telemetry muting is the act of stopping data flow through a telemetry pipeline, period.

Through our muting capability, you can create rules in your data pipelines that will help you stop/start data flow. These rules can be enabled or disabled on-demand, however, manually. We plan to continue building out this capability, so stay tuned for more updates.

Why use Telemetry Muting?

Cost Savings & Telemetry Control

As we mentioned, it's very expensive to send data into and out of your technology envelope. Not only is it expensive, but not all data is high-priority data.

Manual muting affords you the ability to start and stop data flow and saving money by only sending the highest priority data per your definition. This can be achieved by:

  1. A CLI command to apply mute rules to your OTel pipelines
  2. Using the MDAI Console's streamlined rule builder wizard to apply mute rules to your OTel pipelines

Note: Both of these options will work together, so you can choose what works best for you after experimenting.

❗️Limited support❗️

Our first iteration of muting only allows you to mute by one or all of Metrics, Logs, and Trace based pipeline definitions. Don't worry, we're hard at work to expand the Telemetry Muting capability.

Future Enhancements

Based on usage and adoption of this feature, we plan to expand the ability to have telemetry and cost controls.

If you have feedback or suggestions about the future of automute.




Choose your adventure:

Mute with CLI
Mute with Console

Mute Rules

What are mute rules?

Mute rules are contextual configurations (instructions) that allow the ability to control data flow through defined OTel Pipeline configuration for a given cluster

Available Rules

  • Add a rule - Allows a user to apply muting instructions to a given pipeline.

  • Toggle a mute rule - Allows a user to apply muting instructions to a given pipeline.

  • Edit a mute rule - Allows a user to modify muting instructions to a given pipeline.

  • Delete a mute rule - Allows a user to delete a muting instruction.




Start muting with the CLI now! ⏭️

Start muting with the Console now! ⏭️

Manual Muting via the MDAI CLI

Let's see muting in action with the MDAI CLI!

Note: You can find more helpful information for CLI usage in our CLI repo.

Add a mute rule

Toggle a mute rule

Unmute and remove rule

Manual Muting via the MDAI Console UI

Let's see muting in action within the Console!

Add a mute rule

Toggle a mute rule

Edit a mute rule

Delete a mute rule

Updating OpenTelemetry Configurations

If you find you'd like to modify your OpenTelemetry collector configuration, you have the option to change it.

Modify your OTel Config

Navigate to the lib/mdai-operator.yaml file and modify it to fit your needs.

Apply changes to your collector

You can now apply them to the cluster using the following command.

kubectl apply -f lib/mdai-operator.yaml

You should start seeing changes reflected immediately in the Console UI.

Contributing guide

Coming soon! Check back shortly.

Don't want to wait? Email us at support@mydecisive.ai

Troubleshooting

🐞 Known issues

Docker daemon not started

Upon installation you encounter an error where Docker is the culprit, please ensure the docker daemon is running. Here's a helpful docker guide" to help resolve this issue.

An alternative to using the command line to start the Docker daemon is to open the Docker Desktop application.

This is an example of the error you would see during install related to the docker daemon. docker error

Script timeout during automated install

This is an example of the error you would see during the automated install if there is a timeout. docker error

🤔 What can I do to fix it? ... It depends on where the error occurs during timeout. Let's investigate some potential fixes!

Option 1: Run a make target following a common breaking point

We've noticed the the typical timeout location is immediately following the ✅ Complete mdai-wait-for-pods! checkpoint. You should be able to run the following command to finish the install process.

make -f ./make/Makefile-local-install mdai-deploy-config

Option 2: Re-run the automated install process

If you errored out at a different checkpoint, try running the make -f ./make/Makefile-local-recipes create-mdai command again.

Option 3: Report an issue to us!

Still not working? Please report a new GitHub issue!

Cognito callback url not configured to your domain

This is an example of the error you would see during the automated install if there is a timeout. cognito callbackurl

How to fix...

Option 1: Add your LB url as the callback url

Go to your AWS Cognito UserPool settings and add a callback url that represents your load balancer's URL. cognito callbackurl

Option 2: Create a CNAME record for your MDAI Console in your domain host.

Go to your host for your domain (e.g., GoDaddy) and add a CNAME record using your load balancer's DNS name to map to the key/value map for MDAI_UI_USER_POOL_DOMAIN you set in the aws.env file.

🔎 Report a new issue!

If this doesn't fix the error, feel free to:

Alpha Expectations

Warning: This is an Alpha release. This codebase is experimental and in active development. Proceed with caution and use at your own risk.

  1. Limited Functionality: The alpha release will have limited functionality compared to the final product. Users should expect only basic features and capabilities to be available.

  2. Experimental Features: Some features may be experimental or in early stages of development. Users should be aware that these features may change or be removed in future releases.

  3. Bugs and Issues: Users should expect to encounter bugs, glitches, and issues while using the alpha release. They should be prepared to provide feedback and report any issues they encounter.

  4. Incomplete Documentation: Documentation for the alpha release may be incomplete or missing entirely. Users should expect to rely on their own experimentation and troubleshooting skills to use the software effectively.

  5. Limited Support: Support for the alpha release may be limited or non-existent. Users should not expect immediate assistance or troubleshooting help from the development team.

  6. Data Loss or Corruption: Due to the experimental nature of the alpha release, there is a risk of data loss or corruption. Users should avoid using the software with critical or sensitive data.

  7. Changes and Updates: The alpha release is a work in progress, and changes and updates are to be expected. Users should be prepared for frequent updates, changes to features, and improvements over time.

  8. Community Engagement: Users are encouraged to provide feedback, suggestions, and bug reports to help improve the software. Community engagement and collaboration are essential for the success of the alpha release.

  9. Testing and Evaluation: The alpha release is primarily intended for testing and evaluation purposes. Users should use the software to provide feedback on its usability, functionality, and performance.

  10. No Guarantees: There are no guarantees or warranties associated with the alpha release. Users should use the software at their own risk and understand that it may not be suitable for production use.