Skip links

Enhancing SRE Operations for a Unicorn Security Startup with an AI-Powered Chat Application

Enhancing SRE Operations for a Unicorn Security Startup with an AI-Powered Chat Application

Executive Summary

A global leader in the Secure Access Service Edge (SASE) industry struggled with operational bottlenecks in its Site Reliability Engineering (SRE) team, largely due to challenges accessing relevant information and correlating it during incidents. With critical operational information spread across Jira and Confluence, the team experienced information fatigue, making it cumbersome to sift through extensive documentation and reports to identify relevant insights. This led to slower incident response times, a complex onboarding process for new employees who struggled to absorb the processes, and overlooked correlations between current tickets and past incidents.

To overcome these challenges, Crest Data built and deployed a bespoke AI chatbot integrated with Slack that optimises the SRE process. Employing a Retrieval-Augmented Generation (RAG) model using LlamaIndex and OpenAI models, the system enables natural language queries to search Jira and Confluence for real-time, contextual information. Through real-time information correlation and document summarization, Crest Data helped the startup cut incident resolution times by more than 40% and onboarding times by 30%. This scalable solution not only removed the need for manual search and research but also improved collaboration by providing the right answers within the team’s communication platform.

About the Customer

The customer is a global leader in cloud security, taking a data-centric approach to protecting users and information across all environments. Their security platform offers unrivaled visibility and real-time threat protection for cloud services, websites, and private applications, accessible from any location or device. By operating one of the world’s largest and fastest security networks, the company provides sophisticated security solutions that address the complexities of modern cloud infrastructure.

Customer Challenge

The customer, a pioneer in the Secure Access Service Edge (SASE) security startup space, had a number of operational challenges within its Site Reliability Engineering (SRE) team that resulted in productivity and incident management issues. The main issues were:

  • Slow Incident Resolution: Manual searches for and cross-referencing of key information across tools such as Jira and Confluence during incidents was painfully slow, resulting in delayed resolutions.
  • Data Overload: The team found it challenging to sift through and gather insights from a vast amount of documents and past incident reports, which made it harder to quickly access the required information.
  • New Team Member Assimilation: The assimilation process for new SRE team members was complicated by the need to digest detailed processes and resolutions to past incidents in the absence of summaries.
  • Knowledge Fragmentation: Operational insights and correlations between active tickets and past incidents were often missed, making it difficult to prepare for incident escalation and impacting team efficiency.

Customer Solution

To address the challenges the SRE team was facing, Crest Data developed a custom AI-driven Chat Application. This application was built specifically to connect with the existing workflow tools, such as Jira and Confluence, allowing the team to access data and insights.

Key features of the solution include:

  • Natural Language Querying: SRE team members can search Jira tickets and Confluence documents using simple, natural language, eliminating the need for time-consuming manual search.
  • Dynamic Information Linking: The tool automatically recommends relevant documents and comments to link active tickets with past information, enabling better resolution of complex problems.
  • Automated Summarization: To accelerate onboarding, the tool offers summaries of important documents, enabling new employees to quickly familiarise themselves with complex SRE processes.
  • Cross-Referencing Capabilities: It enables cross-referencing between open incidents and historical information to better prepare for escalations.

Architecture Highlights:

The tool was developed with a cutting-edge AI stack to provide reliable and user-friendly responses:

  • Retrieval-Augmented Generation (RAG): Leverages LlamaIndex and OpenAI models to provide relevant, accurate answers.
  • Vector Database: Uses Chroma DB to store embeddings of operational data, allowing for quick access to relevant data.
  • Slack Integration: The user interface is entirely hosted within Slack, enabling the team to work collaboratively and access the data in their main communication channels.
  • Flask Backend: A powerful backend to process user queries and integrate with Confluence and Jira.

Crest Data further added value by providing a scalable and maintainable system with periodic updates, ensuring the solution continues to evolve alongside the customer’s operational needs.

Outcomes 

The deployment of the AI-powered chat application by Crest Data revolutionised the SRE team’s processes, resulting in several positive outcomes and benefits:

  • Significant Decrease in Incident Resolution Time: The startup’s time to resolution during incidents was decreased by more than 40%. This is a result of the application’s ease of search, which reduced the time required to search for key data across multiple platforms.
  • Faster Onboarding of SREs: The onboarding process for new SREs was 30% more efficient. The summarization of long documents within the application enabled new hires to onboard quickly and easily absorb past incident practices.
  • Improved Decision-Making and Insights: The system’s ability to automatically correlate current tickets with historical data, as well as summarizations of documents, allowed the SRE team to make decisions with confidence.
  • Enhanced Team Collaboration: The integration of the solution with Slack improved team collaboration and communication, as key insights were available within the team’s communication platform.
  • Efficient Knowledge Sharing: The application used automated updates to provide the team with the latest operational data, removing manual knowledge-sharing barriers.

About Crest Data

Crest Data is a data and AI-driven technology solutions provider for enterprises and technology innovators across cybersecurity and cloud security, helping them move faster and more securely. Our expertise lies in Conversational AI & chatbots, using RAG (Retrieval-Augmented Generation) to provide contextual insights from multiple data sources. Our AIOps for Incident Management solutions have achieved a significant reduction in incident resolution time.