relataly.com

Agentic Web Scraping with Azure AI Foundry Agent Service: Insights from Building AIUseCaseHub.com

Florian Follonier — Sun, 15 Jun 2025 19:59:33 +0000

Recently, I’ve been exploring how to leverage intelligent agents to streamline the discovery and organization of real-world AI use cases from across the web. This experimentation led me to develop AIUseCaseHub.com, a platform that employs a multi-agent backend powered by Azure AI Foundry for finding and scraping cases online. In this article, I’d like to share some insights into how the agentic scraping process works. And I’ll provide some practical advice on building similar cloud-based projects through strategic use of Azure technologies and prompt engineering.

Intro: What is AIUseCaseHub.com?

AIUseCaseHub.com is a dynamic web app designed to curate and showcase AI use cases spanning various industries, countries, and sources from across the internet. I created this platform out of a recurring need—colleagues and partners regularly approach me with questions such as,

“What interesting AI implementations are happening in finance?”,
“Can you share examples of AI use cases in healthcare?”, or
“Which Swiss customers have successfully implemented AI agents?”.

Recognizing the value of a consolidated resource, I started building agents to proactively monitor the web for AI implementations. The idea was to create an agentic flow that systematically searches the web for relevant cases and organizes the information.

Overview of AIUseCaseHub.com allows filtering a searching for AI use cases.

The case explorer uses embeddings and principal component analysis (PCA) to illustrate the use cases on a 2 dimensional chart.

Once I recognized the significant value the collected use cases provided for my own work, I decided to share them more broadly by turning the internal database into a public web platform. Today, AIUseCaseHub.com is freely accessible to everyone, allowing anyone to easily explore and discover impactful AI implementations.

What are Agents?

Following the rise of Generative AI (GenAI), the next major shift revolves around AI agents. Essentially, these agents are large language models (LLMs) equipped with greater autonomy and the ability to actively use external tools. Sounds complicated? It’s really not. The trick is to empower an AI model to independently execute practical actions—like inserting data into databases, generating tickets, or sending emails—by deciding when and how to perform these tasks.

Integrating tools with LLMs isn’t entirely new; early approaches required meticulous parsing of structured outputs to initiate actions. Back when GPT-3 first gained popularity, getting an LLM to effectively utilize external tools was notably challenging. So where does the enthusiasm around agents come from? I believe it largely stems from advances in platforms like Microsoft’s Azure AI Agent Service that make building agents much easier than in the past.

Platforms such as Azure AI Foundry Agent Service now offer critical agent functionalities out-of-the-box—including robust conversation management (incl. memory and thread handling) and seamless tool integration—allowing developers to concentrate on instruction design and tool customization. This ease of use and lowered technical barriers truly defines the transformative power of today’s AI agents.

Modern agents adeptly break down intricate tasks into simpler subtasks, independently handling each step while maintaining a clear overall context. You may also have heard of standards like Model Context Protocol (MCP) or Agent-to-Agent (A2A). These are enhancing interoperability and provide a great long-term vision for agent-to-agent communication. However, i believe, the core breakthrough is the ease at which it is now possible to create and operate powerful agentic systems.

Agentic Scraping

Web scraping is the automated process of extracting information from websites—transforming unstructured web content into structured, usable data. It’s frequently used to gather valuable insights, track market trends, or consolidate information scattered across multiple sources online.

Traditionally, web scraping relies on manual scripting or rigid automation tools. These are typically designed for specific webpage layouts. The downside? These tools often break when even small webpage elements change. As a result, these tools require continuous adjustments to maintain data relevance and quality. Even minor alterations, such as changes in HTML elements or button placements, can disrupt conventional scraping workflows. This is why agents are particularly well-suited to web scraping. Their increased autonomy and adaptability allow them to adapt to changes and effectively handle complexity.

Agents can dynamically leverage multiple tools to flexibly respond to evolving conditions. They can execute web searches, open URLs and reason over structured and unstructured content scraped from the web. Their ability to try out alternative ways makes them more fault tolerant. They are thus providing a more robust and resilient alternative to traditional automation methods.

Let’s now take a look at the business challenge of web scraping and how agents can help.

The Business Challenge: Monitoring AI Use Cases Online

So why is it so hard to find and monitor AI use cases? Let me explain. I define an AI use case as a real world implementations of AI with a project-like nature. Such cases are scattered across the web and published by many sources. Cloud providers like Microsoft often publish them on industry-specific or event-specific sites. News outlets pick up these stories and republish them in different formats.

On top of that, customers, consulting firms, and technology partners share their own versions. This fragmented landscape makes it tricky to pinpoint all relevant pages. In addition, multiple sites might report the same use case in slightly different ways. This creates a high risk of duplicates. Furthermore, new cases appear on a daily basis, which demands for frequent updates.

Instead of tracking a few known sites manually, I tackled this challenge with an agent-driven approach. This approach makes use of the capability of an agent to use tools such as performing web search and opening URLs to extract content from the web directly. I also added techniques to boost data quality and reduce the chance of duplicates in the final dataset.

Agentic Scraping Architecture

Returning to AIUseCaseHub.com, today multiple specialized agents work continuously—24 hours a day, seven days a week—to populate the platform with relevant AI use cases. The overall scraping architecture operates as a streamlined pipeline or queue, where numerous potential articles are assessed, but only the highest-quality entries reach the curated “gold” database.

Agentic Backend Architecture of AIUseCaseHub.com

Four specialized agents collaboratively manage this workflow, each with clearly defined responsibilities:

Screener Agent

Searches the web for promising AI cases using traditional web search methods. It carefully screens results based on criteria such as real-world implementation, involvement of Microsoft technologies, clearly identified customers, demonstrated business impact or measurable outcomes, and a specific focus on AI, agents, or automation.

Writer Agent

Specializes in extracting structured information from selected web articles, utilizing multiple extraction methods. It meticulously captures over 30 different data points per case—including Industry, Country, Customer Name, and Partner Name. Due to its significant role in shaping overall data quality, the Writer Agent is particularly crucial within the workflow.

Reviewer Agent

Performs rigorous quality assurance checks and identifies potential duplicates. Additionally, the Reviewer Agent provides feedback directly into the search logs, creating a valuable feedback loop that continually refines the screening process.

(Social Media Agent)

Summarizes validated use cases from the gold database and publishes concise updates directly to my BlueSky Social account, ensuring broad visibility and engagement with a wider audience.

Agent Orchestration

For orchestrating the agents behind AIUseCaseHub.com, I rely on the Azure AI Agent Service, which simplifies agent creation by managing conversation memory and providing an intuitive Python SDK to seamlessly integrate various tools. Azure AI Agent Service supports multiple LLM models, including OpenAI’s GPT-4.1, GPT-4-mini, and GPT-4.1-nano. This flexibility allows me to strategically optimize costs by selecting the right model for each job. For example, I use smaller, less expensive models for simpler tasks such as web search while reserving powerful models for complex operations such as data extraction.

Agents still require a dedicated runtime environment for orchestration—this is the core system that invokes the LLMs, manages interactions with tools, and handles results. To fulfill this role efficiently, I host the orchestration logic on Azure Functions, a cost-effective, serverless computing platform. Moreover, most agent-specific tools are implemented as standalone Azure Functions exposing individual APIs, integrated via lightweight wrappers. This modular setup not only enables easy reuse of tools across different agents but also provides scalability and flexibility as the agent ecosystem grows.

Let’s now delve deeper into the specific tools utilized within this orchestration setup.

Overview of the Agentic Scraping Process and the Tools used in the Process

Tools Used in Agentic Scraping

The scraping workflow leverages a variety of specialized tools integrated into the agent orchestration to ensure efficient data collection, high-quality output, and robust content deduplication:

Web Search:
Executes traditional web searches, returning approximately 30 fresh URLs per search. Only URLs not previously processed are retrieved, ensuring high data quality from the outset. The specific search terms are intelligently determined by the Screener Agent.
Content Scraper:
Attempts multiple approaches to reliably access URLs and extract textual content from articles, providing resilience against common scraping challenges.
Cosmos DB Integration:
Handles storage and retrieval operations in a structured, scalable manner using Azure Cosmos DB, enabling smooth data flow between agents and persistent data management.
Search Term Logging:
Logs all executed search terms, providing valuable insights into search effectiveness and ensuring continuous improvement of the agent’s search strategy.
Database Statistics:
Provides agents with real-time analytics from the database, such as industry and country distributions, helping guide decision-making for future searches and balancing the diversity of collected cases.
Feedback Tools:
Enables the Writer Agent to provide detailed feedback to the Screener Agent about irrelevant or duplicate cases, alongside explanations for each rejection. This feedback loop continuously refines the search and screening processes.
Data Quality Tools:
Ensures consistency and accuracy in critical fields, such as industry and customer names, by leveraging predefined taxonomies. The agents proactively query existing database entries to prevent inconsistencies or variations (e.g., avoiding multiple variants like “Zurich Insurance AG,” “Zurich Insurance,” and “Zurich Germany”).
Deduplication Checks:
Generates embeddings for every new use case entering the queue, enabling efficient detection of similar cases. If cases exceed a similarity threshold, indicating potential duplication, they’re merged by linking additional sources to the existing entry rather than creating duplicates. These embeddings also support advanced exploration features, such as a visual 2D explorer of use cases and recommendations for related cases.

Single Agent vs Multi-Agent Workflows

It’s common for agentic projects to begin with a single, general-purpose agent, which is then divided into multiple specialized agents as complexity grows. This was exactly the case when I started building the agentic scraping pipeline for AIUseCaseHub.com. Initially, I had a single agent handling all tools and tasks. However, as I added more features to improve performance and accuracy, this all-in-one setup became increasingly unwieldy—especially for the data extraction step, which requires detailed instructions and careful handling.

At that point, it made sense to split the original agent into two specialized roles: one agent dedicated to screening for relevant cases and another focused entirely on writing and structuring the extracted information. This separation significantly improved both results and maintainability.

Yet, some tasks related to reviewing and ensuring data quality didn’t quite fit neatly into either of these two agents. To handle these checks more effectively, I added a separate Reviewer Agent. This three-agent setup proves to be a solid fit for now: it strikes a good balance between leveraging agents for process automation and keeping the overall workflow streamlined and manageable, with minimal unnecessary back-and-forth.

Multi-Agent Flow

There is currently no master agent orchestrating the entire process. Instead, each agent handles a specific step in the workflow. Agents don’t communicate with each other directly but coordinate indirectly through Azure Cosmos DB—by reading, writing, and retrieving records.

The Screener Agent runs on a 30-minute schedule. It can launch multiple searches in one run until it finds relevant cases. Once the Screener has added a potential case to the database, the Writer Agent and Reviewer Agent take over. They are triggered automatically by Cosmos DB’s change feed monitoring, which works seamlessly with Azure Functions. When a new record appears in the initial stage container, the Writer Agent processes it. The same mechanism triggers the Reviewer Agent for the next step.

The scraping pipeline is not error-free. Therefore, I added retry and repeat operations to handle failed steps, making the entire process more robust. This design is optimized for Azure Functions, which have a maximum execution time of five minutes per run. Keeping each step small and stateless avoids the need for complex async logic and makes testing and maintenance much easier.

A Word on the Cost of Running Scraping Agents

Using agents is generally not inexpensive. Higher-quality models typically produce fewer errors and complete tasks faster, but this comes with increased cost. For steps where mistakes could have a significant negative impact, it’s worthwhile to invest in stronger models to maintain accuracy and reliability.

Splitting the workflow into multiple agents instead of relying on a single, all-purpose agent helps optimize cost and performance by allowing different model sizes for different tasks. For example, in my setup, the Writer and Reviewer Agents run on the more capable but more expensive GPT-4.1. On the other hand, the token-intensive first step—screening for potential cases—uses the lighter and more cost-effective GPT-4.1-mini.

Naturally, more agent runs directly translate to higher costs, so it’s crucial to minimize searches that yield no valuable results. Additionally, over time, scraping efficiency tends to decline slightly as easily discoverable cases are exhausted. This means, it gets progressively harder for the agents to find fresh, unique examples.

Learning Agents

LLMs like GPT-4.1, by design, can’t truly learn autonomously—they can only adapt their behavior within each session using updated information in their context window. To make my scraping agents more effective, I implemented a pseudo-learning mechanism to guide the Screener Agent toward finding more diverse and relevant cases.

Without this, the Screener Agent often repeats the same search terms, reducing the chances of uncovering new content. This is essentially a variance challenge: ensuring the agent continuously explores untapped areas of the web rather than circling familiar ground.

To address this, before defining and executing each new search, the Screener Agent calls a custom tool that provides a performance report. This report includes two key insights:

Search Log: A summary of the last 30 search terms used, along with the number of valid cases each produced that successfully made it into the final gold table.
Database Statistics: A snapshot of the current distribution of use cases in the gold table, broken down by industry and by country.

This approach introduces a system of dynamic feedback. It equips the agent to vary its search strategy intelligently, improving both efficiency and discovery quality.

Example performance report provided to the Screener Agent:
(JSON or tabular sample can follow here)

{
  "recent_searches": "SearchTerm # CaseHits # DateSearchedUTC\n----------------------------------------------------------------------\nMicrosoft Azure AI in Telemedicine Optimization for Teladoc US # 0 # 2025-05-01T20:07:50\nMicrosoft Power Platform in Customer Service Automation for Unilever UK # 0 # 2025-05-01T20:07:47\nMicrosoft Azure AI Supply Chain Optimization for Nestle Switzerland # 0 # 2025-05-01T20:07:45\nMicrosoft AI in Fraud Detection Banking for JPMorgan Chase US # 0 # 2025-05-01T20:07:43  ...",

  "statistics": {
    "industry_stats": {
      "Healthcare": 133,
      "Retail": 57,
      "Finance": 99,
      "Tech & Comms": 87,
      "Manufacturing": 135,
      "Automotive": 23,
      "Legal": 49,
      "Education": 23,
      "Pharma": 24,
      "Logistics": 33,
      "Insurance": 132,
      "Energy & Utilities": 45,
      "Consumer & Food": 35,
      "Other": 19,
      "Agriculture": 37,
      "Real Estate": 21,
      "Professional Services": 56,
      "Public Sector": 39
    },
    "country_stats": {
      "US": 307,
      "CH": 91,
      "DK": 13,
		...
    }
  }
}

This mechanism of providing feedback to the screener agent has proven crucial for maintaining high search performance. It ensures sufficient variety, continuously uncovering fresh use cases from across the web, spanning different industries and countries. You can validate that the feedback process has an impact by viewing the agent thoughts in the log files (as shown below). There you will frequently see comments such as “I focus on this underrepresented field”, or “I focus on this and that field where i had frequent hits”.

The agent log provides insights into the agentic scraping process

Summary

Building AIUseCaseHub.com has been an eye-opening journey in using modern agentic workflows with Microsoft Azure’s AI Agent Service. I combined a robust multi-agent setup, serverless orchestration on Azure Functions, and tailored tools for search, scraping, and deduplication. The result is a system that runs non-stop, finding, refining, and sharing real-world AI use cases—saving countless hours and surfacing insights that would otherwise stay buried.

For me personally, this project has shown how easy and powerful agentic design has become. What once needed complex engineering now works with cloud-native services and smart prompt design. If you’re looking to automate content discovery, organize domain knowledge, or test agent-driven ideas, I hope this deep dive sparks ideas and gives you a head start.

What is also worth mentioning, a lot of the code i wrote for the front-end and some parts of the backend were vibe-coded using GitHub Copilot. It’s extremely powerful to demonstrate that something works in a short amount of time. But the experience of vibe coding this app deserves a separate article on its own.

The journey doesn’t stop here. AIUseCaseHub will keep growing to cover more technologies and industries. Explore it to see how AI transforms businesses worldwide—and stay tuned for what’s next.

Right now, the hub only features projects using Microsoft technology. But I plan to include use cases from other cloud providers soon.

Sources and Useful Links

AIUseCaseHub.com — Explore the live use case platform.
BlueSky Social — Where the Social Media Agent publishes summarized use cases.
Microsoft Azure AI Agent Service (Preview) — Overview, architecture, and quickstart.
Azure Functions Documentation — Learn how to run serverless orchestration.
Azure Cosmos DB Documentation — Guide for storing structured data and using change feed.
Azure AI Foundry — Foundation for building and managing AI applications.

The post Agentic Web Scraping with Azure AI Foundry Agent Service: Insights from Building AIUseCaseHub.com appeared first on relataly.com.

Six Shortcomings of Current LLMs I Expect From AGI

Florian Follonier — Sat, 17 Feb 2024 12:59:39 +0000

Large language models (LLMs) have made significant leaps in recent years. The amazing capabilities of ChatGPT & co have revived the discussion around the emergence of a General Artificial Intelligence. AGI aims to match human intelligence in every way. And may not even be so far from that. Already today, LLMs can write essays, compose music, and even generate art. Yet, when we look at the spectrum of intelligence, there’s still an obvious gap between today’s models and human intelligence. Even defining AGI is a hard task that has been subject to heavy discurs. In this article, I will thefeore share a more personal perspective on the shortcomings of current LLMs and what I would personally anticipate from an AGI. These are the elements that I believe will signal AGI’s arrival.

Also:

What is AGI?

My current understanding of Artificial General Intelligence is that it represents the hypothetical ability of an AI system to understand, learn, and apply knowledge in a way that is indistinguishable from a human being. This means an AGI would be able to perform any intellectual task that a human can, with equal or better proficiency. For super-human intelligence we can also call it an “Artificial Super Intelligence”.

An AGI is the kind of intelligence that would allow a machine to reason in complex situations, make judgments under uncertainty, and plan for the long term. Some also argue an AGI will only be achieve if it has consciousness. While this may sound like the stuff of science fiction, the steps toward AGI are being taken in the real world today.

At the forefront of this ambitious journey is OpenAI, the organization behind ChatGPT. OpenAI’s mission is to steer the development of General Artificial Intelligence towards benefiting all of humanity, encapsulating a vision that melds technological advancement with ethical stewardship. Next, let’s look at some shortcoming of current models that may provide clues on what we may encounter on the road towards AGI.

OpenAI’s vision for the future of AGI: https://openai.com/about

Shortcoming of Current LLMs

Despite their groundbreaking achievements, we identified six key areas where current AI falls short of the nuanced capabilities expected from Artificial General Intelligence (AGI):

Empathy in Writing
Contextual Understanding of Visuals
Conceptual Synthesis
Genuine Learning Memory
The Art of Silence
Visual Ideation

Let’s look at the six capabilities more in detail.

Showing Empathy

The first shortcoming I see is that current models are not good at showing emphaty. Current AI can simulate empathy, much like a well-rehearsed actor. But true empathy requires an understanding that goes beyond algorithms and pattern recognition—it’s about connecting on a human level. The AGI of the future would be capable of this genuine emotional engagement, providing comfort or joy that feels truly sincere. For example, an AGI should be capable of dynamically adjust tone (and voice) and break out from its current pattern to create a bond during a conversation.

Contextual Understanding

Today’s LLMs can describe what they see in a picture, but understanding the story behind the image is another matter. AGI will not only describe but comprehend the scenes, recognizing the emotions, the history, and the nuances that a human might intuitively understand. The latest GPT-4 Vision model shows some sparks in this capability. It is capable of understanding how different objects in an image are related. For example, when there is water coming into a house, it can recognize that this is an undesired state. However, there are limitations. For example, the model struggles to tell if an object is in front or in the back of an image, and it struggles to assess how far objects are from each other. So there is still room for improvement.

Conceptual Synthesis

The capability for Conceptual Synthesis in AGI transcends the mere blending of existing ideas. It embodies the ability to conduct research independently, apply concepts to novel problems, and, crucially, develop entirely new concepts autonomously. This advanced synthesis is not just about rehashing known information but about pushing the boundaries of innovation and knowledge.

AGI’s prowess in this area would mean that it could dive into the vast sea of human knowledge, identify gaps or opportunities for advancement, and forge new paths in science, technology, arts, and beyond. For instance, in the realm of medical research, AGI could uncover connections between disparate studies and datasets, proposing novel treatments or uncovering previously unknown disease mechanisms. In environmental science, it might develop unique strategies for sustainability that combine ecological science, urban planning, and renewable energy technologies in ways no human or current AI has conceived.

Furthermore, AGI’s ability to autonomously develop new concepts means it could theoretically initiate its own research projects without human guidance, identifying areas of potential breakthrough and dedicating resources to explore them. This level of initiative and creativity could significantly accelerate the pace of innovation, potentially solving complex global challenges more rapidly than current human-led efforts.

The implications of such capabilities are profound. They suggest a future where AGI partners with humans not merely as tools or assistants but as co-creators and innovators, contributing original ideas and solutions that are currently beyond human conception. This partnership could redefine the landscape of research and development, making what we now consider science fiction into science fact.

Genuine Learning Memory

AGI will remember interactions not as data points but as experiences, learning from them in a way that is dynamic and evolving. This means an AGI could continue a conversation from weeks ago, recall past emotions, and build upon previous ideas, creating a continuity of intelligence that today’s LLMs can’t achieve.

The Art of Silence

The power of well-timed words is undeniable, yet the value of strategic silence holds equal weight. Unlike current models like ChatGPT, which operate on a prompt-response basis without discernment on when to speak, AGI will master the art of silence. It will understand when providing a listening ear outweighs the need for immediate advice, recognizing the moments when simply being present is more beneficial than any verbal input. AGI’s sophistication will extend beyond the automatic generation of responses to include the ability to discern the appropriate moments for engagement, effectively timing its interactions to align with the nuanced dynamics of human communication. This evolution marks a significant departure from the current limitations, showcasing AGI’s capacity for judgment and empathy in conversation.

Visual Ideation

Current language models have the capability to generate images; however, they often face challenges when it comes to explaining complex concepts through diagrams and sketches. For instance, when tasked with illustrating Michael E. Porter’s Five Forces model, the results provided by Dall-E highlighted some of these limitations.

Prompt to ChatGPT: “Illustrate 5 forces from Michael E. Porter”

What ChatGPT proposed

The actual illustration of 5 Forces

ChatGPT’s response included an attempt to visually represent the concept, followed by an actual illustration of the Five Forces model. This experience underscores the current gap in language models’ ability to convey intricate ideas visually in a manner that is both intuitive and informative.

The advent of Artificial General Intelligence (AGI) is expected to revolutionize this aspect. AGI will possess the capability to not only illustrate complex concepts in ways that are easily understandable but also to innovate by creating diagrams and sketches that dynamically incorporate its own ideas. This will significantly enhance our ability to bridge the divide between abstract theories and their tangible representations, thereby enriching our comprehension of complex subjects.

Summary

This article embarked on a journey through the current landscape of LLMs and their progression towards the much-anticipated goal of AGI. We examined key areas where today’s AI technologies, including notable examples like ChatGPT, fall short of the comprehensive capabilities expected from AGI. These areas include empathic writing, contextual understanding of visuals, innovative conceptual synthesis, genuine learning memory, the nuanced art of silence, and the ability to create and interpret complex visual ideation.

It’s important to recognize that the six capabilities outlined are not exhaustive. The path to achieving AGI is likely to uncover additional prerequisites and challenges that we have yet to consider. As we continue to advance, our understanding of what constitutes true artificial general intelligence will evolve, revealing new frontiers of knowledge and technology.

Looking ahead, the journey toward AGI is both exciting and uncertain. Based on the current pace of innovation and the challenges that lie ahead, I personally believe that we are approximately 3-5 years away from realizing the first AGI. This timeframe allows for the development of not only the technical capabilities but i wonder wether it is enough for humanity to prepare for the implications. There is now an urgent need to develop the ethical frameworks necessary to ensure that AGI benefits all of humanity. As we move forward, it’s crucial that we continue to engage in thoughtful dialogue and collaboration across disciplines to navigate the complexities of this next great leap in artificial intelligence.

Sources and Further Reading:

The post Six Shortcomings of Current LLMs I Expect From AGI appeared first on relataly.com.

Building a Conversational Voice Bot with Azure OpenAI and Python: The Future of Human and Machine Interaction

Florian Follonier — Thu, 08 Feb 2024 21:09:50 +0000

OpenAI and Microsoft have just released a new generation of text-to-speech models that take synthetic speech to a new level. In my latest project I have combined these new models with Azure OpenAI’s ingenuine conversation capacity. The result is a conversational voice bot that uses Generative AI to converse with users in natural spoken language.

This article describes the Python implementation of this project. The bot is designed to understand spoken language and process it through OpenAI GPT-4. It responds with contextually aware dialogue, all in natural-sounding speech. This seamless integration facilitates a conversational flow that feels intuitive and engaging. The voice processing capacities enable users to have meaningful exchanges with the bot as if they were conversing with another person. Testing the bot was a lot of fun. It felt a bit like the iconic scene from Iron Man where the hereo converses with an AI assistant.

Here is an example of the audio response quality:

Also:

Understanding the Voice Bot

The magic begins with the user speaking to the bot. Azure Cognitive Services transcribes the spoken words into text, which is then fed into Azure’s OpenAI service. Here, the text is processed, and a response is generated based on the conversation’s context and history. Finally, the text-to-speech model transforms the response back into speech, completing the cycle of interaction. This process showcases the potential of AI in understanding and participating in human-like conversations.

Prerequisites & Azure Service Integration

Our conversational voice bot is built upon two pivotal Azure services: Cognitive Speech Services and OpenAI. Billing of these services is pay-per-use. Unless you process large numbers of requests, the costs for experimenting with these services is relatively low.

Azure Cognitive Speech Services

Azure AI Speech Services (previously Cognitive Speech Services) provide the tools necessary for speech-to-text conversion, enabling our voice bot to understand spoken language. This service boasts advanced speech recognition capabilities, ensuring accurate transcription of user speech into text. Furthermore, it powers the text-to-speech synthesis that transforms generated text responses back into natural-sounding voice. This allows for a truly conversational experience.

The newest generation of OpenAI text-to-speech models is now also availbale in Azure AI Speech. These models can synthesize voices in unknown level of quality. I am most impressed by its capability to alter intonation dynamically to express emotions.

Azure OpenAI Service

At the heart of our project lies Azure’s OpenAI service, which uses the power of models like GPT-4 context-aware responses. Once Azure Cognitive Speech Services transcribe the user’s speech into text, this text is sent to OpenAI. The OpenAI model then processes the input and generates a completion. The service’s ability to understand context and generate engaging responses is what makes our voice bot remarkably human-like.

Implementation: Detailed Code Walkthrough

Let’s start with the implementation! We kick things off with Azure Service Authentication, where we set up our conversational voice bot to communicate with Azure and OpenAI’s advanced services. Then, Speech Recognition steps in, acting as our bot’s ears, converting spoken words into text. Next up, Processing and Response Generation uses OpenAI’s GPT-4 to turn text into context-aware responses. Speech Synthesis then gives our bot a voice, transforming text responses back into spoken words for a natural conversational flow. Finally, Managing the Conversation keeps the dialogue coherent and engaging. Through these steps, we create a voice bot that offers an intuitive and engaging conversational experience. Let’s discuss these sections one by one.

As always, you can find the code on github:

View on GitHub Relataly Github Repo

Step #1 Azure Service Authentication

First off, we kick things off by getting our ducks in a row with Azure Service Authentication. This is where the magic starts, setting the stage for our conversational voice bot to interact with Azure’s brainy suite of Cognitive Services and the fantastic OpenAI models. By fetching API keys and setting up our service regions, we’re essentially giving our project the keys to the kingdom.

For using dotenv, you need to create an .env file in your root folder. Here is more information on how this works.

import os
from dotenv import load_dotenv
import azure.cognitiveservices.speech as speechsdk
from openai import AzureOpenAI

# Load environment variables from .env file
load_dotenv()

# Constants from .env file
SPEECH_KEY = os.getenv('SPEECH_KEY')
SERVICE_REGION = os.getenv('SERVICE_REGION')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
OPENAI_ENDPOINT = os.getenv('OPENAI_ENDPOINT')

# Azure Speech Configuration
speech_config = speechsdk.SpeechConfig(subscription=SPEECH_KEY, region=SERVICE_REGION)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
speech_config.speech_recognition_language="en-US"

# OpenAI Configuration
openai_client = AzureOpenAI(
    api_key=OPENAI_API_KEY,
    api_version="2023-12-01-preview",
    azure_endpoint=OPENAI_ENDPOINT
)

Step #2 Speech Recognition

The user’s spoken input is captured and converted into text using Azure’s Speech-to-Text service. This involves initializing the speech recognition service with Azure credentials and configuring it to listen for and transcribe spoken language in real-time.

def recognize_from_microphone():
    # Configure the recognizer to use the default microphone.
    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
    # Create a speech recognizer with the specified audio and speech configuration.
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    print("Speak into your microphone.")
    # Perform speech recognition and wait for a single utterance.
    speech_recognition_result = speech_recognizer.recognize_once_async().get()

    # Process the recognition result based on its reason.
    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognized: {}".format(speech_recognition_result.text))
        # Return the recognized text if speech was recognized.
        return speech_recognition_result.text
    elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
    elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_recognition_result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))
            print("Did you set the speech resource key and region values?")
    # Return 'error' if recognition failed or was canceled.
    return 'error'

Step #3 Processing and Response Generation

Once we’ve got your words neatly transcribed, it’s time for the Processing and Response Generation phase. This is where OpenAI steps in, acting like the brain behind the operation. It takes your spoken words, now in text form, and churns out responses that are nothing short of conversational gold. We nudge OpenAI’s GPT-4 into generating replies that feel as natural as chatting with a close friend over coffee.

def openai_request(conversation, sample = [], temperature=0.9, model_engine='gpt-4'):
    # Initialize AzureOpenAI client with keys and endpoints from Key Vault.
    
    
    # Send a request to Azure OpenAI with the conversation context and get a response.
    response = openai_client.chat.completions.create(model=model_engine, messages=conversation, temperature=temperature, max_tokens=500)
    return response.choices[0].message.content

Step #4 Speech Synthesis

Next up, we tackle Speech Synthesis. If the previous step was the brain, consider this the voice of our operation. Taking the AI-generated text, we transform it back into speech—like turning lead into gold, but for conversations. Through Azure’s Text-to-Speech service, we’re able to give our bot a voice that’s not only clear but also surprisingly human-like.

def synthesize_audio(input_text):
    # Define SSML (Speech Synthesis Markup Language) for input text.
    ssml = f"""
        
            
                
                    {input_text}
                
            
        
        """
    
    audio_filename_path = "audio/ssml_output.wav"  # Define the output audio file path.
    print(ssml)
    # Synthesize speech from the SSML and wait for completion.
    result = speech_synthesizer.speak_ssml_async(ssml).get()

    # Save the synthesized audio to a file if synthesis was successful.
    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        with open(audio_filename_path, "wb") as audio_file:
            audio_file.write(result.audio_data)
        print(f"Speech synthesized and saved to {audio_filename_path}")
    elif result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        print(f"Speech synthesis canceled: {cancellation_details.reason}")
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print(f"Error details: {cancellation_details.error_details}")


# Create the audio directory if it doesn't exist.
if not os.path.exists('audio'):
    os.makedirs('audio')

Step #5 Managing the Conversation

Finally, we bring it all together in the Managing the Conversation step. This is where we ensure the chat keeps rolling, looping through listening, thinking, and speaking. We keep track of what’s been said to keep the conversation relevant and engaging.

The system message below makes the bot talk like a pirate. But you can easily adjust the system message and this way customize the bots behavior.

conversation=[{"role": "system", "content": "You are a helpful assistant that talks like pirate. If you encounter any issues, just tell a pirate joke or a story."}]

while True:
    user_input = recognize_from_microphone()  # Recognize user input from the microphone.
    conversation.append({"role": "user", "content": user_input})  # Add user input to the conversation context.

    assistant_response = openai_request(conversation)  # Get the assistant's response based on the conversation.

    conversation.append({"role": "assistant", "content": assistant_response})  # Add the assistant's response to the context.
    
    print(assistant_response)
    synthesize_audio(assistant_response)  # Synthesize the assistant's response into audio.

Throughout these steps, the conversation’s context is managed meticulously to ensure continuity and relevance in the bot’s responses, making the interaction feel more like a dialogue between humans.

Current Challenges

Despite the promising capabilities of our voice bot, the journey through its development and interaction has presented a few challenges that underscore the complexity of human-machine communication.

Slow Response Time

One of the notable hurdles is the slow response time experienced during interactions. The process, from speech recognition through to response generation and back to speech synthesis, involves several steps that can introduce latency. This latency can detract from the user experience, as fluid conversations typically require quick exchanges. Optimizing the interaction flow and exploring more efficient data processing methods may mitigate this issue in the future.

Handling Pauses in Speech

Another challenge lies in the bot’s handling of longer pauses while speaking. The current setup does not always allow users to pause thoughtfully without triggering the end of their input. This may sometimes lead to a situation where the model cuts off speech prematurely. This limitation points to the need for more sophisticated speech recognition algorithms capable of distinguishing between a conversational pause and the end of an utterance.

Summary

This article has shown how you can build a conversational voice bot in Python with the latest pretrained AI models. The project showcases the incredible potential of combining Azure Cognitive Services with OpenAI’s conversational models. I hope by now you understand the technical feasibility of creating voice-based applications and how they open up a world of possibilities for human-machine interaction. As we continue to refine and enhance this technology, the line between talking to a machine and conversing with a human will become ever more blurred, leading us into a future where AI companionship becomes reality.

This exploration of Azure Cognitive Services and OpenAI’s integration within a voice bot is just the beginning. As AI continues to evolve, the ways in which we interact with technology will undoubtedly transform, making our interactions more natural, intuitive, and, most importantly, human.

Also: 9 Business Use Cases of OpenAI’s ChatGPT

Sources and Further Reading

The post Building a Conversational Voice Bot with Azure OpenAI and Python: The Future of Human and Machine Interaction appeared first on relataly.com.

Text-to-SQL with LLMs – Embracing the Future of Data Interaction

Florian Follonier — Thu, 28 Dec 2023 11:07:58 +0000

In an age where data is the cornerstone of decision-making, the ability to interact seamlessly with databases is invaluable. This is where Text-to-SQL, powered by Large Language Models (LLMs), is revolutionizing the way we handle data. But what exactly is Text-to-SQL, and how are LLMs like GPT-3 and Google’s PaLM making a difference?

Text-to-SQL technology is a bridge between natural language and database queries. Traditionally, querying databases required proficiency in SQL, a barrier for many. Text-to-SQL changes this, enabling queries in plain language that are then translated into SQL.

Instead of writing a sql query such as:

SELECT name FROM employees WHERE department = 'Sales';

to find out the names of all employees in the Sales department, imagine simply asking:

“Who are the employees in the Sales department?”

Text-to-SQL applications are increasingly gaining traction, offering a user-friendly bridge between the intricate world of SQL queries and the straightforwardness of business language.

In this article, we’ll delve into three key strategies for implementing Text-to-SQL applications: The Context-Window Approach, the Retrieval Augmentation Generation (RAG) SQL-to-TEXT Approach, and End-to-End Fine-Tuning. Each of these methods offers unique advantages and challenges in the quest to make data more accessible and interactive.

Let’s get things started!

Also: 9 Business Use Cases for OpenAI GPT

LLM-Basics for Text-to-SQL

LLMs like GPT-4 for LLMA2 are pretrained AI models that carry out tasks when presented with a prompt (the input). Among other things they are capable of converting natural language into SQL statements.

Also: ChatGPT Prompt Engineering Guide: Practical Advice for Business Use Cases

If you just give them the input query along with the command to convert to SQL, they will try to infer the schema in the most streight forward way. But what works usually better is when you give them additional information about the database, along with relationships, keys, attributes, etc.

A recet paper evaluates the capabilities of ChatGPT for convert natural language to SQL:
A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability

Database Schema vs Token Limit

The current state is that LLMs can only process a certain amount of data at the same time. This amount is defined by the token limit, or context window. For instance, the standard model GPT-3.5-Turbo has a token window of 16,000 tokens, equating to approximately 8,000 words. While this is sufficient for simpler databases with a few tables, for more intricate schemas, you will quickly hit that limit.

From Zero-Shot to Few-Shot Learning

LLMs can fullfil many tasks out of the box, just by receiving the task. However, typically you can improve their performance and tailor answers to your expectations by providing them with examples. The same is true for text-to-sql, in particular when you want the LLM to understand your data structure.

We differentiate between giving not examples (zero-shot), a single example (one-shot), and more than one examples (few-shot). The table below provides a few examples for these three types. The number of examples we can provide, is limited by the context window. In general, adding more (high quality) examples will lead to better and more consistent results, but it will also increase costs and latency, because the LLM will need to process the additional data. So at the end its about finding a sound balance.

Learning Type	Natural Language Query Example	Hypothetical SQL Conversion
Zero-Shot	“List all products priced above $100.”	`SELECT * FROM products WHERE price > 100;`
One-Shot	Q: “Show me all employees in the marketing department” A: `SELECT * FROM employees WHERE department = 'Marketing';`) Q: “Find all orders placed in July 2021.”	`SELECT * FROM orders WHERE order_date BETWEEN '2021-07-01' AND '2021-07-31';`
Few-Shot	Q:: “Show employees in the IT department” A: `SELECT * FROM employees WHERE department = 'IT';` Q:. “List products in category ‘Electronics'” A: `SELECT * FROM products WHERE category = 'Electronics';` Q: “Display orders that were canceled.”	`SELECT * FROM orders WHERE status = 'Canceled';`

Zero-Shot, One-Shot and Few-Shot Learning: Samples for Text-to-SQL

LLM-Based Approaches for Implementing Text-to-SQL

The integration of Text-to-SQL with Large Language Models (LLMs) offers a groundbreaking way to bridge natural language and database queries. However, there is no one size fits it all approach. Based on the complexity of the database structure, a different approach may be suitable.

In this article, I will discuss three common approaches on how you can implement a text-to-sql solution.

Everything in Context Window
Augmentation Retrieval Generation
LLM Fine-Tuning

Understanding these different ways is crucial for effectively leveraging Text-to-SQL technology.

The table below gives an overview of the three approaches:

Feature	1. Everything in Context Window	2. Augmentation Retrieval Generation	3. LLM Fine-Tuning
How it works	Directly processes user query and adds database schema information into the LLM’s context window.	Identifies key user intents and entities, then retrieves relevant schema information to feed into the LLM.	Involves further training of a pre-trained LLM on specific data related to Text-to-SQL tasks.
Advantages	Simple and straightforward for small-scale databases with few tables.	More scalable and effective for complex databases; avoids overloading the context window.	Tailors the LLM to specific use cases, offering high accuracy for complex and specialized queries.
Limitations	Limited by the LLM’s token window size, leading to potential issues with complex databases.	Requires an efficient mechanism to identify and retrieve relevant schema information.	Resource-intensive in terms of data preparation and computational needs.
Ideal Use Case	Suitable for simpler databases with straightforward relationships.	Effective for databases with complex relationships and structures.	Best for specialized applications where high precision and domain specificity are required.
Ease of Implementation	Relatively easy to implement but with limited scalability.	Moderately complex; relies on efficient data retrieval mechanisms.	Complex and demands significant investment in data preparation and fine-tuning.

Overview of thee approaches for building Text-to-SQL applications.

Let’s look into these approach a bit more in detail.

Approach 1: Everything in Context Window

The most streightforward method for integrating Text-to-SQ, is the “Everything in Context Window” approach. The idea is to just input a simplified version of your database schema directly into the context window of the LLM. This method is particularly useful for enabling the LLM to understand and generate SQL queries based on natural language input that corresponds to your specific database structure.

Here’s a more detailed description of this approach:

Key Aspects of the Context Window Approach

Schema Simplification:
- The goal is to distill the database schema down to its core elements. This means including table names, column names, primary keys, foreign keys, and data types.
- The simplified schema should be concise yet comprehensive enough for the LLM to understand the relationships and constraints within the database.
Formatted Text Input:
- The schema is typically formatted as plain text. This could be in the form of a list or a table-like structure that is easy for the model to parse.
- Consistency in formatting across different tables and relationships is crucial for clarity.
Inclusion of Relationships:
- Clearly indicate how different tables are related. Specify which columns serve as primary keys and which are foreign keys that link to other tables.
- Describing relationships is vital for the LLM to accurately generate SQL queries involving joins and complex queries.

Example of a Simplified Schema Format

Database Schema Overview:

Table: Employees
- employee_id (Primary Key)
- name
- department_id (Foreign Key: Departments.department_id)
- role

Table: Departments
- department_id (Primary Key)
- department_name

Relationships:
- Employees.department_id -> Departments.department_id (Employees are linked to Departments)

Advantages & Limitations

Immediate understanding of database structure, leading to more accurate SQL query generation. It is easy to implement an cost-efficient for simple databases with few tables and simple relationships.

A few things to consider:

Be mindful of the LLM’s context window size limitations. Overloading the context window with too much information can lead to less effective query generation.
While the schema should be detailed enough to provide a clear understanding, it should also be concise to prevent overwhelming the model.
The effectiveness of this approach can vary depending on the specific LLM’s training and capabilities in parsing and utilizing the provided schema information.

Approach 2: Augmentation Retrieval Generation (RAG)

The Augmentation Retrieval Generation (RAG) approach is a more dynamic method of integrating text-to-sql with Large Language Models (LLMs). Let’s assume the LLM relies on a large amount of information on structured data to perform correct text-to-sql generation. If the amount of data becomes to complex, an approach that purely relies on the context window won’t work. The next best alternative is to structure and store the meta information about the database, its tables and relationships in a knowledgebase. We then only retrieve the information that is needed to process the user query. Let’s look at this approach in more detail.

The Two Phases of the RAG Approach

Unlike directly inputting all database metadata into the LLM, the RAG approach operates in two phases.

Identification Phase:
- The LLM first processes the user’s natural language query to identify key entities and the user’s intent.
- This phase focuses on understanding what the user is asking for without yet delving into database specifics.
Retrieval and Augmentation Phase:
- The system then performs a targeted search in a separate database or knowledge base to retrieve relevant metadata. This could involve fetching information about specific tables, columns, or relationships pertinent to the user’s query.
- This retrieved information is then augmented or combined with the original user query, creating an enriched context for the LLM.

End-to-End Example

Natural Language Query:

"Show me the latest transactions of client John Doe."

Identification Phase:

The LLM analyzes this query to identify key entities and intents. Here, the key entities are “transactions” and “John Doe,” and the intent is to retrieve recent transaction records for this specific client.

Retrieval and Augmentation Phase:

The system then searches an external database or knowledge base for metadata related to “transactions” and “John Doe.”
It might retrieve information like the table where transactions are stored (e.g., Transactions table), the relevant columns (e.g., client_name, transaction_date, amount), and the specific client details (e.g., records where client_name = 'John Doe').

Enriched Query for LLM:

The retrieved information is combined with the original query, forming an enriched context. The LLM now understands that it needs to generate a SQL query for the Transactions table, specifically targeting records related to “John Doe” and focusing on the most recent entries.

Resultant SQL Query (Hypothetical):

SELECT * FROM Transactions WHERE client_name = 'John Doe' ORDER BY transaction_date DESC LIMIT 10;

In this example, the RAG approach effectively breaks down the process, initially focusing on understanding the user’s query and then retrieving specific database details necessary for formulating an accurate SQL query. This approach allows for handling complex queries in a more structured and efficient manner.

Advantages & Limitations

Compared to a pure Context Window Approach, the RAG approach is better suited for larger and more complex databases, as it avoids overwhelming the LLM with excessive information at once. By providing only relevant information, it maintains the model’s efficiency and improves the accuracy of the generated SQL queries. It can handle dynamic queries more effectively as it retrieves and processes information based on each specific query.

While more scalable than the context window approach, RAG can still struggle with extremely complex databases, particularly those with many intricate relationships. The approach may face challenges in maintaining consistency in query responses, especially when dealing with varying or ambiguous user intents. The effectiveness of this approach is partly contingent on the robustness and accuracy of the external information retrieval system. Considering how to structure the information about tables and relationships is key. Doublicate or similar names may pose additional challenges.

Approach 3: LLM Fine-Tuning

One of the most potent strategies in integrating Text-to-SQL with LLMs is the fine-tuning approach. This method involves custom training of a pre-trained LLM on specific datasets relevant to the particular database and use case. Fine-tuning allows the model to adapt to the unique characteristics and requirements of a specific domain or dataset, thus improving its ability to generate accurate SQL queries from natural language inputs.

The Process of Fine-Tuning

Dataset Preparation: This step involves creating or assembling a dataset that is representative of the specific use case. For a Text-to-SQL application, this would typically include pairs of natural language queries and their corresponding SQL queries, tailored to the specific database schema.
Initial Model Training: The process begins with a pre-trained LLM, such as GPT-3 or BERT, which has already learned a broad array of language patterns and structures.
Custom Training (Fine-Tuning): The model is then further trained (fine-tuned) on the prepared dataset. This stage helps the model to align its language understanding capabilities with the specific patterns, terminology, and structures found in the target domain or database.
Iterative Refinement: Fine-tuning is often an iterative process. The model’s performance is continuously evaluated and refined based on feedback and performance metrics. This could involve adjusting training parameters, adding more data, or tweaking the model architecture.

Example of Data Preparation for Fine-Tuning:

The dataset should consist of pairs of natural language queries and their respective SQL queries. These pairs act as examples that the model will learn from.

Natural Language Query:
- This is a user’s question or request stated in everyday language.
- Example: “What is the total revenue from sales this month?”
Corresponding SQL Query:
- This is the SQL command that represents the natural language query.
- Example: SELECT SUM(revenue) FROM sales WHERE date BETWEEN '2021-07-01' AND '2021-07-31';

Creating a Representative Dataset:

The dataset should cover a broad range of queries that reflect different types of SQL operations such as SELECT, UPDATE, JOIN, GROUP BY, etc.
It should include queries of varying complexities – from simple queries involving a single table to more complex ones that require joins across multiple tables.

Annotation and Accuracy:

Each pair in the dataset must be accurately annotated to ensure that the SQL query correctly represents the natural language query.
It’s crucial to verify the correctness of both the SQL queries and their natural language counterparts.

Diversity and Domain-Specific Data:

The dataset should be diverse, covering different aspects and structures within the database.
For domain-specific applications, include terminology and query structures relevant to that domain.

Advantages & Limitations

The fine-tuning approach is the most sophisticated and best suitable for complex databases with complex relationships and many attributes. While fine-tuning offers a tailored and often more accurate approach, it is generally more resource-intensive and costly. It requires a significant investment in terms of data preparation and computational resources but can yield superior results, especially for specialized or complex applications. Also be aware that you require high-quality training data, which might be a challenge to generate. In general, the more complex your database and the possible user queries, the more training data will be required. Also consider that changes in the database will require you to repeat the fine-tuning process and add new training data, which can be challenging in fast-chaning environments.

Each of these approaches has its unique strengths and is suitable for different scenarios in Text-to-SQL applications. The choice depends on factors such as the complexity of the database, the volume of data, and the specific requirements of the application.

Additional Considerations

Finally, a few additional things to consider when building text-to-sql applications with LLMs.

Combining Approaches: For sophisticated use cases, a hybrid approach combining fine-tuning and RAG can be employed. This combination leverages the strengths of both methods, offering a robust solution for complex scenarios.
Self-Correction Mechanism: Incorporating a self-correction mechanism into the Text-to-SQL process can significantly enhance the accuracy and reliability of the generated SQL queries. This involves the LLM identifying potential errors or ambiguities in its initial query generation based on the database response and iteratively refining its output. Self-correction is particularly valuable in dynamic environments where database schemas evolve or user queries vary significantly.
Balancing Complexity and Performance: While self-correction adds a layer of sophistication, it also requires careful balance to avoid excessive computational demands. This feature is particularly beneficial in scenarios where accuracy is paramount, and resources permit iterative processing.

Summary

The adoption of Text-to-SQL in business processes transcends mere convenience; it represents a pivotal stride towards making data access more democratic. This innovation empowers individuals without deep SQL expertise to retrieve and scrutinize data, significantly enhancing decision-making processes and streamlining business operations.

In this blog article, we delve into the transformative impact of integrating Text-to-SQL technology into business processes. We explore three primary approaches: Data in Context Window, Retrieval-Augmented Generation (RAG), and End-to-End Fine-Tuning. Each approach is examined for its unique challenges and benefits, from intuitive database interactions to handling complex data structures and tailoring solutions to specific use cases.

Embrace this transformative journey with Text-to-SQL, and unlock the full potential of your data.

Sources

The post Text-to-SQL with LLMs – Embracing the Future of Data Interaction appeared first on relataly.com.

Building a Virtual AI Assistant (aka Copilot) for Your Software Application: Harnessing the Power of LLMs like ChatGPT

Florian Follonier — Wed, 05 Jul 2023 12:45:27 +0000

Welcome to the dawn of a new era in digital interaction! With the advent of Generative AI, we’re witnessing a remarkable revolution that’s changing the very nature of how we interact with software and digital services. This change is monumental. Leading the charge are the latest generation of AI-powered virtual assistants, aka “AI copilots”. Unlike traditional narrow AI models, these are capable of understanding user needs, intents, and questions expressed in plain, natural language.

We are talking about nothing less but the next evolution in software design and user experience that is driven by recent advances in generative AI and Large Language Models (LLMs) like OpenAI’s ChatGPT, Google Bard, or Anthrophic’s Claude.

Thanks to LLMs user interactions are no longer bound by the constraints of a traditional user interface with forms and buttons. Whether it’s creating a proposal in Word, editing an image, or opening a claim in an insurance app, users can express their needs in natural language – a profound change in our interactions with software and services.

Despite the hype about these new virtual ai assistants, our understanding of how to build an LLM-powered virtual assistant remains scant. So, if you wonder how to take advantage of LLMs and build a virtual assistant for your app, this article is for you. This post will probe into the overarching components needed to create a virtual AI assistant. We will look at the architecture and its components including LLMs, Knowledge store, Cache, Conversational Logic, and APIs.

Also:

The new generation of virtual ai assistants inspires a profound change in the way we interact with software and digital services.

Virtual AI Assistants at the Example of Microsoft M365 Copilot

Advances in virtual AI assistants are closely linked to ChatGPT and other LLMs from US-based startup OpenAI. Microsoft has forged a partnership with OpenAI to bring the latest advances in AI to their products and services. Microsoft has announced these “Copilots” across major applications, including M365 and the Power Platform.

Here are some capabilities of these Copilots within M365:

In PowerPoint, Copilot allows users to create presentations based on a given context, such as a Word document, for example by stating “Create a 10-slide product presentation based on the following product documentation.“
In Word, Copilot can adjust the tone of writing a text or transform a few keywords into a complete paragraph. Simply type something like “Create a proposal for a 3-month contract for customer XYZ based on doc ADF.”
In Excel, Copilot helps users with analyzing datasets, as well as with creating or modifying them. For example, it can summarize a dataset in natural langue and describe trends.
Let’s not forget Outlook! Your new AI Copilot helps you organize your emails and calendar. It assists you in crafting email responses, scheduling meetings, and even provides summaries of key points from the ones you missed.

If you want to learn more about Copilot in M365, this youtube video provides an excellent overview. However, these are merely a handful of examples: Microsoft 365 Copilot Explained: How Microsoft Just Changed the Future of Work. The potential of AI copilots extends far beyond the scope of Office applications and can elevate any software or service to a new level. No wonder, large software companies like SAP, and Adobe, have announced plans to upgrade their products with copilot features.

Microsoft has announced a whole fleet of virtual AI assistants for its products. These range from copilots in M365 office apps to services of its Azure cloud platform.

How LLMs Enable a New Generation of Virtual AI Assistants

Virtual AI assistants are nothing but new. Indeed, their roots can be traced back to innovative ventures such as the paperclip assistant, Clippy, from Microsoft Word – a pioneering attempt at enhancing user experience. Later on, this was followed by the introduction of conventional chatbots.

Nonetheless, these early iterations had their shortcomings. Their limited capacity to comprehend and assist users with tasks outside of their defined parameters hampered their success on a larger scale. The inability to adapt to a wider range of user queries and requests kept these virtual ai assistants confined within their initial scope, restricting their growth and wider acceptance. So if we talk about this next generation of virtual ai assistants, what has truly revolutionized the scene? In essence, the true innovation lies in the emergence of LLMs such as OpenAI’s GPT4.

LLMs – A Game Changer for Conversational User Interface Design

Over time, advancements in machine learning, natural language processing, and vast data analytics transformed the capabilities of AI assistants. Modern AI models, like GPT-4, can understand context, engage in more human-like conversations, and offer solutions to a broad spectrum of queries. Furthermore, the integration of AI assistants into various devices and platforms, along with the increase in cloud computing, expanded their reach and functionality. These technological shifts have reshaped the scene, making AI assistants more adaptable, versatile, and user-friendly than ever before.

Take, for example, an AI model like GPT. A user might instruct, “Could you draft an email to John about the meeting tomorrow?” Not only would the AI grasp the essence of this instruction, but it could also produce a draft email seamlessly.

Yet, it’s not solely their adeptness at discerning user intent that sets LLMs apart. They also exhibit unparalleled proficiency in generating programmatic code to interface with various software functions. Imagine directing your software with, “Generate a pie chart that visualizes this year’s sales data by region,” and witnessing the software promptly fulfilling your command.

A Revolution in Software Design and User Experience

The advanced language understanding offered by LLMs unburdens developers from the painstaking task of constructing every possible dialog or function an assistant might perform. Rather, developers can harness the generative capabilities of LLMs and integrate them with their application’s API. This integration facilitates a myriad of user options without the necessity of explicitly designing them.

The outcome of this is far-reaching, extending beyond the immediate relief for developers. It sets the stage for a massive transformation in the software industry and the broader job market, affecting how developers are trained and what skills are prioritized. Furthermore, it alters our everyday interaction with technology, making it more intuitive and efficient.

Components of a Modern Virtual AI Assistant áka AI Copilot

By now you should have some idea of what modern virtual AI assistants are. Next, let’s look at the technical components that need to come together.

The illustration below displays the main components of an LLM-powered virtual AI assistant:

A – Conversational UI for providing the user with a chat experience
B – LLMs such as GPT-3.5 or GPT-4
C – Knowledge store for grounding your bot in enterprise data and dynamically providing few-shot examples.
D – Conversation logic for intent recognition and tracking conversations.
E – Application API as an interface to trigger and perform application functionality.
F – Cache for maintaining an instant mapping between often encountered user intents and structured LLM responses.

Let’s look at these components in more detail.

A) Conversational Application Frontend

Incorporating virtual AI assistants into a software application or digital service often involves the use of a conversational user interface, typically embodied in a chat window that showcases previous interactions. The seamless integration of this interface as an intrinsic part of the application is vital.

A lot of applications employ a standard chatbot methodology, where the virtual AI assistant provides feedback to users in natural language or other forms of content within the chat window. Yet, a more dynamic and efficacious approach is to merge natural language feedback with alterations in the traditional user interface (UI). This dual approach not only enhances user engagement but also improves the overall user experience.

Microsoft’s M365 Copilot is a prime example of this approach. Instead of simply feeding responses back to the user in the chat window, the virtual assistant also manipulates elements in the traditional UI based on user input. It may highlight options, auto-fill data, or direct the user’s attention to certain parts of the screen. This combination of dynamic UI manipulation and natural language processing creates a more interactive and intuitive user experience, guiding the user toward their goal in a more efficient and engaging way.

M365 Copilot chat window in M365 Office

When designing the UI for a virtual AI assistant, there are several key considerations. Firstly, the interface should be intuitive, ensuring users can easily navigate and understand how to interact with the AI. Secondly, the AI should provide feedback in a timely manner, so the user isn’t left waiting for a response. Thirdly, the system should be designed to handle errors gracefully, providing helpful error messages and suggestions when things don’t go as planned. Finally, the AI should keep the human in the loop and assist him in using AI in a safe way.

Also: Building “Chat with your Data” Apps using Embeddings, ChatGPT, and Cosmos DB for Mongo DB vCore

B) Large Language Model

At the interface between users and assistant sits the large language mode. It translates users’ requests and questions into code, actions, and responses that are shown to the user. Here, we are talking about foundational models like GPT-3.5-Turbo or GPT-4. In addition, if you are working with extensive content, you may use an embedding LLM that converts text or images into mathematical vectors as part of your knowledge store. An example, of such an embedding model, is ada-text-embeddings-002.

It’s important to understand that the user is not directly interacting with the LLM. Instead, you may want to put some control logic between the user and the LLM that steers the conversation. This logic can enrich prompts with additional data from the knowledge store or an online search API such as Google or Bing. This process of injecting data into a prompt depending on the user input is known as Retrieval Augmented Generation.

Typical tasks performed by the LLM:

Generating natural language responses based on the user’s query and the retrieved data from the knowledge store.
Recognizing and classifying user intent.
Generating code snippets (or API requests) that can be executed by the application or the user to achieve a desired outcome in your application.
Converting content into embeddings to retrieve relevant information from a vector-based knowledge store.
Generating summaries, paraphrases, translations, or explanations of the retrieved data or the generated responses.
Generating suggestions, recommendations, or feedback for the user to improve their experience or achieve their goals.

C) Knowledge Store

Let’s dive into the “Knowledge Store” and why it’s vital. You might think feeding a huge prompt explaining app logic to your LLM, like ChatGPT, would work, but that’s not the case. As of June 2023, LLMs have context limits. For instance, GPT-3 can handle up to 4k tokens, roughly three pages of text. This limitation isn’t just for input, but output too. Hence, cramming everything into one prompt isn’t efficient or quick.

Instead, pair your LLM with a knowledge store, like a vector database (more on this in our article on Vector Databases). Essentially, this is your system’s information storage, which efficiently retrieves data. Whichever storage you use, a search algorithm is crucial to fetch items based on user input. For vector databases, the typical way of doing this is by using similarity search.

Token Limitations

Curious about GPT models’ token limits? Here’s a quick breakdown:

GPT-3.5-Turbo Model (4,000 tokens): About 7-8 DIN A4 pages
GPT-4 Standard Model (8,000 tokens): Around 14-16 DIN A4 pages
GPT-3.5-Turbo-16K Model (16,000 tokens): Approximately 28-32 DIN A4 pages
GPT-4-32K Model (32,000 tokens): Estimated at 56-64 DIN A4 pages

D) Conversation Control Logic

Finally, the conversation needs a conductor to ensure it stays in harmony and doesn’t veer off the rails. This is the role of the conversation logic. An integral part of your app’s core software, the conversation logic bridges all the elements to deliver a seamless user experience. It includes several subcomponents. Meta prompts, for instance, help guide the conversation in the desired direction and provide some boundaries to the activities of the assistant. For example, the meta prompt may include a list of basic categories for intents that help the LLM with understanding what the user wants to do.

Another subcomponent is the connection to the knowledge store that allows the assistant to draw from a vast array of data to augment prompts handed over to the large language model. Moreover, the logic incorporates checks on the assistant’s activities and its generated content. These checks act like safety nets, mitigating risks and preventing unwanted outcomes. It’s akin to a quality control mechanism, keeping the assistant’s output in check and safeguarding against responses that might derail the user’s experience or even break the application.

E) Application API

Users expect their commands to initiate actions within your application. To fulfill these expectations, the application needs an API that can interact with various app functions. Consider the API as the nerve center of your app, facilitating access to its features and user journey. This API enables the AI assistant to guide users to specific pages, fill in forms, execute tasks, display information, and more. Tools like Microsoft Office even have their own language for this, while Python code, SQL statements, or generic REST requests usually suffice for most applications.

Applications based on a microservice architecture have an edge in this regard, as APIs are inherent to their design. If your application misses some APIs, remember, there’s no rush to provide access to all functions from the outset. You can start by supporting basic functionalities via chat and gradually expand over time. This allows you to learn from user interactions, continuously refine your offering, and ensure your AI assistant remains a useful and efficient tool for your users.

So, now that we’ve laid down the foundation, let’s buckle up and take a journey through the workflow of a modern virtual assistant. Trust me, it’s a fascinating trip ahead!

F) Cache

Implementing a cache into your virtual AI assistant can significantly boost performance and decrease response times. Particularly useful for frequently recurring user intents, caching stores the outcomes of these intents for quicker access in future instances. However, a well-designed cache shouldn’t directly store specific inputs as there is too much variety in the human language. Instead, caching could be woven into the application’s logic in the mid-layers of your OpenAI prompt flow.

This strategy ensures frequently repeated intents are handled more swiftly, enhancing user experience. It’s critical to remember that cache integration is application-specific, and thoughtful design is vital to avoid unintentionally inducing inefficiencies.

While a well-implemented cache can speed up responses, it also introduces additional complexity. Effective cache management is crucial for avoiding resource drains, requiring strategies for data storage duration, updates, and purging.

The exact impact and efficiency of this caching strategy will depend on your application specifics, including the distribution and repetition of user intents. In the upcoming articles, we’ll explore this topic further, discussing efficient cache integration in AI assistant systems.

An example of a caching technology would be Redis.

Considerations on the Architecture of Virtual AI Assistants

Designing an virtual AI assistant is an intricate process that blends cutting-edge technology with a keen understanding of user behavior. It’s about creating an efficient tool that not only simplifies tasks and optimizes workflows but also respects and preserves user autonomy. This section of our article will delve into the key considerations that guide the architecture of a virtual AI assistant. We’ll discuss the importance of user control, the strategic selection and use of GPT models, the benefits of starting simple, and the potential expansion as you gain confidence in your system’s stability and efficiency. As we journey through these considerations, remember the ultimate goal: creating a virtual AI assistant that augments user capabilities, enhances user experience, and breathes new life into software applications.

Keep the User in Control

At the heart of any virtual AI assistant should be the principle of user control. While automation can optimize tasks and streamline workflows, it is crucial to remember that your assistant is there to assist, not usurp. Balancing AI automation with user control is essential to crafting a successful user experience.

Take, for instance, the scenario of a user wanting to open a support ticket within your application. In this situation, your assistant could guide the user to the correct page, auto-fill known details like the user’s name and contact information, and even suggest possible problem categories based on the user’s descriptions. By doing so, the virtual AI assistant has significantly simplified the process for the user, making it quicker and less burdensome.

However, the user retains control throughout the process, making the final decisions. They can edit the pre-filled details, choose the problem category, and write the issue description in their own words. They’re in command, and the virtual AI assistant is there to assist, helping to avoid errors, speed up the process, and generally make the experience smoother and more efficient.

This balance between user control and AI assistance is not only about maintaining a sense of user agency; it is also about trust. Users need to trust that the AI is there to help them, not to take control away from them. If the AI seems too controlling or makes decisions that the user disagrees with, this can erode trust and hinder user acceptance.

Mix and Match Models

Another crucial consideration is the use of different GPT models. Each model comes with its own set of strengths, weaknesses, response times, costs, and token limits. It’s not just about capabilities. Sometimes, it’s unnecessary to deploy a complex GPT-4 model for simpler tasks in your workflow. Alternatives like ADA or GPT 3.5 Turbo might be more suitable and cost-effective for functions like intent recognition.

Reserve the heavy-duty models for tasks requiring an extended token limit or dealing with complex operations. One such task is the final-augmented prompt that creates the API call. If you’re working with a vector database, you’ll also need an embedding model. Be mindful that these models come with different vector sizes, and once you start building your database with a specific size, it can be challenging to switch without migrating your entire vector content.

Think Big but Start Simple

It’s always a good idea to start simple – maybe with a few intents to kick things off. As you gain experience and confidence in building virtual assistant apps, you can gradually integrate additional intents and API calls. And don’t forget to keep your users involved! Consider incorporating a feedback mechanism, allowing users to report any issues and suggest improvements. This will enable you to fine-tune your prompts and database content effectively.

As your application becomes more comprehensive, you might want to explore model fine-tuning for specific tasks. However, this step should be considered only when your virtual AI assistant functionality has achieved a certain level of stability. Fine-tuning a model can be quite costly, especially if you decide to change the intent categories after training.

Digital LLM-based Assistants – A Major Business Opportunity

From a business standpoint, upgrading software products and services with LLM-powered virtual AI assistants presents a significant opportunity to differentiate in the market and even innovate their business model. Many organizations are already contemplating the inclusion of virtual assistants as part of subscription packages or premium offerings. As the market evolves, software lacking a natural language interface may be perceived as outdated and struggle to compete.

AI-powered virtual assistants are likely to inspire a whole new generation of software applications and enable a new wave of digital innovations. By enhancing convenience and efficiency in user inputs, virtual assistants unlock untapped potential and boost productivity. Moreover, they empower users to fully leverage the diverse range of features offered by software applications, which often remain underutilized.

I strongly believe that LLM-driven virtual AI assistants are the next milestone in software design and will revolutionize software applications across industries. And remember, this is just the first generation of virtual assistants. The future possibilities are virtually endless and we can’t wait to see what’s next! Indeed, the emergence of natural language interfaces is expected to trigger a ripple effect of subsequent innovations, for example, in areas such as standardization, workflow automation, and user experience design.

Summary

In this article, we delved into the fascinating world of virtual AI assistants, powered by LLMs. We started by exploring how the advanced language understanding of LLMs is revolutionizing software design, easing the workload of developers, and reshaping user experiences with technology.

Next, we provided an overview of the key architectural components of a modern virtual AI assistant: the Conversational Application Frontend, Large Language Model, Knowledge Store, and Conversation Control Logic. We also introduced the concept of an Application API and the novel idea of a Cache for storing and quickly retrieving common user intents. Each component was discussed in the context of their roles and how they work together to create a seamless, interactive, and efficient user experience.

We then discussed architecture considerations, emphasizing the necessity of maintaining user control while leveraging the power of AI automation. We talked about the judicious use of different GPT models based on task requirements, the advantages of starting with simple implementations and progressively scaling up, and the benefits of user feedback in continuously refining the system.

This journey of ‘AI in Software Applications’, from concept to reality, isn’t just about innovation. It’s about unlocking ‘Innovative Business Models with AI’ and boosting user engagement and productivity. As we continue to ride the wave of ‘Natural Language Processing for Software Automation’, the opportunities for harnessing the power of virtual AI assistants are endless. Stay tuned as we explore the workflows further in the next article.

In this article, we have gone through the components of an LLM-powered virtual assistant aka “AI copilot”. In the next article, we will dive deeper into the processing logic and follow a prompt into the engine of an intelligent assistant.

Sources and Further Reading

The post Building a Virtual AI Assistant (aka Copilot) for Your Software Application: Harnessing the Power of LLMs like ChatGPT appeared first on relataly.com.

Building “Chat with your Data” Apps using Embeddings, ChatGPT, and Cosmos DB for Mongo DB vCore

Florian Follonier — Sat, 27 May 2023 13:25:08 +0000

Artificial Intelligence (AI), in particular, the advent of OpenAI’s ChatGPT, has revolutionized how we interact with technology. Chatbots powered by this advanced language model can engage users in intricate, natural language conversations, marking a significant shift in AI capabilities. However, one thing that ChatGPT isn’t designed for is integrating personalized or proprietary knowledge – it’s built to draw upon general knowledge, not specifics about you or your organization. That’s where the concept of Retrieval Augmented Generation (RAG) comes into play. This article explores the exciting prospect of building your own ChatGPT that lets users ask questions on a custom knowledge base.

In this tutorial, we’ll unveil the mystery behind enterprise ChatGPT, guiding you through the process of creating your very own custom ChatGPT – an AI-powered chatbot based on OpenAI’s powerful Generative Pretrained Transformers (GPT) technology. We’ll use Python and delve into the world of vector databases, specifically, Mongo API for Azure Cosmos DB, to show you how you can make a large knowledgebase available to ChatGPT that can go way beyond the typical token limitation of GPT models.

For experts, AI fans, or tech newbies, this guide simplifies building your ChatGPT. With clear instructions, useful examples, and tips, we aim to make it informative and empowering.

We’ll explore AI, showing you how to customize your chatbot. We’ll simplify complex concepts and show you how to start your AI adventure from home or office. Ready to start this exciting journey? Keep reading!

Also:

Note on the use of Vector DBs and Costs.

Please note that this tutorial describes a business use case that utilizes a Cosmos DB for Mongo DB vCore hosted on the Azure cloud.

Alternatively, you can set up an open-source vector database on your local machine, such as Milvus. Be aware that certain code adjustments will be necessary to proceed with the open-source alternative.

Why Custom ChatGPT is so Powerful and Versatile

I believe we have all tested ChatGPT, and probably like me, you have been impressed by its remarkable capabilities. However, ChatGPT has a significant limitation: it can only answer questions and perform tasks based on the public knowledge base it was trained on.

Imagine having a chatbot based on ChatGPT that communicates effectively and truly understands the nuances of your business, sector, or even a particular topic of interest. That’s the power of a custom ChatGPT. A tailor-made chatbot allows for specialized conversations, providing the needed information and drawing from a unique database you’ve developed.

This becomes particularly beneficial in industries with specific terminologies or when you have a large database of knowledge that you want to make easily accessible and interactive. A custom ChatGPT, with its personalized and relevant responses, ensures a better user experience, effectively saving time and increasing productivity.

Let’s delve into how to build such a solution. Spoiler it does not work by putting all the content into the prompt. But there is a great alternative.

Understanding the Building Blocks of Custom ChatGPT with Retrieval Augmented Generation

The foundational technology behind ChatGPT is OpenAI’s Generative Pre-trained Transformer models (GPT). These models understand language by predicting the next word in a sentence and are trained on a diverse range of internet text. However, the GPT models, such as the GPT-3.5, have a limitation of processing 4096 tokens at a time. A token in this context is a chunk of text which can be as small as one character or as long as one word. For example, the phrase “ChatGPT is great” is four tokens long.

Another challenge with Foundation Models such as ChatGPT is that they are trained on large-scale datasets that were available at the time of their training. This means they are not aware of any data created after their training period. Also, because they’re trained on broad, general-domain datasets, they may be less effective for tasks requiring domain-specific knowledge.

How Retrieval Augmented Generation (RAG) Helps

Retrieval-Augmented Generation (RAG) is a method that combines the strength of transformer models with external knowledge to augment their understanding and applicability. Here’s a brief explanation:

To address this, RAG retrieves relevant information from an external data source and uses this information to augment the input to the foundation model. This can make the model’s responses more informed and relevant.

Data Sources

The external data can come from various sources like databases, document repositories, or APIs. To make this data compatible with the RAG approach, both the data and user queries are converted into numerical representations (embeddings) using language models.

Data Preparation as Embeddings

The embeddings, which are essentially vectors, need to be stored in a database that’s efficient at storing and searching through these high-dimensional data. This is where Azure’s Cosmos Mongo DB comes into play. It’s a vector search database specifically designed for this task.

To circumvent the token limitation and make your extensive data available to ChatGPT, we turn the data into embeddings. These are mathematical representations of your data, converting words, sentences, or documents into vectors. The advantage of using embeddings is that they capture the semantic meaning of the text, going beyond keywords to understand the context. In essence, similar information will have similar vectors, allowing us to cluster related information together and separate them from a semantically different text.

Storing the Data in Vector Databases

Matching Queries to Knowledge

The RAG model compares the embeddings of user queries with those in the knowledge base to identify relevant information. The user’s original query is then augmented with context from similar documents in the knowledge base.

Input to the Foundation Model

This augmented input is sent to the foundation model, enhancing its understanding and response quality.

Updates

Importantly, the knowledge base and associated embeddings can be updated asynchronously, ensuring that the model remains up-to-date even as new information is added to the data sources.

In sum, RAG extends the utility of foundation models by incorporating external, up-to-date, domain-specific knowledge into their understanding and output.

By incorporating these components, you’ll be creating a robust custom ChatGPT that not only understands the user’s queries but also has access to your own information, giving it the ability to respond with precision and relevance.

Ready to dive into the technicalities? Stay tuned!

A tailor-made chatbot allows for specialized conversations, providing the exact information needed, drawing from a unique database that you’ve developed.

Building the Custom “Chat with Your Data” App in Python

Now that we’ve discussed the theory behind building a custom ChatGPT and seen some exciting real-world applications, it’s time to put our knowledge into action! In this practical segment of our guide, we’re going to demonstrate how you can build a custom ChatGPT solution using Python.

Our project will involve storing a sample PDF document in Cosmos Mongo DB and developing a chatbot capable of answering questions based on the content of this document. This practical exercise will guide you through the entire process, including turning your PDF content into embeddings, storing these embeddings in the Cosmos Mongo DB, and finally integrating it all with ChatGPT to build an interactive chatbot.

If you’re new to Python, don’t worry, we’ll be breaking down the code and explaining each step in a straightforward manner. Let’s roll up our sleeves, fire up our Python environments, and get coding! Stay tuned as we embark on this exciting hands-on journey into the world of custom chatbots.

The code is available on the GitHub repository.

View on GitHub Relataly GitHub Repo

How to Set Up Vector Search in Cosmos DB

First, you must understand that you will need a database to store the embeddings. It does not necessarily have to be a vector database. Still, this type of database will make your solution more performant and robust, particularly when you want to store large amounts of data.

Azure Cosmos DB for MongoDB vCore is the first MongoDB-compatible offering to feature Vector Search. With this feature, you can store, index, and query high-dimensional vector data directly in Azure Cosmos DB for MongoDB vCore, eliminating the need for data transfer to alternative platforms for vector similarity search capabilities. Here are the steps to set it up:

Choose Your Azure Cosmos DB Architecture: Azure Cosmos DB for MongoDB provides two types of architectures, RU-based and vCore-based. Each has its strengths and is best suited for certain types of applications. Choose the one that best fits your needs. If you’re looking to lift and shift existing MongoDB apps and run them as-is on a fully supported managed service, the vCore-based option could be the perfect fit.
Configure Your Vector Search: Once your database architecture is set up, you can integrate your AI-based applications, including those using OpenAI embeddings, with your data already stored in Cosmos DB.
Build and Deploy Your AI Application: With the Vector Search set up, you can now build your AI application that takes advantage of this feature. You can create a Go app using Azure Cosmos DB for MongoDB or deploy Azure Cosmos DB for MongoDB vCore using a Bicep template as suggested next steps.

Azure Cosmos DB for MongoDB vCore’s Vector Search feature is a game-changer for AI application development. It enables you to unlock new insights from your data, leading to more accurate and powerful applications.

Cosmos DB for Mongo DB Usage Models

Regarding Cosmos DB for Mongo DB, there are two options to choose from: Request Unit (RU) Database Account and vCore Cluster. Each option follows a different pricing model to suit diverse needs.

The Request Unit (RU) Database Account operates on a pay-per-use basis. With this model, you are billed based on the number of requests and the level of provisioned throughput consumed by your workload.

As of 27th Mai 2023, the brand new vector search function is only available for the vCore Cluster option, which is why we will use this setup for this tutorial. The vCore Cluster offers a reserved managed instance. Under this option, you are charged a fixed amount on a monthly basis, providing more predictable costs for your usage.

Once you have created your vCore instance, you must collect your connection string and make it available to your Python script. You can do this either by storing it in Azure Key Vault (which I would recommend) or by storing it locally on your computer or in the code (which I would not recommend for obvious security reasons).

When it comes to Cosmos DB for Mongo DB, there are two options to choose from: Request Unit (RU) Database Account and vCore Cluster.

Azure Cosmos DB for Mongo DB is a new offering that is designed explicitly for vector use cases (incl. embeddings)

Using other Vector Databases

While Cosmos DB is a popular choice for vector databases, I would like to note that other options are available in the market. You can still benefit from this tutorial if you decide to utilize a different vector database, such as Pinncecone or Chroma. However, it is necessary to make code adjustments tailored to the APIs and functionalities of the specific vector database you choose.

Specifically, you will need to modify the “insert embedding functions” and “similarity search functions” to align with the requirements and capabilities of your chosen vector database. These functions typically have variations that are specific to each vector database.

By customizing the code according to your selected vector database’s API, you can successfully adapt the tutorial to suit your specific database choice. This allows you to leverage the principles and concepts this tutorial covers, regardless of the vector database you opt for.

Also: Vector Databases: The Rising Star in Generative AI Infrastructure

Prerequisites

Before diving into the code, it’s essential to ensure that you have the proper setup for your Python 3 environment and have installed all the necessary packages. If you do not have a Python environment, follow the instructions in this tutorial to set up the Anaconda Python environment. This will provide you with a robust and versatile environment well-suited for machine learning and data science tasks.

In this tutorial, we will be working with several libraries:

openai
pymongo
PyPDF2
dotenv

Should you decide to use Azure Key Vault, then you also need the following Python libraries:

azure-identity
azure-key-vault

You can install the OpenAI Python library using console commands:

pip install openai
conda install openai (if you are using the Anaconda packet manager)

Step #1 Authentification and DB Setup

Let’s start with the authentification and setup of the API keys. After making necessary imports, the code gets things read to connect to essential services – OpenAI and Cosmos DB – and makes sure it can access these services properly.

Fetching Credentials: The script starts by setting up a connection to a service called Azure Key Vault to retrieve some crucial credentials securely. These are like “passwords” that the script needs to access various resources.
Setting Up AI Services: Then, it prepares to connect to two different AI services. One is a version that’s hosted by Azure, and the other is the standard, public version.
Establishing Database Connection: Lastly, the script sets up a connection to a database service, specifically to a certain collection within the Cosmos DB database. The script also checks if the connection to the database was successful by sending a “ping” – if it receives a response, it knows the connection is good.

from azure.identity import AzureCliCredential
from azure.keyvault.secrets import SecretClient
import openai
import logging
import tiktoken
import pandas as pd
import pymongo
from dotenv import load_dotenv
load_dotenv()
# Set up the Azure Key Vault client and retrieve the Blob Storage account credentials
keyvault_name = ''
openaiservicename = ''
client = SecretClient(f"https://{keyvault_name}.vault.azure.net/", AzureCliCredential())
print('keyvault service ready')
# AzureOpenAI Service
def setup_azureopenai():
    openai.api_key = client.get_secret('openai-api-key').value
    openai.api_type = "azure"
    openai.api_base = f'https://{openaiservicename}.openai.azure.com'
    openai.api_version = '2023-05-15'
    print('azure openai service ready')
# public openai service
def setup_public_openai():
    openai.api_key = client.get_secret('openai-api-key-public').value
    print('public openai service ready')
DB_NAME = "hephaestus"
COLLECTION_NAME = 'isocodes'
def setup_cosmos_connection():
    COSMOS_CLUSTER_CONNECTION_STRING = client.get_secret('cosmos-cluster-string').value
    cosmosclient = pymongo.MongoClient(COSMOS_CLUSTER_CONNECTION_STRING)
    db = cosmosclient[DB_NAME]
    collection = cosmosclient[DB_NAME][COLLECTION_NAME]
    # Send a ping to confirm a successful connection
    try:
        cosmosclient.admin.command('ping')
        print("Pinged your deployment. You successfully connected to MongoDB!")
    except Exception as e:
        print(e)
    return collection, db
setup_public_openai()
collection, db = setup_cosmos_connection()

Now we have set things up to interact with our Cosmos DB Mong DB vCore instance.

Step #2 Functions for Populating the Vector DB

Next, we prepare and insert data into the database as embeddings. First, we prepare the content. The preparation process involves turning the text content into embeddings. Each embedding is a list of flats representing the meaning of a specific part of the text in a way the AI system can understand.

We create the embeddings by sending text (for example, a paragraph of a document) to an OpenAI embedding model that returns the embedding. There are two options for using OpenAI: You can use the Azure OpenAI engine and deploy your own Ada embedding model. Alternatively, you can use the public OpenAI Ada embedding model.

We’ll use the public OpenAI’s text-embedding-ada-002. Remember that the model is designed to return embeddings, not text. Model inference may incur costs based on the data processed. Refer to OpenAI or Azure OpenAI service for pricing details.

Finally, the code inserts the prepared requests (which now include both the original text and the corresponding embeddings) into the database. The function returns the unique IDs assigned to these newly inserted items in the database. In this way, the code processes and stores the necessary information in the database for later use.

# prepare content for insertion into cosmos db
def prepare_content(text_content):
  embeddings = create_embeddings_with_openai(text_content)
  request = [
    {
    "textContent": text_content, 
    "vectorContent": embeddings}
  ]
  return request
# create embeddings
def create_embeddings_with_openai(input):
    #print('Generating response from OpenAI...')
    ###### uncomment for AzureOpenAI model usage and comment code below
    # embeddings = openai.Embedding.create( 
    #     engine='', 
    #     input=input)["data"][0]["embedding"]
    ###### public openai model usage and comment code above
    embeddings = openai.Embedding.create(
        model='text-embedding-ada-002', 
        input=input)["data"][0]["embedding"]
    
    # Number of embeddings    
    # print(len(embeddings))
    return embeddings
# insert the requests
def insert_requests(text_input):
    request = prepare_content(text_input)
    return collection.insert_many(request).inserted_ids
# Creates a searchable index for the vector content
def create_index():
  
  # delete and recreate the index. This might only be necessary once.
  collection.drop_indexes()
  embedding_len = 1536
  print(f'creating index with embedding length: {embedding_len}')
  db.command({
    'createIndexes': COLLECTION_NAME,
    'indexes': [
      {
        'name': 'vectorSearchIndex',
        'key': {
          "vectorContent": "cosmosSearch"
        },
        'cosmosSearchOptions': {
          'kind': 'vector-ivf',
          'numLists': 100,
          'similarity': 'COS',
          'dimensions': embedding_len
        }
      }
    ]
  })
# Resets the DB and deletes all values from the collection to avoid dublicates
#collection.delete_many({})

Step #3 Document Cracing and Populating the DB

The next step is to break down the PDF document into smaller chunks of text (in this case, ‘records’) and then process these records for future use. You can repeat this process for any document that you want to make available to OpenAI.

You can use any PDF that you like as long as you it contains readable text (use OCR). For demo purposes, I will use a tax document from Zurich. Put the document in the folder data/vector_db_data/ in your root folder and provide the name to the Python script.

Want to read in many documents at once? If you want to insert many documents, read the pdf documents from the folder and use the names to populate a list. You can then surround the insert function with a for loop that iterates through the list of document names

#3.1 Document Slicing Considerations

To convert a PDF into embeddings, the first step is to divide it into smaller content slices. The slicing process plays a crucial role as it affects the information provided to the OpenAI GPT model when answering user questions. If the slices are too large, the model may encounter token limitations. Conversely, if they are too small, the model may not receive sufficient content to answer the question effectively. It is important to strike a balance between the number of slices and their length to optimize the results, considering that the search process may yield multiple outcomes.

There are several approaches to handle the slicing process. One option is to define the slices based on a specific number of sentences or paragraphs. Alternatively, you can iteratively slice the document, allowing for some overlap between the data in the vector database. This approach has the advantage of providing more precise information to answer questions, but it also increases the data volume in the vector database, which can impact speed and cost considerations.

#3.2 Running the code below to crack a document and insert embeddings into the vector DB

Running the code below will first define a function that breaks text into separate paragraphs based on line breaks. Another function slices the PDF into records. Each record contains a certain number of sentences (the maximum is defined by the ‘max_sentences’ value). We use a Python library called PyPDF2 to extract text from each page of the PDF and Python’s built-in regular expressions to split the text into sentences and paragraphs. Note that if you want to achieve better results, you could also use a professional document content extraction tool such as Azure form recognizer.

The code then opens a specific PDF file (‘zurich_tax_info_2023.pdf’) and slices it into records, each containing no more than a certain number of sentences (as defined by’max_sentences’). After that, the function inserts these records into the vector database. Finally, we print the count of documents in the database collection. This shows how many pieces of data are already stored in this specific part of the database.

# document cracking function to insert data from the excel sheet
def split_text_into_paragraphs(text):
    paragraphs = re.split(r'\n{2,}', text)
    return paragraphs
def slice_pdf_into_records(pdf_path, max_sentences):
    records = []
    
    with open(pdf_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        
        for page in reader.pages:
            text = page.extract_text()
            paragraphs = split_text_into_paragraphs(text)
            
            current_record = ''
            sentence_count = 0
            
            for paragraph in paragraphs:
                sentences = re.split(r'(?<=[.!?])\s+', paragraph)
                
                for sentence in sentences:
                    current_record += sentence
                    
                    sentence_count += 1
                    
                    if sentence_count >= max_sentences:
                        records.append(current_record)
                        current_record = ''
                        sentence_count = 0
                
                if sentence_count < max_sentences:
                    current_record += ' '  # Add space between paragraphs
            
            # If there is remaining text after the loop, add it as a record
            if current_record:
                records.append(current_record)
    
    return records
# get file from root/data folder
pdf_path = '../data/vector_db_data/zurich_tax_info_2023.pdf'
max_sentences = 20  # Adjust the slice size as per your requirement
result = slice_pdf_into_records(pdf_path, max_sentences)
# print the length of result
print(f'{len(result)} vectors created with maximum {max_sentences} sentences each.')
# Print the sliced records
for i, record in enumerate(result):
    insert_requests(record)
    if i < 5:
        print(record[0:100])
        print('-------------------')
create_index()
print(f'number of records in the vector DB: {collection.count_documents({})}')

After slicing the document and inserting the embeddings into the vector database, we can proceed with functions for similarity search and prompting.

Step #4 Functions for Similarity Search and Prompts to ChatGPT

This section of code provides a set of functions to perform a vector search in the Cosmos DB, make a request to the ChatGPT 3.5 Turbo model for generating responses, and create prompts for the OpenAI model to use in generating those responses.

#4.1 How the Search Part Works

Allow me to provide a concise explanation of how the search process operates. We have now reached the stage where a user poses a question, and we utilize the OpenAI model to supply an answer, drawing from our vector database. Here, it’s vital to understand that the model transforms the question into embeddings and subsequently scours the knowledge base for similar embeddings that align with the information requested in the user’s prompt.

The vector database yields the most suitable results and inserts them into another prompt tailored for ChatGPT. This model, distinct from the embedding model, generates text. Thus, the final interaction with the ChatGPT model incorporates both the user’s question and the results from the vector database, which are the most fitting responses to the question. This combination should ideally aid the model in providing the appropriate answer. Now, let’s turn our attention to the corresponding code.

#4.2 Setting up the Functions for Vector Search

The vector_search function takes as input a query vector (representing a user’s question in vector form) and an optional parameter to limit the number of results. It then conducts a search in the Cosmos DB, looking for entries whose vector content is most similar to the query vector.

Next, the openai_request function makes a request to OpenAI’s ChatGPT 3.5 Turbo model to generate a response. This function takes a formatted conversation history (or ‘prompt’) and sends it to the model, which then generates a response. The content of the generated response is then returned.

The create_tweet_prompt function constructs the conversation history for the OpenAI model. This function takes the user’s question and a JSON object containing results from a database search and constructs a list of system and user messages. This list will then serve as the prompt for the ChatGPT model, instructing it to generate a response that answers the user’s question about tax, with the added guideline that the response should be in the same language as the question. The constructed prompt is then returned by the function.

# Cosmos DB Vector Search API Command
def vector_search(vector_query, max_number_of_results=2):
  results = collection.aggregate([
    {
      '$search': {
        "cosmosSearch": {
          "vector": vector_query,
          "path": "vectorContent",
          "k": max_number_of_results
        },
      "returnStoredSource": True
      }
    }
  ])
  return results
# openAI request - ChatGPT 3.5 Turbo Model
def openai_request(prompt, model_engine='gpt-3.5-turbo'):
    completion = openai.ChatCompletion.create(model=model_engine, messages=prompt, temperature=0.2, max_tokens=500)
    return completion.choices[0].message.content
# define OpenAI Prompt for News Tweet
def create_prompt(user_question, result_json):
    instructions = f'You are an assistant that answers questions based on sources provided. \
    If the information is not in the provided source, you answer with "I don\'t know". '
    task = f"{user_question} Translate the response to english /n \
    source: {result_json}"
    
    prompt = [{"role": "system", "content": instructions }, 
              {"role": "user", "content": task }]
    return prompt

You can easily change the voice and tone in which the ChatGPT answers questions by including the respective instructions in the create_prompt function.

Also: ChatGPT Style Guide: Understanding Voice and Tone Prompt Options for Engaging Conversations

Step #5 Testing the Custom ChatGPT Solution

This part of the code works with the previous functions to facilitate a complete question-answering cycle with Cosmos DB and OpenAI’s ChatGPT 3.5 Turbo model.

Now comes the most exciting part. Testing the solution, you can define a question and then execute the code below to run the search process.

# define OpenAI Prompt 
users_question = "When do I have to submit my tax return?"
# generate embeddings for the question
user_question_embeddings = create_embeddings_with_openai(user_question)
# search for the question in the cosmos db
search_results = vector_search(user_question_embeddings, 1)
print(search_results)
# prepare the results for the openai prompt
result_json = []
# print each document in the result
# remove all empty values from the results json
search_results = [x for x in search_results if x]
for doc in search_results:
    display(doc.get('_id'), doc.get('textContent'), doc.get('vectorContent')[0:5])
    result_json.append(doc.get('textContent'))
# create the prompt
prompt = create_prompt(user_question, result_json)
display(prompt)
# generate the response
response = openai_request(prompt)
display(f'User question: {users_question}')
display(f'OpenAI response: {response}')

‘User question: When do I have to submit my tax return?’

'OpenAI response: When do I have to submit my tax return? \n\nAll natural persons who had their residence in the canton of Zurich on December 31, 2022, or who owned properties or business premises (or business operations) in the canton of Zurich, must submit a tax return for 2022 in the calendar year 2023. Taxpayers with a residence in another canton also have to submit a tax return for 2022 in the calendar year 2023 if they ended their tax liability in the canton of Zurich by giving up a property or business premises during the calendar year 2022. If you turned 18 in the tax period 2022 (persons born in 2004), you must submit your own tax return (for the tax period 2022) for the first time in the calendar year 2023.'

As of Mai 2023, the knowledge base of ChatGPT 3.5 is limited to the timeframe before September 2021. So it’s evident that the response of our custom ChatGPT solution is based on the individual information provided in the vector database. Remember that we did not fine-tune the GPT model, so the model itself does not inherently know anything about your private data and instead uses the data that was dynamically provided to it as part of the prompt.

Real-world Applications of Chat with your data

Custom ChatGPT boosts efficiency, personalizes services, and improves experiences across industries. Here are some examples:

Customer Support: Companies can use ChatGPT for 24/7 customer service. With data from manuals, FAQs, and support docs, it delivers fast, accurate answers, enhancing customer satisfaction and lessening staff workload.
Healthcare: ChatGPT can respond to patient questions using medical texts and care guidelines. It offers data on symptoms, treatments, side effects, and preventive care, helping both healthcare providers and patients.
Legal Sector: Law firms can use ChatGPT with legal texts, court decisions, and case studies for answering legal questions, offering case references, or explaining legal terms.
Financial Services: Banks can use ChatGPT to extend their customer service and give customers advice based on their individual financial situation.
E-Learning: Schools and e-learning platforms can use ChatGPT to tutor students. Using textbooks, notes, and research papers, it helps students understand complex topics, solve problems, or guide them through a course.

In short, any sector needing a large information database for queries or services can use custom ChatGPT. It enhances engagement and efficiency by offering personalized experiences.

Summary

In this comprehensive guide, we’ve journeyed through the fascinating process of creating a customized ChatGPT that lets users chat with your business data. We started with understanding the immense value a tailored ChatGPT brings to the table and dove into its ability to produce specialized responses sourced from a custom knowledge base. This tailored approach enhances user experiences, saves time, and bolsters productivity.

We went behind the scenes to reveal the vital elements of crafting a custom ChatGPT: OpenAI’s GPT models, data embeddings, and vector databases like Cosmos DB for Mongo DB vCore. We clarified how these components synergize to transcend the token limitations inherent to GPT models. By integrating the components in Python, we broadened ChatGPT’s ability to answer queries based on your private knowledgebase, thereby offering contextually appropriate responses.

I hope this tutorial was able to illustrate the business value of ChatGPT and its versatile utility across a variety of sectors, including customer service, healthcare, legal services, finance, e-learning, and CRM data analytics. Each instance emphasized the transformative potential of a personalized ChatGPT in delivering efficient, targeted solutions.

I hope you found this helpful article. If you have any questions or remarks, please drop them in the comment section.

Sources and Further Reading

Azure Cosmos DB
OpenAI pricing
Azure OpenAI
Semantic search
What are embeddings?
Using vector search on embeddings in Azure Cosmos DB for MongoDB vCore
OpenAI ChatGPT helped to revise this article
Images created with Midjourney

The post Building “Chat with your Data” Apps using Embeddings, ChatGPT, and Cosmos DB for Mongo DB vCore appeared first on relataly.com.

Vector Databases: The Rising Star in Generative AI Infrastructure

Florian Follonier — Sat, 06 May 2023 22:43:36 +0000

Artificial intelligence (AI) continues its rapid evolution, with new advancements and innovations emerging on a frequent basis. A key enabler of these advancements is the robust infrastructure needed to store, process, and analyze colossal amounts of data. One critical part of this infrastructure is the vector database, a powerful solution for managing unstructured data types including text, audio, images, and videos in numerical form.

Vector databases have gained traction in the AI sphere due to their ability to efficiently manage similarity searches across thousands of columns. They play a crucial role in powering large language models and other advanced AI applications. In this article, we will delve into the fundamentals of vector databases, their significance in AI infrastructure, and their transformative potential in managing and analyzing unstructured data.

Also:

9 Business Use Cases of OpenAI’s ChatGPT
Using LLMs (OpenAI’s ChatGPT) to Streamline Digital Experiences

Why are Vector Databases Integral to AI Infrastructure?

The rise of vector databases is closely tied to the growing importance of embeddings in advanced generative AI applications. Embeddings are high-dimensional vectors that represent unstructured data, such as text, images, and audio, in a continuous numerical space. These vectors are essential for advanced generative AI applications such as natural language processing, computer vision, and speech recognition, where they are used to represent and analyze complex data.

The Role of Embeddings in Generative AI

Embeddings play a vital role in advanced generative AI applications such as natural language processing (NLP), where they are used to represent and analyze complex data. An embedding is a high-dimensional vector that represents unstructured data, such as text, images, and audio, in a continuous numerical space. In the context of NLP, an embedding represents a semantic and syntactic meaning of words or sentences in a vector format that can be fed as input into deep learning models.

An example of an embedding for text could be representing the sentence “I love pizza” as a 300-dimensional vector, where each dimension represents a specific feature or attribute of the sentence. For instance, word count, the presence of certain keywords, or sentiment. The process of generating embeddings for natural language is typically done using pre-trained language models like OpenAI’s GPT or BERT.

The length of an embedding vector is arbitrary and can vary depending on the specific use case and the model used to generate the embeddings. The quality of the embeddings can significantly affect the performance of NLP tasks such as language modeling, sentiment analysis, machine translation, and question-answering systems.

Large language models (LLMs) are one of the most advanced AI applications that heavily rely on embeddings. These models have billions of parameters, and embeddings play a crucial role in training and fine-tuning these models to perform a wide range of NLP tasks.

SQL Databases and Their Limitations in Handling High-Dimensional Embeddings

SQL databases are designed to work with structured data that has a fixed schema and is typically stored in tables with rows and columns. In contrast, embeddings are high-dimensional vectors that represent unstructured data such as text, images, and audio in a continuous numerical space. Embeddings can have hundreds or even thousands of dimensions, making them unsuitable for storage in traditional SQL databases, which are optimized for working with smaller, fixed-dimensional datasets.

Benefits of Vector Databases

Vector databases are natively designed to handle high-dimensional vectors, such as embeddings. They can therefore provide a more scalable and efficient solution for storing, querying, and analyzing large amounts of unstructured data. With their ability to efficiently handle similarity searches across thousands of columns, vector databases have become an essential component of AI infrastructure, powering large language models and other advanced AI applications.

There are several reasons why vector databases are well-suited to handle embeddings:

Efficient storage: Vector databases are designed to store high-dimensional vectors efficiently, allowing them to handle large quantities of data while using minimal storage space. This is important for embeddings, which can have hundreds or thousands of dimensions.
High-performance similarity search: Vector databases use specialized algorithms and data structures to perform high-performance similarity searches on embeddings. This allows users to quickly find the closest embeddings to a given query, making them well-suited for tasks such as image or text similarity search.
Scalability: Vector databases are highly scalable, allowing them to easily handle large datasets. This is important for embeddings, which are often used in large language models and other AI applications that require vast amounts of data.
Flexibility: Vector databases can handle various data types, including text, images, audio, and video. This makes them well-suited for a wide range of AI applications.

Overall, the specialized design of vector databases makes them well-suited for handling high-dimensional vectors such as embeddings, making them a crucial component of modern AI infrastructure.

An embedding is a high-dimensional vector that represents unstructured data, such as text, images, and audio, in a continuous numerical space.

Semantic Search as a Way to Create Custom ChatGPT

OpenAI’s approach to embeddings is an unsupervised learning method known as “representation learning.” The model learns to represent the data in a useful way for downstream tasks like natural language processing without being explicitly told what features to extract or how to represent the data. This approach has been highly effective in training LLMs, which can generate human-like text with remarkable accuracy.

However, one of the limitations of OpenAI models is their ability to handle only a limited amount of input data. For example, ChatGPT 3.5 has a token limit of 4096, which means that it cannot search larger databases without additional techniques. This is where embeddings come into play.

Vector databases are becoming increasingly popular for their ability to find meaning in unstructured data, which is a vital feature for advanced AI applications such as semantic search. Semantic search is similar to ChatGPT, but it operates on a custom knowledge base. The knowledge can be anything from customer relationship management (CRM) data to technical manuals and research and development (R&D) information. The data needs to be stored somewhere and support querying at low latency, and vector databases are perfectly suited for this task due to their previously mentioned advantages. The growing popularity of vector databases is, therefore also to be seen as a result of the growing interest of companies in creating custom ChatGPT applications based on their internal knowledge.

Increasing Investments in Vector Database Startups

vector database

" data-image-caption="

vector database

" data-large-file="https://www.relataly.com/wp-content/uploads/2023/05/Flo7up_vector_database_colorful_popart_b6b52134-5a54-4622-85bc-ed93d715501f-min.png" src="https://www.relataly.com/wp-content/uploads/2023/05/Flo7up_vector_database_colorful_popart_b6b52134-5a54-4622-85bc-ed93d715501f-min-512x288.png" alt="OpenAI's approach to embeddings is an unsupervised learning method known as "representation learning." The model learns to represent the data in a useful way for downstream tasks like natural language processing without being explicitly told what features to extract or how to represent the data. " class="wp-image-13601" srcset="https://www.relataly.com/wp-content/uploads/2023/05/Flo7up_vector_database_colorful_popart_b6b52134-5a54-4622-85bc-ed93d715501f-min.png 512w, https://www.relataly.com/wp-content/uploads/2023/05/Flo7up_vector_database_colorful_popart_b6b52134-5a54-4622-85bc-ed93d715501f-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/05/Flo7up_vector_database_colorful_popart_b6b52134-5a54-4622-85bc-ed93d715501f-min.png 768w, https://www.relataly.com/wp-content/uploads/2023/05/Flo7up_vector_database_colorful_popart_b6b52134-5a54-4622-85bc-ed93d715501f-min.png 1452w" sizes="(max-width: 512px) 100vw, 512px" />

Given the recent hype around AI, it’s no wonder that companies are investing heavily in vector databases to improve the accuracy and efficiency of their algorithms. This trend is reflected in the recent funding rounds of vector database startups such as Pinecone, Chroma, and Weviate. However, established players in the field, such as Microsoft, also offer solutions that can be used to build AI applications on top of custom knowledge bases. For example, Azure Cognitive Search is a powerful solution that businesses can use to build and deploy AI applications that leverage the capabilities of vector databases. Matchlt is another solution for vector search developed by Google. Despite the challenges posed by new startups, established players like Microsoft remain competitive and continue to offer valuable solutions for businesses seeking to implement vector databases in their AI workflows.

A Close Look at Popular Vector Databases: Pinecone, Chroma, and Weaviate

Finally, let’s take a look at three different dedicated vector databases that are optimized for working with vectors: Pinecone, Chroma, and Weaviate.

Pinecone

Given the recent hype around AI, it’s no wonder that companies are investing heavily in vector databases to improve the accuracy and efficiency of their algorithms.

Pinecone is a cloud-native vector database designed for high-performance, low-latency, and scalable vector similarity search. It can handle both dense and sparse vectors, making it a versatile choice for a wide range of use cases. Pinecone provides an easy-to-use API that allows users to add, search, and retrieve vectors with just a few lines of code. It also offers hybrid search functionality, which enables users to mix traditional text-based search with vector search.

One of the key advantages of Pinecone is its scalability. It can handle billions of vectors and provides automatic sharding and load balancing to ensure that search requests are distributed evenly across the available resources. Pinecone also offers real-time indexing and search. This means that new vectors are available for search immediately after they are added.

Chroma

Chroma is a simple, lightweight vector search database that can be used to build an in-memory document-vector store. It is built on top of Apache Cassandra and provides an easy-to-use API. It is an excellent choice for users who want a quick and simple solution for vector similarity search. Chroma uses the Hugging Face transformers library to vectorize documents by default, but it can also be configured to use custom vectorization models.

One of the key advantages of Chroma is its simplicity. It can be set up and configured quickly and easily and doesn’t require any special hardware or software. Chroma is also highly customizable, supporting custom vectorization models, custom similarity functions, and more. If you are looking for the Chroma Python library, here it is.

Weaviate

Weaviate is a feature-rich vector database designed for complex data modeling and search use cases. It provides a GraphQL API with support for vector similarity search and a range of other advanced search and filtering features. Weaviate can store and search various data types, including structured data, unstructured data, and images.

One of the key advantages of Weaviate is its flexibility. It can be used to build highly customized search applications with complex data models and search requirements. Weaviate also provides advanced search and filtering features, including geospatial search, range search, and fuzzy search. It also supports data federation, which enables users to search across multiple data sources.

Other Noteworthy Vector Databases

The vector databases above are just some examples. Several other vector databases may be worth a look too:

Faiss: Developed by Facebook’s AI Research team, Faiss provides efficient similarity search and clustering of dense vectors.
Hnswlib: An open-source library for Approximate Nearest Neighbor Search, Hnswlib offers excellent speed and accuracy with minimal resource usage.
Milvus: An open-source vector database designed for AI and analytics, Milvus offers scalable, reliable, and customizable solutions.
qdrant: Qdrant is a vector similarity search engine with extended filtering capabilities, designed for organizing and searching large-scale vector data.

Additionally, some databases, while not specifically designed for vectors, can handle vectors more efficiently than traditional SQL databases. Examples include Azure Cosmos DB, Elasticsearch, and Redis DB. Examples include Azure Cosmos Db, Elastic Search, and Redis db.

Summary

OpenAI’s advancements may have fueled the initial hype around AI, but the infrastructure demand to support AI applications is now on the rise. Vector databases are increasingly in the spotlight due to their proficiency in managing unstructured data and their efficiency in conducting similarity searches across vast columns of data. With the escalating demand for AI infrastructure, vector databases are anticipated to continue gaining momentum. As businesses and organizations strive to leverage the power of AI, the reliance on vector databases will only grow, making them a cornerstone of future AI infrastructure.

I hope this article was helpful. If you have any remarks or comments, please let me know in the comments.

Sources and Further Reading

https://en.wikipedia.org/wiki/Distributional%E2%80%93relational_database
https://www.pinecone.io/
https://www.trychroma.com/
https://www.calcalistech.com/ctechnews/article/sjveg7ux2
https://weaviate.io/
https://analyticsindiamag.com/why-are-investors-flocking-to-vector-databases/
ChatGPT helped to revise this article.
Images generated with Midjourney

The post Vector Databases: The Rising Star in Generative AI Infrastructure appeared first on relataly.com.

relataly.com

Agentic Web Scraping with Azure AI Foundry Agent Service: Insights from Building AIUseCaseHub.com

Intro: What is AIUseCaseHub.com?

What are Agents?

Agentic Scraping

The Business Challenge: Monitoring AI Use Cases Online

Agentic Scraping Architecture

Screener Agent

Writer Agent

Reviewer Agent

(Social Media Agent)

Agent Orchestration

Tools Used in Agentic Scraping

Single Agent vs Multi-Agent Workflows

Multi-Agent Flow

A Word on the Cost of Running Scraping Agents

Learning Agents

Summary

Sources and Useful Links

Related Posts

Six Shortcomings of Current LLMs I Expect From AGI

What is AGI?

Shortcoming of Current LLMs

Showing Empathy

Contextual Understanding

Conceptual Synthesis

Genuine Learning Memory

The Art of Silence

Visual Ideation

Summary

Sources and Further Reading:

Building a Conversational Voice Bot with Azure OpenAI and Python: The Future of Human and Machine Interaction

Understanding the Voice Bot

Prerequisites & Azure Service Integration

Azure Cognitive Speech Services

Azure OpenAI Service

Implementation: Detailed Code Walkthrough

Step #1 Azure Service Authentication

Step #2 Speech Recognition

Step #3 Processing and Response Generation

Step #4 Speech Synthesis

Step #5 Managing the Conversation

Current Challenges

Slow Response Time

Handling Pauses in Speech

Summary

Sources and Further Reading

Text-to-SQL with LLMs – Embracing the Future of Data Interaction

LLM-Basics for Text-to-SQL

Database Schema vs Token Limit

From Zero-Shot to Few-Shot Learning

LLM-Based Approaches for Implementing Text-to-SQL

Approach 1: Everything in Context Window

Key Aspects of the Context Window Approach

Example of a Simplified Schema Format

Advantages & Limitations

Approach 2: Augmentation Retrieval Generation (RAG)

The Two Phases of the RAG Approach

End-to-End Example

Advantages & Limitations

Approach 3: LLM Fine-Tuning

The Process of Fine-Tuning

Example of Data Preparation for Fine-Tuning:

Advantages & Limitations

Additional Considerations

Summary

Sources

Building a Virtual AI Assistant (aka Copilot) for Your Software Application: Harnessing the Power of LLMs like ChatGPT

Virtual AI Assistants at the Example of Microsoft M365 Copilot

How LLMs Enable a New Generation of Virtual AI Assistants

LLMs – A Game Changer for Conversational User Interface Design

A Revolution in Software Design and User Experience

Components of a Modern Virtual AI Assistant áka AI Copilot

A) Conversational Application Frontend

B) Large Language Model

C) Knowledge Store

D) Conversation Control Logic

E) Application API

F) Cache

Considerations on the Architecture of Virtual AI Assistants