ChatBots Archives - relataly.com

Building a Conversational Voice Bot with Azure OpenAI and Python: The Future of Human and Machine Interaction

Florian Follonier — Thu, 08 Feb 2024 21:09:50 +0000

OpenAI and Microsoft have just released a new generation of text-to-speech models that take synthetic speech to a new level. In my latest project I have combined these new models with Azure OpenAI’s ingenuine conversation capacity. The result is a conversational voice bot that uses Generative AI to converse with users in natural spoken language.

This article describes the Python implementation of this project. The bot is designed to understand spoken language and process it through OpenAI GPT-4. It responds with contextually aware dialogue, all in natural-sounding speech. This seamless integration facilitates a conversational flow that feels intuitive and engaging. The voice processing capacities enable users to have meaningful exchanges with the bot as if they were conversing with another person. Testing the bot was a lot of fun. It felt a bit like the iconic scene from Iron Man where the hereo converses with an AI assistant.

Here is an example of the audio response quality:

Also:

Understanding the Voice Bot

The magic begins with the user speaking to the bot. Azure Cognitive Services transcribes the spoken words into text, which is then fed into Azure’s OpenAI service. Here, the text is processed, and a response is generated based on the conversation’s context and history. Finally, the text-to-speech model transforms the response back into speech, completing the cycle of interaction. This process showcases the potential of AI in understanding and participating in human-like conversations.

Prerequisites & Azure Service Integration

Our conversational voice bot is built upon two pivotal Azure services: Cognitive Speech Services and OpenAI. Billing of these services is pay-per-use. Unless you process large numbers of requests, the costs for experimenting with these services is relatively low.

Azure Cognitive Speech Services

Azure AI Speech Services (previously Cognitive Speech Services) provide the tools necessary for speech-to-text conversion, enabling our voice bot to understand spoken language. This service boasts advanced speech recognition capabilities, ensuring accurate transcription of user speech into text. Furthermore, it powers the text-to-speech synthesis that transforms generated text responses back into natural-sounding voice. This allows for a truly conversational experience.

The newest generation of OpenAI text-to-speech models is now also availbale in Azure AI Speech. These models can synthesize voices in unknown level of quality. I am most impressed by its capability to alter intonation dynamically to express emotions.

Azure OpenAI Service

At the heart of our project lies Azure’s OpenAI service, which uses the power of models like GPT-4 context-aware responses. Once Azure Cognitive Speech Services transcribe the user’s speech into text, this text is sent to OpenAI. The OpenAI model then processes the input and generates a completion. The service’s ability to understand context and generate engaging responses is what makes our voice bot remarkably human-like.

Implementation: Detailed Code Walkthrough

Let’s start with the implementation! We kick things off with Azure Service Authentication, where we set up our conversational voice bot to communicate with Azure and OpenAI’s advanced services. Then, Speech Recognition steps in, acting as our bot’s ears, converting spoken words into text. Next up, Processing and Response Generation uses OpenAI’s GPT-4 to turn text into context-aware responses. Speech Synthesis then gives our bot a voice, transforming text responses back into spoken words for a natural conversational flow. Finally, Managing the Conversation keeps the dialogue coherent and engaging. Through these steps, we create a voice bot that offers an intuitive and engaging conversational experience. Let’s discuss these sections one by one.

As always, you can find the code on github:

View on GitHub Relataly Github Repo

Step #1 Azure Service Authentication

First off, we kick things off by getting our ducks in a row with Azure Service Authentication. This is where the magic starts, setting the stage for our conversational voice bot to interact with Azure’s brainy suite of Cognitive Services and the fantastic OpenAI models. By fetching API keys and setting up our service regions, we’re essentially giving our project the keys to the kingdom.

For using dotenv, you need to create an .env file in your root folder. Here is more information on how this works.

import os
from dotenv import load_dotenv
import azure.cognitiveservices.speech as speechsdk
from openai import AzureOpenAI

# Load environment variables from .env file
load_dotenv()

# Constants from .env file
SPEECH_KEY = os.getenv('SPEECH_KEY')
SERVICE_REGION = os.getenv('SERVICE_REGION')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
OPENAI_ENDPOINT = os.getenv('OPENAI_ENDPOINT')

# Azure Speech Configuration
speech_config = speechsdk.SpeechConfig(subscription=SPEECH_KEY, region=SERVICE_REGION)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
speech_config.speech_recognition_language="en-US"

# OpenAI Configuration
openai_client = AzureOpenAI(
    api_key=OPENAI_API_KEY,
    api_version="2023-12-01-preview",
    azure_endpoint=OPENAI_ENDPOINT
)

Step #2 Speech Recognition

The user’s spoken input is captured and converted into text using Azure’s Speech-to-Text service. This involves initializing the speech recognition service with Azure credentials and configuring it to listen for and transcribe spoken language in real-time.

def recognize_from_microphone():
    # Configure the recognizer to use the default microphone.
    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
    # Create a speech recognizer with the specified audio and speech configuration.
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    print("Speak into your microphone.")
    # Perform speech recognition and wait for a single utterance.
    speech_recognition_result = speech_recognizer.recognize_once_async().get()

    # Process the recognition result based on its reason.
    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognized: {}".format(speech_recognition_result.text))
        # Return the recognized text if speech was recognized.
        return speech_recognition_result.text
    elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
    elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_recognition_result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))
            print("Did you set the speech resource key and region values?")
    # Return 'error' if recognition failed or was canceled.
    return 'error'

Step #3 Processing and Response Generation

Once we’ve got your words neatly transcribed, it’s time for the Processing and Response Generation phase. This is where OpenAI steps in, acting like the brain behind the operation. It takes your spoken words, now in text form, and churns out responses that are nothing short of conversational gold. We nudge OpenAI’s GPT-4 into generating replies that feel as natural as chatting with a close friend over coffee.

def openai_request(conversation, sample = [], temperature=0.9, model_engine='gpt-4'):
    # Initialize AzureOpenAI client with keys and endpoints from Key Vault.
    
    
    # Send a request to Azure OpenAI with the conversation context and get a response.
    response = openai_client.chat.completions.create(model=model_engine, messages=conversation, temperature=temperature, max_tokens=500)
    return response.choices[0].message.content

Step #4 Speech Synthesis

Next up, we tackle Speech Synthesis. If the previous step was the brain, consider this the voice of our operation. Taking the AI-generated text, we transform it back into speech—like turning lead into gold, but for conversations. Through Azure’s Text-to-Speech service, we’re able to give our bot a voice that’s not only clear but also surprisingly human-like.

def synthesize_audio(input_text):
    # Define SSML (Speech Synthesis Markup Language) for input text.
    ssml = f"""
        
            
                
                    {input_text}
                
            
        
        """
    
    audio_filename_path = "audio/ssml_output.wav"  # Define the output audio file path.
    print(ssml)
    # Synthesize speech from the SSML and wait for completion.
    result = speech_synthesizer.speak_ssml_async(ssml).get()

    # Save the synthesized audio to a file if synthesis was successful.
    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        with open(audio_filename_path, "wb") as audio_file:
            audio_file.write(result.audio_data)
        print(f"Speech synthesized and saved to {audio_filename_path}")
    elif result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        print(f"Speech synthesis canceled: {cancellation_details.reason}")
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print(f"Error details: {cancellation_details.error_details}")


# Create the audio directory if it doesn't exist.
if not os.path.exists('audio'):
    os.makedirs('audio')

Step #5 Managing the Conversation

Finally, we bring it all together in the Managing the Conversation step. This is where we ensure the chat keeps rolling, looping through listening, thinking, and speaking. We keep track of what’s been said to keep the conversation relevant and engaging.

The system message below makes the bot talk like a pirate. But you can easily adjust the system message and this way customize the bots behavior.

conversation=[{"role": "system", "content": "You are a helpful assistant that talks like pirate. If you encounter any issues, just tell a pirate joke or a story."}]

while True:
    user_input = recognize_from_microphone()  # Recognize user input from the microphone.
    conversation.append({"role": "user", "content": user_input})  # Add user input to the conversation context.

    assistant_response = openai_request(conversation)  # Get the assistant's response based on the conversation.

    conversation.append({"role": "assistant", "content": assistant_response})  # Add the assistant's response to the context.
    
    print(assistant_response)
    synthesize_audio(assistant_response)  # Synthesize the assistant's response into audio.

Throughout these steps, the conversation’s context is managed meticulously to ensure continuity and relevance in the bot’s responses, making the interaction feel more like a dialogue between humans.

Current Challenges

Despite the promising capabilities of our voice bot, the journey through its development and interaction has presented a few challenges that underscore the complexity of human-machine communication.

Slow Response Time

One of the notable hurdles is the slow response time experienced during interactions. The process, from speech recognition through to response generation and back to speech synthesis, involves several steps that can introduce latency. This latency can detract from the user experience, as fluid conversations typically require quick exchanges. Optimizing the interaction flow and exploring more efficient data processing methods may mitigate this issue in the future.

Handling Pauses in Speech

Another challenge lies in the bot’s handling of longer pauses while speaking. The current setup does not always allow users to pause thoughtfully without triggering the end of their input. This may sometimes lead to a situation where the model cuts off speech prematurely. This limitation points to the need for more sophisticated speech recognition algorithms capable of distinguishing between a conversational pause and the end of an utterance.

Summary

This article has shown how you can build a conversational voice bot in Python with the latest pretrained AI models. The project showcases the incredible potential of combining Azure Cognitive Services with OpenAI’s conversational models. I hope by now you understand the technical feasibility of creating voice-based applications and how they open up a world of possibilities for human-machine interaction. As we continue to refine and enhance this technology, the line between talking to a machine and conversing with a human will become ever more blurred, leading us into a future where AI companionship becomes reality.

This exploration of Azure Cognitive Services and OpenAI’s integration within a voice bot is just the beginning. As AI continues to evolve, the ways in which we interact with technology will undoubtedly transform, making our interactions more natural, intuitive, and, most importantly, human.

Also: 9 Business Use Cases of OpenAI’s ChatGPT

Sources and Further Reading

The post Building a Conversational Voice Bot with Azure OpenAI and Python: The Future of Human and Machine Interaction appeared first on relataly.com.

Building a Virtual AI Assistant (aka Copilot) for Your Software Application: Harnessing the Power of LLMs like ChatGPT

Florian Follonier — Wed, 05 Jul 2023 12:45:27 +0000

Welcome to the dawn of a new era in digital interaction! With the advent of Generative AI, we’re witnessing a remarkable revolution that’s changing the very nature of how we interact with software and digital services. This change is monumental. Leading the charge are the latest generation of AI-powered virtual assistants, aka “AI copilots”. Unlike traditional narrow AI models, these are capable of understanding user needs, intents, and questions expressed in plain, natural language.

We are talking about nothing less but the next evolution in software design and user experience that is driven by recent advances in generative AI and Large Language Models (LLMs) like OpenAI’s ChatGPT, Google Bard, or Anthrophic’s Claude.

Thanks to LLMs user interactions are no longer bound by the constraints of a traditional user interface with forms and buttons. Whether it’s creating a proposal in Word, editing an image, or opening a claim in an insurance app, users can express their needs in natural language – a profound change in our interactions with software and services.

Despite the hype about these new virtual ai assistants, our understanding of how to build an LLM-powered virtual assistant remains scant. So, if you wonder how to take advantage of LLMs and build a virtual assistant for your app, this article is for you. This post will probe into the overarching components needed to create a virtual AI assistant. We will look at the architecture and its components including LLMs, Knowledge store, Cache, Conversational Logic, and APIs.

Also:

The new generation of virtual ai assistants inspires a profound change in the way we interact with software and digital services.

Virtual AI Assistants at the Example of Microsoft M365 Copilot

Advances in virtual AI assistants are closely linked to ChatGPT and other LLMs from US-based startup OpenAI. Microsoft has forged a partnership with OpenAI to bring the latest advances in AI to their products and services. Microsoft has announced these “Copilots” across major applications, including M365 and the Power Platform.

Here are some capabilities of these Copilots within M365:

In PowerPoint, Copilot allows users to create presentations based on a given context, such as a Word document, for example by stating “Create a 10-slide product presentation based on the following product documentation.“
In Word, Copilot can adjust the tone of writing a text or transform a few keywords into a complete paragraph. Simply type something like “Create a proposal for a 3-month contract for customer XYZ based on doc ADF.”
In Excel, Copilot helps users with analyzing datasets, as well as with creating or modifying them. For example, it can summarize a dataset in natural langue and describe trends.
Let’s not forget Outlook! Your new AI Copilot helps you organize your emails and calendar. It assists you in crafting email responses, scheduling meetings, and even provides summaries of key points from the ones you missed.

If you want to learn more about Copilot in M365, this youtube video provides an excellent overview. However, these are merely a handful of examples: Microsoft 365 Copilot Explained: How Microsoft Just Changed the Future of Work. The potential of AI copilots extends far beyond the scope of Office applications and can elevate any software or service to a new level. No wonder, large software companies like SAP, and Adobe, have announced plans to upgrade their products with copilot features.

Microsoft has announced a whole fleet of virtual AI assistants for its products. These range from copilots in M365 office apps to services of its Azure cloud platform.

How LLMs Enable a New Generation of Virtual AI Assistants

Virtual AI assistants are nothing but new. Indeed, their roots can be traced back to innovative ventures such as the paperclip assistant, Clippy, from Microsoft Word – a pioneering attempt at enhancing user experience. Later on, this was followed by the introduction of conventional chatbots.

Nonetheless, these early iterations had their shortcomings. Their limited capacity to comprehend and assist users with tasks outside of their defined parameters hampered their success on a larger scale. The inability to adapt to a wider range of user queries and requests kept these virtual ai assistants confined within their initial scope, restricting their growth and wider acceptance. So if we talk about this next generation of virtual ai assistants, what has truly revolutionized the scene? In essence, the true innovation lies in the emergence of LLMs such as OpenAI’s GPT4.

LLMs – A Game Changer for Conversational User Interface Design

Over time, advancements in machine learning, natural language processing, and vast data analytics transformed the capabilities of AI assistants. Modern AI models, like GPT-4, can understand context, engage in more human-like conversations, and offer solutions to a broad spectrum of queries. Furthermore, the integration of AI assistants into various devices and platforms, along with the increase in cloud computing, expanded their reach and functionality. These technological shifts have reshaped the scene, making AI assistants more adaptable, versatile, and user-friendly than ever before.

Take, for example, an AI model like GPT. A user might instruct, “Could you draft an email to John about the meeting tomorrow?” Not only would the AI grasp the essence of this instruction, but it could also produce a draft email seamlessly.

Yet, it’s not solely their adeptness at discerning user intent that sets LLMs apart. They also exhibit unparalleled proficiency in generating programmatic code to interface with various software functions. Imagine directing your software with, “Generate a pie chart that visualizes this year’s sales data by region,” and witnessing the software promptly fulfilling your command.

A Revolution in Software Design and User Experience

The advanced language understanding offered by LLMs unburdens developers from the painstaking task of constructing every possible dialog or function an assistant might perform. Rather, developers can harness the generative capabilities of LLMs and integrate them with their application’s API. This integration facilitates a myriad of user options without the necessity of explicitly designing them.

The outcome of this is far-reaching, extending beyond the immediate relief for developers. It sets the stage for a massive transformation in the software industry and the broader job market, affecting how developers are trained and what skills are prioritized. Furthermore, it alters our everyday interaction with technology, making it more intuitive and efficient.

Components of a Modern Virtual AI Assistant áka AI Copilot

By now you should have some idea of what modern virtual AI assistants are. Next, let’s look at the technical components that need to come together.

The illustration below displays the main components of an LLM-powered virtual AI assistant:

A – Conversational UI for providing the user with a chat experience
B – LLMs such as GPT-3.5 or GPT-4
C – Knowledge store for grounding your bot in enterprise data and dynamically providing few-shot examples.
D – Conversation logic for intent recognition and tracking conversations.
E – Application API as an interface to trigger and perform application functionality.
F – Cache for maintaining an instant mapping between often encountered user intents and structured LLM responses.

Let’s look at these components in more detail.

A) Conversational Application Frontend

Incorporating virtual AI assistants into a software application or digital service often involves the use of a conversational user interface, typically embodied in a chat window that showcases previous interactions. The seamless integration of this interface as an intrinsic part of the application is vital.

A lot of applications employ a standard chatbot methodology, where the virtual AI assistant provides feedback to users in natural language or other forms of content within the chat window. Yet, a more dynamic and efficacious approach is to merge natural language feedback with alterations in the traditional user interface (UI). This dual approach not only enhances user engagement but also improves the overall user experience.

Microsoft’s M365 Copilot is a prime example of this approach. Instead of simply feeding responses back to the user in the chat window, the virtual assistant also manipulates elements in the traditional UI based on user input. It may highlight options, auto-fill data, or direct the user’s attention to certain parts of the screen. This combination of dynamic UI manipulation and natural language processing creates a more interactive and intuitive user experience, guiding the user toward their goal in a more efficient and engaging way.

M365 Copilot chat window in M365 Office

When designing the UI for a virtual AI assistant, there are several key considerations. Firstly, the interface should be intuitive, ensuring users can easily navigate and understand how to interact with the AI. Secondly, the AI should provide feedback in a timely manner, so the user isn’t left waiting for a response. Thirdly, the system should be designed to handle errors gracefully, providing helpful error messages and suggestions when things don’t go as planned. Finally, the AI should keep the human in the loop and assist him in using AI in a safe way.

Also: Building “Chat with your Data” Apps using Embeddings, ChatGPT, and Cosmos DB for Mongo DB vCore

B) Large Language Model

At the interface between users and assistant sits the large language mode. It translates users’ requests and questions into code, actions, and responses that are shown to the user. Here, we are talking about foundational models like GPT-3.5-Turbo or GPT-4. In addition, if you are working with extensive content, you may use an embedding LLM that converts text or images into mathematical vectors as part of your knowledge store. An example, of such an embedding model, is ada-text-embeddings-002.

It’s important to understand that the user is not directly interacting with the LLM. Instead, you may want to put some control logic between the user and the LLM that steers the conversation. This logic can enrich prompts with additional data from the knowledge store or an online search API such as Google or Bing. This process of injecting data into a prompt depending on the user input is known as Retrieval Augmented Generation.

Typical tasks performed by the LLM:

Generating natural language responses based on the user’s query and the retrieved data from the knowledge store.
Recognizing and classifying user intent.
Generating code snippets (or API requests) that can be executed by the application or the user to achieve a desired outcome in your application.
Converting content into embeddings to retrieve relevant information from a vector-based knowledge store.
Generating summaries, paraphrases, translations, or explanations of the retrieved data or the generated responses.
Generating suggestions, recommendations, or feedback for the user to improve their experience or achieve their goals.

C) Knowledge Store

Let’s dive into the “Knowledge Store” and why it’s vital. You might think feeding a huge prompt explaining app logic to your LLM, like ChatGPT, would work, but that’s not the case. As of June 2023, LLMs have context limits. For instance, GPT-3 can handle up to 4k tokens, roughly three pages of text. This limitation isn’t just for input, but output too. Hence, cramming everything into one prompt isn’t efficient or quick.

Instead, pair your LLM with a knowledge store, like a vector database (more on this in our article on Vector Databases). Essentially, this is your system’s information storage, which efficiently retrieves data. Whichever storage you use, a search algorithm is crucial to fetch items based on user input. For vector databases, the typical way of doing this is by using similarity search.

Token Limitations

Curious about GPT models’ token limits? Here’s a quick breakdown:

GPT-3.5-Turbo Model (4,000 tokens): About 7-8 DIN A4 pages
GPT-4 Standard Model (8,000 tokens): Around 14-16 DIN A4 pages
GPT-3.5-Turbo-16K Model (16,000 tokens): Approximately 28-32 DIN A4 pages
GPT-4-32K Model (32,000 tokens): Estimated at 56-64 DIN A4 pages

D) Conversation Control Logic

Finally, the conversation needs a conductor to ensure it stays in harmony and doesn’t veer off the rails. This is the role of the conversation logic. An integral part of your app’s core software, the conversation logic bridges all the elements to deliver a seamless user experience. It includes several subcomponents. Meta prompts, for instance, help guide the conversation in the desired direction and provide some boundaries to the activities of the assistant. For example, the meta prompt may include a list of basic categories for intents that help the LLM with understanding what the user wants to do.

Another subcomponent is the connection to the knowledge store that allows the assistant to draw from a vast array of data to augment prompts handed over to the large language model. Moreover, the logic incorporates checks on the assistant’s activities and its generated content. These checks act like safety nets, mitigating risks and preventing unwanted outcomes. It’s akin to a quality control mechanism, keeping the assistant’s output in check and safeguarding against responses that might derail the user’s experience or even break the application.

E) Application API

Users expect their commands to initiate actions within your application. To fulfill these expectations, the application needs an API that can interact with various app functions. Consider the API as the nerve center of your app, facilitating access to its features and user journey. This API enables the AI assistant to guide users to specific pages, fill in forms, execute tasks, display information, and more. Tools like Microsoft Office even have their own language for this, while Python code, SQL statements, or generic REST requests usually suffice for most applications.

Applications based on a microservice architecture have an edge in this regard, as APIs are inherent to their design. If your application misses some APIs, remember, there’s no rush to provide access to all functions from the outset. You can start by supporting basic functionalities via chat and gradually expand over time. This allows you to learn from user interactions, continuously refine your offering, and ensure your AI assistant remains a useful and efficient tool for your users.

So, now that we’ve laid down the foundation, let’s buckle up and take a journey through the workflow of a modern virtual assistant. Trust me, it’s a fascinating trip ahead!

F) Cache

Implementing a cache into your virtual AI assistant can significantly boost performance and decrease response times. Particularly useful for frequently recurring user intents, caching stores the outcomes of these intents for quicker access in future instances. However, a well-designed cache shouldn’t directly store specific inputs as there is too much variety in the human language. Instead, caching could be woven into the application’s logic in the mid-layers of your OpenAI prompt flow.

This strategy ensures frequently repeated intents are handled more swiftly, enhancing user experience. It’s critical to remember that cache integration is application-specific, and thoughtful design is vital to avoid unintentionally inducing inefficiencies.

While a well-implemented cache can speed up responses, it also introduces additional complexity. Effective cache management is crucial for avoiding resource drains, requiring strategies for data storage duration, updates, and purging.

The exact impact and efficiency of this caching strategy will depend on your application specifics, including the distribution and repetition of user intents. In the upcoming articles, we’ll explore this topic further, discussing efficient cache integration in AI assistant systems.

An example of a caching technology would be Redis.

Considerations on the Architecture of Virtual AI Assistants

Designing an virtual AI assistant is an intricate process that blends cutting-edge technology with a keen understanding of user behavior. It’s about creating an efficient tool that not only simplifies tasks and optimizes workflows but also respects and preserves user autonomy. This section of our article will delve into the key considerations that guide the architecture of a virtual AI assistant. We’ll discuss the importance of user control, the strategic selection and use of GPT models, the benefits of starting simple, and the potential expansion as you gain confidence in your system’s stability and efficiency. As we journey through these considerations, remember the ultimate goal: creating a virtual AI assistant that augments user capabilities, enhances user experience, and breathes new life into software applications.

Keep the User in Control

At the heart of any virtual AI assistant should be the principle of user control. While automation can optimize tasks and streamline workflows, it is crucial to remember that your assistant is there to assist, not usurp. Balancing AI automation with user control is essential to crafting a successful user experience.

Take, for instance, the scenario of a user wanting to open a support ticket within your application. In this situation, your assistant could guide the user to the correct page, auto-fill known details like the user’s name and contact information, and even suggest possible problem categories based on the user’s descriptions. By doing so, the virtual AI assistant has significantly simplified the process for the user, making it quicker and less burdensome.

However, the user retains control throughout the process, making the final decisions. They can edit the pre-filled details, choose the problem category, and write the issue description in their own words. They’re in command, and the virtual AI assistant is there to assist, helping to avoid errors, speed up the process, and generally make the experience smoother and more efficient.

This balance between user control and AI assistance is not only about maintaining a sense of user agency; it is also about trust. Users need to trust that the AI is there to help them, not to take control away from them. If the AI seems too controlling or makes decisions that the user disagrees with, this can erode trust and hinder user acceptance.

Mix and Match Models

Another crucial consideration is the use of different GPT models. Each model comes with its own set of strengths, weaknesses, response times, costs, and token limits. It’s not just about capabilities. Sometimes, it’s unnecessary to deploy a complex GPT-4 model for simpler tasks in your workflow. Alternatives like ADA or GPT 3.5 Turbo might be more suitable and cost-effective for functions like intent recognition.

Reserve the heavy-duty models for tasks requiring an extended token limit or dealing with complex operations. One such task is the final-augmented prompt that creates the API call. If you’re working with a vector database, you’ll also need an embedding model. Be mindful that these models come with different vector sizes, and once you start building your database with a specific size, it can be challenging to switch without migrating your entire vector content.

Think Big but Start Simple

It’s always a good idea to start simple – maybe with a few intents to kick things off. As you gain experience and confidence in building virtual assistant apps, you can gradually integrate additional intents and API calls. And don’t forget to keep your users involved! Consider incorporating a feedback mechanism, allowing users to report any issues and suggest improvements. This will enable you to fine-tune your prompts and database content effectively.

As your application becomes more comprehensive, you might want to explore model fine-tuning for specific tasks. However, this step should be considered only when your virtual AI assistant functionality has achieved a certain level of stability. Fine-tuning a model can be quite costly, especially if you decide to change the intent categories after training.

Digital LLM-based Assistants – A Major Business Opportunity

From a business standpoint, upgrading software products and services with LLM-powered virtual AI assistants presents a significant opportunity to differentiate in the market and even innovate their business model. Many organizations are already contemplating the inclusion of virtual assistants as part of subscription packages or premium offerings. As the market evolves, software lacking a natural language interface may be perceived as outdated and struggle to compete.

AI-powered virtual assistants are likely to inspire a whole new generation of software applications and enable a new wave of digital innovations. By enhancing convenience and efficiency in user inputs, virtual assistants unlock untapped potential and boost productivity. Moreover, they empower users to fully leverage the diverse range of features offered by software applications, which often remain underutilized.

I strongly believe that LLM-driven virtual AI assistants are the next milestone in software design and will revolutionize software applications across industries. And remember, this is just the first generation of virtual assistants. The future possibilities are virtually endless and we can’t wait to see what’s next! Indeed, the emergence of natural language interfaces is expected to trigger a ripple effect of subsequent innovations, for example, in areas such as standardization, workflow automation, and user experience design.

Summary

In this article, we delved into the fascinating world of virtual AI assistants, powered by LLMs. We started by exploring how the advanced language understanding of LLMs is revolutionizing software design, easing the workload of developers, and reshaping user experiences with technology.

Next, we provided an overview of the key architectural components of a modern virtual AI assistant: the Conversational Application Frontend, Large Language Model, Knowledge Store, and Conversation Control Logic. We also introduced the concept of an Application API and the novel idea of a Cache for storing and quickly retrieving common user intents. Each component was discussed in the context of their roles and how they work together to create a seamless, interactive, and efficient user experience.

We then discussed architecture considerations, emphasizing the necessity of maintaining user control while leveraging the power of AI automation. We talked about the judicious use of different GPT models based on task requirements, the advantages of starting with simple implementations and progressively scaling up, and the benefits of user feedback in continuously refining the system.

This journey of ‘AI in Software Applications’, from concept to reality, isn’t just about innovation. It’s about unlocking ‘Innovative Business Models with AI’ and boosting user engagement and productivity. As we continue to ride the wave of ‘Natural Language Processing for Software Automation’, the opportunities for harnessing the power of virtual AI assistants are endless. Stay tuned as we explore the workflows further in the next article.

In this article, we have gone through the components of an LLM-powered virtual assistant aka “AI copilot”. In the next article, we will dive deeper into the processing logic and follow a prompt into the engine of an intelligent assistant.

Sources and Further Reading

The post Building a Virtual AI Assistant (aka Copilot) for Your Software Application: Harnessing the Power of LLMs like ChatGPT appeared first on relataly.com.

Building “Chat with your Data” Apps using Embeddings, ChatGPT, and Cosmos DB for Mongo DB vCore

Florian Follonier — Sat, 27 May 2023 13:25:08 +0000

Artificial Intelligence (AI), in particular, the advent of OpenAI’s ChatGPT, has revolutionized how we interact with technology. Chatbots powered by this advanced language model can engage users in intricate, natural language conversations, marking a significant shift in AI capabilities. However, one thing that ChatGPT isn’t designed for is integrating personalized or proprietary knowledge – it’s built to draw upon general knowledge, not specifics about you or your organization. That’s where the concept of Retrieval Augmented Generation (RAG) comes into play. This article explores the exciting prospect of building your own ChatGPT that lets users ask questions on a custom knowledge base.

In this tutorial, we’ll unveil the mystery behind enterprise ChatGPT, guiding you through the process of creating your very own custom ChatGPT – an AI-powered chatbot based on OpenAI’s powerful Generative Pretrained Transformers (GPT) technology. We’ll use Python and delve into the world of vector databases, specifically, Mongo API for Azure Cosmos DB, to show you how you can make a large knowledgebase available to ChatGPT that can go way beyond the typical token limitation of GPT models.

For experts, AI fans, or tech newbies, this guide simplifies building your ChatGPT. With clear instructions, useful examples, and tips, we aim to make it informative and empowering.

We’ll explore AI, showing you how to customize your chatbot. We’ll simplify complex concepts and show you how to start your AI adventure from home or office. Ready to start this exciting journey? Keep reading!

Also:

Note on the use of Vector DBs and Costs.

Please note that this tutorial describes a business use case that utilizes a Cosmos DB for Mongo DB vCore hosted on the Azure cloud.

Alternatively, you can set up an open-source vector database on your local machine, such as Milvus. Be aware that certain code adjustments will be necessary to proceed with the open-source alternative.

Why Custom ChatGPT is so Powerful and Versatile

I believe we have all tested ChatGPT, and probably like me, you have been impressed by its remarkable capabilities. However, ChatGPT has a significant limitation: it can only answer questions and perform tasks based on the public knowledge base it was trained on.

Imagine having a chatbot based on ChatGPT that communicates effectively and truly understands the nuances of your business, sector, or even a particular topic of interest. That’s the power of a custom ChatGPT. A tailor-made chatbot allows for specialized conversations, providing the needed information and drawing from a unique database you’ve developed.

This becomes particularly beneficial in industries with specific terminologies or when you have a large database of knowledge that you want to make easily accessible and interactive. A custom ChatGPT, with its personalized and relevant responses, ensures a better user experience, effectively saving time and increasing productivity.

Let’s delve into how to build such a solution. Spoiler it does not work by putting all the content into the prompt. But there is a great alternative.

Understanding the Building Blocks of Custom ChatGPT with Retrieval Augmented Generation

The foundational technology behind ChatGPT is OpenAI’s Generative Pre-trained Transformer models (GPT). These models understand language by predicting the next word in a sentence and are trained on a diverse range of internet text. However, the GPT models, such as the GPT-3.5, have a limitation of processing 4096 tokens at a time. A token in this context is a chunk of text which can be as small as one character or as long as one word. For example, the phrase “ChatGPT is great” is four tokens long.

Another challenge with Foundation Models such as ChatGPT is that they are trained on large-scale datasets that were available at the time of their training. This means they are not aware of any data created after their training period. Also, because they’re trained on broad, general-domain datasets, they may be less effective for tasks requiring domain-specific knowledge.

How Retrieval Augmented Generation (RAG) Helps

Retrieval-Augmented Generation (RAG) is a method that combines the strength of transformer models with external knowledge to augment their understanding and applicability. Here’s a brief explanation:

To address this, RAG retrieves relevant information from an external data source and uses this information to augment the input to the foundation model. This can make the model’s responses more informed and relevant.

Data Sources

The external data can come from various sources like databases, document repositories, or APIs. To make this data compatible with the RAG approach, both the data and user queries are converted into numerical representations (embeddings) using language models.

Data Preparation as Embeddings

The embeddings, which are essentially vectors, need to be stored in a database that’s efficient at storing and searching through these high-dimensional data. This is where Azure’s Cosmos Mongo DB comes into play. It’s a vector search database specifically designed for this task.

To circumvent the token limitation and make your extensive data available to ChatGPT, we turn the data into embeddings. These are mathematical representations of your data, converting words, sentences, or documents into vectors. The advantage of using embeddings is that they capture the semantic meaning of the text, going beyond keywords to understand the context. In essence, similar information will have similar vectors, allowing us to cluster related information together and separate them from a semantically different text.

Storing the Data in Vector Databases

Matching Queries to Knowledge

The RAG model compares the embeddings of user queries with those in the knowledge base to identify relevant information. The user’s original query is then augmented with context from similar documents in the knowledge base.

Input to the Foundation Model

This augmented input is sent to the foundation model, enhancing its understanding and response quality.

Updates

Importantly, the knowledge base and associated embeddings can be updated asynchronously, ensuring that the model remains up-to-date even as new information is added to the data sources.

In sum, RAG extends the utility of foundation models by incorporating external, up-to-date, domain-specific knowledge into their understanding and output.

By incorporating these components, you’ll be creating a robust custom ChatGPT that not only understands the user’s queries but also has access to your own information, giving it the ability to respond with precision and relevance.

Ready to dive into the technicalities? Stay tuned!

A tailor-made chatbot allows for specialized conversations, providing the exact information needed, drawing from a unique database that you’ve developed.

Building the Custom “Chat with Your Data” App in Python

Now that we’ve discussed the theory behind building a custom ChatGPT and seen some exciting real-world applications, it’s time to put our knowledge into action! In this practical segment of our guide, we’re going to demonstrate how you can build a custom ChatGPT solution using Python.

Our project will involve storing a sample PDF document in Cosmos Mongo DB and developing a chatbot capable of answering questions based on the content of this document. This practical exercise will guide you through the entire process, including turning your PDF content into embeddings, storing these embeddings in the Cosmos Mongo DB, and finally integrating it all with ChatGPT to build an interactive chatbot.

If you’re new to Python, don’t worry, we’ll be breaking down the code and explaining each step in a straightforward manner. Let’s roll up our sleeves, fire up our Python environments, and get coding! Stay tuned as we embark on this exciting hands-on journey into the world of custom chatbots.

The code is available on the GitHub repository.

View on GitHub Relataly GitHub Repo

How to Set Up Vector Search in Cosmos DB

First, you must understand that you will need a database to store the embeddings. It does not necessarily have to be a vector database. Still, this type of database will make your solution more performant and robust, particularly when you want to store large amounts of data.

Azure Cosmos DB for MongoDB vCore is the first MongoDB-compatible offering to feature Vector Search. With this feature, you can store, index, and query high-dimensional vector data directly in Azure Cosmos DB for MongoDB vCore, eliminating the need for data transfer to alternative platforms for vector similarity search capabilities. Here are the steps to set it up:

Choose Your Azure Cosmos DB Architecture: Azure Cosmos DB for MongoDB provides two types of architectures, RU-based and vCore-based. Each has its strengths and is best suited for certain types of applications. Choose the one that best fits your needs. If you’re looking to lift and shift existing MongoDB apps and run them as-is on a fully supported managed service, the vCore-based option could be the perfect fit.
Configure Your Vector Search: Once your database architecture is set up, you can integrate your AI-based applications, including those using OpenAI embeddings, with your data already stored in Cosmos DB.
Build and Deploy Your AI Application: With the Vector Search set up, you can now build your AI application that takes advantage of this feature. You can create a Go app using Azure Cosmos DB for MongoDB or deploy Azure Cosmos DB for MongoDB vCore using a Bicep template as suggested next steps.

Azure Cosmos DB for MongoDB vCore’s Vector Search feature is a game-changer for AI application development. It enables you to unlock new insights from your data, leading to more accurate and powerful applications.

Cosmos DB for Mongo DB Usage Models

Regarding Cosmos DB for Mongo DB, there are two options to choose from: Request Unit (RU) Database Account and vCore Cluster. Each option follows a different pricing model to suit diverse needs.

The Request Unit (RU) Database Account operates on a pay-per-use basis. With this model, you are billed based on the number of requests and the level of provisioned throughput consumed by your workload.

As of 27th Mai 2023, the brand new vector search function is only available for the vCore Cluster option, which is why we will use this setup for this tutorial. The vCore Cluster offers a reserved managed instance. Under this option, you are charged a fixed amount on a monthly basis, providing more predictable costs for your usage.

Once you have created your vCore instance, you must collect your connection string and make it available to your Python script. You can do this either by storing it in Azure Key Vault (which I would recommend) or by storing it locally on your computer or in the code (which I would not recommend for obvious security reasons).

When it comes to Cosmos DB for Mongo DB, there are two options to choose from: Request Unit (RU) Database Account and vCore Cluster.

Azure Cosmos DB for Mongo DB is a new offering that is designed explicitly for vector use cases (incl. embeddings)

Using other Vector Databases

While Cosmos DB is a popular choice for vector databases, I would like to note that other options are available in the market. You can still benefit from this tutorial if you decide to utilize a different vector database, such as Pinncecone or Chroma. However, it is necessary to make code adjustments tailored to the APIs and functionalities of the specific vector database you choose.

Specifically, you will need to modify the “insert embedding functions” and “similarity search functions” to align with the requirements and capabilities of your chosen vector database. These functions typically have variations that are specific to each vector database.

By customizing the code according to your selected vector database’s API, you can successfully adapt the tutorial to suit your specific database choice. This allows you to leverage the principles and concepts this tutorial covers, regardless of the vector database you opt for.

Also: Vector Databases: The Rising Star in Generative AI Infrastructure

Prerequisites

Before diving into the code, it’s essential to ensure that you have the proper setup for your Python 3 environment and have installed all the necessary packages. If you do not have a Python environment, follow the instructions in this tutorial to set up the Anaconda Python environment. This will provide you with a robust and versatile environment well-suited for machine learning and data science tasks.

In this tutorial, we will be working with several libraries:

openai
pymongo
PyPDF2
dotenv

Should you decide to use Azure Key Vault, then you also need the following Python libraries:

azure-identity
azure-key-vault

You can install the OpenAI Python library using console commands:

pip install openai
conda install openai (if you are using the Anaconda packet manager)

Step #1 Authentification and DB Setup

Let’s start with the authentification and setup of the API keys. After making necessary imports, the code gets things read to connect to essential services – OpenAI and Cosmos DB – and makes sure it can access these services properly.

Fetching Credentials: The script starts by setting up a connection to a service called Azure Key Vault to retrieve some crucial credentials securely. These are like “passwords” that the script needs to access various resources.
Setting Up AI Services: Then, it prepares to connect to two different AI services. One is a version that’s hosted by Azure, and the other is the standard, public version.
Establishing Database Connection: Lastly, the script sets up a connection to a database service, specifically to a certain collection within the Cosmos DB database. The script also checks if the connection to the database was successful by sending a “ping” – if it receives a response, it knows the connection is good.

from azure.identity import AzureCliCredential
from azure.keyvault.secrets import SecretClient
import openai
import logging
import tiktoken
import pandas as pd
import pymongo
from dotenv import load_dotenv
load_dotenv()
# Set up the Azure Key Vault client and retrieve the Blob Storage account credentials
keyvault_name = ''
openaiservicename = ''
client = SecretClient(f"https://{keyvault_name}.vault.azure.net/", AzureCliCredential())
print('keyvault service ready')
# AzureOpenAI Service
def setup_azureopenai():
    openai.api_key = client.get_secret('openai-api-key').value
    openai.api_type = "azure"
    openai.api_base = f'https://{openaiservicename}.openai.azure.com'
    openai.api_version = '2023-05-15'
    print('azure openai service ready')
# public openai service
def setup_public_openai():
    openai.api_key = client.get_secret('openai-api-key-public').value
    print('public openai service ready')
DB_NAME = "hephaestus"
COLLECTION_NAME = 'isocodes'
def setup_cosmos_connection():
    COSMOS_CLUSTER_CONNECTION_STRING = client.get_secret('cosmos-cluster-string').value
    cosmosclient = pymongo.MongoClient(COSMOS_CLUSTER_CONNECTION_STRING)
    db = cosmosclient[DB_NAME]
    collection = cosmosclient[DB_NAME][COLLECTION_NAME]
    # Send a ping to confirm a successful connection
    try:
        cosmosclient.admin.command('ping')
        print("Pinged your deployment. You successfully connected to MongoDB!")
    except Exception as e:
        print(e)
    return collection, db
setup_public_openai()
collection, db = setup_cosmos_connection()

Now we have set things up to interact with our Cosmos DB Mong DB vCore instance.

Step #2 Functions for Populating the Vector DB

Next, we prepare and insert data into the database as embeddings. First, we prepare the content. The preparation process involves turning the text content into embeddings. Each embedding is a list of flats representing the meaning of a specific part of the text in a way the AI system can understand.

We create the embeddings by sending text (for example, a paragraph of a document) to an OpenAI embedding model that returns the embedding. There are two options for using OpenAI: You can use the Azure OpenAI engine and deploy your own Ada embedding model. Alternatively, you can use the public OpenAI Ada embedding model.

We’ll use the public OpenAI’s text-embedding-ada-002. Remember that the model is designed to return embeddings, not text. Model inference may incur costs based on the data processed. Refer to OpenAI or Azure OpenAI service for pricing details.

Finally, the code inserts the prepared requests (which now include both the original text and the corresponding embeddings) into the database. The function returns the unique IDs assigned to these newly inserted items in the database. In this way, the code processes and stores the necessary information in the database for later use.

# prepare content for insertion into cosmos db
def prepare_content(text_content):
  embeddings = create_embeddings_with_openai(text_content)
  request = [
    {
    "textContent": text_content, 
    "vectorContent": embeddings}
  ]
  return request
# create embeddings
def create_embeddings_with_openai(input):
    #print('Generating response from OpenAI...')
    ###### uncomment for AzureOpenAI model usage and comment code below
    # embeddings = openai.Embedding.create( 
    #     engine='', 
    #     input=input)["data"][0]["embedding"]
    ###### public openai model usage and comment code above
    embeddings = openai.Embedding.create(
        model='text-embedding-ada-002', 
        input=input)["data"][0]["embedding"]
    
    # Number of embeddings    
    # print(len(embeddings))
    return embeddings
# insert the requests
def insert_requests(text_input):
    request = prepare_content(text_input)
    return collection.insert_many(request).inserted_ids
# Creates a searchable index for the vector content
def create_index():
  
  # delete and recreate the index. This might only be necessary once.
  collection.drop_indexes()
  embedding_len = 1536
  print(f'creating index with embedding length: {embedding_len}')
  db.command({
    'createIndexes': COLLECTION_NAME,
    'indexes': [
      {
        'name': 'vectorSearchIndex',
        'key': {
          "vectorContent": "cosmosSearch"
        },
        'cosmosSearchOptions': {
          'kind': 'vector-ivf',
          'numLists': 100,
          'similarity': 'COS',
          'dimensions': embedding_len
        }
      }
    ]
  })
# Resets the DB and deletes all values from the collection to avoid dublicates
#collection.delete_many({})

Step #3 Document Cracing and Populating the DB

The next step is to break down the PDF document into smaller chunks of text (in this case, ‘records’) and then process these records for future use. You can repeat this process for any document that you want to make available to OpenAI.

You can use any PDF that you like as long as you it contains readable text (use OCR). For demo purposes, I will use a tax document from Zurich. Put the document in the folder data/vector_db_data/ in your root folder and provide the name to the Python script.

Want to read in many documents at once? If you want to insert many documents, read the pdf documents from the folder and use the names to populate a list. You can then surround the insert function with a for loop that iterates through the list of document names

#3.1 Document Slicing Considerations

To convert a PDF into embeddings, the first step is to divide it into smaller content slices. The slicing process plays a crucial role as it affects the information provided to the OpenAI GPT model when answering user questions. If the slices are too large, the model may encounter token limitations. Conversely, if they are too small, the model may not receive sufficient content to answer the question effectively. It is important to strike a balance between the number of slices and their length to optimize the results, considering that the search process may yield multiple outcomes.

There are several approaches to handle the slicing process. One option is to define the slices based on a specific number of sentences or paragraphs. Alternatively, you can iteratively slice the document, allowing for some overlap between the data in the vector database. This approach has the advantage of providing more precise information to answer questions, but it also increases the data volume in the vector database, which can impact speed and cost considerations.

#3.2 Running the code below to crack a document and insert embeddings into the vector DB

Running the code below will first define a function that breaks text into separate paragraphs based on line breaks. Another function slices the PDF into records. Each record contains a certain number of sentences (the maximum is defined by the ‘max_sentences’ value). We use a Python library called PyPDF2 to extract text from each page of the PDF and Python’s built-in regular expressions to split the text into sentences and paragraphs. Note that if you want to achieve better results, you could also use a professional document content extraction tool such as Azure form recognizer.

The code then opens a specific PDF file (‘zurich_tax_info_2023.pdf’) and slices it into records, each containing no more than a certain number of sentences (as defined by’max_sentences’). After that, the function inserts these records into the vector database. Finally, we print the count of documents in the database collection. This shows how many pieces of data are already stored in this specific part of the database.

# document cracking function to insert data from the excel sheet
def split_text_into_paragraphs(text):
    paragraphs = re.split(r'\n{2,}', text)
    return paragraphs
def slice_pdf_into_records(pdf_path, max_sentences):
    records = []
    
    with open(pdf_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        
        for page in reader.pages:
            text = page.extract_text()
            paragraphs = split_text_into_paragraphs(text)
            
            current_record = ''
            sentence_count = 0
            
            for paragraph in paragraphs:
                sentences = re.split(r'(?<=[.!?])\s+', paragraph)
                
                for sentence in sentences:
                    current_record += sentence
                    
                    sentence_count += 1
                    
                    if sentence_count >= max_sentences:
                        records.append(current_record)
                        current_record = ''
                        sentence_count = 0
                
                if sentence_count < max_sentences:
                    current_record += ' '  # Add space between paragraphs
            
            # If there is remaining text after the loop, add it as a record
            if current_record:
                records.append(current_record)
    
    return records
# get file from root/data folder
pdf_path = '../data/vector_db_data/zurich_tax_info_2023.pdf'
max_sentences = 20  # Adjust the slice size as per your requirement
result = slice_pdf_into_records(pdf_path, max_sentences)
# print the length of result
print(f'{len(result)} vectors created with maximum {max_sentences} sentences each.')
# Print the sliced records
for i, record in enumerate(result):
    insert_requests(record)
    if i < 5:
        print(record[0:100])
        print('-------------------')
create_index()
print(f'number of records in the vector DB: {collection.count_documents({})}')

After slicing the document and inserting the embeddings into the vector database, we can proceed with functions for similarity search and prompting.

Step #4 Functions for Similarity Search and Prompts to ChatGPT

This section of code provides a set of functions to perform a vector search in the Cosmos DB, make a request to the ChatGPT 3.5 Turbo model for generating responses, and create prompts for the OpenAI model to use in generating those responses.

#4.1 How the Search Part Works

Allow me to provide a concise explanation of how the search process operates. We have now reached the stage where a user poses a question, and we utilize the OpenAI model to supply an answer, drawing from our vector database. Here, it’s vital to understand that the model transforms the question into embeddings and subsequently scours the knowledge base for similar embeddings that align with the information requested in the user’s prompt.

The vector database yields the most suitable results and inserts them into another prompt tailored for ChatGPT. This model, distinct from the embedding model, generates text. Thus, the final interaction with the ChatGPT model incorporates both the user’s question and the results from the vector database, which are the most fitting responses to the question. This combination should ideally aid the model in providing the appropriate answer. Now, let’s turn our attention to the corresponding code.

#4.2 Setting up the Functions for Vector Search

The vector_search function takes as input a query vector (representing a user’s question in vector form) and an optional parameter to limit the number of results. It then conducts a search in the Cosmos DB, looking for entries whose vector content is most similar to the query vector.

Next, the openai_request function makes a request to OpenAI’s ChatGPT 3.5 Turbo model to generate a response. This function takes a formatted conversation history (or ‘prompt’) and sends it to the model, which then generates a response. The content of the generated response is then returned.

The create_tweet_prompt function constructs the conversation history for the OpenAI model. This function takes the user’s question and a JSON object containing results from a database search and constructs a list of system and user messages. This list will then serve as the prompt for the ChatGPT model, instructing it to generate a response that answers the user’s question about tax, with the added guideline that the response should be in the same language as the question. The constructed prompt is then returned by the function.

# Cosmos DB Vector Search API Command
def vector_search(vector_query, max_number_of_results=2):
  results = collection.aggregate([
    {
      '$search': {
        "cosmosSearch": {
          "vector": vector_query,
          "path": "vectorContent",
          "k": max_number_of_results
        },
      "returnStoredSource": True
      }
    }
  ])
  return results
# openAI request - ChatGPT 3.5 Turbo Model
def openai_request(prompt, model_engine='gpt-3.5-turbo'):
    completion = openai.ChatCompletion.create(model=model_engine, messages=prompt, temperature=0.2, max_tokens=500)
    return completion.choices[0].message.content
# define OpenAI Prompt for News Tweet
def create_prompt(user_question, result_json):
    instructions = f'You are an assistant that answers questions based on sources provided. \
    If the information is not in the provided source, you answer with "I don\'t know". '
    task = f"{user_question} Translate the response to english /n \
    source: {result_json}"
    
    prompt = [{"role": "system", "content": instructions }, 
              {"role": "user", "content": task }]
    return prompt

You can easily change the voice and tone in which the ChatGPT answers questions by including the respective instructions in the create_prompt function.

Also: ChatGPT Style Guide: Understanding Voice and Tone Prompt Options for Engaging Conversations

Step #5 Testing the Custom ChatGPT Solution

This part of the code works with the previous functions to facilitate a complete question-answering cycle with Cosmos DB and OpenAI’s ChatGPT 3.5 Turbo model.

Now comes the most exciting part. Testing the solution, you can define a question and then execute the code below to run the search process.

# define OpenAI Prompt 
users_question = "When do I have to submit my tax return?"
# generate embeddings for the question
user_question_embeddings = create_embeddings_with_openai(user_question)
# search for the question in the cosmos db
search_results = vector_search(user_question_embeddings, 1)
print(search_results)
# prepare the results for the openai prompt
result_json = []
# print each document in the result
# remove all empty values from the results json
search_results = [x for x in search_results if x]
for doc in search_results:
    display(doc.get('_id'), doc.get('textContent'), doc.get('vectorContent')[0:5])
    result_json.append(doc.get('textContent'))
# create the prompt
prompt = create_prompt(user_question, result_json)
display(prompt)
# generate the response
response = openai_request(prompt)
display(f'User question: {users_question}')
display(f'OpenAI response: {response}')

‘User question: When do I have to submit my tax return?’

'OpenAI response: When do I have to submit my tax return? \n\nAll natural persons who had their residence in the canton of Zurich on December 31, 2022, or who owned properties or business premises (or business operations) in the canton of Zurich, must submit a tax return for 2022 in the calendar year 2023. Taxpayers with a residence in another canton also have to submit a tax return for 2022 in the calendar year 2023 if they ended their tax liability in the canton of Zurich by giving up a property or business premises during the calendar year 2022. If you turned 18 in the tax period 2022 (persons born in 2004), you must submit your own tax return (for the tax period 2022) for the first time in the calendar year 2023.'

As of Mai 2023, the knowledge base of ChatGPT 3.5 is limited to the timeframe before September 2021. So it’s evident that the response of our custom ChatGPT solution is based on the individual information provided in the vector database. Remember that we did not fine-tune the GPT model, so the model itself does not inherently know anything about your private data and instead uses the data that was dynamically provided to it as part of the prompt.

Real-world Applications of Chat with your data

Custom ChatGPT boosts efficiency, personalizes services, and improves experiences across industries. Here are some examples:

Customer Support: Companies can use ChatGPT for 24/7 customer service. With data from manuals, FAQs, and support docs, it delivers fast, accurate answers, enhancing customer satisfaction and lessening staff workload.
Healthcare: ChatGPT can respond to patient questions using medical texts and care guidelines. It offers data on symptoms, treatments, side effects, and preventive care, helping both healthcare providers and patients.
Legal Sector: Law firms can use ChatGPT with legal texts, court decisions, and case studies for answering legal questions, offering case references, or explaining legal terms.
Financial Services: Banks can use ChatGPT to extend their customer service and give customers advice based on their individual financial situation.
E-Learning: Schools and e-learning platforms can use ChatGPT to tutor students. Using textbooks, notes, and research papers, it helps students understand complex topics, solve problems, or guide them through a course.

In short, any sector needing a large information database for queries or services can use custom ChatGPT. It enhances engagement and efficiency by offering personalized experiences.

Summary

In this comprehensive guide, we’ve journeyed through the fascinating process of creating a customized ChatGPT that lets users chat with your business data. We started with understanding the immense value a tailored ChatGPT brings to the table and dove into its ability to produce specialized responses sourced from a custom knowledge base. This tailored approach enhances user experiences, saves time, and bolsters productivity.

We went behind the scenes to reveal the vital elements of crafting a custom ChatGPT: OpenAI’s GPT models, data embeddings, and vector databases like Cosmos DB for Mongo DB vCore. We clarified how these components synergize to transcend the token limitations inherent to GPT models. By integrating the components in Python, we broadened ChatGPT’s ability to answer queries based on your private knowledgebase, thereby offering contextually appropriate responses.

I hope this tutorial was able to illustrate the business value of ChatGPT and its versatile utility across a variety of sectors, including customer service, healthcare, legal services, finance, e-learning, and CRM data analytics. Each instance emphasized the transformative potential of a personalized ChatGPT in delivering efficient, targeted solutions.

I hope you found this helpful article. If you have any questions or remarks, please drop them in the comment section.

Sources and Further Reading

Azure Cosmos DB
OpenAI pricing
Azure OpenAI
Semantic search
What are embeddings?
Using vector search on embeddings in Azure Cosmos DB for MongoDB vCore
OpenAI ChatGPT helped to revise this article
Images created with Midjourney

The post Building “Chat with your Data” Apps using Embeddings, ChatGPT, and Cosmos DB for Mongo DB vCore appeared first on relataly.com.

Vector Databases: The Rising Star in Generative AI Infrastructure

Florian Follonier — Sat, 06 May 2023 22:43:36 +0000

Artificial intelligence (AI) continues its rapid evolution, with new advancements and innovations emerging on a frequent basis. A key enabler of these advancements is the robust infrastructure needed to store, process, and analyze colossal amounts of data. One critical part of this infrastructure is the vector database, a powerful solution for managing unstructured data types including text, audio, images, and videos in numerical form.

Vector databases have gained traction in the AI sphere due to their ability to efficiently manage similarity searches across thousands of columns. They play a crucial role in powering large language models and other advanced AI applications. In this article, we will delve into the fundamentals of vector databases, their significance in AI infrastructure, and their transformative potential in managing and analyzing unstructured data.

Also:

9 Business Use Cases of OpenAI’s ChatGPT
Using LLMs (OpenAI’s ChatGPT) to Streamline Digital Experiences

Why are Vector Databases Integral to AI Infrastructure?

The rise of vector databases is closely tied to the growing importance of embeddings in advanced generative AI applications. Embeddings are high-dimensional vectors that represent unstructured data, such as text, images, and audio, in a continuous numerical space. These vectors are essential for advanced generative AI applications such as natural language processing, computer vision, and speech recognition, where they are used to represent and analyze complex data.

The Role of Embeddings in Generative AI

Embeddings play a vital role in advanced generative AI applications such as natural language processing (NLP), where they are used to represent and analyze complex data. An embedding is a high-dimensional vector that represents unstructured data, such as text, images, and audio, in a continuous numerical space. In the context of NLP, an embedding represents a semantic and syntactic meaning of words or sentences in a vector format that can be fed as input into deep learning models.

An example of an embedding for text could be representing the sentence “I love pizza” as a 300-dimensional vector, where each dimension represents a specific feature or attribute of the sentence. For instance, word count, the presence of certain keywords, or sentiment. The process of generating embeddings for natural language is typically done using pre-trained language models like OpenAI’s GPT or BERT.

The length of an embedding vector is arbitrary and can vary depending on the specific use case and the model used to generate the embeddings. The quality of the embeddings can significantly affect the performance of NLP tasks such as language modeling, sentiment analysis, machine translation, and question-answering systems.

Large language models (LLMs) are one of the most advanced AI applications that heavily rely on embeddings. These models have billions of parameters, and embeddings play a crucial role in training and fine-tuning these models to perform a wide range of NLP tasks.

SQL Databases and Their Limitations in Handling High-Dimensional Embeddings

SQL databases are designed to work with structured data that has a fixed schema and is typically stored in tables with rows and columns. In contrast, embeddings are high-dimensional vectors that represent unstructured data such as text, images, and audio in a continuous numerical space. Embeddings can have hundreds or even thousands of dimensions, making them unsuitable for storage in traditional SQL databases, which are optimized for working with smaller, fixed-dimensional datasets.

Benefits of Vector Databases

Vector databases are natively designed to handle high-dimensional vectors, such as embeddings. They can therefore provide a more scalable and efficient solution for storing, querying, and analyzing large amounts of unstructured data. With their ability to efficiently handle similarity searches across thousands of columns, vector databases have become an essential component of AI infrastructure, powering large language models and other advanced AI applications.

There are several reasons why vector databases are well-suited to handle embeddings:

Efficient storage: Vector databases are designed to store high-dimensional vectors efficiently, allowing them to handle large quantities of data while using minimal storage space. This is important for embeddings, which can have hundreds or thousands of dimensions.
High-performance similarity search: Vector databases use specialized algorithms and data structures to perform high-performance similarity searches on embeddings. This allows users to quickly find the closest embeddings to a given query, making them well-suited for tasks such as image or text similarity search.
Scalability: Vector databases are highly scalable, allowing them to easily handle large datasets. This is important for embeddings, which are often used in large language models and other AI applications that require vast amounts of data.
Flexibility: Vector databases can handle various data types, including text, images, audio, and video. This makes them well-suited for a wide range of AI applications.

Overall, the specialized design of vector databases makes them well-suited for handling high-dimensional vectors such as embeddings, making them a crucial component of modern AI infrastructure.

An embedding is a high-dimensional vector that represents unstructured data, such as text, images, and audio, in a continuous numerical space.

Semantic Search as a Way to Create Custom ChatGPT

OpenAI’s approach to embeddings is an unsupervised learning method known as “representation learning.” The model learns to represent the data in a useful way for downstream tasks like natural language processing without being explicitly told what features to extract or how to represent the data. This approach has been highly effective in training LLMs, which can generate human-like text with remarkable accuracy.

However, one of the limitations of OpenAI models is their ability to handle only a limited amount of input data. For example, ChatGPT 3.5 has a token limit of 4096, which means that it cannot search larger databases without additional techniques. This is where embeddings come into play.

Vector databases are becoming increasingly popular for their ability to find meaning in unstructured data, which is a vital feature for advanced AI applications such as semantic search. Semantic search is similar to ChatGPT, but it operates on a custom knowledge base. The knowledge can be anything from customer relationship management (CRM) data to technical manuals and research and development (R&D) information. The data needs to be stored somewhere and support querying at low latency, and vector databases are perfectly suited for this task due to their previously mentioned advantages. The growing popularity of vector databases is, therefore also to be seen as a result of the growing interest of companies in creating custom ChatGPT applications based on their internal knowledge.

Increasing Investments in Vector Database Startups

vector database

" data-image-caption="

vector database

" data-large-file="https://www.relataly.com/wp-content/uploads/2023/05/Flo7up_vector_database_colorful_popart_b6b52134-5a54-4622-85bc-ed93d715501f-min.png" src="https://www.relataly.com/wp-content/uploads/2023/05/Flo7up_vector_database_colorful_popart_b6b52134-5a54-4622-85bc-ed93d715501f-min-512x288.png" alt="OpenAI's approach to embeddings is an unsupervised learning method known as "representation learning." The model learns to represent the data in a useful way for downstream tasks like natural language processing without being explicitly told what features to extract or how to represent the data. " class="wp-image-13601" srcset="https://www.relataly.com/wp-content/uploads/2023/05/Flo7up_vector_database_colorful_popart_b6b52134-5a54-4622-85bc-ed93d715501f-min.png 512w, https://www.relataly.com/wp-content/uploads/2023/05/Flo7up_vector_database_colorful_popart_b6b52134-5a54-4622-85bc-ed93d715501f-min.png 300w, https://www.relataly.com/wp-content/uploads/2023/05/Flo7up_vector_database_colorful_popart_b6b52134-5a54-4622-85bc-ed93d715501f-min.png 768w, https://www.relataly.com/wp-content/uploads/2023/05/Flo7up_vector_database_colorful_popart_b6b52134-5a54-4622-85bc-ed93d715501f-min.png 1452w" sizes="(max-width: 512px) 100vw, 512px" />

Given the recent hype around AI, it’s no wonder that companies are investing heavily in vector databases to improve the accuracy and efficiency of their algorithms. This trend is reflected in the recent funding rounds of vector database startups such as Pinecone, Chroma, and Weviate. However, established players in the field, such as Microsoft, also offer solutions that can be used to build AI applications on top of custom knowledge bases. For example, Azure Cognitive Search is a powerful solution that businesses can use to build and deploy AI applications that leverage the capabilities of vector databases. Matchlt is another solution for vector search developed by Google. Despite the challenges posed by new startups, established players like Microsoft remain competitive and continue to offer valuable solutions for businesses seeking to implement vector databases in their AI workflows.

A Close Look at Popular Vector Databases: Pinecone, Chroma, and Weaviate

Finally, let’s take a look at three different dedicated vector databases that are optimized for working with vectors: Pinecone, Chroma, and Weaviate.

Pinecone

Given the recent hype around AI, it’s no wonder that companies are investing heavily in vector databases to improve the accuracy and efficiency of their algorithms.

Pinecone is a cloud-native vector database designed for high-performance, low-latency, and scalable vector similarity search. It can handle both dense and sparse vectors, making it a versatile choice for a wide range of use cases. Pinecone provides an easy-to-use API that allows users to add, search, and retrieve vectors with just a few lines of code. It also offers hybrid search functionality, which enables users to mix traditional text-based search with vector search.

One of the key advantages of Pinecone is its scalability. It can handle billions of vectors and provides automatic sharding and load balancing to ensure that search requests are distributed evenly across the available resources. Pinecone also offers real-time indexing and search. This means that new vectors are available for search immediately after they are added.

Chroma

Chroma is a simple, lightweight vector search database that can be used to build an in-memory document-vector store. It is built on top of Apache Cassandra and provides an easy-to-use API. It is an excellent choice for users who want a quick and simple solution for vector similarity search. Chroma uses the Hugging Face transformers library to vectorize documents by default, but it can also be configured to use custom vectorization models.

One of the key advantages of Chroma is its simplicity. It can be set up and configured quickly and easily and doesn’t require any special hardware or software. Chroma is also highly customizable, supporting custom vectorization models, custom similarity functions, and more. If you are looking for the Chroma Python library, here it is.

Weaviate

Weaviate is a feature-rich vector database designed for complex data modeling and search use cases. It provides a GraphQL API with support for vector similarity search and a range of other advanced search and filtering features. Weaviate can store and search various data types, including structured data, unstructured data, and images.

One of the key advantages of Weaviate is its flexibility. It can be used to build highly customized search applications with complex data models and search requirements. Weaviate also provides advanced search and filtering features, including geospatial search, range search, and fuzzy search. It also supports data federation, which enables users to search across multiple data sources.

Other Noteworthy Vector Databases

The vector databases above are just some examples. Several other vector databases may be worth a look too:

Faiss: Developed by Facebook’s AI Research team, Faiss provides efficient similarity search and clustering of dense vectors.
Hnswlib: An open-source library for Approximate Nearest Neighbor Search, Hnswlib offers excellent speed and accuracy with minimal resource usage.
Milvus: An open-source vector database designed for AI and analytics, Milvus offers scalable, reliable, and customizable solutions.
qdrant: Qdrant is a vector similarity search engine with extended filtering capabilities, designed for organizing and searching large-scale vector data.

Additionally, some databases, while not specifically designed for vectors, can handle vectors more efficiently than traditional SQL databases. Examples include Azure Cosmos DB, Elasticsearch, and Redis DB. Examples include Azure Cosmos Db, Elastic Search, and Redis db.

Summary

OpenAI’s advancements may have fueled the initial hype around AI, but the infrastructure demand to support AI applications is now on the rise. Vector databases are increasingly in the spotlight due to their proficiency in managing unstructured data and their efficiency in conducting similarity searches across vast columns of data. With the escalating demand for AI infrastructure, vector databases are anticipated to continue gaining momentum. As businesses and organizations strive to leverage the power of AI, the reliance on vector databases will only grow, making them a cornerstone of future AI infrastructure.

I hope this article was helpful. If you have any remarks or comments, please let me know in the comments.

Sources and Further Reading

https://en.wikipedia.org/wiki/Distributional%E2%80%93relational_database
https://www.pinecone.io/
https://www.trychroma.com/
https://www.calcalistech.com/ctechnews/article/sjveg7ux2
https://weaviate.io/
https://analyticsindiamag.com/why-are-investors-flocking-to-vector-databases/
ChatGPT helped to revise this article.
Images generated with Midjourney

The post Vector Databases: The Rising Star in Generative AI Infrastructure appeared first on relataly.com.

From Pirates to Nobleman: Simulating Multi-Agent Conversations using OpenAI’s ChatGPT and Python

Florian Follonier — Mon, 24 Apr 2023 22:44:52 +0000

Many people use ChatGPT for its text-generation capability and have included it in their day-to-day workflows. However, few people may know that you can use it to create multi-agent conversations between fictional and nonfictional characters. Ever wondered if AI could take you on a time-traveling journey to eavesdrop on a chat between an 18th-century Pirate and Nobleman? Say no more! In this blog post, we’re diving deep into the captivating universe of ChatGPT. We use a Python script and OpenAI ChatGPT to simulate a multi-agent conversation about life’s big questions. Plus, we’ll give you the lowdown on how to bring your characters to life to make the convo even more riveting.

So, buckle up and get ready for a time-warping chat that’ll not only tickle your curiosity but also get you thinking about the meaning of life itself!

Also: ChatGPT Prompt Engineering: A Practical Guide for Businesses

Multi-Agent Conversations between Two instances of ChatGPT of itself – How does it work?

GPT, part of the name ChatGPT, stands for “Generative Pretrained Transformer”. It’s a fancy name for a groundbreaking innovation we’ve seen in the last ten years. One cool thing about ChatGPT is that it can have a conversation with itself. How? By using the answer from one version of itself as the next question for another version.

Imagine this: You let ChatGPT-1 talk, take its answer, and give it to ChatGPT-2 as a new question. You keep doing this, and before you know it, you’ve got an ongoing chat between the two!

This unique feature can be super handy for many things. For instance, you could build more lifelike chatbots or create authentic and varied dialogues for characters in a story.

But there’s a catch. Sometimes, they can get stuck in loops or spit out gibberish. That’s why it’s crucial to keep an eye on these chats to make sure they make sense and stay on topic.

The illustration shows the first three steps in a conversation between ChatGPT and itself. In the sample, the prompt includes information that makes ChatGPT alternate between the roles of a pirate and an aristocrat.

Also: Eliminating Friction: How LLMs such as OpenAI’s ChatGPT Streamline Digital Experiences and Reduce the Need for Search

How to Manage a Conversation Context with ChatGPT

When using ChatGPT models, there are three different roles available: system, user, and assistant. Understanding these roles is critical when designing a conversation because it allows us to manage the context and information provided to the model.

The system role indicates that a message comes from the system or the model itself. While it is not required to include system messages in a conversation, doing so can help set up the conversation’s context and provide information that can be used to guide the model’s subsequent responses.
The user role indicates that a message comes from an end-user or an application. This role represents prompts that trigger a response from the model. When designing a conversation, it is essential to carefully craft these prompts to provide the model with the necessary context and information to provide a useful response.
The assistant role indicates that a message is coming from the assistant, which is the model itself. This role helps maintain continuity between the user and the model during a conversation. Using the assistant role, previous conversations can be saved and sent again in subsequent requests to help guide the model’s responses.

The code below shows how to use these roles when requesting the ChatGPT model.

prompt = [{"role": "system", "content": general_instructions}, 
          {"role": "user", "content": task},
         {"role": "assistant": previous_responses}] 

#print('Generating response from OpenAI...') 
completion = openai.ChatCompletion.create(
  model=model_engine, 
  messages=prompt, 
  temperature=0.5, 
  max_tokens=100)

Understanding the different roles available when using ChatGPT models is critical for designing effective conversations. By carefully crafting messages with the appropriate role, we can provide the model with the necessary context and information it needs to provide useful and relevant responses.

Simulating a Multi-Agent Conversation between a Pirate and a Nobleman

Now that you have a broad understanding of multi-agent conversations, we will focus on the practical part. In the following, we will use ChatGPT to simulate a discussion between a pirate and a nobleman on the sense of life. The characters have distinct personalities and goals, which will be reflected in their responses and the unique style of how they converse.

The code is available on the GitHub repository.

View on GitHub Relataly GitHub Repo

OpenAI ChatGPT has various use cases. Among others, it can be used to simulate multi-agent conversations between fictional and non-fictional characters.

Some Words on the OpenAI API Key and Inference Costs

To run the code below, you will need an OpenAI API that you can obtain from OpenAI or from Azure OpenAI. Yes, you will have to sign up.

How about the cost of inferring the model? We will be utilizing the ChatGPT Turbo model for every conversation, which will require using the completion endpoint of the model several times, leading to some costs. The cost of using GPT models varies depending on the model type and the number of processed tokens. In the use case discussed in this article, each code execution will involve 20 API calls to the ChatGPT Turbo model (3.5). Since this model is highly cost-effective and we will not be generating an excessive amount of text, executing the code will only incur a few cents in costs.

Step #1 Setting up Imports and OpenAI Key

Let’s begin by importing a few libraries such as openai, datetime, and Azure. We will also set up an API key for OpenAI. You can for example store and retrieve the API key from an environment variable or from an Azure Key Vault. Storing the key directly in the code is not advised, as you might accidentally commit your code to a public repository and expose the key.

The code below will also create a folder to store conversations. Each conversation will get stored in HTML format.

import openai
import datetime as dt
from azure.identity import AzureCliCredential
from azure.keyvault.secrets import SecretClient
import time
import os

# use environment variables for API key
timestamp = dt.datetime.now().strftime("%Y%m%d_%H%M%S")

# set your OpenAI API key here
API_KEY = os.environ.get("OPENAI_API_KEY")
if API_KEY is None:
    print('trying to get API key from azure keyvault')
    keyvault_name = 'your-keyvault-name'
    client = SecretClient(f"https://{keyvault_name}.vault.azure.net/", AzureCliCredential())
    API_KEY = client.get_secret('openai-api-key').value
openai.api_key = API_KEY

# create a folder to store the conversations if it does not exist
path = 'ChatGPT_conversations'
if not os.path.exists(path):
    os.makedirs(path)

Step #2 Functions for Prompts and OpenAI Completion

We begin by defining three essential functions used to create and maintain a conversation using the OpenAI ChatGPT completion endpoint.

The first function, initialize_conversation, sets up the conversation by creating an initial prompt for the user to start the conversation with a given topic and character description. For instance, the code below sets the initial prompt as “Good day, Sir. Wonderful day isn’t it?” and waits for ChatGPT to respond with an appropriate reply.
The second function, respond_prompt, generates a response to the previous response and creates a prompt for the user to continue the conversation. This function is called multiple times as the conversation progresses.
Finally, the openai_request function utilizes the OpenAI chat engine to generate a response to the prompt created by the previous two functions. The response generated by the chat engine is then returned to the calling function.

By using these functions, we can establish a fully functional and engaging conversation system that utilizes the power of OpenAI’s ChatGPT model.

# This function creates a prompt that initializes the conversation 
def initialize_conversation(topic='', character=''):
    instructions = f' You have a conversation on {topic}. You can bring up any topic that comes to your mind'
    instructions = character['description'] + instructions
    task = f'Good day, Sir.'
    if topic != '':
        task = task + f' Wonderful day isn t it?'
    return instructions, task

# This function creates a prompt that responds to the previous response
def respond_prompt(response, topic='', character=''):    
    instructions = f'You have a conversation with someone on {topic}. \
    Reply to questions and bring up any topic that comes to your mind.\
    Dont say more than 2 sentences at a time.'
    instructions = character['description'] + instructions
    task = f'{response}' 
    return instructions, task

# OpenAI Engine using the turbo model
def openai_request(instructions, task, model_engine='gpt-3.5-turbo'):
    prompt = [{"role": "system", "content": instructions }, 
              {"role": "user", "content": task }]

    #print('Generating response from OpenAI...')
    completion = openai.ChatCompletion.create(
    model=model_engine, 
    messages=prompt,
    temperature=1.0, # this will lead to create responses that are more creative
    max_tokens=100)

    response = completion.choices[0].message.content

    return response

Step #3 Defining Characters

After defining the functions, the next step is to create the conversation by specifying the topic and the characters that will participate in the debate. In the following example, the topic is “the sense of life,” and the two characters are James, an Aristocrat, and Blackbeard, a Pirate. These characters are described with unique personalities and behaviors that will be utilized in the conversation simulation.

With this framework, it is simple to modify the behaviors, voice, and tone of the characters by adjusting their descriptions. This flexibility allows for the creation of intriguing conversations on any topic between different characters. The possibilities are endless; one could simulate a debate between Darth Vader and Harry Potter, or even describe the behavior and background of the characters in more detail.

In short, this framework provides a flexible and adaptable platform for generating engaging conversations, allowing for the exploration of various perspectives and topics.

# initialize conversation on the following topic
topic = 'The sense of life'
conversation_rounds = 20

# description of character 1
color_1 = 'darkblue' 
character_1 = {
"name": 'James (Aristocrat)',
"description": 'You are a French nobleman from the 18th century. \
    Your knowledge and wordlview corresponds to that of a common aristocrate from that time. \
    You speak in a distinguished manner. \
    You response in one or two sentences. \
    You are afraid of pirates but also curious to meet one.'}

# description of character 2 
color_2 = 'brown'
character_2 = {
"name": 'Blackbeard (Pirate)',
"description": 'You are a devious pirate from the 18th century who tends to swear. \
    Your knowledge and wordlview corresponds to that of a common pirate from that time. \
    You response in one or two sentences. \
    You are looking for a valuable treasure and trying to find where it is hidden. \
    You try to steer the conversation back to the treasure no matter what.'}

Step #4 Start the Conversation

Now that we have our characters defined, it’s time to start the conversation. The next section of the code invokes the conversation between the two characters by initializing it with a greeting from one of the characters. In the code below, James starts the conversation with a prompt: “Good day, Sir. Wonderful day, isn’t it?”.

The conversation then proceeds with each character taking turns responding to the previous prompt. The response from each character is generated by calling the openai_request() function, which sends a prompt to OpenAI’s GPT-3.5 model and returns a response generated by the model.

The conversation continues for a specified number of rounds, which can be set by changing the value of the “num_rounds” variable. Each round consists of a response from one character followed by a response from the other character.

conversation = ''
for i in range(conversation_rounds):
        # initialize conversation
        if i == 0:
            print('Initializing conversation...')
            text_color = color_1
            name = character_1['name']
            instructions, task = initialize_conversation(topic, character_1)
            response = openai_request(instructions, task)
            print(f'{name}: {task}')
            conversation = f'{name}: {task} \n'
        # alternate between character_1 and character_2
        else:
            if i % 2 == 0:
                text_color = color_1
                name = character_1['name']
                instructions, task = respond_prompt(response, topic, character_1)
            else:
                text_color = color_2
                name = character_2['name']
                instructions, task = respond_prompt(response, topic, character_2)

            # OpenAI request
            response = openai_request(instructions, task)

            # wait some seconds 
            time.sleep(15)

            # add response to conversation after linebreak
            print(f'{name}: {response}')
            conversation += ' ' + f'{name}: {response} \n'

        #print('storing conversation')
        # store conversation with timestamp
        
        filename = f'{path}/GPTconversation_{timestamp}.html'
        with open(filename, 'w') as f:
            f.write(conversation)

As you can see, the conversation is quite entertaining and even touches upon some aspects of an interesting philosophical question. But foremost it’s an amusing example of an exotic use case for ChatGPT.

By modifying the character descriptions and prompts, you can create interesting conversations on any topic between various characters. For example, you can create a conversation between Albert Einstein and Marie Curie on the topic of physics, or between Abraham Lincoln and Martin Luther King Jr. on the topic of civil rights. The possibilities are endless!

Summary

Arrr sailor, you have reached the end of this post! So let’s do a quick recap. This article has presented a Python script that allowed for the simulation of a multi-agent ChatGPT conversation between a pirate and a nobleman. The script used the ChatGPT language model to generate responses for the characters based on a given topic. We have provided instructions for running the script and customizing the personalities and behaviors of the characters. In addition, we have discussed how the script works by generating a response from OpenAI that is then used as a prompt for another ChatGPT instance. In this way, ChatGPT can be used to simulate a conversation on any topic.

Overall, I hope this article was able to demonstrate the potential of using GenerativeAI and natural language processing to create engaging and entertaining dialogues between virtual characters.

If you have any questions or want to share your experiences with what you could achieve with the script, please let me and everyone else know in the comment.

Sources and Further Reading

https://en.wikipedia.org/wiki/Blackbeard
OpenAI.com/models
Azure OpenAI Service
ChatGPT helped to revise this article, but the thoughts are human-made.
Images generated with Midjourney.com

The post From Pirates to Nobleman: Simulating Multi-Agent Conversations using OpenAI’s ChatGPT and Python appeared first on relataly.com.