Conversation Flow

The following diagram illustrates the flow of a conversation in Ayushma, including API endpoints, database models, and integrations with external services.

Explanation:

User Input: The user initiates a conversation by providing input either through text or audio.
Language Selection: The user selects the desired language for the conversation.
Chat Creation: The front-end calls the Chat Creation API endpoint to create a new chat session in the database. This generates a unique Chat ID.
Converse API: The front-end calls the Converse API endpoint with the Chat ID and user input.
Conditional Logic:
- Text Input: If the user provided text input, it is sent directly to the Converse API with the text parameter.
- Audio Input: If the user provided audio input, the Speech-to-Text API is called first to transcribe the audio into text. The transcribed text is then sent to the Converse API.
OpenAI API / Pinecone: The back-end processes the user's query and interacts with the OpenAI API or Pinecone index to generate a response.
AI Response: The AI generates a response based on the user's query and available information.
Translate?: If the user's selected language is not English, the AI response is translated to the target language using a translation API.
Text-to-Speech?: If audio output is enabled, the AI response (translated or in English) is converted into speech using a Text-to-Speech API. This generates an audio file.
Store ChatMessage: The response, along with any generated audio, is stored as a ChatMessage in the database, associated with the corresponding Chat and Project.
Response: The final response, either as text or audio, is sent back to the user through the front-end interface.

Database Models Involved:

ChatMessage: Stores individual messages within a chat session.
Chat: Represents a chat session with a title, user, project, and list of associated messages.
Project: Defines the configuration and settings for a specific project, including the prompt, API keys, and document references.
Document: Represents a document that has been ingested into Ayushma for reference during conversations.

External Services and Integrations:

Speech-to-Text Engine: Whisper or Google Speech-to-Text is used to transcribe audio input into text.
Text-to-Speech Engine: OpenAI or Google Text-to-Speech is used to convert text responses into speech.
Pinecone Index: Stores vector embeddings of documents for efficient retrieval during conversations.
OpenAI API: Provides access to OpenAI's language models for generating responses and performing other AI tasks.
Translation API: Facilitates real-time translation of messages between different languages.