Conversation Flow
The following diagram illustrates the flow of a conversation in Ayushma, including API endpoints, database models, and integrations with external services.
Explanation:
- User Input: The user initiates a conversation by providing input either through text or audio.
- Language Selection: The user selects the desired language for the conversation.
- Chat Creation: The front-end calls the Chat Creation API endpoint to create a new chat session in the database. This generates a unique Chat ID.
- Converse API: The front-end calls the Converse API endpoint with the Chat ID and user input.
- Conditional Logic:
- Text Input: If the user provided text input, it is sent directly to the Converse API with the text parameter.
- Audio Input: If the user provided audio input, the Speech-to-Text API is called first to transcribe the audio into text. The transcribed text is then sent to the Converse API.
- OpenAI API / Pinecone: The back-end processes the user's query and interacts with the OpenAI API or Pinecone index to generate a response.
- AI Response: The AI generates a response based on the user's query and available information.
- Translate?: If the user's selected language is not English, the AI response is translated to the target language using a translation API.
- Text-to-Speech?: If audio output is enabled, the AI response (translated or in English) is converted into speech using a Text-to-Speech API. This generates an audio file.
- Store ChatMessage: The response, along with any generated audio, is stored as a ChatMessage in the database, associated with the corresponding Chat and Project.
- Response: The final response, either as text or audio, is sent back to the user through the front-end interface.
Database Models Involved:
- ChatMessage: Stores individual messages within a chat session.
- Chat: Represents a chat session with a title, user, project, and list of associated messages.
- Project: Defines the configuration and settings for a specific project, including the prompt, API keys, and document references.
- Document: Represents a document that has been ingested into Ayushma for reference during conversations.
External Services and Integrations:
- Speech-to-Text Engine: Whisper or Google Speech-to-Text is used to transcribe audio input into text.
- Text-to-Speech Engine: OpenAI or Google Text-to-Speech is used to convert text responses into speech.
- Pinecone Index: Stores vector embeddings of documents for efficient retrieval during conversations.
- OpenAI API: Provides access to OpenAI's language models for generating responses and performing other AI tasks.
- Translation API: Facilitates real-time translation of messages between different languages.