Media Tools
Media tools enable your AI agents to process multimedia content including audio, video, and images.
Audio Transcription
Convert audio files to text using Deepgram's speech-to-text technology.
Supported Formats
- MP3
- WAV
- M4A
- FLAC
- OGG
- And more audio formats
Configuration Options
| Setting | Description |
|---|---|
| Context Prompt | Additional context to improve accuracy |
| Key Terms | Industry-specific terminology |
| Find/Replace | Correct common transcription errors |
Key Terms
Improve transcription of specialized vocabulary:
- Company names
- Product names
- Technical terms
- Proper nouns
Example:
Gen8, Acme Corp, TechFlow, API Gateway
Find/Replace
Automatically correct common errors:
jean eight -> Gen8
a.i. -> AI
Image Description
This tool analyzes and describes uploaded images using AI vision capabilities.
Configuration
| Setting | Description |
|---|---|
| Context Prompt | Guide what to focus on |
When a user uploads an image, the AI examines the visual content, provides a detailed description, and can answer follow-up questions about what it sees. You can guide the analysis by providing a context prompt that tells the AI what to focus on—for example, "Focus on identifying products and their prices" or "Describe any text visible in the image."
Common uses include document analysis, product identification, screenshot interpretation, and extracting data from visual content.
Meeting Bot
The meeting bot joins and transcribes live meetings automatically. It works with Zoom, Google Meet, Microsoft Teams, and other major video conferencing platforms.
Configuration
| Setting | Description |
|---|---|
| Bot Name | Name displayed in the meeting |
| Bot Image | Avatar shown for the bot |
| Entry Message | Message sent when bot joins |
| Key Terms | Improve transcription accuracy |
The workflow is simple: a user provides a meeting link, the AI sends a bot to join, and the bot records and transcribes in real-time. Once the meeting ends, the transcript becomes available in the chat for review and further processing.
Best practices include informing meeting participants about the recording, setting a professional bot name and appropriate image, and configuring key terms for better accuracy with technical or company-specific terminology.
Cancel Meeting Bot
This tool removes a bot from an ongoing meeting—useful when a meeting ends early, the bot is no longer needed, or it joined the wrong meeting.
PDF Transcription
This tool extracts and analyzes content from PDF documents. It handles text extraction, analyzes images within PDFs, recognizes tables, and processes multi-page documents.
Example Workflows
Meeting summary: A user provides a meeting link, the AI sends a bot to join and transcribe. After the meeting, the AI provides the transcript along with key points and action items.
Audio analysis: A user uploads a customer call recording. The AI transcribes it and identifies key topics discussed—product inquiries, pricing questions, support requests.
Image processing: A user uploads a product photo and asks what product it is. The AI analyzes the image and identifies the product model, visible specifications, and features.
Tips for Better Results
For better audio transcriptions, use clear audio with minimal background noise, speak at a moderate pace, and use quality microphones when recording.
For better image analysis, use high-resolution images with good lighting, ensure text is legible, and avoid motion blur.
Providing context always helps. Tell the AI what you're looking for, what industry or domain the content relates to, and what type of content to expect. Results will be more accurate and relevant when the AI understands the context.