Skip to main content

Media Tools

Media tools enable your AI agents to process multimedia content including audio, video, and images.

Audio Transcription

Convert audio files to text using Deepgram's speech-to-text technology.

Supported Formats

  • MP3
  • WAV
  • M4A
  • FLAC
  • OGG
  • And more audio formats

Configuration Options

SettingDescription
Context PromptAdditional context to improve accuracy
Key TermsIndustry-specific terminology
Find/ReplaceCorrect common transcription errors

Key Terms

Improve transcription of specialized vocabulary:

  • Company names
  • Product names
  • Technical terms
  • Proper nouns

Example:

Gen8, Acme Corp, TechFlow, API Gateway

Find/Replace

Automatically correct common errors:

jean eight -> Gen8
a.i. -> AI

Image Description

This tool analyzes and describes uploaded images using AI vision capabilities.

Configuration

SettingDescription
Context PromptGuide what to focus on

When a user uploads an image, the AI examines the visual content, provides a detailed description, and can answer follow-up questions about what it sees. You can guide the analysis by providing a context prompt that tells the AI what to focus on—for example, "Focus on identifying products and their prices" or "Describe any text visible in the image."

Common uses include document analysis, product identification, screenshot interpretation, and extracting data from visual content.

Meeting Bot

The meeting bot joins and transcribes live meetings automatically. It works with Zoom, Google Meet, Microsoft Teams, and other major video conferencing platforms.

Configuration

SettingDescription
Bot NameName displayed in the meeting
Bot ImageAvatar shown for the bot
Entry MessageMessage sent when bot joins
Key TermsImprove transcription accuracy

The workflow is simple: a user provides a meeting link, the AI sends a bot to join, and the bot records and transcribes in real-time. Once the meeting ends, the transcript becomes available in the chat for review and further processing.

Best practices include informing meeting participants about the recording, setting a professional bot name and appropriate image, and configuring key terms for better accuracy with technical or company-specific terminology.

Cancel Meeting Bot

This tool removes a bot from an ongoing meeting—useful when a meeting ends early, the bot is no longer needed, or it joined the wrong meeting.

PDF Transcription

This tool extracts and analyzes content from PDF documents. It handles text extraction, analyzes images within PDFs, recognizes tables, and processes multi-page documents.

Example Workflows

Meeting summary: A user provides a meeting link, the AI sends a bot to join and transcribe. After the meeting, the AI provides the transcript along with key points and action items.

Audio analysis: A user uploads a customer call recording. The AI transcribes it and identifies key topics discussed—product inquiries, pricing questions, support requests.

Image processing: A user uploads a product photo and asks what product it is. The AI analyzes the image and identifies the product model, visible specifications, and features.

Tips for Better Results

For better audio transcriptions, use clear audio with minimal background noise, speak at a moderate pace, and use quality microphones when recording.

For better image analysis, use high-resolution images with good lighting, ensure text is legible, and avoid motion blur.

Providing context always helps. Tell the AI what you're looking for, what industry or domain the content relates to, and what type of content to expect. Results will be more accurate and relevant when the AI understands the context.