Xshell Pro
📖 Tutorial

Accelerate Database Performance Troubleshooting Using Grafana Assistant: A Step-by-Step Guide

Last updated: 2026-05-18 14:05:25 Intermediate
Complete guide
Follow along with this comprehensive guide

Introduction

Is your database suddenly sluggish? Grafana Cloud Database Observability already provides rich metrics like RED (Rate, Errors, Duration), execution samples, wait event breakdowns, and visual explain plans. But visibility alone isn't enough—you need to know why a query's P99 latency spiked or what an obscure wait event like wait/synch/mutex/innodb means. Enter the new Grafana Assistant integration, which combines AI with your actual observability data to deliver targeted, actionable insights without manual context assembly. This guide walks you through using this tool to troubleshoot performance issues faster.

Accelerate Database Performance Troubleshooting Using Grafana Assistant: A Step-by-Step Guide

What You Need

  • A Grafana Cloud account with Database Observability enabled
  • Access to the Grafana Assistant integration (available in the query detail view)
  • One or more databases with observed slow queries or degraded performance
  • Familiarity with basic Grafana navigation (metrics, logs, dashboards)
  • Optional: A specific time range and query ID to investigate

Step-by-Step Instructions

Step 1: Identify the Slow Query

Navigate to your Database Observability dashboard. Look for queries with elevated duration (P99 latency spikes) or rising error rates. Click into any query that appears problematic. You'll see detailed time-series data: duration, rows examined, rows returned, wait events, and more. This is your starting point. Skip to Step 2 if you've already identified a query.

Step 2: Open the Grafana Assistant

Once you're on the query detail page, locate the Grafana Assistant panel (usually a chat icon or button labeled "Assistant"). Click to open the assistant interface. A chat box appears, pre-loaded with context from your current view—time range, query text, schema, and execution plan. No need to paste SQL or explain the schema manually.

Step 3: Use Pre-Built AI Prompts

The assistant offers purpose-built AI buttons designed by database engineers. For example, you'll see options like:

  • "Why is this query slow?"
  • "Get recommendations on changes"
  • "Analyze wait events"
Click the "Why is this query slow?" button. The assistant instantly queries your real Prometheus and Loki data sources within the selected time window, using actual metrics, logs, and schema metadata.

Step 4: Analyze the Results

The assistant synthesizes the data into a health assessment. For instance, it might report:

  • Duration is spiking because rows examined are 50 times the rows returned (inefficient filtering).
  • P99 is 12x the median, indicating intermittent slowness.
  • CPU time is healthy, but wait events consume 40% of execution time.
Wait events like wait/synch/mutex/innodb are automatically translated: "During this wait, the database is contending for an internal mutex lock..." The assistant provides specific advice (e.g., "Consider using a covering index to reduce row scanning").

Step 5: Apply Recommendations

Based on the analysis, implement the suggested changes. Open the visual explain plan to confirm the index suggestion or rewrite the query. After applying changes, return to the dashboard and verify metrics (duration, wait events) improve. You can also ask follow-up questions in the same chat without losing context.

Step 6: Repeat or Refine

For other slow queries, repeat Steps 3-5. Use the "Get recommendations" button for proactive optimization. The assistant never stores your query text or schema—each analysis is ephemeral and privacy-preserving.

Tips for Best Results

  • Start with the pre-built prompts—they're optimized for common issues like lock contention, table scans, or high wait times. Generic prompts work too, but the guided buttons offer more precise context.
  • Understand wait events: The assistant demystifies cryptic names (e.g., io/table/sql/handler means I/O overhead from handler actions). Leverage this to pinpoint root causes.
  • Use a narrow time window when possible. The assistant queries actual data, so a focused range (e.g., 15 minutes) yields more actionable results than a 24-hour view.
  • Combine with visual explain plans: After AI suggestions, cross-reference with the visual plan to validate index usage or table access patterns.
  • Iterate quickly: The assistant's real-time querying means you can test changes and re-analyze immediately. No manual reprocessing.
  • Security note: The assistant never uses your data for model training—only for the current context. This ensures sensitive schema remains confidential.

By following these steps, you'll transform vague slowdowns into concrete fixes using Grafana Assistant's AI-powered integration. Happy troubleshooting!