Don't miss Build Games$1M Builder Competition
Guides

Monitoring

Monitor your temporary networks with metrics, logs, and dashboards

This guide shows you how to set up monitoring for your temporary networks using Prometheus for metrics and Promtail for logs.

Overview

tmpnet provides built-in integration with:

  • Prometheus - Collect and store metrics from all nodes
  • Promtail - Aggregate logs from all nodes
  • Grafana - Visualize metrics and logs in dashboards

Monitoring helps you:

  • Debug network behavior
  • Identify performance bottlenecks
  • Analyze consensus patterns
  • Track resource usage
  • Troubleshoot issues

Prerequisites

Install Monitoring Tools

The easiest way to get the required tools is using Nix:

# Start a development shell with monitoring tools
nix develop

This provides both prometheus and promtail binaries.

Alternative: Manual Installation

Install tools manually if you don't use Nix:

Prometheus:

# macOS
brew install prometheus

# Linux - download from prometheus.io/download

Promtail:

# Download from GitHub releases
# github.com/grafana/loki/releases

Configure Monitoring Backend

You'll need access to a Prometheus and Loki backend. Set these environment variables:

export PROMETHEUS_URL="https://your-prometheus.example.com"
export PROMETHEUS_PUSH_URL="https://your-prometheus.example.com/api/v1/push"
export PROMETHEUS_USERNAME="your-username"
export PROMETHEUS_PASSWORD="your-password"

export LOKI_URL="https://your-loki.example.com"
export LOKI_PUSH_URL="https://your-loki.example.com/loki/api/v1/push"
export LOKI_USERNAME="your-username"
export LOKI_PASSWORD="your-password"

Quick Start

Start Collectors

Start Prometheus and Promtail collectors:

# Start metrics collection
tmpnetctl start-metrics-collector

# Start log collection
tmpnetctl start-logs-collector

Start Your Network

Create a network - monitoring will be automatic:

tmpnetctl start-network

Output includes a Grafana link:

Started network /home/user/.tmpnet/networks/20240312-143052.123456
Network metrics: https://grafana.example.com/...

View Metrics and Logs

Click the Grafana link or open the URL saved at:

cat ~/.tmpnet/networks/latest/metrics.txt

Stop Collectors

When done, stop the collectors:

tmpnetctl stop-metrics-collector
tmpnetctl stop-logs-collector

Monitoring Configuration

Service Discovery

tmpnet uses file-based service discovery to automatically configure monitoring:

Prometheus Configuration:

~/.tmpnet/prometheus/file_sd_configs/
└── [network-uuid]-[node-id].json

Promtail Configuration:

~/.tmpnet/promtail/file_sd_configs/
└── [network-uuid]-[node-id].json

When a node starts, tmpnet automatically creates these configuration files. When a node stops, the files are removed.

Metric Labels

All metrics include these labels for filtering:

  • network_uuid - Unique identifier for the network
  • node_id - Node ID
  • is_ephemeral_node - Whether the node is ephemeral
  • network_owner - User-defined network owner identifier

When running in GitHub Actions, additional labels are added:

  • gh_repo - Repository name
  • gh_workflow - Workflow name
  • gh_run_id - Run ID
  • gh_job_id - Job ID

Custom Grafana Instance

Use a custom Grafana instance:

export GRAFANA_URI="https://your-grafana.example.com/d/your-dashboard-id"

The emitted links will use your custom Grafana instance.

Monitoring in Code

Enable Monitoring Programmatically

Start collectors from Go code:

import "github.com/ava-labs/avalanchego/tests/fixture/tmpnet"

// Start Prometheus
err := tmpnet.StartPrometheus(
    prometheusURL,
    prometheusUsername,
    prometheusPassword,
    pushURL,
    os.Stdout, // Progress output
)
if err != nil {
    panic(err)
}

// Start Promtail
err = tmpnet.StartPromtail(
    lokiURL,
    lokiUsername,
    lokiPassword,
    pushURL,
    os.Stdout,
)
if err != nil {
    panic(err)
}

Verify Collection

Check that metrics and logs are being collected:

// Check metrics
err := tmpnet.CheckMetricsExist(context.Background(), prometheusURL, prometheusUsername, prometheusPassword, network.UUID)
if err != nil {
    fmt.Println("Metrics not found:", err)
}

// Check logs
err = tmpnet.CheckLogsExist(context.Background(), lokiURL, lokiUsername, lokiPassword, network.UUID)
if err != nil {
    fmt.Println("Logs not found:", err)
}

Monitoring Patterns

Development Workflow

For local development:

# Start collectors once
tmpnetctl start-metrics-collector
tmpnetctl start-logs-collector

# Create/destroy networks as needed
tmpnetctl start-network
# ... test ...
tmpnetctl stop-network

# Collectors keep running
# Stop when done with all testing
tmpnetctl stop-metrics-collector
tmpnetctl stop-logs-collector

Test Isolation

Filter to specific networks using the network UUID:

# In Grafana, filter by:
network_uuid="abc-123-def-456"

Each network gets a unique UUID, making it easy to isolate results.

Ephemeral Node Monitoring

Track ephemeral nodes separately:

# Filter to only ephemeral nodes:
is_ephemeral_node="true"

# Filter to only permanent nodes:
is_ephemeral_node="false"

Common Metrics

tmpnet collects standard AvalancheGo metrics:

Node Health

  • avalanche_network_peers - Number of connected peers
  • avalanche_P_vm_blks_accepted - Accepted blocks on P-Chain
  • avalanche_health_checks_failing - Failing health checks

Network Activity

  • avalanche_network_msgs_sent - Messages sent
  • avalanche_network_msgs_received - Messages received
  • avalanche_network_bandwidth_throttler_inbound_acquired_bytes - Inbound bandwidth

Consensus

  • avalanche_snowman_polls_successful - Successful consensus polls
  • avalanche_snowman_polls_failed - Failed consensus polls
  • avalanche_P_blks_processing - Blocks currently processing

Performance

  • avalanche_P_vm_blks_processing_time - Block processing time
  • go_goroutines - Number of goroutines
  • go_memstats_alloc_bytes - Memory allocated

Custom VM Metrics

If your custom VM exports Prometheus metrics, they'll be collected automatically.

Log Collection

Log Levels

Configure log verbosity per node:

node.Flags = tmpnet.FlagsMap{
    "log-level": "debug", // trace, debug, info, warn, error, fatal
    "log-display-level": "info",
}

Log Queries in Grafana

Example queries in Grafana Explore (Loki):

# All logs for a network
{network_uuid="abc-123"}

# Error logs only
{network_uuid="abc-123"} |= "error" or "ERROR"

# Logs from a specific node
{network_uuid="abc-123", node_id="NodeID-7Xhw2..."}

# Search for specific patterns
{network_uuid="abc-123"} |= "consensus"

# Regex search
{network_uuid="abc-123"} |~ "block \\d+ accepted"

Structured Logging

Enable JSON structured logging for easier parsing:

network.DefaultFlags = tmpnet.FlagsMap{
    "log-format": "json",
}

Troubleshooting

Collectors Won't Start

Check if already running:

ps aux | grep prometheus
ps aux | grep promtail

Stop existing processes:

tmpnetctl stop-metrics-collector
tmpnetctl stop-logs-collector

No Metrics Appear

Verify collectors are running:

ps aux | grep prometheus

Check service discovery configs exist:

ls ~/.tmpnet/prometheus/file_sd_configs/

Verify network UUID:

cat ~/.tmpnet/networks/latest/config.json | jq -r '.uuid'

Check Prometheus is scraping:

# Check Prometheus logs
tail -f ~/.tmpnet/prometheus/*.log

No Logs Appear

Check Promtail is running:

ps aux | grep promtail

Verify log files exist:

ls ~/.tmpnet/networks/latest/NodeID-*/logs/

Check Promtail configuration:

ls ~/.tmpnet/promtail/file_sd_configs/

Can't Access Grafana

Check the metrics link:

cat ~/.tmpnet/networks/latest/metrics.txt

Verify GRAFANA_URI is set:

echo $GRAFANA_URI

Check network UUID is in the URL: The URL should contain var-network_uuid=YOUR_UUID

CI/CD Integration

GitHub Actions

Use the provided GitHub Action for automated monitoring:

- name: Run tests with monitoring
  uses: ./.github/actions/run-monitored-tmpnet-cmd
  with:
    run: ./scripts/test.sh
    prometheus_url: ${{ secrets.PROMETHEUS_URL }}
    prometheus_push_url: ${{ secrets.PROMETHEUS_PUSH_URL }}
    prometheus_username: ${{ secrets.PROMETHEUS_USERNAME }}
    prometheus_password: ${{ secrets.PROMETHEUS_PASSWORD }}
    loki_url: ${{ secrets.LOKI_URL }}
    loki_push_url: ${{ secrets.LOKI_PUSH_URL }}
    loki_username: ${{ secrets.LOKI_USERNAME }}
    loki_password: ${{ secrets.LOKI_PASSWORD }}

The action automatically:

  • Starts collectors
  • Runs your tests
  • Stops collectors
  • Uploads network artifacts
  • Emits Grafana links in logs

Custom CI Systems

For other CI systems:

#!/bin/bash
set -e

# Start collectors
tmpnetctl start-metrics-collector
tmpnetctl start-logs-collector

# Ensure cleanup on exit
trap "tmpnetctl stop-metrics-collector; tmpnetctl stop-logs-collector" EXIT

# Run your tests
./run-tests.sh

# Metrics link is in the network directory
cat ~/.tmpnet/networks/latest/metrics.txt

Advanced Monitoring

Custom Metrics Dashboard

Create custom Grafana dashboards using tmpnet labels:

# Panel: Network Message Rate
rate(avalanche_network_msgs_sent{network_uuid="$network_uuid"}[1m])

# Panel: Block Processing Time (P-Chain)
histogram_quantile(0.99, rate(avalanche_P_vm_blks_processing_time_bucket[5m]))

# Panel: Active Validators
avalanche_P_vm_validators_count{network_uuid="$network_uuid"}

Alerting

Set up alerts based on network behavior:

# Alert when nodes disconnect
avalanche_network_peers < 4

# Alert on high block processing time
histogram_quantile(0.99, rate(avalanche_P_vm_blks_processing_time_bucket[5m])) > 1000

# Alert on failed health checks
avalanche_health_checks_failing > 0

Metric Retention

Metrics are retained according to your Prometheus backend configuration. For long-running tests, ensure sufficient retention.

Next Steps

Additional Resources

Is this guide helpful?