Monitoring
Monitor your temporary networks with metrics, logs, and dashboards
This guide shows you how to set up monitoring for your temporary networks using Prometheus for metrics and Promtail for logs.
Overview
tmpnet provides built-in integration with:
- Prometheus - Collect and store metrics from all nodes
- Promtail - Aggregate logs from all nodes
- Grafana - Visualize metrics and logs in dashboards
Monitoring helps you:
- Debug network behavior
- Identify performance bottlenecks
- Analyze consensus patterns
- Track resource usage
- Troubleshoot issues
Prerequisites
Install Monitoring Tools
The easiest way to get the required tools is using Nix:
# Start a development shell with monitoring tools
nix developThis provides both prometheus and promtail binaries.
Alternative: Manual Installation
Install tools manually if you don't use Nix:
Prometheus:
# macOS
brew install prometheus
# Linux - download from prometheus.io/downloadPromtail:
# Download from GitHub releases
# github.com/grafana/loki/releasesConfigure Monitoring Backend
You'll need access to a Prometheus and Loki backend. Set these environment variables:
export PROMETHEUS_URL="https://your-prometheus.example.com"
export PROMETHEUS_PUSH_URL="https://your-prometheus.example.com/api/v1/push"
export PROMETHEUS_USERNAME="your-username"
export PROMETHEUS_PASSWORD="your-password"
export LOKI_URL="https://your-loki.example.com"
export LOKI_PUSH_URL="https://your-loki.example.com/loki/api/v1/push"
export LOKI_USERNAME="your-username"
export LOKI_PASSWORD="your-password"Quick Start
Start Collectors
Start Prometheus and Promtail collectors:
# Start metrics collection
tmpnetctl start-metrics-collector
# Start log collection
tmpnetctl start-logs-collectorStart Your Network
Create a network - monitoring will be automatic:
tmpnetctl start-networkOutput includes a Grafana link:
Started network /home/user/.tmpnet/networks/20240312-143052.123456
Network metrics: https://grafana.example.com/...View Metrics and Logs
Click the Grafana link or open the URL saved at:
cat ~/.tmpnet/networks/latest/metrics.txtStop Collectors
When done, stop the collectors:
tmpnetctl stop-metrics-collector
tmpnetctl stop-logs-collectorMonitoring Configuration
Service Discovery
tmpnet uses file-based service discovery to automatically configure monitoring:
Prometheus Configuration:
~/.tmpnet/prometheus/file_sd_configs/
└── [network-uuid]-[node-id].jsonPromtail Configuration:
~/.tmpnet/promtail/file_sd_configs/
└── [network-uuid]-[node-id].jsonWhen a node starts, tmpnet automatically creates these configuration files. When a node stops, the files are removed.
Metric Labels
All metrics include these labels for filtering:
network_uuid- Unique identifier for the networknode_id- Node IDis_ephemeral_node- Whether the node is ephemeralnetwork_owner- User-defined network owner identifier
When running in GitHub Actions, additional labels are added:
gh_repo- Repository namegh_workflow- Workflow namegh_run_id- Run IDgh_job_id- Job ID
Custom Grafana Instance
Use a custom Grafana instance:
export GRAFANA_URI="https://your-grafana.example.com/d/your-dashboard-id"The emitted links will use your custom Grafana instance.
Monitoring in Code
Enable Monitoring Programmatically
Start collectors from Go code:
import "github.com/ava-labs/avalanchego/tests/fixture/tmpnet"
// Start Prometheus
err := tmpnet.StartPrometheus(
prometheusURL,
prometheusUsername,
prometheusPassword,
pushURL,
os.Stdout, // Progress output
)
if err != nil {
panic(err)
}
// Start Promtail
err = tmpnet.StartPromtail(
lokiURL,
lokiUsername,
lokiPassword,
pushURL,
os.Stdout,
)
if err != nil {
panic(err)
}Verify Collection
Check that metrics and logs are being collected:
// Check metrics
err := tmpnet.CheckMetricsExist(context.Background(), prometheusURL, prometheusUsername, prometheusPassword, network.UUID)
if err != nil {
fmt.Println("Metrics not found:", err)
}
// Check logs
err = tmpnet.CheckLogsExist(context.Background(), lokiURL, lokiUsername, lokiPassword, network.UUID)
if err != nil {
fmt.Println("Logs not found:", err)
}Monitoring Patterns
Development Workflow
For local development:
# Start collectors once
tmpnetctl start-metrics-collector
tmpnetctl start-logs-collector
# Create/destroy networks as needed
tmpnetctl start-network
# ... test ...
tmpnetctl stop-network
# Collectors keep running
# Stop when done with all testing
tmpnetctl stop-metrics-collector
tmpnetctl stop-logs-collectorTest Isolation
Filter to specific networks using the network UUID:
# In Grafana, filter by:
network_uuid="abc-123-def-456"Each network gets a unique UUID, making it easy to isolate results.
Ephemeral Node Monitoring
Track ephemeral nodes separately:
# Filter to only ephemeral nodes:
is_ephemeral_node="true"
# Filter to only permanent nodes:
is_ephemeral_node="false"Common Metrics
tmpnet collects standard AvalancheGo metrics:
Node Health
avalanche_network_peers- Number of connected peersavalanche_P_vm_blks_accepted- Accepted blocks on P-Chainavalanche_health_checks_failing- Failing health checks
Network Activity
avalanche_network_msgs_sent- Messages sentavalanche_network_msgs_received- Messages receivedavalanche_network_bandwidth_throttler_inbound_acquired_bytes- Inbound bandwidth
Consensus
avalanche_snowman_polls_successful- Successful consensus pollsavalanche_snowman_polls_failed- Failed consensus pollsavalanche_P_blks_processing- Blocks currently processing
Performance
avalanche_P_vm_blks_processing_time- Block processing timego_goroutines- Number of goroutinesgo_memstats_alloc_bytes- Memory allocated
Custom VM Metrics
If your custom VM exports Prometheus metrics, they'll be collected automatically.
Log Collection
Log Levels
Configure log verbosity per node:
node.Flags = tmpnet.FlagsMap{
"log-level": "debug", // trace, debug, info, warn, error, fatal
"log-display-level": "info",
}Log Queries in Grafana
Example queries in Grafana Explore (Loki):
# All logs for a network
{network_uuid="abc-123"}
# Error logs only
{network_uuid="abc-123"} |= "error" or "ERROR"
# Logs from a specific node
{network_uuid="abc-123", node_id="NodeID-7Xhw2..."}
# Search for specific patterns
{network_uuid="abc-123"} |= "consensus"
# Regex search
{network_uuid="abc-123"} |~ "block \\d+ accepted"Structured Logging
Enable JSON structured logging for easier parsing:
network.DefaultFlags = tmpnet.FlagsMap{
"log-format": "json",
}Troubleshooting
Collectors Won't Start
Check if already running:
ps aux | grep prometheus
ps aux | grep promtailStop existing processes:
tmpnetctl stop-metrics-collector
tmpnetctl stop-logs-collectorNo Metrics Appear
Verify collectors are running:
ps aux | grep prometheusCheck service discovery configs exist:
ls ~/.tmpnet/prometheus/file_sd_configs/Verify network UUID:
cat ~/.tmpnet/networks/latest/config.json | jq -r '.uuid'Check Prometheus is scraping:
# Check Prometheus logs
tail -f ~/.tmpnet/prometheus/*.logNo Logs Appear
Check Promtail is running:
ps aux | grep promtailVerify log files exist:
ls ~/.tmpnet/networks/latest/NodeID-*/logs/Check Promtail configuration:
ls ~/.tmpnet/promtail/file_sd_configs/Can't Access Grafana
Check the metrics link:
cat ~/.tmpnet/networks/latest/metrics.txtVerify GRAFANA_URI is set:
echo $GRAFANA_URICheck network UUID is in the URL:
The URL should contain var-network_uuid=YOUR_UUID
CI/CD Integration
GitHub Actions
Use the provided GitHub Action for automated monitoring:
- name: Run tests with monitoring
uses: ./.github/actions/run-monitored-tmpnet-cmd
with:
run: ./scripts/test.sh
prometheus_url: ${{ secrets.PROMETHEUS_URL }}
prometheus_push_url: ${{ secrets.PROMETHEUS_PUSH_URL }}
prometheus_username: ${{ secrets.PROMETHEUS_USERNAME }}
prometheus_password: ${{ secrets.PROMETHEUS_PASSWORD }}
loki_url: ${{ secrets.LOKI_URL }}
loki_push_url: ${{ secrets.LOKI_PUSH_URL }}
loki_username: ${{ secrets.LOKI_USERNAME }}
loki_password: ${{ secrets.LOKI_PASSWORD }}The action automatically:
- Starts collectors
- Runs your tests
- Stops collectors
- Uploads network artifacts
- Emits Grafana links in logs
Custom CI Systems
For other CI systems:
#!/bin/bash
set -e
# Start collectors
tmpnetctl start-metrics-collector
tmpnetctl start-logs-collector
# Ensure cleanup on exit
trap "tmpnetctl stop-metrics-collector; tmpnetctl stop-logs-collector" EXIT
# Run your tests
./run-tests.sh
# Metrics link is in the network directory
cat ~/.tmpnet/networks/latest/metrics.txtAdvanced Monitoring
Custom Metrics Dashboard
Create custom Grafana dashboards using tmpnet labels:
# Panel: Network Message Rate
rate(avalanche_network_msgs_sent{network_uuid="$network_uuid"}[1m])
# Panel: Block Processing Time (P-Chain)
histogram_quantile(0.99, rate(avalanche_P_vm_blks_processing_time_bucket[5m]))
# Panel: Active Validators
avalanche_P_vm_validators_count{network_uuid="$network_uuid"}Alerting
Set up alerts based on network behavior:
# Alert when nodes disconnect
avalanche_network_peers < 4
# Alert on high block processing time
histogram_quantile(0.99, rate(avalanche_P_vm_blks_processing_time_bucket[5m])) > 1000
# Alert on failed health checks
avalanche_health_checks_failing > 0Metric Retention
Metrics are retained according to your Prometheus backend configuration. For long-running tests, ensure sufficient retention.
Next Steps
Additional Resources
Is this guide helpful?