Backup and DR data is a valuable business asset, and ensuring that it’s safe and accessible is essential. In particular, you want to be able to monitor your backups to ensure that the data is indeed protected and that you can quickly recover it in the event of a disaster, user error, or failed upgrade.
Users of Google Cloud’s Backup and DR service often ask:
-
What are the best practices for monitoring my backup and recovery jobs?
-
How can I proactively identify issues and troubleshoot?
-
How can I create alerts and be notified about important events?
The good news is that Google Cloud Backup and DR now integrates with Cloud Logging and Cloud Monitoring tools. Now, you can monitor backup events, jobs, appliance health, resource consumption and user actions in the same way you monitor other Google Cloud workloads and services.
With this integration, you can now:
-
Monitor events related to backup and restore – With detailed Google Cloud Backup and DR service event logs in Cloud Logging, you can now use log based alerts to create a customisable notification, in near-real time. This lets you monitor important events such as backup job failures, jobs that did not run, local backup storage saturation, and network failures.
-
Configure fine grained alerting – You can write custom event queries on a wide variety of dimensions such as event severity, application type, application name, job type, job name, error message and many more. This gives a lot of flexibility to define alerts for specific events or conditions within a system, thus reducing noise and helping you identify and fix issues easily.
-
Get notified on your preferred notification channel – Google Cloud offers seven pre-built notification channels that let you receive customisable notifications directly over email, SMS, Slack or the Google Cloud mobile app. Alternatively, you can integrate with your own monitoring and event management tools using webhooks or Pub/Sub.
-
Derive useful insights to troubleshoot issues – With Log Analytics, you can use BigQuery to query your data using SQL queries and generate operational insights, which can help you reduce time spent troubleshooting. For example, you can analyze the key reasons that backup and restore jobs fail for a given application or application type.
The integration also lets you proactively discover deeper issues related to backup and recovery, such as capacity-related issues or certain job failures that happen periodically, by monitoring recurring events in the logs over time. With log-based metrics, you can:
-
Create trend charts to track important metrics such as the number of backup or restore jobs that failed during a day/week/month
-
Receive a notification when the number of occurrences crosses a threshold, for example receive a notification if a snapshot pool saturates more than five times a week
-
Monitor trends in data, such as latency values in logs, and receive a notification if the values change in an unacceptable way.
Getting started
The time to learn that your backup failed should never be when you go to restore. By integrating our Backup and DR service with Cloud Monitoring and Cloud Logging, you can get valuable assurances about the health of your backup and business continuity processes with the same tools that you use to manage other Google Cloud workloads. To get started, check out the Backup and DR event logs page. You can also watch a video that shows you how to set up and use the service and configure some custom alerts.