Installation

Guidelines for Backup and Disaster Recovery

This article provides a clear strategy for MySQL database backups, disaster recovery (DR), routine maintenance, and troubleshooting. Its goal is to protect data, keep the application running, and reduce downtime during unexpected issues.

This guideline applies to critical components of the OvalEdge application, including: 

  • MySQL Databases
  • Application Servers (Tomcat)
  • Elasticsearch
  • Log and Storage Management

Backup Strategy

Backup Types and Schedule

Backup Type

Description 

Frequency 

Retention

Storage Location

Automated Daily Backup 

A complete copy of the MySQL database using AutoDB Backup Script.

Daily (off-peak)

7 Days

Onsite and Offsite (e.g., cloud storage)

Binary Log Backup

Continuously backs up MySQL binary logs to enable point-in-time recovery (PITR) by replaying changes up to a specific moment.

Continuous

7 days

Onsite and Offsite

Note: 

  • Ensure backups are stored in encrypted formats, and access is restricted to authorized personnel only.

Backup Implementation

  1. Automation:
    Schedule backup tasks using cron jobs or task schedulers to run at specified intervals without manual intervention.​
  2. Compression and Encryption:
    1. Compress backup files to optimize storage utilization.​
    2. Encrypt backups using industry-standard encryption protocols to safeguard sensitive data during storage and transmission.​
  3. Testing and Validation:
    1. Regularly restore backups to a test environment to verify data integrity and the effectiveness of the restoration process.​
    2. Monitor backup logs and set up alerts for any failures or anomalies.​

            Disaster Recovery (DR) Plan

            DR Strategy

            1. Replication
              1. Implement MySQL Replication to maintain real-time database copies on separate servers, preferably in different geographic locations, to protect against regional outages. 
            2. Failover and Recovery Procedures
              1. Deploy monitoring tools to detect failures and automatically switch to standby replicas, minimizing downtime.​
              2. Document and regularly update failover procedures, ensuring team members are trained to execute them effectively.​

            DR Guidelines

            1. Automated Failover
              1. Configure automated failover mechanisms to promote standby replicas to primary roles during a failure seamlessly.​
              2. Ensure applications are configured to reconnect to the new primary database instance without manual intervention.
                Note: If the hostname of MySQL is different, we need to configure it in the oasis.properties file​.
            2. Regular DR Drills
              1. Conduct quarterly disaster recovery drills to test the effectiveness of backups, replication, and failover mechanisms.​
              2. Update DR plans based on lessons learned from drills and evolving infrastructure.​
            3. Documentation
              1. Maintain comprehensive documentation of all backup and recovery procedures, configurations, and contact information for responsible personnel.​
              2. Store documentation in accessible locations for quick reference during emergencies.​
                Architecture Diagram for Disaster Recovery

            Troubleshooting Application Downtime

            Common Causes for Downtime

            1. Tomcat service is not running.​
            2. Elasticsearch service stopped or unreachable.​
            3. MySQL is not accessible by the application server.​
            4. Insufficient disk space on the server.​

            Troubleshooting Steps

            1. Verify Application URL
              1. If the application does not load in the browser, proceed with server checks.​
            2. Check Tomcat Logs
              1. Review ovaledge.log in the Tomcat log folder (both UI and Job tomcats, if applicable) for error messages or stack traces indicating issues.​
            3. Validate Service Status
              1. Use services.msc (Windows) to verify if Tomcat and Elasticsearch services are running.​
              2. Restart services if they are down. Ensure Elasticsearch is running before starting Tomcat, as it may have dependencies.​
                Elasticsearch Service
                Tomcat Service
            4. Check MySQL Connectivity
              1. Run the below command to confirm that the MySQL server is reachable from the application server as shown below.
                telnet <MySQL IP> 3306 or nc -zv <MySQL IP> 3306
              2. Ensure firewall rules and security groups allow traffic between the application server and MySQL.
            5. Validate Disk Space 
              1. Ensure a minimum of 30GB of free disk space for optimal application performance.​
              2. Use commands like df -h (Linux) or check disk properties (Windows) to monitor available space.​
            6. Use Troubleshooting Script 
              1. Utilize the provided Troubleshooting Script to collect service status, disk usage, and recent logs for analysis.​

                        Maintenance Activities

                        1. Logs File Activity 
                          1. Configuration
                            Log files can grow quickly, consuming valuable disk space and potentially affecting application performance. This script ensures that only logs from the last 10 days are retained, and older logs are automatically deleted. Please refer to the Logs cleanup document below.
                            Logs File Activity