Backup scheme: Difference between revisions

From msgwiki
Jump to navigation Jump to search
Access restrictions were established for this page. If you see this message, you have no access to this page.
mNo edit summary
No edit summary
 
Line 100: Line 100:


We should look into having a procedure in place, where if any one server goes offline due to a problem that cannot be resolved (say, terrorists attack and it sacrifices itself in heroic lifesaving action) its functionality can be entirely restored from planc on a different device at a different location. This could also be useful if we find we need to get something online quickly and don't want to transfer data across the world.
We should look into having a procedure in place, where if any one server goes offline due to a problem that cannot be resolved (say, terrorists attack and it sacrifices itself in heroic lifesaving action) its functionality can be entirely restored from planc on a different device at a different location. This could also be useful if we find we need to get something online quickly and don't want to transfer data across the world.
==== Backup Notifications ====
There should be notifications, if backups cannot be made or encounter issues.
===== Exodus =====
* an e-mail is sent if [[IT/Software/Backup Programs/Borg Backup|Borg]] is not able to create a backup
* to be clarified :)
===== PlanC =====
* a Line notification is sent, if the last [[IT/Software/Backup Programs/Borg Backup|Borg]] backup is older than 2 days
* script is on planc under /usr/local/bin/backup-notify.sh
* runs from a cron job once a day


=== Old system ===
=== Old system ===

Latest revision as of 14:53, 29 May 2024

Backup scheme

Our backup model is setup to have copies of everything we use, in several places, and easily restorable.

Goals

The goals we have in mind for our long-term backup system are as follows.

  1. Nothing is stored only once. Everything has at least one redundancy.
  2. Everything is stored in more than one location. In most cases this is exodus and somewhere else.
  3. During normal (non-emergency) operation, everything is easily accessible.
  4. Backups should be incremental where possible to protect against error code ID-10T. Keep working snapshots from the last hours, days, weeks, months and years.
  5. Where possible, restoring from a failure should be fast. Transferring large amounts of data over a network across the world should be a last resort that never needs to happen.

The current system

There are 2 types of backups.

  1. A mirror backup
    1. is a carbon copy clone of a data source stored in a repository.
    2. Very convenient for access because it's an exact replica. To restore from a mirror is as simple as copying it back over.
    3. The mirror can be stored in the same place as the original, making copy times small because data transfer is entirely local. This is of course less redundant.
    4. The mirror can be stored in a remote location. These backups are more reliable but slower to restore.
    5. Usually differential. This means every backup only worries about differences between the original and the mirror and does not copy everything every time
    6. We use rsync for this
  2. An incremental backup
    1. is a type of backup that keeps multiple "snapshots" of data over time.
    2. Not an exact replica. Usually data is stored as many zip files in a systematic structure, but not an intuitive one to a user looking at the files
    3. Can also be either local or remote. Same pros and cons
    4. Incremental. This is a step further than differential. It copies only the differences since the last backup, but also keeps a "snapshot" of the backup before the changes
      1. This is essentially a save-point.
      2. At any time you can go back to any previous snapshot
    5. We use borg backup for this

These backups can be on a basic disk or on a RAID.

We intend to have a RAID running on exodus /mnt/RAID. We have 4 4TB disks that will form the RAID 6, giving us 8TB of storage. This will chiefly be used as mirror storage but will also host nextcloud data files on exodus. RAID 6 gives us 2 disks of redundancy; we can lose up to 2 disks before data loss. Additionally, when a disk fails on exodus an automated email is sent to it@msgeducation.com informing of degraded array operation.

We will also have a 2TB SSD running on a raspberry pi called planc. This is a portable backup pot that will connect only through ZeroTier and host only backups, and so can be plugged into anywhere with an internet connection and continue backing places up. This device will host incremental backups only, running Borg Backup from any original location.


Here is where we plan to store data and where it will be backed up:

  1. Server msgcnx
    1. msgcnxFiles
      1. Mirror to exodus /media/RAID/cnx
      2. Mirror to local disk
    2. msgwiki
      1. Mirror to exodus /media/RAID/cnx
      2. Mirror to local disk
    3. Moodle (msgcnx, msghan, msgcgk)
      1. Mirror to exodus /media/RAID/cnx
      2. Mirror to local disk
    4. Wordpress (msgcnx, msghan, msgcgk, gegpak, msgdeh, msgdel)
      1. Mirror to exodus /media/RAID/cnx
      2. Mirror to local disk
    5. PUBLIC www folder
      1. Mirror to exodus /media/RAID/cnx
      2. Mirror to local disk
  2. Server msgvte
    1. msgvtefiles
      1. Mirror to exodus /media/RAID/vte
      2. Mirror to local disk
    2. Wordpress (msgvte)
      1. Mirror to exodus /media/RAID/vte
      2. Mirror to local disk
    3. Moodle (msgvte)
      1. Mirror to exodus /media/RAID/vte
      2. Mirror to local disk
    4. Public_HTML_MSGVTE
      1. Mirror to exodus /media/RAID/vte
      2. Mirror to local disk
  3. Server exodus
    1. /media/RAID
      1. Stores
        1. /nextcloud
          1. All nextcloud instances for all campuses
        2. /vte
          1. Mirror backups of the vte server
        3. /cnx
          1. Mirror backups of the cnx server
        4. /education
          1. MSGEDU stored data
        5. /exodusBackups
          1. Backups of exodus /etc, /var/www, and databases
      2. Incremental to planc
        1. All of these things are backed up to planc incrementally with borg backup once a day

In addition, clear and easy steps to recovery need to be written so that Mr. Smith, who has a basic knowledge of Linux, but nothing else, can restore a backup from both exodus and planc with only the credentials. See Borg Backup for planc instructions.

Here's how this holds up against our goals:

  1. Everything has a minimum of one backup.
  2. Everything has a minimum of two locations. Everything is on the exodus RAID and somewhere else.
  3. Everything is nearby and easily accessible with the possible exception of nextcloud, which is on exodus and geographically about as far away from the majority of users as possible.
    1. We could put nextcloud on the cnx server but for now it's not worthwhile.
      1. Pros are things are closer and theoretically faster (for people not on a vpn).
      2. Cons are the CNX server is significantly less reliable due to power outages (which take out the internet. UPS is insufficient), and it would need significantly more storage.
  4. Everything user-facing (that is, accessed by non-techies) has an incremental backup to planc. The only things that do not have incremental backups are Moodle instances, wordpress instances, msgwiki, and public file shares. These things could easily be added if we decide to.
  5. Satisfactory
    1. Server msgvte is all set: it has 2 copies of everything locally, plus incremental backups on planc and mirror backups on exodus, in that order of recovery speed.
    2. Server msgcnx is also all set: storing local backups, and the only critical thing is msgcnxFiles which is on planc and also exodus. Everything else is being phased out anyway, or would be fast to restore even from exodus but everything is local as well.
    3. Server exodus: The most critical server by far. A hard drive failure can be resolved with zero downtime due to the RAID. In the event of 2 hard drive failures, data would need to be transferred from planc which is slow but unlikely.

We should look into having a procedure in place, where if any one server goes offline due to a problem that cannot be resolved (say, terrorists attack and it sacrifices itself in heroic lifesaving action) its functionality can be entirely restored from planc on a different device at a different location. This could also be useful if we find we need to get something online quickly and don't want to transfer data across the world.

Backup Notifications

There should be notifications, if backups cannot be made or encounter issues.

Exodus
  • an e-mail is sent if Borg is not able to create a backup
  • to be clarified :)
PlanC
  • a Line notification is sent, if the last Borg backup is older than 2 days
  • script is on planc under /usr/local/bin/backup-notify.sh
  • runs from a cron job once a day

Old system

The old setup (redone April 2024) had the following repositories of information, backed up to the following places. None of the backups are incremental, all are mirrors synced regularly (most daily some weekly).

  1. Server msgcnx
    1. msgcnxFiles- shared on the msgcnx LAN and used by local computers
      • Some folders (Teacher) are backed up to next.msgcnx.com on exodus
      • Backed up to /media/cnx/backup_from_cnx/msgcnxFiles
    2. MediaWiki msgwiki
      • Database backed up to /media/cnx/backup_from_cnx/databases on exodus
      • www backed up to /media/cnx/backup_from_cnx/www on exodus
    3. ~/.scripts
      • Backed up to /media/cnx/backup_from_cnx/scripts on exodus
    4. Moodle msgcnx- unused except as reference and source for test conversion
      • Database backed up to /media/cnx/backup_from_cnx/databases on exodus
      • Data directory backed up to /media/cnx/backup_from_cnx/moodledatacnx
    5. Moodle msgcgk
      • Database backed up to /media/cnx/backup_from_cnx/databases on exodus
      • Data directory backed up to /media/cnx/backup_from_cnx/moodledatacgk
    6. Moodle msghan
      • Database backed up to /media/cnx/backup_from_cnx/databases on exodus
      • Data directory backed up to /media/cnx/backup_from_cnx/moodledatahan
    7. Wordpress msgcnx.com
      • Database backed up to /media/cnx/backup_from_cnx/databases on exodus
      • www backed up to /media/cnx/backup_from_cnx/www/html on exodus
    8. Wordpress msghan.com
      • Database backed up to /media/cnx/backup_from_cnx/databases on exodus
      • www backed up to /media/cnx/backup_from_cnx/www/html on exodus
    9. Wordpress msgcgk.com
      • Database backed up to /media/cnx/backup_from_cnx/databases on exodus
      • www backed up to /media/cnx/backup_from_cnx/www/html on exodus
    10. Wordpress gegpak.com
      • Database not backed up
      • www backed up to /media/cnx/backup_from_cnx/www/html on exodus
    11. Wordpress msgdel.com
      1. No known backup
    12. Wordpress msgdeh.com
      1. No known backup
    13. OneDrive- Gradekeeper files from CGK, VTE, HAN and CNX
      • Backed up to /media/cnx/backup_from_cnx/oneDriveBackups
      • No longer necessary as grades are on Populi now
  2. Server msgvte
    1. msgvtefiles- shared on webdav.msgvte.com and used by VTE teachers
      • Backs up to /media/vte/backup_from_vte on exodus
      • Syncs to next.msgvte.com on exodus
      • Backs up to a local daily and weekly disk
    2. Wordpress msgvte.com
      • Database backed up to /media/vte/backup_from_vte/databases on exodus
      • www backed up to /media/vte/backup_from_vte/www on exodus
      • Backs up to a local daily and weekly disk
    3. Moodle msgvte
      • Database backed up to /media/vte/backup_from_vte/databases on exodus
      • Data directory backed up to /media/vte/backup_from_vte/moodledatavte
      • Backs up to a local daily and weekly disk
    4. Public_HTML_MSGVTE
      • Backed up to /media/vte/backup_from_vte on exodus
      • Backs up to a local daily and weekly disk
    5. ~/.scripts
      • Backed up to /media/vte/backup_from_vte/scripts on exodus
      • Backs up to a local daily and weekly disk
  3. Server exodus
    1. /media/education/yearbook
      1. Unknown source. This is the only backup, and is currently coming from somewhere
    2. /media/education/personalBackups
      1. Backups of a personal Demoss computer
    3. /media/nextcloud
      1. Nextcloud data directory of all campuses
      2. Contains msgcgk, msgcnx, msgdeh, msgdel, msgeducation, msghan, msgvte, walt, gegpak
      3. Many campuses do not use, nor even know that this exists
      4. No known complete backups. Portions are stored elsewhere for edge access but not all.

How the old system held up against our goals:

  1. Some things are not redundant (Nextcloud data, some wordpress)
  2. Everything that is backed up is backed up to an external location. Some things are still living only on exodus and couple things only on msgcnx
  3. msgcnxFiles can only be fully easily accessed on the msgcnx LAN. Not a big issue, most commonly used things are on nextcloud as well.
  4. The closest thing we have to any incremental backups are the daily and weekly backups on vte. Even these are full clones updated regularly, and not snapshots.
  5. VTE could restore very quickly in case of a hardware fault due to immediately accessible local backups. CNX and EXO would both be in trouble and would require long distance data transfer from each other. In case of a location failure, such as robbery or natural disaster, all servers would require long distance recovery.