Backup scheme
Our backup model is setup to have copies of everything we use, in several places, and easily restorable.
Goals
The goals we have in mind for our long-term backup system are as follows.
- Nothing is stored only once. Everything has at least one redundancy.
- Everything is stored in more than one location. In most cases this is exodus and somewhere else.
- During normal (non-emergency) operation, everything is easily accessible.
- Backups should be incremental where possible to protect against error code ID-10T. Keep working snapshots from the last hours, days, weeks, months and years.
- Where possible, restoring from a failure should be fast. Transferring large amounts of data over a network across the world should be a last resort that never needs to happen.
Current system
The current setup (1/25/24) has the following repositories of information, backed up to the following places. None of the backups are incremental, all are mirrors synced regularly (most daily some weekly).
- Server msgcnx
- msgcnxFiles- shared on the msgcnx LAN and used by local computers
- Some folders (Teacher) are backed up to next.msgcnx.com on exodus
- Backed up to /media/cnx/backup_from_cnx/msgcnxFiles
- MediaWiki msgwiki
- Database backed up to /media/cnx/backup_from_cnx/databases on exodus
- www backed up to /media/cnx/backup_from_cnx/www on exodus
- ~/.scripts
- Backed up to /media/cnx/backup_from_cnx/scripts on exodus
- Moodle msgcnx- unused except as reference and source for test conversion
- Database backed up to /media/cnx/backup_from_cnx/databases on exodus
- Data directory backed up to /media/cnx/backup_from_cnx/moodledatacnx
- Moodle msgcgk
- Database backed up to /media/cnx/backup_from_cnx/databases on exodus
- Data directory backed up to /media/cnx/backup_from_cnx/moodledatacgk
- Moodle msghan
- Database backed up to /media/cnx/backup_from_cnx/databases on exodus
- Data directory backed up to /media/cnx/backup_from_cnx/moodledatahan
- Wordpress msgcnx.com
- Database backed up to /media/cnx/backup_from_cnx/databases on exodus
- www backed up to /media/cnx/backup_from_cnx/www/html on exodus
- Wordpress msghan.com
- Database backed up to /media/cnx/backup_from_cnx/databases on exodus
- www backed up to /media/cnx/backup_from_cnx/www/html on exodus
- Wordpress msgcgk.com
- Database backed up to /media/cnx/backup_from_cnx/databases on exodus
- www backed up to /media/cnx/backup_from_cnx/www/html on exodus
- Wordpress gegpak.com
- Database not backed up
- www backed up to /media/cnx/backup_from_cnx/www/html on exodus
- Wordpress msgdel.com
- No known backup
- Wordpress msgdeh.com
- No known backup
- OneDrive- Gradekeeper files from CGK, VTE, HAN and CNX
- Backed up to /media/cnx/backup_from_cnx/oneDriveBackups
- No longer necessary as grades are on Populi now
- msgcnxFiles- shared on the msgcnx LAN and used by local computers
- Server msgvte
- msgvtefiles- shared on webdav.msgvte.com and used by VTE teachers
- Backs up to /media/vte/backup_from_vte on exodus
- Syncs to next.msgvte.com on exodus
- Backs up to a local daily and weekly disk
- Wordpress msgvte.com
- Database backed up to /media/vte/backup_from_vte/databases on exodus
- www backed up to /media/vte/backup_from_vte/www on exodus
- Backs up to a local daily and weekly disk
- Moodle msgvte
- Database backed up to /media/vte/backup_from_vte/databases on exodus
- Data directory backed up to /media/vte/backup_from_vte/moodledatavte
- Backs up to a local daily and weekly disk
- Public_HTML_MSGVTE
- Backed up to /media/vte/backup_from_vte on exodus
- Backs up to a local daily and weekly disk
- ~/.scripts
- Backed up to /media/vte/backup_from_vte/scripts on exodus
- Backs up to a local daily and weekly disk
- msgvtefiles- shared on webdav.msgvte.com and used by VTE teachers
- Server exodus
- /media/education/yearbook
- Unknown source. This is the only backup, and is currently coming from somewhere
- /media/education/personalBackups
- Backups of a personal Demoss computer
- /media/nextcloud
- Nextcloud data directory of all campuses
- Contains msgcgk, msgcnx, msgdeh, msgdel, msgeducation, msghan, msgvte, walt, gegpak
- Many campuses do not use, nor even know that this exists
- No known complete backups. Portions are stored elsewhere for edge access but not all.
- /media/education/yearbook
How the current system holds up against our goals:
- Some things are not redundant (Nextcloud data, some wordpress)
- Everything that is backed up is backed up to an external location. Some things are still living only on exodus and couple things only on msgcnx
- msgcnxFiles can only be fully easily accessed on the msgcnx LAN. Not a big issue, most commonly used things are on nextcloud as well.
- The closest thing we have to any incremental backups are the daily and weekly backups on vte. Even these are full clones updated regularly, and not snapshots.
- VTE could restore very quickly in case of a hardware fault due to immediately accessible local backups. CNX and EXO would both be in trouble and would require long distance data transfer from each other. In case of a location failure, such as robbery or natural disaster, all servers would require long distance recovery.
The new system
We are currently planning to setup the following system instead.
There are 2 types of backups.
- A mirror backup
- is a carbon copy clone of a data source stored in a repository.
- Very convenient for access because it's an exact replica. To restore from a mirror is as simple as copying it back over.
- The mirror can be stored in the same place as the original, making copy times small because data transfer is entirely local. This is of course less redundant.
- The mirror can be stored in a remote location. These backups are more reliable but slower to restore.
- Usually differential. This means every backup only worries about differences between the original and the mirror and does not copy everything every time
- An incremental backup
- is a type of backup that keeps multiple "snapshots" of data over time.
- Not an exact replica. Usually data is stored as many zip files in a systematic structure, but not an intuitive one to a user looking at the files
- Can also be either local or remote. Same pros and cons
- Incremental. This is a step further than differential. It copies only the differences since the last backup, but also keeps a "snapshot" of the backup before the changes
- This is essentially a save-point.
- At any time you can go back to any previous snapshot
These backups can be on a basic disk or on a RAID.
We intend to have a RAID running on exodus /mnt/RAID. We have 3 4TB disks that will form the RAID 5, giving us 8TB of storage. This will chiefly be used as mirror storage but will also host nextcloud data files on exodus.
We will also have a 2TB SSD running on a raspberry pi called planc. This is a portable backup pot that will connect only through ZeroTier and host only backups, and so can be plugged into anywhere with an internet connection and continue backing places up. This device will host incremental backups only, running Borg Backup from any original location.
Here is where we plan to store data and where it will be backed up:
- Server msgcnx
- msgcnxFiles
- Mirror to exodus /media/RAID/cnx
- Incremental to planc
- msgwiki
- Mirror to exodus /media/RAID/cnx
- Incremental not necessary?
- Moodle (msgcnx, msghan, msgcgk)
- Mirror to exodus /media/cnx
- Incremental not necessary?
- Wordpress (msgcnx, msghan, msgcgk, gegpak, msgdeh, msgdel)
- Mirror to exodus /media/cnx
- Incremental not necessary?
- Is there a PUBLIC www folder?
- msgcnxFiles
- Server msgvte
- msgvtefiles
- Mirror to exodus /media/vte
- Mirror to local disk
- Incremental to planc
- Wordpress (msgvte)
- Mirror to exodus /media/vte
- Mirror to local disk
- Incremental not necessary?
- Moodle (msgvte)
- Mirror to exodus /media/vte
- Mirror to local disk
- Incremental not necessary?
- Public_HTML_MSGVTE
- Mirror to exodus /media/vte
- Mirror to local disk
- Incremental not necessary?
- msgvtefiles
- Server exodus
- /media/nextcloud
- Is on a RAID for single redundancy
- Incremental to planc
- /media/nextcloud
Here's how this holds up against our goals:
- Nothing is stored only once. Everything has at least one redundancy.
- Everything is stored in more than one location. In most cases this is exodus and somewhere else.
- During normal (non-emergency) operation, everything is easily accessible.
- Backups should be incremental where possible to protect against error code ID-10T. Keep working snapshots from the last hours, days, weeks, months and years.
- Where possible, restoring from a failure should be fast. Transferring large amounts of data over a network across the world should be a last resort that never needs to happen.
- Everything has a minimum of one backup.
- Everything has a minimum of two locations. Everything is on the exodus RAID and somewhere else.
- Everything is nearby and easily accessible with the possible exception of nextcloud, which is on exodus and geographically about as far away from the majority of users as possible.
- We could put nextcloud on the cnx server but for now it's not worthwhile.
- Pros are things are closer and theoretically faster (for people not on a vpn).
- Cons are the CNX server is significantly less reliable due to power outages (which take out the internet. UPS is insufficient), and it would need significantly more storage.
- We could put nextcloud on the cnx server but for now it's not worthwhile.
- Everything user-facing (that is, accessed by non-techies) has an incremental backup to planc. The only things that do not have incremental backups are Moodle instances, wordpress instances, msgwiki, and public file shares. These things could easily be added if we decide to.
- Satisfactory
- Server msgvte is all set: it has 2 copies of everything locally, plus incremental backups on planc and mirror backups on exodus, in that order of recovery speed.
- Server msgcnx: The only critical thing is msgcnxFiles which is on planc and also exodus. Everything else is being phased out anyway, or would be fast to restore even from exodus. Should consider local backups.
- Server exodus: The most critical server by far. A hard drive failure can be resolved with zero downtime due to the RAID. In the event of 2 hard drive failures, data would need to be transferred from planc which is slow but unlikely.
We should look into having a procedure in place, where if any one server goes offline due to a problem that cannot be resolved (say, terrorists attack and it sacrifices itself in heroic lifesaving action) its functionality can be entirely restored from planc on a different device at a different location. This could also be useful if we find we need to get something online quickly and don't want to transfer data across the world.