At least 2 SysAdmins with access to each server | |||
There should be *at least* two sysadmins with access to every server. In order to reduce the high bus factor we currently have, and increase our resilience
|
1.2. Use standard procedures where possible, if there is no good reason to proceed differently | |
Some day, another admin might have to enter your setup to help, and the more standard setup, the easier for them to get comfortable patching the system. Standard procedures is more a concept than a documented reality. Let's just use some guidelines by saying that standard is what is provided by the distro or packaged tools and non standard is what is overly clever. Keep It Simple and Silly because the next person needing to touch it might do so in a crisis in the middle of the night not awake yet. |
1.3. Document your setup, specially the non-standard procedures | |
Every sysadmin has its own recipe on how to improve this or that section of the server, which deviates from the more standard procedure documented in other places. Document your changes, so that this important knowledge is not lost (in case of the unfortunate event that a bus "visits" you or something else happens), and in case of need, another person can easily find out your changes, and how to proceed there. |
1.3.1. Use pages in the Teams Structures | |
To avoid scattering info in wiki pages all around, we can benefit from using the wiki structure Teams, which has childs for all Teams in tiki.org. For instance, things related to SysAdmin, should be found as child pages from the Tiki Admin Group substructure. |
1.3.2. Please (b)log your changes to record changes | |
Also for sysadmins is the Community Infrastructure Blog for Community Infrastructure sysadmins to log their activities so that all other sysadmins can know what has been changed, configured, done, etc... |
1.4. Respect the environment set up by another admin | |
This rule implies that more than one admin will have access to login to each server, to reduce our bus factor and increase the resilience of our Community. |
1.4.1. Don't mess with anything | |
There should be backups (that should be documented also, as suggested in the previous rules), but it's way much better if we don't have to deal with retrieving and restoring backups, which usually involves some unexpected surprises or some content lost due to uncontrolled factors, etc. |
1.4.2. Fetch data only | |
The safest way to enter a server that was set up by another admin is to fetch data only, to place it somewhere else, and play/fine tune it there to split services, clone servers, etc. |
1.4.3. Discuss changes in advanced where possible, and report back when urgent changes were needed | |
If there was a very good reason to change something (e.g., patch a system after an important vulnerability has been reported and the main sys admin is unavailable for too long), discuss with others admins when possible before doing the change. And report back about your changes as soon as possible to that other admin, or other admins, when you had to apply them urgently for some important reason. |
Rationale | |
This way the community can stay responsive, if anything bad happens, fetch sites from the server and set them up on a new one managed by someone else. These concepts are being successfully used by several admins in some servers (amette/Nelson, Xavi/Ferran/Alex, Xavi/Alex, Xavi/Carlos, ...; Though not extensively tested, amette didn't yet find a bus he liked enough to jump in front of it , but xavi had a road accident that took him off for many months and the servers set up this way evolved organically where needed and users of services didn't notice it ).
1
They were initially drafted during TikiFest2014-Toronto-Tiki13Alpha, and improved after TikiFestSysAdmin-2014.
|