alias "operations fail" alias "sysadmins in endlosschleifen" alias "sysadmin selbstbeschäftigung"
inspired of getting al that input from the puppetconf i decided to list all interesing points.
testing modules (have you ever made a rollout with 100% working software?), read release notes!
release management (plan your releases - easy step forward and easy step back, change people for test an production environment to test your documentation!)
dev's and OP's (sometimes its better to have an developer in th OP's team - script on top of a script on top of the script on top of a script => developers see "all scripts" and can simplify the program)
deal with errors: (its okay to have network problems, server failures, how is the structure? simple point of stupidity) (there is more between OKAY and ERROR) (QUEUE!)
mysql is for "web scale": focus on treasures on that products are made for (compare all and real features!)
alerting => (whats to deal with service error, notify team nagios or team pager? - sms, email, jabber?) - notify the persons who can fix the problems!
updates (the more frequent you update or change the smaller long-term risc you have on an every update can cause problems)
something new? (be an inventor? NO - or use something that already exists!)
knowledge silus (responsibility for services - nobody sees the whole structure! - silos build walls!, )
netboot (do not netboot all things - it doesn't scale!)
logging (analyze log (logstash) or >/dev/null, disable debug log on production!)
network - free for all (split it, dont wanna have broadcasts from printers, clients, )
multi-site (have it or use it, decentral backup)
test backups (i don't mean delete production data!)
desaster recovery + test plans