Expectations

  • Software engineers are responsible for
  • Design and performance of queries and tables
  • Basic DB topology design, including the placement of DB servers
  • Monitoring and alerting metrics from the application code, but not DB
  • DBAs are responsible for
  • Monitoring and alerting metrics from DB layer
  • Backup and restore db from/to different servers
  • Failover/failback drill
  • DBA should be able to answer at minimum 80% of questions I posted here
  • Communication wise, see if they are aware and follow the STAR principle
  • Culture wise, see if they demonstrate
    • Ownership
    • Disagree and commit
    • Dive deep

Questions

  • Give two cases of deadlocks you see on the DB, and explain how you solved it.
    • Be able to explain isolation levels
    • Be able to explain difference of different lock types
    • Be able to explain ways to detect and fix deadlock cases
  • Give two cases of slow queries you see on the DB, and explain how you solved it
    • Show the use of slow query log, query plan, and unix performance debugging commands
    • Monitoring and alerting should be in place to discover the slow query after 5 mins for OLTP, or high resource usage for OLAP
    • Be able to explain the common patterns of setting up indices and identify when the indices are too many or too few
  • Give two cases of replication lag you see on the DB, and explain how you solved it
    • Be able to explain replication lag threshold set, which should be no more than 10 minutes
    • Be able to explain the HA and DR setup of the replication tool, and the operational excellence to ensure the HA
    • Be able to give verfication procedure after the replication is fixed
  • What is the backup/restore DB speed I can expect?
    • Be able to dig more context info without being prompted. Red flag if jump into an estimate immediately
    • Be able to give verification procedure after the restoration is done
    • Be able to explain common parameters for the dump/restore tool
  • Design or explain DR drills for the DBs you are responsible for
    • Knows the meaning of RTO, RPO, MTTR. Red flag if not able to specify them before jumping into drill details
    • The DR drills should cover multiple cases, from the most simple to complete outage. At least 3 cases should be covered