Troubleshooting is one of the most important skills for the operations professional.
Troubleshooting is the process of:
As you can see from this list, research is a foundational aspect of troubleshooting. After an issue has been discovered, it is usually up to the operations team to identify the issue, research the issue, and ultimately fix the issue.
Troubleshooting skills are improved through experience. At the start of your career you won’t have much knowledge about what can go wrong, and how to fix a broken deployment. However, as you continue to research and fix issues, your troubleshooting skills will grow.
Note
This chapter is in no way exhaustive. As you continue throughout your career, you will learn about new techniques, tools, and solutions to issues.
We will provide some opinionated guidelines to give you a foundation, but eventually you will hone in on the most efficient approach for you and your team.
The following sections will present multiple case studies in which we will illustrate the eight troubleshooting steps. The objective of these sections is to gain a strong understanding of the troubleshooting process.
The final “How to Troubleshoot” section will introduce tools, methodologies, tips, and tricks to help you understand how to perform the troubleshooting process.
Although development troubleshooting will be mentioned in this chapter, we will primarily focus on operations troubleshooting since that has been the focus of this Azure course.
Note
The issues throughout this chapter will be framed as issues in the Coding Events API. Many of the issues discussed may be similar, or identical, to issues you may encounter in other deployments.
To start, let’s build some troubleshooting knowledge by examining some case studies on issues that could arise in a Coding Events API deployment.