Linux Kernel 6.13 to Support Display of Stuck Task Counts, Aiding Administrators in Fault Diagnosis
In Windows, we can quickly view which processes are running and which have become unresponsive or frozen via the Task Manager. Now, Linux Kernel is set to offer a similar feature.
Of course, the Linux Kernel cannot provide a graphical interface, but it will display the specific number of processes/tasks that have encountered response issues after certain faults occur, allowing system administrators to judge the situation based on the count.
The new patch supporting this feature has been merged into Linux Kernel version 6.13. Once this version is officially released and systems are updated to the latest kernel version, the count of stuck tasks will be available for use.
The added option, /proc/sys/kernel/hung_task_detect_count, primarily serves to indicate the number of all pending task warnings since the system/server was booted.
For example, a Linux server that has been running continuously for over 200 days will inevitably encounter some process issues, which will be recorded. If at any point the number of hung processes/tasks suddenly increases, it clearly indicates a fault in the server.
In such cases, system administrators can quickly determine if there is a software or hardware fault based on the number of hung tasks. While one cannot solely rely on the count of hung tasks for diagnosis, it acts as a warning, necessitating timely fault investigation by administrators to analyze the specific cause.
However, as of now, there isn't a convenient report for the count of hung tasks available, which might require more time to develop into a feature that can more intuitively assess the situation.