QUALITY IMPROVEMENT

Evaluation of Deep Learning Models for Identifying Surgical Actions and Measuring Performance

Published on

March 2, 2020

JAMA Network Open

Shuja Khalid, Mitchell Goldenberg, Teodor Grantcharov, Babak Taati, Frank Rudzicz
Shuja Khalid, Mitchell Goldenberg, Teodor Grantcharov, Babak Taati, Frank Rudzicz
Shuja Khalid, Mitchell Goldenberg, Teodor Grantcharov, Babak Taati, Frank Rudzicz

Overview

This quality improvement study evaluated a framework for assessing surgical video clips using deep learning models. The objective was to categorize clips based on the surgical step being performed and the surgeon's competence level. The study analyzed 103 video clips of 8 surgeons performing knot tying, suturing, and needle passing. The deep learning models were trained to estimate categorical outputs such as performance level (novice, intermediate, expert) and surgical actions.

The results showed high accuracy in identifying surgical actions and performance levels using only video input. The models achieved mean precisions of 0.97 for surgical actions and 0.77 for performance levels, with corresponding mean recalls of 0.98 and 0.78. These findings demonstrate that deep machine learning can effectively identify associations in surgical video clips, representing a crucial step towards creating an automated feedback mechanism for surgeons to learn from their experiences and refine their skills. This approach could potentially address the limitations of current subjective evaluation methods and the infeasibility of reviewing every surgical procedure manually.

Results

The provided architectures achieved accuracy in surgical action and performance calculation tasks using only video input. The embedding representation had a mean (root mean square error [RMSE]) precision of 1.00 (0) for suturing, 0.99 (0.01) for knot tying, and 0.91 (0.11) for needle passing, resulting in a mean (RMSE) precision of 0.97 (0.01). Its mean (RMSE) recall was 0.94 (0.08) for suturing, 1.00 (0) for knot tying, and 0.99 (0.01) for needle passing, resulting in a mean (RMSE) recall of 0.98 (0.01). It also estimated scores on the Objected Structured Assessment of Technical Skill Global Rating Scale categories, with a mean (RMSE) precision of 0.85 (0.09) for novice level, 0.67 (0.07) for intermediate level, and 0.79 (0.12) for expert level, resulting in a mean (RMSE) precision of 0.77 (0.04). Its mean (RMSE) recall was 0.85 (0.05) for novice level, 0.69 (0.14) for intermediate level, and 0.80 (0.13) for expert level, resulting in a mean (RMSE) recall of 0.78 (0.03).