With 19 million cataract surgeries performed annually, cataract surgery is the most common surgical procedure in the world . A cataract is a clouding of the lens, inside the eye, which leads to a decrease in vision. The purpose of the surgery is to remove this lens and replace it with an artificial lens. The entire procedure can be performed with small incisions only. The natural lens is indeed broken into small pieces with the help of a high-frequency ultrasound device before leaving the eye. As for the artificial lens, it is rolled up inside an injector before entering the eye. The surgery typically lasts 15 minutes. During the surgery, the patient’s eye is under a microscope and the output of the microscope is video-recorded.
 Trikha S, Turnbull AM, Morris RJ, Anderson DF, Hossain P. The journey to femtosecond laser-assisted cataract surgery: new beginnings or false dawn? Eye. 2013; 27:461–473.
A dataset of 100 videos from 50 cataract surgeries has been collected. Surgeries were performed in Brest University Hospital between January 22, 2015 and September 10, 2015. Reasons for surgery included age-related cataract, traumatic cataract and refractive errors. Patients were 61 years old on average (minimum: 23, maximum: 83, standard deviation: 10). There were 38 females and 12 males. Informed consent was obtained from all patients. Surgeries were performed by three surgeons: a renowned expert (48 surgeries), a one-year experienced surgeon (1 surgery) and an intern (1 surgery). Surgeries were performed under an OPMI Lumera T microscope (Carl Zeiss Meditec, Jena, Germany). Two videos were recorded for each surgery: one recording tool-tissue interactions through the surgical microscope, the other one recording the surgical tray (see Fig. 1). In total, more than nine hours of surgery (i.e. 18 hours of videos) have been recorded.
|1. surgical microscope (© copyright Zeiss)||2. surgical tray with the articulated arm|
Surgical Microscope Videos
The first set of videos was recorded with a 180I camera (Toshiba, Tokyo, Japan) and a MediCap USB200 recorder (MediCapture, Plymouth Meeting, USA). The frame definition was 1920x1080 pixels and the frame rate was approximately 30 frames per second.
Surgical Tray Videos
The second set of videos was recorded with an HDR-PJ5301 video camera (Sony, Tokyo, Japan) attached to the surgical tray by an articulated arm. The frame definition was 1920x1080 pixels and the frame rate was 50 frames per second.
The two video recordings were started manually, by pressing two different buttons. Thereforefore, pairs of videos from the same surgery had to be synchronized retrospectively (see Fig. 2). Synchronization was based on tool usage analysis: videos were synchronized in such a way that no tool appears both on the tray and under the microscope (see Fig. 3). Tray videos were resampled to match the frame rate of the microscope videos. When necessary, empty frames (black frames) were added in microscope or tray videos to ensure that each frame in the microscope video is associated with exactly one frame in the tray video.
|1. surgical microscope video||2. surgical tray video|
Because of inconsistencies among video decoders, we decided to convert each video into one directory containing one JPEG image per frame.
All surgical tools visible in microscope videos were first listed and labeled by the surgeons: a list of 21 surgical tools was obtained (see Fig 4).
|1. biomarker||2. Charleux cannula||3. hydrodissection cannula||4. Rycroft cannula||5. viscoelastic cannula||6. cotton||7. capsulorhexis cystotome|
|8. Bonn forceps||9. capsulorhexis forceps||10. Troutman forceps||11. needle holder||12. irrigation / aspiration handpiece||13. phacoemulsifier handpiece||14. vitrectomy handpiece|
|15. implant injector||16. primary incision knife||17. secondary incision knife||18. micromanipulator||19. suture needle||20. Mendez ring||21. Vannas scissors|
Annotations were then collected for these 21 tools in each video. Annotators were two non-M.D. cataract surgery experts trained by an experienced ophthalmologist to recognize each surgical tool. They were instructed to indicate (through a web interface) when:
- a surgical tool starts or stops touching the eyeball in the microscope video,
- a surgical tool enters or exists the field of view in the tray video.
Frame-level labels were then determined automatically for each annotator. These labels are binary for microscope videos: 1 means that the tool is in contact with the eyeball, 0 means that it is not. They can be any non-negative integer in tray videos: they indicate the number of visible instances of this tool on the tray.
Next, annotations from both experts were adjudicated. Disagreements on the list of tools in contact with the eyeball, or on the list of tools visible on the surgical tray, were automatically detected. In case of disagreement, experts watched the video together and jointly determine the actual contact or visibility status. However, the precise contact or visibility timing was not adjudicated. Therefore, a real-valued reference standard was obtained. For microscope videos:
- 0 means that both experts agree that the tool is not in contact,
- 1 means that both experts agree that the tool is in contact,
- 0.5 means that experts disagree.
For tray videos, the average tool instance count is provided.
Some statistics about the annotated videos are included below. In the microscope videos, inter-rater agreement about tool contact is measured using Cohen’s Kappa. In the tray videos, inter-rater agreement about the number of visible instances of each tool is measured using linear-weighted Cohen’s Kappa.
Kappa in Microscope Videos
- Before adjudication: k = 0.894 +/- 0.096
- After adjudication: k = 0.909 +/- 0.098
Linear-weighted Kappa in Tray videos
- Before adjudication: k = 0.752 +/- 0.257
- After adjudication: k = 0.988 +/- 0.039
We can see that the adjudication task was more complex in the tray videos: due to the large number of tools, the annotators often lost track of a few of them.
Each video is associated with one CSV file containing one row per frame in addition to a header. Because videos were converted into directories of JPEG images, each row is in fact associated with one JPEG image. The first column contains the frame index (i.e. the name of the JPEG image without the extension). The second column contains a binary value indicating whether or not this frame is an empty frame added for synchronization purposes. The remaining 21 columns contain the real-valued reference standard for each surgical tool.
Training/Test Set Separation
The dataset was divided into a training set (25 pairs of synchronized videos) and a test set (25 pairs of synchronized videos). Division was made in such a way that 1) each tool appears in the same number of videos from both subsets (plus or minus one) and 2) the test set only contains videos from surgeries performed by the renowned expert. Apart from that, division was made at random.