Are Female Radiologists More Prone to Speech Recognition Errors?
Significantly more errors occur in speech recognition software when used by female radiologists than for their male counterparts.
That‘s the surprise headline that swept the medical world not long ago as researchers at the University of Maryland School of Medicine raised new questions about the accuracy of female physicians who use SR applications — this in a specialty long known for leveraging technology to reduce clinical error, not increase it.
Commenting in the June 4 issue of Diagnostic Imaging, lead author and University of Maryland radiology resident Syed Ali, M.D., concluded, "The discrepancy may have a significant negative impact on reporting accuracy and productivity for female radiologists."
The announced findings caused quite a stir, but study designer, Khan M. Siddiqui, M.D., tells ACR News Scan that the results were considerably more mixed than the lopsided, male-over-female story that got so much ink in the media. Here’s a closer look.
Beyond “He Said, She Said”
Conducted at the Baltimore Veterans Affairs (VA) Medical Center, the gender study was part of the VA’s comprehensive, multi-year “Reading Room of the Future Project” helmed by Eliot L. Siegel, M.D., professor in the Diagnostic Imaging Department at the University of Maryland School of Medicine and Chief of Imaging at the VA Maryland Healthcare System.
Dr. Siegel’s team — which gained prominence in 1993 when it debuted the world’s first filmless, enterprise picture archiving and communication systems (PACS) — is gathering data that will help design reading rooms optimized for today’s deadline-driven radiologist. The need for improvement is acute, as studies show radiologists using PACS often complain of eye strain, neck and shoulder pain, repetitive stress injuries, and more.
According to Dr. Siddiqui — chief of Imaging Informatics and Cardiac CT and MR at the VA Maryland Healthcare System — the idea to study gender differences in SR actually came from female faculty members in the University’s radiology department. As the department was starting to embrace the latest, greatest SR technology, these faculty wanted to go back to digital dictation.
Seeking answers to their concerns about error rates, the University of Maryland team began isolating baseline differences that might degrade performance, including such variables as gender, age, computer proficiency, accents, talking speed, and so forth. Ironically, another factor they looked at was the “white noise” system used to mask background sounds in the “Reading Room of the Future.”
The researchers gave five male and five female residents commercial SR medical software and had them dictate 10 standardized radiology reports containing a total of 2,123 words. Male error rates ranged from a low of 2.5 percent to a high of 13.9 percent versus 1.5 percent to 20.6 percent for the women.
Notably, one of the female radiology residents recorded the group’s lowest error rate in a single report. The most error-prone male recorded nearly 10 times as many errors, while the most error-prone female recorded nearly 14 times as many errors. On average, the five males narrowly outperformed the five female radiologists.
If the overall gender picture was mixed, one thing was strikingly clear: The high error rates among both male and female radiologists suggested that out-of-the-box SR tools are less accurate than advertised, Dr. Siddiqui says. Unmodified SR applications, he notes, may have an error rate approaching “something like 10 percent” — a figure that dwarves the observed gender differences.
Conversely, results reinforce what SR manufacturers have long maintained: Doing a little extra SR training — and learning how to finesse the system — rewards radiologists with the clinical trifecta of fewer errors, higher productivity, and reduced stress.
Dr. Siddiqui says the vulnerability of SR technology to extraneous noise — in comparison to dictation using a human transcriptionist — can compromise accuracy rates. Additionally, a reading room that may seem acoustically acceptable for SR reporting may in reality be the hidden cause behind a department’s frustratingly high error rates. A reading room with poor acoustic design can significantly reduce the accuracy of a state-of-the-art SR system, even if that room lacks obvious audible noise from extraneous sources, such as a nearby lobby or MR scanner.
The Real Problem: High-Pitched Voices
While the study found statistically significant variation in male-female error rates, it found a closer performance gap between high- and low-pitched voices, Dr. Siddiqui notes. For whatever reason, current SR technology cannot process high-pitched voices as accurately, which translates into more errors for females. Few trade publications reporting the story made this key distinction and instead focused on gender.
One regret Dr. Siddiqui has is that his study did not measure the pitch of each of the 10 radiologists. “In hindsight, we should have done that,” he says. “Then instead of saying ‘male and female,’ we could have looked at the precise pitch of each voice as it relates to error rates.”
Questions remain. Why do SR applications understand a baritone male voice better than the soprano or alto? Dr. Siddiqui says more research is needed, but speculates that SR software “was optimized for a deep, male voice.” Indeed, posters on SR message boards this summer theorized that SR developers crafted their products in male-dominated environments. Had their test subjects been all-female, it is conceivable that headlines in May might have read: ”Male Radiologists More Prone to Speech Recognition Errors.”
Dr. Siddiqui says the small difference in male and female error rates may pale next to those of as yet-unstudied factors. Further research will examine how accents impact accuracy. In time, we may know which voice is optimized for (or penalized by) SR technology, be it the “urban” dialect, East Tennessee accent, or accents from Queens, west Texas, or coastal Maine. Researchers will also look at the accuracy of native versus non-native English speakers, which may emerge as a hot-button issue as radiologic studies continue to be offshored.
An “Easy” Solution
While some female radiologists may wonder if manufacturers are working on women-specific SR applications, Dr. Siddiqui says female radiologists — and male radiologists with difficult-to-process voices — can easily increase their accuracy by being more mindful of their speaking voice and doing extra training. In most cases, this entails doing “maybe one more or two more sessions than are regularly recommended,” he says. Other possible solutions include training the system to recognize especially problematic words and using macros for key passages.
Dr. Siddiqui identifies a second solution: digitally altering or “preprocessing” the female voice. “The incoming [digital] sound can be modified for maximum accuracy,” he says. Similarly, manufacturers could install filters on the back end of SR systems to score even greater gains.
Dr. Siddiqui hopes his study has made SR vendors more aware “that there is some kind of difference, and that either they investigate it more or build a filter to compensate for it.” The more people talk about this issue, he says, the more likely an industry solution. But given the proprietary nature of SR technology, he adds, “We don’t know what they are doing and not doing.”
These days at the University of Maryland School of Medicine, the radiology staff reports substantially higher accuracy rates than those measured in the studies. Experience, however, has taught Dr. Siddiqui that there is no one-size-fits-all solution.
“Everybody is using it differently,” he says. “One of the things we learned immediately is that there is no standard way of optimizing SR. Just as everybody wears different clothes, everybody has his or her own modified dictation style. One radiologist changed her macros, another changed the way she says various medical terms, while another changed her dictation style by avoiding certain hard-to-understand words.”
Dr. Siddiqui — a native Pakistani — says his accuracy with SR is very high despite his moderate accent. “I’ve been doing SR for five years now and have learned what works and what doesn’t. If I speak without proper modification, I experience higher error rates; it’s that simple.”
For reasons not yet entirely understood, then, SR software seems to “understand” some voices far more easily than others. This fact hasn’t soured Dr. Siddiqui on the technology, as he believes fully mature SR technology is just a “couple of years away.”
In the meantime, he reminds, “When a transcriptionist is typing something, she is not only writing the words and sentences, but she is understanding what you are saying. If you mumble or skip something, she can fill it in. With current SR applications, the technology is not comprehending what you are saying in the same way.”
Offering a parting thought, he observes, “Some people really love SR from the get-go. One of our female radiologists here really likes it, because the SR did a better job at converting her words and sentences than a transcriptionist who did not understand her accent.”
