Translating is hard work, especially since the other two languages are different from each other. French to Spanish? No problem. Ancient Greek in Esperanto? Much more difficult. But sign language is a unique case, and it is particularly difficult to translate because it is fundamentally different from spoken and written languages. All the same, SignAll has been working hard for years to make accurate, real-time ASL translation a reality.
One might think that with all the advances in artificial intelligence and computer vision, a problem as interesting and beneficial to solve as this one would be besieged by the best. Even thinking of it from a cynical point of view of market expansion, an Echo or TV that understands sign language could attract millions of new (and very grateful) customers.
Unfortunately, this does not seem to be the case – leaving small businesses like SignAll, based in Budapest, to work hard for this underserved group. And it turns out that translating sign language in real time is even more complicated than it looks.
CEO Zsolt Robotka and R & D manager Marton Kajtar were exhibiting at CES this year, where I talked to them about the company, the challenges they faced and how they were doing. were waiting for the domain to evolve. (I’m glad to see that the company was also at Disrupt SF in 2016, even though I missed them then.)
The most interesting thing for me about the whole business is perhaps how much the problem is interesting and complex to solve.
“It’s a multi-channel communication – it’s not just about shapes or hand movements,” says Robotka. “If you really want to translate sign language, you have to follow the whole process. upper body and facial expressions – which makes computer vision very difficult. “
From the outset, it’s a difficult question because it’s a huge volume to follow a subtle move. The configuration now uses a Kinect 2 more or less in the center and three RGB cameras positioned at one or two feet. The system must be reconfigured for each new user, since everyone is talking differently, all ASL users sign differently.
“We need this complex configuration because we can bypass the lack of resolution, both temporally and spatially (refresh rate and number of pixels), having different points of view,” said Kajtar. “You can have fairly complex finger patterns, and traditional hand skeletonization methods do not work because they clog each other.” So we use the side cameras to solve the occlusion. “
As if that were not enough, the facial expressions and the slight variations of gestures also inform what is said, for example by adding emotion or indicating a direction. And then, there is the fact that sign language is fundamentally different from English or any other common spoken language. This is not a transcript – it’s a complete translation.
“The nature of the language is the continuous signature, which makes it difficult to tell when one sign ends and another begins,” said Robotka. “But it’s also a very different language, you can not translate word for word, recognizing them in a vocabulary.”
SignAll’s system works with complete sentences, not just individual words presented sequentially. A system that disassembles and translates one sign after the other (whose limited versions exist) would be likely to create misinterpretations or overly simplistic representations of what has been said. While this may be suitable for simple things like asking for instructions, meaningful meaningful communication has levels of complexity that must be accurately detected and reproduced.
Somewhere between these two options is what SignAll is aiming for for its first public pilot of the system, at Gallaudet University. This deaf school in Washington is rehabilitating its reception center and SignAll will be installing a translation booth so hearing people can interact with deaf staff.
This is a good opportunity to test this, Robotka said, because usually the information deficit is the opposite: a deaf person who needs information from a hearing person . Visitors who can not sign can speak, and the query can be turned into text (unless the staff member can read the lips) and respond with signs, which are then translated into text or synthesized speech.
It sounds complicated, and from a technical point of view, but in reality, neither one nor the other person needs to do anything, but to communicate as they normally do, and they can be understood by each other. When you think about it, it’s really great.
To prepare the pilot project, SignAll and Gallaudet worked together to create a database of signs specific to the application in question or local to the university itself. There is no complete 3D representation of all signs, if this is even possible, so for the moment the system will respond to the environment in which it is deployed, with specific gestures to the domain that are added to a database.
“It has been a huge effort to collect the 3D data of all these signs, we have just finished with their support,” said Robotka. “We did some interviews, collected a few conversations that took place, to make sure that we have all the elements and signs of the language.We expect to do this kind of customization work for the first pilots . “
This long-standing project is a sobering reminder of both the possibilities and limitations of technology. True, automatic translation of sign language is a goal that only becomes possible thanks to advances in computer vision, machine learning and imaging. But unlike many other translation or CV tasks, it takes a lot of human input every step of the way, not only to get some basic accuracy, but also to make sure that the humanitarian aspects are there.
After all, it is not just a matter of reading a foreign press article or communicating abroad, but of a class of people who are fundamentally excluded from what most people consider it a communication in person. To improve their lot, it is worth to wait.
Image of the hotel: SignAll