Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater
source audio |
|
source text |
puede decirme el camino al hotel, por favor? |
target text |
can you tell me the way to the hotel please? |
source audio |
|
source text |
puede decirme el camino al hotel, por favor? |
target text |
can you tell me the way to the hotel please? |
~100 languages supported in Google Translate
~3000 languages without a written form
“Moun kwense nan Sakre Kè nan Pòtoprens”
People trapped in Sacred Heart Church, PauP
international rescue teams face language barrier
volunteers helped create parallel text corpora in a short time
source audio |
|
source text |
? |
target text |
yes well and the car |
what can we do in this situation?
source audio |
|
pseudo text |
c38c92 |
target text |
yes well and the car |
use a substitute for ASR
an unsupervised state of the art speech-to-text toolkit
+
off-the-shelf standard machine translation toolkit
Positives
learns some meaningful content word translations
~10 hours of training data
Negatives
big improvements required
audio utterance | pseudotext | human translation |
---|---|---|
|
c1 | yes well and the car |
c1 c2 | yes well and hows it going | |
c2 | this work | |
c2 | call him from work | |
IBM Model 1 : p(english word | cluster id)
transcription | human translation |
---|---|
sı́ pues y el carrito | yes well and the car |
sı́ pues y qué tal vas con | yes well and hows it going |
este trabajo y se | this work |
llama del trabajo mi | call him from work |
IBM Model 1 : p(english word | source word)
audio utterances | prediction @ K=1 | human translation |
---|---|---|
|
(yes) | yes well and the car |
(yes) (job) | yes well and hows it going |
audio utterances | prediction @ K=2 | human translation |
---|---|---|
|
(yes) (well) | yes well and the car |
(yes) (well) (work) (job) | yes well and hows it going |
out-of-vocabulary | |||
train/test split | oracle | ||
realistic | 17% | ||
optimistic | 10% |
transcript | human translation | prediction | cómo anda el plan escolar | how is the school plan going | school, going |
dile que le mando saludos | tell him that i say hi | say, hi | |
sí con dos dientes menos | yeah with two teeth less | denture, yeah, teeth |
Thank you. Questions!
our pred. | gold pred. | transcript | pseudotext |
---|---|---|---|
WELL COMPUTERS EVERYTHING | AHA EVERYTHING COMPUTERS | CO AJá COMPUTADORAS PUES | 1576 1737 11816 |
(☞゚∀゚)☞
our pred. | gold pred. | transcript | pseudotext |
---|---|---|---|
TELL MAMY SAD FUN | SAD THINKING DAY BIRTHDAY | TRISTE PENSANDO DE QUE EL DíA DE TU CUMPLEAñOS | 293 348 379 2514 |
(ಥ﹏ಥ)
our pred. | gold pred. | transcript | pseudotext |
---|---|---|---|
OH WELL 'S PROBLEM HEY WELL | AH BUENO ESA ES LA CUESTIóN OYE BUENO Y | 1354 |
(╯°□°)╯︵ ┻┻
spk 1: me mande mi acta de bautismo
spk 2: una nueva
spk 1: sí oye mamá
spk 3: sí mi hijo
spk 1: mi acta de bautismo
spk 3: sí
cluster | source words | purity |
---|---|---|
(c2) cómo, trabajo, tal vas con | 0.18 | |
claro, claro, claro, ... | 1 | |
: ) | 0.95 | |
mhmm | 1 |
acoustic clusters with errors = 66%
audio | words |
---|---|
bien (female voice) | |
bien (male voice) |
# acoustic clusters for bien = 271
accuracy of patterns across calls = ~10%