What's new

Japanese translator and morphological analysis

Toritoribe

松葉解禁
Staff member
Moderator
Joined
Feb 22, 2008
Messages
14,830
Ratings
2 1,537
Try kanji stem verbs that contain okurigana っ(small tsu), such like 知った, 入った, 送った. Your program has an obvious bug in romaji transliteration and hiragana readings. Also, it can't read the date, e.g. 19日, correctly, either.
 

haibuihoang

A curious learner
Joined
Apr 22, 2012
Messages
53
Ratings
5
Try kanji stem verbs that contain okurigana っ(small tsu), such like 知った, 入った, 送った. Your program has an obvious bug in romaji transliteration and hiragana readings. Also, it can't read the date, e.g. 19日, correctly, either.
Thank you very much for showing that. The decompose of the sentence is done by Mecab, which decompose it into 知っ and た. That's why the Kana2romaji conversion failed. I will find a way to fix this.
 
Joined
Jan 14, 2009
Messages
1,660
Ratings
393
Definitely some weird bugs:

Original Japanese sentence
我輩 は 猫 で ある

Translated Romaji/Kana
wagahai wa neko de aru

Translated English
Wagahaihanekodearu
I don't think any form of automatic translation is much good for learners, as it is likely to have some errors (there is no perfect automatic transcription/translation) and therefore in any case, you still have to work out if what it gives you back is correct, and if you could do that, you might as well do it manually in the first place.

If the project interests you, do continue, but given the scale of the problem the chances of getting it to the point where it's able to handle any arbitrary text with high accuracy by yourself is low (I mean, machine translation is not a solved problem).
 

Toritoribe

松葉解禁
Staff member
Moderator
Joined
Feb 22, 2008
Messages
14,830
Ratings
2 1,537
He seems to use just copy and paste of Google Translate for the "Translated English" section.
 

haibuihoang

A curious learner
Joined
Apr 22, 2012
Messages
53
Ratings
5
He seems to use just copy and paste of Google Translate for the "Translated English" section.
I don't think any form of automatic translation is much good for learners
The translated English uses Google section translation engine. Of course machine translation is very difficult and I don't think an individual can do it better than Google. Surely even Google translation is far from good or even acceptable translation and you can learn nothing from that. That's why I include the sentence analysis in the first place (it uses Mecab in case you want to know), and Japanese to Romaji/Kana conversion on the second place and Machine translated section is just serve as a reference. Many Japanese learners (especially beginners like myself) really appreciate the first and second parts, but it may not be really useful for advanced users like many of you here.
 

Toritoribe

松葉解禁
Staff member
Moderator
Joined
Feb 22, 2008
Messages
14,830
Ratings
2 1,537
The most fundamental problem is that MeCab uses 国文法 to parse sentences. 国文法 is the traditional/pedagogic Japanese grammar basically used in Japanese schools for native Japanese speakers. It is originally made to analyze Classical Japanese, and then applied to Modern Japanese, therefore it's inconvenient to parse Modern Japanese. For instance, the conjugation table doesn't have past, potential, passive, or many other basic forms of the conjugation. Instead, the conjugation suffix た is treated as an auxiliary verb in past forms, and て is a particle in -te forms in this grammar. Most of all non-native learners aren't familiar with this way of parsing. I think it's not useful especially for beginners, since it's quite confusing.

Wait, I learned 入った is the past form of 入る, but this program says 入っ is an inflection of 入る and た is an auxiliary verb. WHY???

BTW, Google translate can provide romaji transliterations, too. What's more, they return correct answers even for 知った, 入った or 送った. Is there any merit to use your program?
 

haibuihoang

A curious learner
Joined
Apr 22, 2012
Messages
53
Ratings
5
...
Wait, I learned 入った is the past form of 入る, but this program says 入っ is an inflection of 入る and た is an auxiliary verb. WHY???
I've been searching for such a good morphological analyzer for a white, but it seems that Mecab is the most popular. I'd be greatly appreciate if someone can show a more accurate software.
By the way, in Romaji, I've made some effort for more correct transliteration. For example: Japanese Translator | RomajiDesu
The translated romaji is okutta as expected.

BTW, Google translate can provide romaji transliterations, too. What's more, they return correct answers even for 知った, 入った or 送った. Is there any merit to use your program?
As pointed above by many others, Google translate is not (yet) a solution but it still can serve as a reference. Come back to the example, when you hover the mouse over '送っ', it will show the base form (dictionary form 送る) which will lead you to its definition page on the dictionary Meaning of 送る in Japanese | RomajiDesu Japanese dictionary. Before for this, there are many times I searched for a word in the dictionary which returns nothing because it's not in the base form. The color coded particles is also useful to me to know the structure of a sentence as well .

PS. I've check 入った, it shows the orginal form is 入る and the word type is verb (screenshot), where did you get 入っ is an inflection?
 
Joined
Jan 14, 2009
Messages
1,660
Ratings
393
Because you haven't highlighted the た。 If you mouse over that it shows you it's "auxiliary verb". Wouldn't that be confusing to a beginner?  入っ is, by itself, not a conjugation of 入る, and はいっ is not pronounced "hait".

By comparison, the WWWJDIC text glosser, while hardly perfect, returns for 入った:
Possible inflected verb or adjective: (plain, past)
入る 【いる; はいる】
(then meanings)

Worse, if I put in something like 住んでる, it splits it into 住ん and でる, with a link to the dictionary meanings of でる!

If the issue is that somebody has trouble with grammar and parsing of sentences to get out words that you can find in the dictionary, then my recommendation is to work on grammar and practice parsing. There's no simple technological solution because computers are dumber than people.

For example, here's a simple "textbook" type sentence.
走ってはいけません

The parser splits it into "走っ / て / は / いけ / ませ / ん ", labels て as a particle rather than part of the verb, and gives no clear information that いけません is a negative verb (also google gets the translation wrong). If I click ませ it links me to the meanings for the verb ます!

You should make it clear on the "translation" page where your info is coming from. You may mention it elsewhere on the site, but here and there when you say things like "I've just developed this Japanese translator and morphological analysis" it sounds like you've done more work than just strapping an existing parser and google translate together.

I am sure your intentions are good, but it's just not going to fly.
 

haibuihoang

A curious learner
Joined
Apr 22, 2012
Messages
53
Ratings
5
Thank you very much for showing the obvious limitation and for your suggestion of putting the translation/parser source on the page (though I put it in the copy right page already).
Thanks for your suggestion, I've removed most of the dictionary-linked word that is confusing.
Anyway, I have to agree with you on this,
There's no simple technological solution because computers are dumber than people.
 
Last edited:
Top