What's new

Japanese & search engines

rajs20

Hokage
Joined
Oct 23, 2004
Messages
77
Reaction score
0
In Japanese, there's a lot less punctuation than in english... in particular, there aren't usually any spaces between words.

So, does anyone know how search engines like google handle that? Especially if the page is in hiragana... For example, if you search for "おもい" (heavy), then how does it know that "とおもいます" (I think) shouldn't be included in the results?

Thanks,
raj
 

PaulTB

Manga Psychic
Joined
Jan 22, 2004
Messages
2,184
Reaction score
57
rajs20 said:
In Japanese, there's a lot less punctuation than in english... in particular, there aren't usually any spaces between words.

So, does anyone know how search engines like google handle that?
Actually I do - and it's something you have to watch out for.

Now there are a few things you have to beware of when searching for Japanese with Google.

1. Japanese or Chinese ?

You should be able to tell the difference but Google can't reliably tell the difference between a Japanese search term and a Chinese one.

Therefore you should always a) Make sure your language is set to Japanese (easiest way is to start from Google ) or b) Make sure there is some kana in your search (easiest way is to add の after a space).

If you don't do this you will sometimes get very bad search results.

2. Do not expect Google to avoid 'partial word' matches. For example if, in English, you search for 'excitable' you won't get any matches to unexcitable. In Japanese if you search for もちはわかります you will get matches for きもちはわかります.

3. Google may sometimes 'helpfully' split words in your search term.

For example I recently did a Google search on 積層窓. (Which I think is 'laminated window' but don't worry - it's hardly ever used).

However the results returned were for 積層 窓 - in other words every page with both 積層 and 窓 but not necessarily the 'word' 積層窓.

In order to only get the results for 積層窓 it has to be included in " marks.

There are other points but that covers most of it.
 

GaijinPunch

遠いから行きません
Joined
Nov 25, 2004
Messages
1,205
Reaction score
33
Errr... Google has an "only search for Japanese pages" option that should solve any of those problems. Besides, it should be able to tell the langauge based on the encoding. I don't think I've ever gotten a Chinese page for what I've searched for... could've been my search criteria though.

In my own experiences, I've gotten the best results by using spaces wisely. I've gotten very bad results when typing in a whole sentence like I would in English.
 

PaulTB

Manga Psychic
Joined
Jan 22, 2004
Messages
2,184
Reaction score
57
GaijinPunch said:
Errr... Google has an "only search for Japanese pages" option
The easiest way to get to which is, as I said, to start from Google .
GaijinPunch said:
that should solve any of those problems.
Which only solves _one_ of those problems.
GaijinPunch said:
Besides, it should be able to tell the language based on the encoding.
ぶ・ぶー。 If you look closely at my post you'll see it says "... can't reliably tell the difference between a Japanese search term and a Chinese one."
That's got nothing to do with the encoding of the pages you are looking for.

GaijinPunch said:
I don't think I've ever gotten a Chinese page for what I've searched for... could've been my search criteria though.
There are two separate possibilities - both of which happened to me lots when I carelessly forgot my own advice.

1. By some quirk Google looks in Chinese pages only for your search term. Result - you get far less pages than you should and they are all in Chinese.
2. Google looks in Chinese and Japanese pages (which is actually the proper default behaviour) and you happen to have a word which is present in both languages. Results - you get far more pages than you should and most of them are in Chinese.

Now I am more sensitive to these points because I've used Google a very great deal to check how often obscure words and words with odd kanji are used.
 
Last edited:

BrennaCeDria

Lovely Angel
Joined
May 5, 2004
Messages
554
Reaction score
4
Also, typically a page in a certain language will be from the same country as that language, so if a url ends in co.jp, then it's most likely Japanese rather Chinese.
 

rajs20

Hokage
Joined
Oct 23, 2004
Messages
77
Reaction score
0
Sweet, thanks, that's exactly the kind of info I was wondering about :)
 
Top