What's new
Learn Japanese with JapanesePod101.com

Mojibake (文字化け) fixing - info and requests

Welcome to our Japan community!

A discussion forum for all Things Japanese. Join Today! It is fast, simple, and FREE!

nekojita

先輩
Joined
14 Jan 2009
Messages
1,660
Reaction score
443
Dear all:

I now have access to a pre-move backup of the forums, and am (slowly) trying to tackle all the mojibake (this stuff: 文字化け) in the Learning Japanese sub-forum.

For the moment, I'm just working backwards in time. Since this is a manual process it will be quite slow. Therefore, if there are any old threads/posts you would particularly like restored, drop me a note here and I'll put them at the top of the list.
 

thomas

Unswerving cyclist
Admin
Joined
14 Mar 2002
Messages
14,137
Reaction score
6,598
Thanks @nekojita !

I'm proceeding in the same fashion in the 一般的なフォーラム section and will join your efforts in the Learning Japanese subsection once that's accomplished.
 

eeky

先輩
Joined
8 Jun 2010
Messages
2,431
Reaction score
22
Since this is a manual process it will be quite slow.
I don't exactly understand the technical details of what has gone wrong, but is there really no way that this conversion can be automated? Are you having to retype content by hand? It sounds a complete nightmare!
 

nekojita

先輩
Joined
14 Jan 2009
Messages
1,660
Reaction score
443
My understanding is that there's no automatic way.

I have access to the backups, so what I do is get the old thread, and copy-paste lock, stock, and barrel. (I can directly go via post number in the url so this bit is easy).

For whatever reason, often only the first post or just the title is mucked up, so it's not as bad as it could be.
 

eeky

先輩
Joined
8 Jun 2010
Messages
2,431
Reaction score
22
My understanding is that there's no automatic way.

I have access to the backups, so what I do is get the old thread, and copy-paste lock, stock, and barrel. (I can directly go via post number in the url so this bit is easy).

For whatever reason, often only the first post or just the title is mucked up, so it's not as bad as it could be.
Do you know how many posts in total are messed up?
 

nekojita

先輩
Joined
14 Jan 2009
Messages
1,660
Reaction score
443
From what I've seen so far, nearly every single thread in the Learning Japanese that had some kanji/kana in it has at least one post mojibake-d. Weirdly, sometimes if somebody had quoted the person, the quoted portion is fine (and sometimes, a post that was fine turned into mojibake when quoted).
 

eeky

先輩
Joined
8 Jun 2010
Messages
2,431
Reaction score
22
From what I've seen so far, nearly every single thread in the Learning Japanese that had some kanji/kana in it has at least one post mojibake-d.
TBH, then, doing this by hand seems a bit crazy, unless I am missing something. If you can locate the uncorrupted posts in a backup and manually copy and paste them to the live system, then surely a program can be written to do that automatically.
 

thomas

Unswerving cyclist
Admin
Joined
14 Mar 2002
Messages
14,137
Reaction score
6,598
If you can locate the uncorrupted posts in a backup and manually copy and paste them to the live system, then surely a program can be written to do that automatically.

Last year we were trying to do exactly that. I invested three months and a not insignificant amount of money into converting the DB, with very modest success. Automatising the conversion often resulted in more gibberish, hence the decision to do it manually - even if it may take a while.
 

eeky

先輩
Joined
8 Jun 2010
Messages
2,431
Reaction score
22
Last year we were trying to do exactly that. I invested three months and a not insignificant amount of money into converting the DB, with very modest success. Automatising the conversion often resulted in more gibberish, hence the decision to do it manually - even if it may take a while.
Yet the conversion happens reliably when you copy and paste manually as Nekojita described? What component is it that does the reliable conversion in that case, that cannot be incorporated into an automated process? I am asking out of curiosity, you understand. I am not doubting your expertise.
 

thomas

Unswerving cyclist
Admin
Joined
14 Mar 2002
Messages
14,137
Reaction score
6,598
Oh, I'm claiming no expertise at all.

What I did is to consult all resources available on DB conversion and to experiment on a backup. When the results were not satisfactory, I hired a developer. He actually managed to clean a large portion of the vB tables, but far from all. It would still have required a lot of manual clean-up; however, on importing the DB into Xenforo most tables turned into mojibake again.

JREF isn't really a commercial venture (the ads barely bring it in the server fees), so when the costs skyrocketed with no end in sight I threw in the towel.

Yes, the manual fixing will take a while, but it will eventually be accomplished.
 

LewiiG

先輩
Joined
27 Dec 2012
Messages
221
Reaction score
13
My threads? I was looking back on them for memories and reference and such but it turned out I couldn't really do that xD if that is not a reasonable request then it's fine; I will just be patient :)
 

Toritoribe

松葉解禁
Moderator
Joined
22 Feb 2008
Messages
17,846
Reaction score
4,101
I've checked and revised your resent 40 posts. I'll do the rest the next time.:)

EDIT:
Mission Accomplished!
 
Last edited:

thomas

Unswerving cyclist
Admin
Joined
14 Mar 2002
Messages
14,137
Reaction score
6,598
Just a little update: in the past few days I have been devoting a lot of time to cleaning up 文字化け, but it's still a Herculean task. I would just like to remind you that if you would like us to put priority on specific threads please post them here.
 

thomas

Unswerving cyclist
Admin
Joined
14 Mar 2002
Messages
14,137
Reaction score
6,598
I just realised that this thread hasn't been updated in four years.

I want to point out that in a massive time-consuming effort we cleaned up the 日本語 forum as well as some of the more important threads in other subfora about three years ago. If you chance upon any mojibake threads please do report them by contacting me via PC.

Thank you. :)
 
Top Bottom