What's new

Mojibake (文字化け) fixing - info and requests

nekojita

先輩
Joined
Jan 14, 2009
Messages
1,660
Reaction score
439
Dear all:

I now have access to a pre-move backup of the forums, and am (slowly) trying to tackle all the mojibake (this stuff: 文字化け) in the Learning Japanese sub-forum.

For the moment, I'm just working backwards in time. Since this is a manual process it will be quite slow. Therefore, if there are any old threads/posts you would particularly like restored, drop me a note here and I'll put them at the top of the list.
 

thomas

Unswerving cyclist
Admin
Joined
Mar 14, 2002
Messages
9,321
Reaction score
1,053
Thanks @nekojita !

I'm proceeding in the same fashion in the 一般的なフォーラム section and will join your efforts in the Learning Japanese subsection once that's accomplished.
 

Toritoribe

松葉解禁
Staff member
Moderator
Joined
Feb 22, 2008
Messages
15,618
Reaction score
2,361
I, too, can do the same thing when nekojita-san is offline.:)
 

eeky

先輩
Joined
Jun 8, 2010
Messages
2,431
Reaction score
22
Since this is a manual process it will be quite slow.
I don't exactly understand the technical details of what has gone wrong, but is there really no way that this conversion can be automated? Are you having to retype content by hand? It sounds a complete nightmare!
 

nekojita

先輩
Joined
Jan 14, 2009
Messages
1,660
Reaction score
439
My understanding is that there's no automatic way.

I have access to the backups, so what I do is get the old thread, and copy-paste lock, stock, and barrel. (I can directly go via post number in the url so this bit is easy).

For whatever reason, often only the first post or just the title is mucked up, so it's not as bad as it could be.
 

eeky

先輩
Joined
Jun 8, 2010
Messages
2,431
Reaction score
22
My understanding is that there's no automatic way.

I have access to the backups, so what I do is get the old thread, and copy-paste lock, stock, and barrel. (I can directly go via post number in the url so this bit is easy).

For whatever reason, often only the first post or just the title is mucked up, so it's not as bad as it could be.
Do you know how many posts in total are messed up?
 

nekojita

先輩
Joined
Jan 14, 2009
Messages
1,660
Reaction score
439
From what I've seen so far, nearly every single thread in the Learning Japanese that had some kanji/kana in it has at least one post mojibake-d. Weirdly, sometimes if somebody had quoted the person, the quoted portion is fine (and sometimes, a post that was fine turned into mojibake when quoted).
 

eeky

先輩
Joined
Jun 8, 2010
Messages
2,431
Reaction score
22
From what I've seen so far, nearly every single thread in the Learning Japanese that had some kanji/kana in it has at least one post mojibake-d.
TBH, then, doing this by hand seems a bit crazy, unless I am missing something. If you can locate the uncorrupted posts in a backup and manually copy and paste them to the live system, then surely a program can be written to do that automatically.
 

thomas

Unswerving cyclist
Admin
Joined
Mar 14, 2002
Messages
9,321
Reaction score
1,053
If you can locate the uncorrupted posts in a backup and manually copy and paste them to the live system, then surely a program can be written to do that automatically.
Last year we were trying to do exactly that. I invested three months and a not insignificant amount of money into converting the DB, with very modest success. Automatising the conversion often resulted in more gibberish, hence the decision to do it manually - even if it may take a while.
 

eeky

先輩
Joined
Jun 8, 2010
Messages
2,431
Reaction score
22
Last year we were trying to do exactly that. I invested three months and a not insignificant amount of money into converting the DB, with very modest success. Automatising the conversion often resulted in more gibberish, hence the decision to do it manually - even if it may take a while.
Yet the conversion happens reliably when you copy and paste manually as Nekojita described? What component is it that does the reliable conversion in that case, that cannot be incorporated into an automated process? I am asking out of curiosity, you understand. I am not doubting your expertise.
 

thomas

Unswerving cyclist
Admin
Joined
Mar 14, 2002
Messages
9,321
Reaction score
1,053
Oh, I'm claiming no expertise at all.

What I did is to consult all resources available on DB conversion and to experiment on a backup. When the results were not satisfactory, I hired a developer. He actually managed to clean a large portion of the vB tables, but far from all. It would still have required a lot of manual clean-up; however, on importing the DB into Xenforo most tables turned into mojibake again.

JREF isn't really a commercial venture (the ads barely bring it in the server fees), so when the costs skyrocketed with no end in sight I threw in the towel.

Yes, the manual fixing will take a while, but it will eventually be accomplished.
 

LewiiG

先輩
Joined
Dec 27, 2012
Messages
221
Reaction score
13
My threads? I was looking back on them for memories and reference and such but it turned out I couldn't really do that xD if that is not a reasonable request then it's fine; I will just be patient :)
 

Toritoribe

松葉解禁
Staff member
Moderator
Joined
Feb 22, 2008
Messages
15,618
Reaction score
2,361
I've checked and revised your resent 40 posts. I'll do the rest the next time.:)

EDIT:
Mission Accomplished!
 
Last edited:

thomas

Unswerving cyclist
Admin
Joined
Mar 14, 2002
Messages
9,321
Reaction score
1,053
Just a little update: in the past few days I have been devoting a lot of time to cleaning up 文字化け, but it's still a Herculean task. I would just like to remind you that if you would like us to put priority on specific threads please post them here.
 
Top