What's new

Mojibake (文字化け) fixing - info and requests

Joined
Jan 14, 2009
Messages
1,660
Ratings
393
Dear all:

I now have access to a pre-move backup of the forums, and am (slowly) trying to tackle all the mojibake (this stuff: 文字化け) in the Learning Japanese sub-forum.

For the moment, I'm just working backwards in time. Since this is a manual process it will be quite slow. Therefore, if there are any old threads/posts you would particularly like restored, drop me a note here and I'll put them at the top of the list.
 

thomas

Unswerving cyclist
Admin
Joined
Mar 14, 2002
Messages
8,771
Ratings
1 762
Thanks @nekojita !

I'm proceeding in the same fashion in the 一般的なフォーラム section and will join your efforts in the Learning Japanese subsection once that's accomplished.
 
Joined
Jun 8, 2010
Messages
2,431
Ratings
21
Since this is a manual process it will be quite slow.
I don't exactly understand the technical details of what has gone wrong, but is there really no way that this conversion can be automated? Are you having to retype content by hand? It sounds a complete nightmare!
 
Joined
Jan 14, 2009
Messages
1,660
Ratings
393
My understanding is that there's no automatic way.

I have access to the backups, so what I do is get the old thread, and copy-paste lock, stock, and barrel. (I can directly go via post number in the url so this bit is easy).

For whatever reason, often only the first post or just the title is mucked up, so it's not as bad as it could be.
 
Joined
Jun 8, 2010
Messages
2,431
Ratings
21
My understanding is that there's no automatic way.

I have access to the backups, so what I do is get the old thread, and copy-paste lock, stock, and barrel. (I can directly go via post number in the url so this bit is easy).

For whatever reason, often only the first post or just the title is mucked up, so it's not as bad as it could be.
Do you know how many posts in total are messed up?
 
Joined
Jan 14, 2009
Messages
1,660
Ratings
393
From what I've seen so far, nearly every single thread in the Learning Japanese that had some kanji/kana in it has at least one post mojibake-d. Weirdly, sometimes if somebody had quoted the person, the quoted portion is fine (and sometimes, a post that was fine turned into mojibake when quoted).
 
Joined
Jun 8, 2010
Messages
2,431
Ratings
21
From what I've seen so far, nearly every single thread in the Learning Japanese that had some kanji/kana in it has at least one post mojibake-d.
TBH, then, doing this by hand seems a bit crazy, unless I am missing something. If you can locate the uncorrupted posts in a backup and manually copy and paste them to the live system, then surely a program can be written to do that automatically.
 

thomas

Unswerving cyclist
Admin
Joined
Mar 14, 2002
Messages
8,771
Ratings
1 762
If you can locate the uncorrupted posts in a backup and manually copy and paste them to the live system, then surely a program can be written to do that automatically.
Last year we were trying to do exactly that. I invested three months and a not insignificant amount of money into converting the DB, with very modest success. Automatising the conversion often resulted in more gibberish, hence the decision to do it manually - even if it may take a while.
 
Joined
Jun 8, 2010
Messages
2,431
Ratings
21
Last year we were trying to do exactly that. I invested three months and a not insignificant amount of money into converting the DB, with very modest success. Automatising the conversion often resulted in more gibberish, hence the decision to do it manually - even if it may take a while.
Yet the conversion happens reliably when you copy and paste manually as Nekojita described? What component is it that does the reliable conversion in that case, that cannot be incorporated into an automated process? I am asking out of curiosity, you understand. I am not doubting your expertise.
 

thomas

Unswerving cyclist
Admin
Joined
Mar 14, 2002
Messages
8,771
Ratings
1 762
Oh, I'm claiming no expertise at all.

What I did is to consult all resources available on DB conversion and to experiment on a backup. When the results were not satisfactory, I hired a developer. He actually managed to clean a large portion of the vB tables, but far from all. It would still have required a lot of manual clean-up; however, on importing the DB into Xenforo most tables turned into mojibake again.

JREF isn't really a commercial venture (the ads barely bring it in the server fees), so when the costs skyrocketed with no end in sight I threw in the towel.

Yes, the manual fixing will take a while, but it will eventually be accomplished.
 
Joined
Dec 27, 2012
Messages
221
Ratings
14
My threads? I was looking back on them for memories and reference and such but it turned out I couldn't really do that xD if that is not a reasonable request then it's fine; I will just be patient :)
 

Toritoribe

松葉解禁
Staff member
Moderator
Joined
Feb 22, 2008
Messages
14,830
Ratings
2 1,537
I've checked and revised your resent 40 posts. I'll do the rest the next time.:)

EDIT:
Mission Accomplished!
 
Last edited:

thomas

Unswerving cyclist
Admin
Joined
Mar 14, 2002
Messages
8,771
Ratings
1 762
Just a little update: in the past few days I have been devoting a lot of time to cleaning up 文字化け, but it's still a Herculean task. I would just like to remind you that if you would like us to put priority on specific threads please post them here.
 
Top