What's new

Mojibake (文字化け) fixing - info and requests

nekojita

先輩
14 Jan 2009
1,660
443
100
Dear all:

I now have access to a pre-move backup of the forums, and am (slowly) trying to tackle all the mojibake (this stuff: 文字化け) in the Learning Japanese sub-forum.

For the moment, I'm just working backwards in time. Since this is a manual process it will be quite slow. Therefore, if there are any old threads/posts you would particularly like restored, drop me a note here and I'll put them at the top of the list.
 
Thanks @nekojita !

I'm proceeding in the same fashion in the 一般的なフォーラム section and will join your efforts in the Learning Japanese subsection once that's accomplished.
 
Since this is a manual process it will be quite slow.
I don't exactly understand the technical details of what has gone wrong, but is there really no way that this conversion can be automated? Are you having to retype content by hand? It sounds a complete nightmare!
 
My understanding is that there's no automatic way.

I have access to the backups, so what I do is get the old thread, and copy-paste lock, stock, and barrel. (I can directly go via post number in the url so this bit is easy).

For whatever reason, often only the first post or just the title is mucked up, so it's not as bad as it could be.
 
My understanding is that there's no automatic way.

I have access to the backups, so what I do is get the old thread, and copy-paste lock, stock, and barrel. (I can directly go via post number in the url so this bit is easy).

For whatever reason, often only the first post or just the title is mucked up, so it's not as bad as it could be.
Do you know how many posts in total are messed up?
 
From what I've seen so far, nearly every single thread in the Learning Japanese that had some kanji/kana in it has at least one post mojibake-d. Weirdly, sometimes if somebody had quoted the person, the quoted portion is fine (and sometimes, a post that was fine turned into mojibake when quoted).
 
From what I've seen so far, nearly every single thread in the Learning Japanese that had some kanji/kana in it has at least one post mojibake-d.
TBH, then, doing this by hand seems a bit crazy, unless I am missing something. If you can locate the uncorrupted posts in a backup and manually copy and paste them to the live system, then surely a program can be written to do that automatically.
 
If you can locate the uncorrupted posts in a backup and manually copy and paste them to the live system, then surely a program can be written to do that automatically.

Last year we were trying to do exactly that. I invested three months and a not insignificant amount of money into converting the DB, with very modest success. Automatising the conversion often resulted in more gibberish, hence the decision to do it manually - even if it may take a while.
 
Last year we were trying to do exactly that. I invested three months and a not insignificant amount of money into converting the DB, with very modest success. Automatising the conversion often resulted in more gibberish, hence the decision to do it manually - even if it may take a while.
Yet the conversion happens reliably when you copy and paste manually as Nekojita described? What component is it that does the reliable conversion in that case, that cannot be incorporated into an automated process? I am asking out of curiosity, you understand. I am not doubting your expertise.
 
Oh, I'm claiming no expertise at all.

What I did is to consult all resources available on DB conversion and to experiment on a backup. When the results were not satisfactory, I hired a developer. He actually managed to clean a large portion of the vB tables, but far from all. It would still have required a lot of manual clean-up; however, on importing the DB into Xenforo most tables turned into mojibake again.

JREF isn't really a commercial venture (the ads barely bring it in the server fees), so when the costs skyrocketed with no end in sight I threw in the towel.

Yes, the manual fixing will take a while, but it will eventually be accomplished.
 
My threads? I was looking back on them for memories and reference and such but it turned out I couldn't really do that xD if that is not a reasonable request then it's fine; I will just be patient :)
 
I've checked and revised your resent 40 posts. I'll do the rest the next time.:)

EDIT:
Mission Accomplished!
 
Last edited:
Just a little update: in the past few days I have been devoting a lot of time to cleaning up 文字化け, but it's still a Herculean task. I would just like to remind you that if you would like us to put priority on specific threads please post them here.
 
I just realised that this thread hasn't been updated in four years.

I want to point out that in a massive time-consuming effort we cleaned up the 日本語 forum as well as some of the more important threads in other subfora about three years ago. If you chance upon any mojibake threads please do report them by contacting me via PC.

Thank you. :)
 
Back
Top Bottom