Category: self

How to rename Unicode Chinese files to Pinyin?

I googled and found no existing tools. That’s very rare and even weird. The only useful piece of information I got was a mapping file that contains a Unicode Pinyin table. So I have to do it myself… to write a script to convert the Unicode Chinese file names to Pinyin using the mapping file.
Since I was doing Python Challenge at the time, natually I just scripted something in Python to get the job done.

The reason I did that was this. I have a HDTV that has a feature to play music from an USB drive. When I wanted to play the songs I downloaded from the Voice of China. I had a problem. The file name of the songs had many Unicode Chinese characters. The TV obviously doesn’t support Unicode. It just doesn’t display those Chinese characters at all. For example:

01 04张玮 – High歌.mp3
05 09吉克隽逸 – I Fell Good.mp3

I can only see:
01 04 – High.mp3
05 09 – I Fell Good.mp3

If those above are okay, then the following ones are ridiculous:
11 11 – .mp3
11 12 – .mp3
11 13 – .mp3
11 14 – .mp3

I have no idea what was what when I tried to choose the songs. Actually their filenames are as the following:
11 11大山 – 王妃.mp3
11 12王韵壹 – 你快乐所以我快乐.mp3
11 13金池 – 后知后觉.mp3
11 14吴莫愁 – 痒.mp3

Putting the mapping file and the script in one folder, all renaming Unicode files under a sub folder “VoC”, then just run the script. Finally I got all the file names like this, not perfect but I am able to tell what songs they are:
11 11 DaShan – WangFei.mp3
11 12 WangYunYi – NiKuaiLeSuoYiWoKuaiLe.mp3
11 13 JinChi – HouZhiHouJue.mp3
11 14 WuMoChou – Yang.mp3

I hope you find my solution helpful. Here is my Python script.

# renameCH2Pinyin.py
# Rename filename from Chinese characters to capitalized pinyin using the
# mapping file and taking out the tone numbers

import os
import re

# File uni2pinyin is a mapping from hex to Pinyin with a tone number
f = open('uni2pinyin')
wf = f.read() # read the whole mapping file

os.chdir('voc') # to rename all files in sub folder 'voc'
myulist = os.listdir(u'.') # read all file names in unicode mode
for x in myulist: # each file name
    filenamePY = ''
    for y in x: # each character
        if 0x4e00 <= ord(y) <= 0x9fff: # Chinese Character Unicode range
            hexCH = (hex(ord(y))[2:]).upper() # strip leading '0x' and change
                                              # to uppercase
            p = re.compile(hexCH+'\t([a-z]+)[\d]*') # define the match pattern
            mp = p.search(wf)
            filenamePY+=mp.group(1).title() # get the pinyin without the tone
                                            # number and capitalize it
        else:
            filenamePY+=y
    print x
    filename = filenamePY
    print filename
    os.rename(x, filename)
os.chdir('..') # go back to the parent folder

This is the link where I got the mapping file:

ftp://ftp.cuhk.hk/pub/chinese/ifcss/software/data/Uni2Pinyin.gz
Advertisements

Python Challenge – level 12 solution

The picture file name is evil1.jpg.
It led me to try evil2.jpg.
The URL evil2.jpg exists and it says “not jpg but .gfx”
The URL http://www.pythonchallenge.com/pc/return/evil2.gfx exists. And we got the binary evil2.gfx
evil2.gfx is a mixed file from 5 picture files. This is solved by looking at the binary files with a hex editor.
The following code separated the file into their original format:

# file evil2.gfx is a combined file from 5
f = open('evil2.gfx','rb')
g = f.read()
f1 = open('evil2-1.jpg','ab')
f2 = open('evil2-2.jpg','ab')
f3 = open('evil2-3.jpg','ab')
f4 = open('evil2-4.jpg','ab')
f5 = open('evil2-5.jpg','ab')

for x in range(len(g)):
    if x % 5 == 0:
        f1.write(g[x])
    elif x % 5 == 1:
        f2.write(g[x])
    elif x % 5 == 2:
        f3.write(g[x])
    elif x % 5 == 3:
        f4.write(g[x])
    else:
        f5.write(g[x])
f.close()
f1.close()
f2.close()
f3.close()
f4.close()
f5.close()

Each of the created file gives a piece of the answer:
dis – pro – port – ional – ity

Python Challenge – level 11 solution

cave3
cave2
The image is mixed 2 pictures. The darker one has the answer.
The following code is for the one which has answer. The other picture was produced by the code just switching the arguments of the 2 getpixel sentences.

import Image, ImageDraw

im = Image.open('cave.jpg')
draw = ImageDraw.Draw(im)
for j in range(im.size[1] / 2):
    for i in range(im.size[0] - 2):
        if i % 2 == 0:
            im.putpixel((i, j), im.getpixel((i, j*2)))
        else:
            im.putpixel((i, j), im.getpixel(i, j*2 + 1))
im.save('cave3.jpg')

Python Challenge – level 10 solution

I was lazy about this one. The code is borrowed from wiki:

def look_and_say(member):
    while True:
        yield member
        breakpoints = ([0] + [i for i in range(1, len(member)) 
                              if member[i - 1] != member[i]]
                        + [len(member)])
        groups = [member[breakpoints[i - 1]:breakpoints[i]]
                  for i in range(1, len(breakpoints))]
        member = ''.join(str(len(group)) + group[0] for group in groups)
 
# Print the 10-element sequence beginning with "1"
sequence = look_and_say("1")
# for i in range(10):
#    print sequence.next()
# The above 2 lines are the only codes I modified as below:
for i in range(31):
    print len(sequence.next())
>>>execfile('pc10.py')
1
2
2
4
6
6
8
10
14
20
26
34
46
62
78
102
134
176
226
302
408
528
678
904
1182
1540
2012
2606
3410
4462
5808

Python Challenge – level 9 solution

good2
The solution is easy but it took me some time to figure out how to use those data.

# draw lines based on the given coordinates

import Image, ImageDraw
t1 = [146,399,163,403,170,393,169,391,166,386,170,381,170,371,170,355,169,346,167,335,170,329,170,320,170,
310,171,301,173,290,178,289,182,287,188,286,190,286,192,291,194,296,195,305,194,307,191,312,190,316,
190,321,192,331,193,338,196,341,197,346,199,352,198,360,197,366,197,373,196,380,197,383,196,387,192,
389,191,392,190,396,189,400,194,401,201,402,208,403,213,402,216,401,219,397,219,393,216,390,215,385,
215,379,213,373,213,365,212,360,210,353,210,347,212,338,213,329,214,319,215,311,215,306,216,296,218,
290,221,283,225,282,233,284,238,287,243,290,250,291,255,294,261,293,265,291,271,291,273,289,278,287,
279,285,281,280,284,278,284,276,287,277,289,283,291,286,294,291,296,295,299,300,301,304,304,320,305,
327,306,332,307,341,306,349,303,354,301,364,301,371,297,375,292,384,291,386,302,393,324,391,333,387,
328,375,329,367,329,353,330,341,331,328,336,319,338,310,341,304,341,285,341,278,343,269,344,262,346,
259,346,251,349,259,349,264,349,273,349,280,349,288,349,295,349,298,354,293,356,286,354,279,352,268,
352,257,351,249,350,234,351,211,352,197,354,185,353,171,351,154,348,147,342,137,339,132,330,122,327,
120,314,116,304,117,293,118,284,118,281,122,275,128,265,129,257,131,244,133,239,134,228,136,221,137,
214,138,209,135,201,132,192,130,184,131,175,129,170,131,159,134,157,134,160,130,170,125,176,114,176,
102,173,103,172,108,171,111,163,115,156,116,149,117,142,116,136,115,129,115,124,115,120,115,115,117,
113,120,109,122,102,122,100,121,95,121,89,115,87,110,82,109,84,118,89,123,93,129,100,130,108,132,110,
133,110,136,107,138,105,140,95,138,86,141,79,149,77,155,81,162,90,165,97,167,99,171,109,171,107,161,
111,156,113,170,115,185,118,208,117,223,121,239,128,251,133,259,136,266,139,276,143,290,148,310,151,
332,155,348,156,353,153,366,149,379,147,394,146,399]

t2 = [156,141,165,135,169,131,176,130,187,134,191,140,191,146,186,150,179,155,175,157,168,157,163,157,159,
157,158,164,159,175,159,181,157,191,154,197,153,205,153,210,152,212,147,215,146,218,143,220,132,220,
125,217,119,209,116,196,115,185,114,172,114,167,112,161,109,165,107,170,99,171,97,167,89,164,81,162,
77,155,81,148,87,140,96,138,105,141,110,136,111,126,113,129,118,117,128,114,137,115,146,114,155,115,
158,121,157,128,156,134,157,136,156,136]

im = Image.open('good.jpg')

draw = ImageDraw.Draw(im)
for i in range(0,len(t1)-4,2):
    draw.line((t1[i],t1[i+1],t1[i+2],t1[i+3]), fill = 128, width = 3)
for i in range(0,len(t2)-4,2):
    draw.line((t2[i],t2[i+1],t2[i+2],t2[i+3]), fill = 128, width = 2)
im.save('good2.jpg')

The script draws a outlined bull in the original picture then saves as good2.jpg.

Python Challenge – level 8 solution

This one is relatively easy. One of the reasons may be there is not much scripting needed.
The riddle designer gave a hint from the forum saying “Look at the password box hint”. When clicking the insect, the dialog shows a hint saying “inflate”. Someone from the forum also said the sound made by the insect. So it’s got to be bz2 and decompress.

>>> import bz2
>>> un = 'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00\x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'
>>> pw = 'BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03$ \x00!\x9ah3M\x13<]\xc9\x14\xe1BBP\x91\xf08'
>>> print bz2.decompress(un)
huge
>>> print bz2.decompress(pw)
file

Python Challenge – level 7 solution

It took me 2 days. I have no idea what the forum hint2: (i * 7, 43) means. It was misleading and wasted lots of my time thinking about it. The solution, at least mine, has nothing to do with it.

import string
from PIL import Image
im = Image.open("oxygen.png")
m = im.getdata()
i = 0
y = m[0][0]
s = ""
for x in m:
    if (x[0] == x[1] == x[2]) and (y == x[0]): #a grey pixel and same as last pixel
        i+=1
    else:
        # if i > 0: print i
# for some reason, "print s" showing nothing
# therefore using string.printable to solve it
        if chr(y) in (string.printable):
            s+=chr(y)*((i+2)/6)  # make every dot 6 pixels wide
                                 # and show first only 4 pixels wide 's' dot 
        i = 0
        y = x[0]
print s
>>> execfile('pc7-1.py')
smart guy, you made it. the next level is [105, 110, 116, 101, 103, 114, 105, 116, 121]smart guy, you made it. the next level is [105, 110, 116, 101, 103, 114, 105, 116, 121]smart guy, you made it. the next level is [105, 110, 116, 101, 103, 114, 105, 116, 121]smart guy, you made it. the next level is [105, 110, 116, 101, 103, 114, 105, 116, 121]smart guy, you made it. the next level is [105, 110, 116, 101, 103, 114, 105, 116, 121]smart guy, you made it. the next level is [105, 110, 116, 101, 103, 114, 105, 116, 121]smart guy, you made it. the next level is [105, 110, 116, 101, 103, 114, 105, 116, 121]smart guy, you made it. the next level is [105, 110, 116, 101, 103, 114, 105, 116, 121]smart guy, you made it. the next level is [105, 110, 116, 101, 103, 114, 105, 116, 121]

Then copy the part to the second script to get the final answer:

import string
s = ""
t = 105, 110, 116, 101, 103, 114, 105, 116, 121

for i in range(len(t)):
    if chr(t[i]) in string.printable:
        s+=chr(t[i])
print s
>>> execfile('pc7-2.py')
integrity