[korean_project(1)] json data 파싱 and save with txt file

ETC

[korean_project(1)] json data 파싱 and save with txt file

mihee 2023. 1. 3. 16:31

ai hub에서 한국어 글자체 이미지를 다운받아 데이터 형식을 파악하고자 한다.

but 크기가 너무 커 파일을 열어보기 힘들어 python 언어를 사용하여 json 파일을 파싱

1. json 파일을 읽고 key를 확인

import json
import numpy as np

with open('./handwriting_data_info_clean.json') as f:
    datas = json.load(f)

print(datas.keys()) # dict_keys(['info', 'images', 'annotations', 'licenses'])

이중 내가 필요한 데이터는 annotations로 이 부분만 가져와 출력 해보고자 한다.

2. annotations 가져와 필요한 부분만 새 리스트에 저장

annotations 데이터 - 이중 필요한 txt와 image_id만 가져와 사용하기로 한다.

annotations = datas['annotations']
# print(type(annotations)) # list
arr = [] 
for annotation in annotations:
    attribute = annotation['attributes']
    if attribute['type'] == '글자(음절)':
        arr.append([annotation['image_id']+'.png', annotation['text']])
new_arr = np.array(arr)

3. 필요한 데이터만 뽑아 저장한 array를 txt 형식으로 저장

np.savetxt('./annotation.txt',new_arr, fmt='%s',delimiter=' ')
print("-------------save done.---------")

저작자표시 (새창열림)