[Python] How to increase the OCR efficency with image pre processing of text document?

Stack · Setembro 13, 2024

I want to write a python script that has standardise pre-processing step which can be applicable to all the documents . F.Y.I : I work with legal documents sometimes documents are even worse than this uploaded sample.

def process(self, images: list[Image]):
# global page_image
for page_index, image in enumerate(images, start=1):
image_np = self.convert_pil_to_np(image)
image_np = cv2.cvtColor(image_np, cv2.COLOR_RGB2BGR)

gray = cv2.cvtColor(image_np, cv2.COLOR_BGR2GRAY)

blur1 = cv2.medianBlur(gray, 3)
# display_image('blur image', blur1)

thresh = cv2.threshold(blur1, 200, 255, cv2.THRESH_BINARY)[1]
# display_image('simple thresh image', thresh)

thresh1 = cv2.adaptiveThreshold(thresh, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY_INV, 11, 2)

blur2 = cv2.medianBlur(thresh1, 3)

# display_image('blur2 image', thresh1)

edges = cv2.Canny(thresh1, 100, 200)
# display_image('edges image', edges)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 5))

# morph = cv2.morphologyEx(blur2, cv2.MORPH_DILATE, kernel, iterations=1)
morph1 = cv2.morphologyEx(blur2, cv2.MORPH_CLOSE, kernel, iterations=1)
contour = cv2.findContours(morph1, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[0]
# display_image('morph image', morph1)
image_np2 = image_np.copy()

cv2.drawContours(image_np2, contour, -1, (0, 255, 0), 2, cv2.LINE_AA)
gray1 = cv2.cvtColor(image_np2, cv2.COLOR_BGR2GRAY)

# display_image("contour", image_np2)
final = cv2.bitwise_and(morph1, gray1)
display_image("contour", final)

this script that i wrote after a lot of research but this doesnt work

Continue reading...

Logar ou Criar uma Conta

[Python] How to increase the OCR efficency with image pre processing of text document?

Stack Membro Participativo

Compartilhe esta Página

Logar ou Criar uma Conta

[Python] How to increase the OCR efficency with image pre processing of text document?

Stack Membro Participativo

Compartilhe esta Página

Pesquisas Úteis