[Python] How to get consistent results in tabular PDF parsing with llama-parse?

Stack · Setembro 13, 2024

I was parsing some PDF files using llama in Python with below code:

import os
import pandas as pd

import nest_asyncio
nest_asyncio.apply()

os.environ["LLMA_CLOUD_API_KEY"] = "some_key_id"
key_input = "some_key_id"

from llama_parse import LlamaParse

# running llama parsing
doc_parsed = LlamaParse(result_type="markdown",api_key=key_input
).load_data(r"Path\myfile.pdf")

The results of parsing the same document is different when I run this same code now from then. Difference is of | and line separation for the separations in tabular text.

Is there a way to get the same old results in llama or to fix some parameters so that it works on same model or same way to always get same consistent results again & again so that I can build Analytics on this based on same code logic?

Last month's llama results:

print(doc_parsed[5].text[:1000])

# Information

|Name|: Mr. XXX|
|---|---|
|Age/Sex|: XX YRS/M|
|Lab Id.|: 0124080X|
|Refered By|: Self|
|Sample Collection On|: 03/Aug/2024 08:30AM|
|Collected By|: XXX|
|Sample Lab Rec. On|: 03/Aug/2024 11:50 AM|
|Collection Mode|: HOME COLLECTION|
|Reporting On|: 03/Aug/2024 02:48 PM|
|BarCode|: XXX|

# Test Results

|Test Name|Result|Biological Ref. Int.|Unit|
|---|---|---|---|

Llama results on same PDF now:

print(doc_parsed[5].text[:1000])

# Report

Name: Mr. XXX

Age/Sex: XXX YRS/M

Lab Id: 0124080X

Referred By: Self

Sample Collection On: 03/Aug/2024 08:30 AM

Collected By: XXX

Sample Lab Rec. On: 03/Aug/2024 11:50 AM

Collection Mode: HOME COLLECTION

Reporting On: 03/Aug/2024 02:48 PM

BarCode: XXX

# Test Results

Test Name
Result
Biological Ref. Int.
Unit

Desired Results:

# Above part doesn't matter but Test Results should be separated by |
# Test Results

|Test Name|Result|Biological Ref. Int.|Unit|

Is there a change of model at the back causing difference? Can I fix the model to get the consistent results?

Continue reading...

Logar ou Criar uma Conta

[Python] How to get consistent results in tabular PDF parsing with llama-parse?

Stack Membro Participativo

Compartilhe esta Página

Logar ou Criar uma Conta

[Python] How to get consistent results in tabular PDF parsing with llama-parse?

Stack Membro Participativo

Compartilhe esta Página

Pesquisas Úteis