i have a code that is written in PYTHON where the code allow the user to select the path of folder that contains PDF files and convert it to text files.
the system system work perfect when the content is not ARABIC.
Traceback (most recent call last): File "C:\Users\test\Downloads\pdf-txt\text maker.py", line 32, in path=list[i] IndexError: list index out of range
import os
from os import chdir, getcwd, listdir, path
import codecs
import pyPdf
from time import strftime
def check_path(prompt):
''' (str) -> str
Verifies if the provided absolute path does exist.
'''
abs_path = raw_input(prompt)
while path.exists(abs_path) != True:
print "\nThe specified path does not exist.\n"
abs_path = raw_input(prompt)
return abs_path
print "\n"
folder = check_path("Provide absolute path for the folder: ")
list=[]
directory=folder
for root,dirs,files in os.walk(directory):
for filename in files:
if filename.endswith('.pdf'):
t=os.path.join(directory,filename)
list.append(t)
m=len(list)
i=0
while i<=len(list):
path=list[i]
head,tail=os.path.split(path)
var="\\"
tail=tail.replace(".pdf",".txt")
name=head+var+tail
content = ""
# Load PDF into pyPDF
##pdf = pyPdf.PdfFileReader(file(path, "rb"))
pdf = pyPdf.PdfFileReader(codecs.open(path, "rb", encoding='UTF-8'))
# Iterate pages
for i in range(0, pdf.getNumPages()):
# Extract text from page and add to content
content += pdf.getPage(i).extractText() + "\n"
print strftime("%H:%M:%S"), " pdf -> txt "
f=open(name,'w')
f.decode(content.encode('UTF-8'))
## f.write(content.encode("UTF-8"))
f.write(content)
f.close
the error can probably be solved by just changing
while i<=len(list):
to:
while i<len(list):
because in python allowed indices for a list with N elements are: 0,1,...,N-1 while trying to access the element N gives an IndexError.
If a list's last index is n, then the len of the list is n+1.
This means that when you want to access a list, you do NOT want to access list[length of list] aka n+1 as this does not exist!
I believe the only wrong line in your code is the while, it should be:
while i < len(list):
And not
while i <= len(list):
You do not want i to take the value len(list).