share
Stack Overflowhow to fix the error displayed on python shell ?
[-4] [2] Georges
[2017-12-11 12:00:02]
[ python pdf text arabic ]
[ https://stackoverflow.com/questions/47752488/how-to-fix-the-error-displayed-on-python-shell ]

i have a code that is written in PYTHON where the code allow the user to select the path of folder that contains PDF files and convert it to text files.

the system system work perfect when the content is not ARABIC.

error displayed :

Traceback (most recent call last): File "C:\Users\test\Downloads\pdf-txt\text maker.py", line 32, in path=list[i] IndexError: list index out of range

code:

import os
from os import chdir, getcwd, listdir, path
import codecs
import pyPdf
from time import strftime

def check_path(prompt):
    ''' (str) -> str
    Verifies if the provided absolute path does exist.
    '''
    abs_path = raw_input(prompt)
    while path.exists(abs_path) != True:
        print "\nThe specified path does not exist.\n"
        abs_path = raw_input(prompt)
    return abs_path    

print "\n"

folder = check_path("Provide absolute path for the folder: ")

list=[]
directory=folder
for root,dirs,files in os.walk(directory):
    for filename in files:
        if filename.endswith('.pdf'):
            t=os.path.join(directory,filename)
            list.append(t)

m=len(list)
i=0
while i<=len(list):
    path=list[i]
    head,tail=os.path.split(path)
    var="\\"

    tail=tail.replace(".pdf",".txt")
    name=head+var+tail



    content = ""
    # Load PDF into pyPDF
    ##pdf = pyPdf.PdfFileReader(file(path, "rb"))
    pdf = pyPdf.PdfFileReader(codecs.open(path, "rb", encoding='UTF-8'))


    # Iterate pages
    for i in range(0, pdf.getNumPages()):
        # Extract text from page and add to content
        content += pdf.getPage(i).extractText() + "\n"
    print strftime("%H:%M:%S"), " pdf  -> txt "
    f=open(name,'w')
    f.decode(content.encode('UTF-8'))
   ## f.write(content.encode("UTF-8"))
    f.write(content)
    f.close
[+1] [2017-12-11 13:08:58] fabiob

the error can probably be solved by just changing

while i<=len(list):

to:

while i<len(list):

because in python allowed indices for a list with N elements are: 0,1,...,N-1 while trying to access the element N gives an IndexError.


i fix it this line but now i got another error: **Traceback (most recent call last): File "C:\Users\test\Downloads\pdf-txt\text maker.py", line 33, in <module> path=list[i] IndexError: list index out of range >>> ** - Georges
1
[0] [2017-12-11 13:09:03] oldabl

If a list's last index is n, then the len of the list is n+1. This means that when you want to access a list, you do NOT want to access list[length of list] aka n+1 as this does not exist!

I believe the only wrong line in your code is the while, it should be:

while i < len(list):

And not

while i <= len(list):

You do not want i to take the value len(list).


i fix it this line but now i got another error: **Traceback (most recent call last): File "C:\Users\test\Downloads\pdf-txt\text maker.py", line 33, in <module> path=list[i] IndexError: list index out of range >>> ** - Georges
That's because you use the same variable i in the for loop inside your loop. Use a variable j in the for loop. Also you will need to increase i every time you loop otherwise it will keep doing the same thing - oldabl
2