The garbled text issue that you're having has nothing to do with the watermark in the document. I mean, maybe this problem can be fixed in some other way, maybe the problem is not in that watermark/logo? Is there a way to remove watermark from page or something like that? My question is, how can I fix this problem? This is the result that I'm getting: #$%˘˘ Page_content = page_content.replace("\n\n\n", "\n").strip() Pdf_file_text = 'PDF File: ' + pdf_link + '\n\n'įor page in range(read_pdf.getNumPages()): I think thats because PDF has watermark over the page so it does not recognise the text: import requests I have wrote a code that extracts the text from PDF file with Python and PyPDF2 lib.Ĭode works good for most docs but sometimes it returns some strange characters.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |