Extract data from PDF and all Microsoft Office files in python

$ pip install slate
$ pip install pdfminer
import slate
with open('sample.pdf', 'rb') as f:
pdf_text = slate.PDF(f)
print pdf_text
Output: ['Sample text...', '......', '......']
import slate
with open('test_doc.pdf', 'rb') as f:
pdf_text = slate.PDF(f, "pass the PDF file password here")
print pdf_text
Output: ['Sample text...', '......', '......']

The article was originally published at MicroPyramid blog.

--

--

--

Python, Django, Android and IOS, reactjs, react-native, AWS, Salesforce consulting & development company

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

The Full Stack Development Journey Part I: Frontend Fundamentals

Using Blender in game development

Salesforce Profiles Deployment 🎭📦

The Ubiquitous DSL

Number of Steps to Reduce a Number to Zero — Day 74(Python)

Vite AMA (July 23, 2021) Recap

Score A Field Goal In AR 🏈

[CS]CS50 Week 0: Scratch

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
MicroPyramid

MicroPyramid

Python, Django, Android and IOS, reactjs, react-native, AWS, Salesforce consulting & development company

More from Medium

Integrate MongoDB With Python

Building the Simplest Scraping Framework in Python

Python Flask

How To Create Python Tkinter Menubutton Widget