Extract data from PDF and all Microsoft Office files in python

$ pip install slate
$ pip install pdfminer
import slate
with open('sample.pdf', 'rb') as f:
pdf_text = slate.PDF(f)
print pdf_text
Output: ['Sample text...', '......', '......']
import slate
with open('test_doc.pdf', 'rb') as f:
pdf_text = slate.PDF(f, "pass the PDF file password here")
print pdf_text
Output: ['Sample text...', '......', '......']

The article was originally published at MicroPyramid blog.

--

--

--

Python, Django, Android and IOS, reactjs, react-native, AWS, Salesforce consulting & development company

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Kubernetes: The Blackhole of FinOps

What happens when you type ls *.c

Run Zephyr on nrf52840 (Particle Xenon)

How to run Python scripts on Flutter

A picture containing flutter and python logos

How I Setup My Windows Machine for Deep Learning

Scraping it Together

AdventureWorks on Snowflake

In order to fully comprehend Golang’s nature, let’s

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
MicroPyramid

MicroPyramid

Python, Django, Android and IOS, reactjs, react-native, AWS, Salesforce consulting & development company

More from Medium

Inserting into database using Python and SQLite

How To Create Python Tkinter Menubutton Widget

Python Flask Tutorial: Build Your Flask Application

Convert CSV(s) to a SQLite database