How to Extract Text & Images Easily from MS Office Files
We may come across the need to extract images or text from an MS Word or MS Powerpoint file. Usually, this may include manual copying and pasting, one page at a time, and with mega-large files, this is going to take quite a bit of time.
Well, we have a simple trick to help you extract images and text from files of the new format ie DOCX, PPTX, XLSX whereas with files of the older format ie DOC, PPT, XLS, all you need is a free software to help you quickly and easily extract images.
Note: For the purpose of demonstrating this post, we will be using only an MS Word file. The process is the same for MS Powerpoint and MS Excel files.
Here’s what this article covers:
- How to extract images & text from DOCX, PPTX, XLXS files
- How to extract images from a single DOC, PPT or XLS file
- How to extract images from multiple DOC, PPT or XLS files
- How to extract images with “Save as Web Page” method
- How to extract plain text instead of XML
Read more: How to copy and extract text from images
How to Extract Images & Text from DOCX, PPTX, XLXS Files
Before following the steps, open the folder containing your files. click Organize > Folder and Search Options > View and uncheck Hide extensions for known file types. Now, you can see the file extension with each filename.
-
Locate and select the file you want to extract images and text from (note: it is better to make a copy of said file). In this example, our target file is named Sample File.docx.
-
Press F2 to rename the file and replace the extension name with .zip.
-
A warning will be shown to confirm the change of the file extension. Click Yes.
-
Right click on the ZIP file and click on Extract files.
-
Locate and open the folder containing the extracted data and then open the word.
-
In it you will see a few folders and XML files. In the media folder you will find the extracted images. For the exracted text, open the document.xml file with notepad or XML Notepad.
Here’s what you will find in the media folder.
How to Extract Images from a Single DOC, PPT or XLS File
If you want to extract images from MS office files with older formats, the above method won’t work with the images. You need a free tool called Office Image Extraction Wizard for this purpose. The tool works with MS Office files as far back as 2012 and it works with one or multiple MS Office files in one go.
-
Download and install Office Image Extraction Wizard.
-
Choose the document you want to extract images from (for this example, we’re doing it to a folder I named Ch1.doc), and select the output folder. You can opt to have a folder created to house all your output images by ticking the option Create a folder here. Once you are done, click Next.
-
Click Start to begin the process.
-
Once the image extraction process is finished, click on Click here to open destination folder and it will open the output folder.
-
As you can see below, the program has created a Ch1 folder.
-
Inside the folder are the extracted images.
How to Extract Images from Multiple DOC, PPT or XLS Files
-
For extracting images from multiple files of the DOC, PPT or XLS formats, tick the Batch mode option found at the bottom left.
-
Click on Add Files and then select the files you want to extract images from. Hold the Ctrl button to select multiple files in one go. After selecting the files, click Next.
-
Click Start.
-
When the process is completed, locate and open the output folder. Here, you will see two folders with the original filenames. Open these folders to see the extracted images from their original MS Office files.
How to Extract Images with "Save as Web Page" Method
There is another method that will work with both newer and older MS Office files.
-
Open the DOCX or XLSX file and click on File > Save As > Computer > Browser and save file as Web Page.
-
Locate the folder with the filename you saved the Web Page in. Here, you will see all the images extracted from the file.
How to Extract Plaintext Instead of XML
-
Open the DOCX file and click on File > Save As > Computer > Browser. Choose to save file as Plain Text (for XLSX files, save it as Text (Tab delimited)).
-
Locate and open the text file with the name you have used to save it. This text file will contain only the text from your original file without any formatting.
If you know any other method or tool to extract images from MS Office files, please mention in the comments section.