What is the tool for this?

Christopher Chaltain chaltain at gmail.com
Fri Jun 3 06:48:48 UTC 2016


Tim was just asking if you were talking about a file ending in .ppt or 
one ending in .pptx. You can just tell by the extension of the file name 
and you don't really need to know when it was created. The .pptx 
extension indicates that it's a newer file format being used by 
Microsoft, which is why Tim called it a newer .pptx file.

I too appreciate the information Tim provided, and if this lead me to 
files that had a better format and were easier to work with then I'd do 
whatever it would take to get them. I can always make things more 
efficient with scripting if I need to perform a task multiple times, but 
ending up with a more accurate file will end up saving time in the long 
run.


On 02/06/16 22:03, Karen Lewellen wrote:
> Oh my goodness!
> Well fortunately for me it was a simple matter  of Ken the administrator
> at shellworld to install unoconv
>   I ran the program first creating a listener channel as instructed,
> then ran   unoconv on the file which created its pdf.
> I have no idea weather it was new or old ppt, I did not create the thing.
> Anyway once in pdf format a simple pdftotext produced  the text file.
> once I found a rather terrific page on running the unoconv program the
> process
>   likely took me all of 2 minutes.
> I love that I could have chosen a different  format for the output, but
> between pdftotext when the file is a baby hippopotamus  as it was in
> this case, or robobraille for when the file is more reasonable in size I
> got the job done.
> I truly honor the dedication of some, but speaking only for myself
> having to  do all those steps would keep me in another operating system
> for sure...my professional deadlines alone require swift solutions.
> My thanks too goes to the person who gave me the name of the front end
> tool.  Seems very shell service friendly much like antiword and unrtf.
> cheers,
> Kare
>
>
> On Thu, 2 Jun 2016, Tim Chase wrote:
>
>> On June  1, 2016, Karen Lewellen wrote:
>>> My Linux experience is rooted at shellworld which is now using
>>> Ubuntu. I just got a PowerPoint file for a meeting, and because of
>>> its size,  I cannot use the back door method I normally tap into
>>> for converting it into something else.
>>> Is there a program like antiword or unrtf to convert PowerPoint at
>>> the command line?
>>
>> Is it an old .ppt or a new .pptx file?  There was a "ppthtml" tool
>> around that could convert the older .ppt files to HTML in a fashion.
>> The site hosting the source code no longer seems to be available
>> though.  If it's a newer .pptx file, it's really just a .zip file
>> with a different extension.  So you can
>>
>>   mkdir prez
>>   mv presentation.pptx prez/presentation.pptx.zip
>>   cd prez
>>   unzip presentation.pptx.zip
>>   cd ppt/slides/
>>
>> There are bunch of slide*.xml files in here which you can either edit:
>>
>>   $EDITOR slide*.xml
>>
>> or strip out the XML tags:
>>
>>   for i in {1..20} ; do sed 's/<[^>]*>//g' slide${i}.xml ; done |
>>   cat -s > output.txt
>>
>> where "20" is the number of slides in the presentation (which you
>> should be able to get from the output of "ls slide*.xml | wc -l"
>>
>> The reason for using the "for" loop with the numbers is because the
>> slides aren't zero-padded, meaning when it sorts the names, you'd get
>> slide1.xml, slide10.xml, slide11.xml, slide2.xml, slide3.xml, etc.
>> Known as lexicographical sorting, this will be hard to read.  So by
>> iterating over them in numerical order, they should make more sense.
>>
>> Alternatively, if you have LibreOffice installed, it should
>> theoretically be able to do conversions.  Based on my
>> experimentation, you have to convert the .ppt[x] to PDF first:
>>
>>  libreoffice --headless -convert-to pdf presentation.pptx
>>
>> and then convert that to something else.  The "poppler-utils" package
>> (at least that's what it's called in Debian) has both a pdftotext and
>> pdftohtml utility.  I recommend either plain-text:
>>
>>  pdftotext presentation.pdf presentation.txt
>>  ${EDITOR:-vi} presentation.txt
>>
>> or HTML:
>>
>>  pdftohtml presentation.pdf presentation.html
>>  lynx presentation.html
>>
>> I snagged a couple random PPT files off the web and tried the
>> libreoffice method and they all came out much better than I expected
>> (and much, much, MUCH better than the hackish attempts to extract the
>> text as given at the top of this message).
>>
>> So if you have libreoffice + poppler-utils installed and can use
>> those, that's your best bet.  If you don't have them and can't get
>> them installed, then using some of the extraction hacks above might
>> at least get some form of the content out.
>>
>> Hopefully these give you some options to get at the content in the
>> presentations.
>>
>> -tim
>> (an avowed despiser of PPT files)
>>
>>
>>
>>
>> _______________________________________________
>> Blinux-list mailing list
>> Blinux-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/blinux-list
>>
>>
>
> _______________________________________________
> Blinux-list mailing list
> Blinux-list at redhat.com
> https://www.redhat.com/mailman/listinfo/blinux-list

-- 
Christopher (CJ)
chaltain at Gmail




More information about the Blinux-list mailing list