Paul Liebrand's Weblog

Welcome to my blog mainly about SharePoint

Skip to: Content | Sidebar | Footer

Google Search

Missing Metadata in SharePoint with PDFs

27 March, 2008 (07:17) | SharePoint | By: Paul Liebrand

I recently got a call from my end-users stating that their metadata was mysteriously disappearing when they are working with their PDFs that are stored in SharePoint.

Take the following scenario.  A user creates a PDF by scanning a paper document using Adobe Acrobat Professional. They then upload that document into SharePoint and assign it the required metadata (shown below).

pdf_initial1

A few days later, the user decides they need to edit the PDF. They open the PDF using the drop down menu and selecting Edit Document.

pdf_edit

Once Adobe Acrobat Professional launches, they perform an OCR on the document so they can edit it. Selecting OCR Text Recognition > Recognize Text Using OCR from the Document menu they let it grind away for a few minutes. With the OCR complete, the user completes the edits and simply clicks Save.

After closing Adobe Acrobat Professional, the SharePoint document library refreshes and all the metadata is gone (see below).

pdf_missing

How unfortunate!

Cause 

After researching the issue I determined that when you perform an OCR on a document that has more than 1 page, Adobe Acrobat will actually delete the original document (thus removing all metadata associated with it) and create a brand spanking new file in its place (the evidence of this can be found by looking in the SharePoint recycle bin).

Doing a similar test on a file stored on a local drive had the same results.  If you fill out the Summary tab by right-click on a PDF document and go to Properties and perform the steps above you will also lose your metadata.

I called Adobe support on this issue and they first responded with “Please explain how SharePoint works, we are unfamiliar with it”. Following my explanation and being put on a hold several times, I finally got this response, “We believe the problem is caused by SharePoint.”

I attempted numerous times to convince the phone support operator that it was not a SharePoint specific problem but was unsuccessful.

Solution

I came up with a solution that meet the needs of my users and thought I would share it.  It is not revolutionary but uses built in SharePoint functionality.

If your users have a need to manipulate a PDF document in anyway, have them follow these steps.

Check out the document to the local drafts folder

pdf_checkout pdf_localdraft

This process will actually put the PDF in the users My Documents\SharePoint Drafts folder on their computer.  Any further edits to the PDF by this user will be made to their local copy.

Once they have completed the edits, simply check the file back in. This will move the file from their local computer back into the SharePoint library. However, all PDF edits and metadata remains intact. (Yippee!)

Conclusion

I have mainly seen this problem occur with the OCR process and sometimes with the amend process when creating a PDF from a scanner. To be safe, I have recommended to my users to use the solution above at all times when manipulating a PDF document. It is not perfect, but it works.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

  • James

    Brilliant. We have had a similar problem here (editing PDF’s) and your post above has solved the problem perfectly.

    Thanks

    James

  • Guest

    Your post is exactly the problem I am trying to deal with.
    I think I may be missing something, my SharePoint site does not have Edit Document as a context link for PDF files and does not give me the option to use the local drafts folder on check-out. Is there some configuration that may be missing that would allow my site to handle PDFs in the way yours does?

  • http://photography.paulliebrand.com Paul Liebrand

    Make sure you have followed the instructions for adding PDF files to SharePoint, including the icon ( http://grounding.co.za/blogs/neil/archive/2008/12/02/working-with-pdf-s-and-sharepoint.aspx). The process of adding it to the DOCICON.XML file is what will make it show the “Edit in” option on the drop down menu.

  • dds

    Hi Paul, I am having a problem with the .dwg file, whaen i am working with it in the sharepoint 2010.
    when i clicks on the .dwg file in the SharePoint document libray, it starts downloading, but my requirement is to open it on the browser.
    I have downloaded the DWG True View 2013, Added a file type entri in the Central administration file type.
    added a mine type in the iis, but not get the solution can you help me regarding this problem ?

  • Joe

    Hello Paul – I have the same issue, when i try to check out .dwg file, it checks out to someplace that i cannot figure out. it doesn’t give me an option to save it to local hard drive. I will appreciate any help.

    I also have other issue where I cannot open .dwg file in SharePoint like once can with .doc/.xls files.

    Thanks
    Joe