Paul Liebrand's Weblog

Welcome to my blog mainly about SharePoint

Skip to: Content | Sidebar | Footer

Missing Metadata in SharePoint with PDFs

27 March, 2008 (07:17) | SharePoint | By: Paul Liebrand

I recently got a call from my end-users stating that their metadata was mysteriously disappearing when they are working with their PDFs that are stored in SharePoint.

Take the following scenario.  A user creates a PDF by scanning a paper document using Adobe Acrobat Professional. They then upload that document into SharePoint and assign it the required metadata (shown below).

pdf_initial1

A few days later, the user decides they need to edit the PDF. They open the PDF using the drop down menu and selecting Edit Document.

pdf_edit

Once Adobe Acrobat Professional launches, they perform an OCR on the document so they can edit it. Selecting OCR Text Recognition > Recognize Text Using OCR from the Document menu they let it grind away for a few minutes. With the OCR complete, the user completes the edits and simply clicks Save.

After closing Adobe Acrobat Professional, the SharePoint document library refreshes and all the metadata is gone (see below).

pdf_missing

How unfortunate!

Cause 

After researching the issue I determined that when you perform an OCR on a document that has more than 1 page, Adobe Acrobat will actually delete the original document (thus removing all metadata associated with it) and create a brand spanking new file in its place (the evidence of this can be found by looking in the SharePoint recycle bin).

Doing a similar test on a file stored on a local drive had the same results.  If you fill out the Summary tab by right-click on a PDF document and go to Properties and perform the steps above you will also lose your metadata.

I called Adobe support on this issue and they first responded with “Please explain how SharePoint works, we are unfamiliar with it”. Following my explanation and being put on a hold several times, I finally got this response, “We believe the problem is caused by SharePoint.”

I attempted numerous times to convince the phone support operator that it was not a SharePoint specific problem but was unsuccessful.

Solution

I came up with a solution that meet the needs of my users and thought I would share it.  It is not revolutionary but uses built in SharePoint functionality.

If your users have a need to manipulate a PDF document in anyway, have them follow these steps.

Check out the document to the local drafts folder

pdf_checkout pdf_localdraft

This process will actually put the PDF in the users My Documents\SharePoint Drafts folder on their computer.  Any further edits to the PDF by this user will be made to their local copy.

Once they have completed the edits, simply check the file back in. This will move the file from their local computer back into the SharePoint library. However, all PDF edits and metadata remains intact. (Yippee!)

Conclusion

I have mainly seen this problem occur with the OCR process and sometimes with the amend process when creating a PDF from a scanner. To be safe, I have recommended to my users to use the solution above at all times when manipulating a PDF document. It is not perfect, but it works.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

  • BfromB
    Problem exists also for other (non-OCR) edit actions of a PDF document. Luckily your solution works for those too!
  • Chandra,

    Not that I am aware of you. You might have to contact Autodesk and ask them about that but I doubt they have an option.

    You will probably have to solve it the same way we did for PDF files. Do the local drafts checkout, make your changes, and then check it back in again.

    Thanks,

    Paul Liebrand
  • Chandra Ojha
    I do have the same issue with AutoCAD dwg files. AutoCAD also does the same thing by deleting the original and creating a new file thus loosing SharePoint MetaData. Is there a way to stop this behaviour in AutoCAD, any suggestions?
  • Yes. It is unfortunate it works this way. But for once, this is not a Microsoft problem -- Adobe needs to resolve this issue.
  • If only I had read this before deploying a workflow that copies docs from one site to another!! Note to self: Always test with pdf as well as office documents!
    ;-)
  • No, because it is technically creating a whole new file behind the scenes. If you do the check out to local drafts and then manipulate it and then check it back in a new version will be created.
  • brett
    In your example, is a new version created if you have versioning enabled?

    Thanks,

    Brett
  • Sam
    I was wondering how you were able to get the "Edit with Acrobat" to work. I followed this blog post

    --LINK REMOVED DUE TO MALWARE NOTICE--

    But whenever I go to edit the PDF I get an error saying "A windows sharepoint services compatible application could not be found." Acrobat 9 is installed on the client machine. Any ideas?
  • Correct. The "Send To > Other Destination" itself is not designed to copy the metadata. However, it works with Office documents because the metadata is stored inside the documents custom properties. If the destination library has the same metadata fields as the Office document being copied, SharePoint will suck that information in.
  • Karen
    Have you noticed a similar scenario regarding missing metadata when using the Send To Other Location option in Sharepoint? The metadata on the source file remains, however it is not carried over to the destination location.
  • @Drew,

    Unfortunately I have not -- this definitely seems like something a 3rd party provider would handle. The problem is that Office applications treat metadata in a slightly different way than other file formats such as PDF.

    You might want to communicate your frustration back to Bamboo Solutions in the hopes they can get them resolved.

    Paul
  • Drew
    Hi Paul,

    I'm too slightly off-topic, but I was wondering whether you have found any way to automatically copy/include the SharePoint metadata associated with a Microsoft Office Document when saving as a PDF? We're using the Bamboo Solutions product, but personally I find it limiting and it's been fraught with problems to get up and running for us...
    Have searched high and low on the internet with little luck..

    drew
  • Kev,

    Unfortunately not (to my knowledge anyway). A co-worker of mine pointed me to a company that has built a "Save to SharePoint" feature for Adobe Acrobat. I have not had an opportunity to check it out yet but it looks interesting. It may offer this capability. Check them out if you are interested: http://www.macroview.com.au/MacroView_Adobe_Sav...
  • Sam
    Hello Paul,

    This is a little off topic but I see you have integrated being able to edit .pdf files with SharePoint. My question is, is your SharePoint site using the default port 80 to accomplish this? I can open, edit, and save .pdf files when on the default port 80 but when i use virtual sites (i.e. any other port), I lost that functionality and cannot seem to figure out how to simulate it without having to define my own activeX control. Anyways, any information would help thanks.
  • Unfortunately not. This is something Adobe would have to design into their product. I am willing to beat that Adobe will start adding more SharePoint integration into future version of Acrobat as SharePoint continues to get a bigger presence in the Enterprise. Do not get me wrong -- Acrobat has the the ability to store metadata, or custom properties is what they call it, in a PDF. However, this information is not sync'd with the SharePoint list.
  • Kev
    Thanks for this Paul. On a similar theme, is there a way to attached metadata to a PDF within Adobe such as you can do with the Information Panel when using MS Word?
blog comments powered by Disqus