Why is everyone's understanding of the PDF format still just a read-only file?

Ricardo Lee

2022-08-04 11:26

•  Filed to:

 Blog 

• Proven solutions

1. Can PDFs be edited? Yes. But the editability of PDFs is much worse than that of common document formats (unformatted text, Markdown, TeX, reST, or Word). This format was not created to make it easier for people to edit it as a document.

 


 
2. Take the example of JPG and PSD for example. If you drag a PDF into AI for editing, it's the same as dragging a JPG into Photoshop to cut out. Or in another way of understanding, modifying PDF is the same as modifying formats such as AI/CDR, which is essentially modified as a multi-page vector diagram.

 
3. There is no structural information such as layers and masks in JPG or PNG, and they are all pixels. Similarly, there is no structural information such as paragraphs, chapters, headers, and footers in PDF. Select a font at the position of coordinate X and ordinate Y to write a string of text/draw a picture" and so on.

 


 
4. If the AI ​​can identify some elements (headers and footers) in the imported PDF to you, it can only mean that his algorithm engineer is very good, and the relative position of the text block determines the accuracy of special elements is high enough. Why is it so difficult to convert PDF to Word? One reason, in a sense, this is "reverse engineering".

 
For example, I just found a colleague to test and drag the PDF generated by LaTeX into AI for editing. As expected, the text paragraphs have been cut into line-by-line text. Want to edit entire paragraphs like Word and still wrap lines? nonexistent. New paragraphs in AI might work.

 
5. What's more, each page of some PDFs may be scanned images, without a word, and nothing can be copied. If you want to edit the content with such a PDF, you can either change it as a picture or see which OCR recognition technology is stronger.
 
 


 
6. To ensure Portable (portability) in PDF, a huge pit in the PDF technology stack is to split the fonts used into subsets and embed them in the PDF. If you think about it, even Microsoft Office has not completely solved this huge pit - when Office documents are exported to PDF, fonts embedded in OTF format are not supported, and they are all converted into bitmaps (not even vector graphics).

 
7. Drag a PDF into AI, and most of the time, things like "can't recognize a font, replace it with the default?" will appear. Because PDF embedded font sources and formats are inherently strange. More than 90 percent of the time, the fonts embedded in the PDF will not match the locally installed fonts that the AI ​​can find. This confirms the reverse. The PDF file is a stereotyped thing, and I don't want you to change it.
Previous:How to Add a Digital Signature to PDF Document
Previous:Why does the PDF file format exist?
This article was published in PDF Editor blog

Start editing PDF documents easily

Different terminal devices/systems with the same document processing capability

AmindPDF

Windows

AmindPDF

IOS

AmindPDF

Android

PDF Editor > Blog > PDF News >
Related Articles
  • How To Merge PDF Documents on Windows
  • Best Ways to Remove Watermarks from PDFs October 2022
  • 9 Best Free PDF Editors for Windows | 2022 Update
  • The five best free PDF editors for iPad and iPhone in 2022
  • The 6 Best Free PDF Editors for Windows 10 in October 2022
  • Top 6 Free PDF Creators ( August in 2022)
  • Top 6 Free PDF Creators ( August in 2022)
  • Protect PDFs with Adobe Reader
  • In 2022, the 6 most worthwhile PDF readers for Android phones to download, simple and practical!
  • Which PDF converters are free (Free software recommendation)
  • Other popular Topics From AmindPDF Edit