Ricardo Lee

2022-08-04 11:26

1. Can PDFs be edited? Yes. But the editability of PDFs is much worse than that of common document formats (unformatted text, Markdown, TeX, reST, or Word). This format was not created to make it easier for people to edit it as a document.


2. Take the example of JPG and PSD for example. If you drag a PDF into AI for editing, it's the same as dragging a JPG into Photoshop to cut out. Or in another way of understanding, modifying PDF is the same as modifying formats such as AI/CDR, which is essentially modified as a multi-page vector diagram.

3. There is no structural information such as layers and masks in JPG or PNG, and they are all pixels. Similarly, there is no structural information such as paragraphs, chapters, headers, and footers in PDF. Select a font at the position of coordinate X and ordinate Y to write a string of text/draw a picture" and so on.


4. If the AI ​​can identify some elements (headers and footers) in the imported PDF to you, it can only mean that his algorithm engineer is very good, and the relative position of the text block determines the accuracy of special elements is high enough. Why is it so difficult to convert PDF to Word? One reason, in a sense, this is "reverse engineering".

For example, I just found a colleague to test and drag the PDF generated by LaTeX into AI for editing. As expected, the text paragraphs have been cut into line-by-line text. Want to edit entire paragraphs like Word and still wrap lines? nonexistent. New paragraphs in AI might work.

5. What's more, each page of some PDFs may be scanned images, without a word, and nothing can be copied. If you want to edit the content with such a PDF, you can either change it as a picture or see which OCR recognition technology is stronger.

6. To ensure Portable (portability) in PDF, a huge pit in the PDF technology stack is to split the fonts used into subsets and embed them in the PDF. If you think about it, even Microsoft Office has not completely solved this huge pit - when Office documents are exported to PDF, fonts embedded in OTF format are not supported, and they are all converted into bitmaps (not even vector graphics).

7. Drag a PDF into AI, and most of the time, things like "can't recognize a font, replace it with the default?" will appear. Because PDF embedded font sources and formats are inherently strange. More than 90 percent of the time, the fonts embedded in the PDF will not match the locally installed fonts that the AI ​​can find. This confirms the reverse. The PDF file is a stereotyped thing, and I don't want you to change it.
