Two common software pasteboard text processing problems
Published:
Adobe Acrobat Reader DC 在使用Adobe Acrobat Reader DC阅读文献时,我发现一个小问题:在复制文献内容时,如果有单词包含’ff’/’fi’等字符串,复制出来后会被错误符号替代,在不同的文本处理界面上可能显示为错误符或者空白。这个错误的复现概率非常高,且改用Chrome内置的PDF阅读功能后即消失,故初步判断原因为Acrobat Reader在编解码过程中的错误处理。
When using Adobe Acrobat Reader DC to read the document, I found a small problem: when copying the content of the document, if there is a word that contains strings such as’ff’/’fi’, it will be replaced by an incorrect symbol after being copied, in a different text Error characters or blanks may be displayed on the processing interface. The recurrence probability of this error is very high, and it disappears after switching to Chrome’s built-in PDF reading function. Therefore, the preliminary judgment is due to the error handling of Acrobat Reader in the encoding and decoding process.
Figure 1. Issue example.
经过考证,这个问题长期且普遍存在。我在Tex论坛上找到了一些端倪:TeX编码标准将某些组合识别为连字 [1]。很显然,在对Tex编译结果的处理这项工作上,Chrome比Adobe做得更好一些。
After research, this problem is long-term and widespread. I found some clues on the Tex forum: The TeX coding standard recognizes certain combinations as ligatures [1]. Obviously, Chrome does a better job than Adobe in processing the results of Tex compilation.
Figure 2. TeX’s Roman Fonts [1].
OneNote to WeChat 从OneNote上复制文本到微信的Windows PC客户端,会变成字非常小的图片。鉴于记事本和网页文本输入框在这项工作上的良好表现,以及不能且不应当反编译微信PC客户端(虽然我知道用的是魔改版Electron框架),暂时猜测这是微信团队对粘贴板内容处理的设计所导致。
Copying text from OneNote to WeChat’s Windows PC client will turn into very small pictures. In view of the good performance of the notepad and web text input box in this work, and the inability and should not to decompile the WeChat PC client (although I know that the Electron framework is modified by the magic version), I guess this is the WeChat team’s response to the pasteboard. The content processing is superfluous due to design.
Figure 3. Issue example.
Reference
- TeX Book, Chapter 9, TeX’s Roman Fonts, p51.