DeepSeek Open-Source 3B OCR Model: 97% Accuracy Breaks Text Compression Limits

Select Language:

In recent news, DeepSeek has made a significant breakthrough by open-sourcing its latest research— the DeepSeek-OCR model—on GitHub. The team has unveiled this advanced optical character recognition (OCR) model, which boasts approximately 3 billion parameters. This marks their initial exploration into the feasibility of using “optical 2D mapping compression” technology for processing long-text contexts.

The core structure of the DeepSeek-OCR model consists of two main components: the DeepEncoder and the DeepSeek3B-MoE-A570M decoder. The DeepEncoder is designed to operate efficiently under high-resolution input conditions, maintaining low activation levels while achieving high compression ratios and generating an adequate number of visual tokens. These visual tokens are then precisely transformed into textual information by the decoder.

According to experimental results, the model demonstrates impressive performance when the number of text tokens is kept within ten times the number of visual tokens—that is, with a compression rate of less than 10x. Under these conditions, the OCR recognition accuracy reaches as high as 97%. Even when the compression rate is increased to 20x, the model maintains an accuracy level of around 60%, showcasing its robustness and potential for handling heavily compressed long texts.

The research team emphasizes that this development opens new avenues for the study of long contextual compression techniques and offers fresh insights into the memory and forgetting mechanisms within large language models. This breakthrough could pave the way for more efficient processing of lengthy textual data in various artificial intelligence applications.